<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>O. Bychkov);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhancing Object Detection and Classification in High- Resolution Images Using SAHI Algorithm and Modern Neural Networks⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksii Bychkov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kateryna Merkulova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yelyzaveta Zhabska</string-name>
          <email>y.zhabska@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Yaroshenko</string-name>
          <email>andrii.yaroshenko@knu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Volodymyrska str. 64/13, Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>This paper presents a novel approach to address the challenges of object detection and classification in high-resolution images by combining the Slicing Aided Hyper Inference (SAHI) algorithm with modern neural networks. The proposed method involves slicing high-resolution images into smaller patches, which are then processed by five state-of-the-art neural networks: YOLOv5, YOLOv8, YOLOX, Torchvision, and RetinaNet. Experimental results on a high-resolution images dataset demonstrate the effectiveness of the proposed approach in terms of both accuracy and efficiency. The influence of various SAHI parameters on the performance of object detection is also investigated. The developed software with a user-friendly interface allows to make easy adaptation of the proposed approach to a wide range of practical applications. The presented solution offers a promising direction for efficient object detection and classification in highresolution images.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;object detection</kwd>
        <kwd>image classification</kwd>
        <kwd>high-resolution images</kwd>
        <kwd>SAHI algorithm</kwd>
        <kwd>neural networks</kwd>
        <kwd>computer vision 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The increasing availability of high-resolution images in various domains has led to a growing need
for efficient object detection and classification methods. However, processing such images poses
significant challenges due to their large size and the presence of small objects. Traditional object
detection methods often struggle to efficiently handle high-resolution images, resulting in high
computational costs and suboptimal performance. To address these challenges, we propose a novel
approach that combines the Slicing Aided Hyper Inference (SAHI) algorithm with an ensemble of
modern neural networks. The SAHI algorithm involves slicing the high-resolution image into smaller
patches, which are then processed independently by the object detection models. This approach
enables more efficient processing of large images and improves detection of small objects.</p>
      <p>In this paper, we integrate the SAHI algorithm with five state-of-the-art neural networks:
YOLOv5, YOLOv8, YOLOX, Torchvision, and RetinaNet. These networks have demonstrated
impressive performance on various object detection tasks. By combining them with the SAHI
algorithm, we aim to leverage their strengths while addressing the challenges posed by
highresolution images.</p>
      <p>To evaluate the effectiveness of the proposed approach, we conduct extensive experiments on a
dataset containing high-resolution images of various scenes, such as beaches and bays. The
experiments focus on assessing the accuracy and efficiency of object detection and classification
using different combinations of neural networks and SAHI parameters. Furthermore, we investigate
the influence of various SAHI parameters, such as tile size and overlap, on the performance of object
detection. By systematically varying these parameters, we aim to provide guidelines for their optimal
selection, enabling users to adapt the proposed approach to their specific needs.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>Object detection and classification in high-resolution images have been the focus of numerous
studies in the field of computer vision. Traditional approaches, such as sliding window and image
pyramids, have been widely used for this task. However, these me
from high computational complexity and limited scalability, making them unsuitable for real-time
applications.</p>
      <p>In recent years, deep learning-based approaches, particularly convolutional neural networks
(CNNs), have revolutionized the field of object detection. Architectures like YOLO (You Only Look
Once), SSD (Single Shot MultiBox Detector), and Faster R-CNN have achieved remarkable
performance on benchmark datasets. However, these networks face several challenges when applied
to high-resolution images.</p>
      <p>
        One of the main limitations of neural networks in processing high-resolution images is their fixed
input size requirement. To accommodate this, images are typically resized or cropped before being
given to the input of the network. This resizing process can lead to a loss of information, especially
for small objects, which may become too small to be detected after resizing. Moreover, resizing large
-consuming [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Another challenge faced by neural networks in high-resolution object detection is their difficulty
in detecting small objects. Even when trained on datasets specifically designed for this task, neural
networks often struggle to capture sufficient visual information from small objects, leading to
suboptimal performance. This problem is further exacerbated in high-resolution images, where
objects of interest may occupy only a small portion of the image.</p>
      <p>Furthermore, processing high-resolution images with neural networks demands significant
computational resources, including memory and processing power. As the image size increases, the
number of computations required by the network grows exponentially, making it challenging to
process large images in real-time or on resource-constrained devices.</p>
      <p>
        To address these limitations, various approaches have been proposed in the literature. One such
approach is the use of image pyramids [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], where the image is resized to multiple scales, and object
detection is performed at each scale. However, this approach can be computationally expensive and
may still miss small objects.
      </p>
      <p>
        Another approach is the use of selective search [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], where the image is segmented into regions,
and object detection is performed on each region. Despite the fact that this approach can handle
objects of different sizes, it can be time-consuming and may generate a large number of false
positives.
      </p>
      <p>
        Sliding window techniques have also been employed for object detection in high-resolution
images [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In this approach, a window of fixed size is slid over the image, and object detection is
performed at each window location. However, this approach can be computationally expensive,
especially for large images, and may struggle with objects of varying sizes.
      </p>
      <p>
        More recently, the concept of slicing images into smaller patches has gained attention as a
potential solution to the challenges of high-resolution object detection [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. By processing smaller
patches independently, the computational burden can be reduced, and the detection of small objects
can be improved. However, the effectiveness of this approach depends on the proper selection of
patch size and the handling of objects that span multiple patches.
      </p>
      <p>Despite the progress made in object detection and classification in high-resolution images, there
remains a need for more efficient and effective solutions. The proposed approach in this paper aims
to address this need by combining the Slicing Aided Hyper Inference (SAHI) algorithm with an
388
ensemble of modern neural networks. By leveraging the strengths of both techniques, the proposed
approach is aimed to overcome the limitations of traditional neural networks and provide a scalable
and accurate solution for object detection in high-resolution images.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset Acquisition</title>
      <p>To evaluate the effectiveness of the proposed approach for object detection and classification in
highresolution images, a suitable dataset is required. However, acquiring a large-scale dataset of
highresolution images with annotated objects can be challenging. Publicly available datasets, such as
COCO (Common Objects in Context) and PASCAL VOC, often contain images of relatively low
resolution, which may not adequately represent the challenges faced in real-world high-resolution
object detection tasks.</p>
      <p>To address this issue, a custom dataset was collected specifically for this study. The dataset
consists of high-resolution panoramic images sourced from the web. Panoramic images were chosen
because they offer a wide field of view and capture a large amount of visual information, making
them suitable for testing object detection algorithms in complex scenes.</p>
      <p>The dataset acquisition process involved several steps. First, a list of potential sources for
highresolution panoramic images was compiled. These sources included online repositories, such as
Gigapan, 360cities, and Google Street View, as well as individ
media platforms.</p>
      <p>Next, a set of search queries was formulated to identify relevant images within these sources. The
queries included keywords related to specific scenes, such as beaches, bays, cityscapes, and
landmarks, as well as terms indicating the presence of objects of interest, such as people, vehicles,
and boats.</p>
      <p>The search results were then manually reviewed to select images that met the following criteria:
1. High resolution: the images should have a minimum resolution of 0.1 gigapixels to ensure
2. Diversity: the selected images should cover a wide range of scenes, locations, and object types
to assess the generalization capability of the proposed approach.
3. Clarity: the images should be of good quality, with minimal blur, distortion, or other artifacts
that could hinder object detection performance.
4. Licensing: the images should be available under licenses that allow their usage in research
and publication.</p>
      <p>After the initial selection, the images were downloaded in their original resolution and format.
However, it was observed that many of the panoramic images were not provided as a single file but
rather as a set of tiles that needed to be stitched together to form the complete image.</p>
      <p>To address this issue, a custom script was developed to automate the stitching process. The script
utilized the metadata associated with each tile, such as its position and dimensions, to determine the
correct arrangement of the tiles. The tiles were then loaded into memory and combined using image
processing techniques, such as blending and feathering, to create a seamless high-resolution
panorama.</p>
      <p>Once the panoramic images were stitched, they were manually inspected to ensure the quality of
the stitching process. Images with visible seams, misalignments, or other artifacts were discarded,
and additional images were collected to replace them.</p>
      <p>The final dataset consists of 10 high-resolution panoramic images, with an average resolution of
0.7 gigapixels. The images cover a diverse range of scenes, including beaches, bays, cityscapes, and
landmarks, and contain various objects of interest, such as people, vehicles, boats, and buildings.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1. SAHI Algorithm</title>
        <p>
          The Slicing Aided Hyper Inference (SAHI) algorithm [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is a key component of the proposed
approach for object detection and classification in high-resolution images. The main idea behind
SAHI is to divide the large input image into smaller, overlapping patches, which can then be
processed independently by the object detection models. This approach has several advantages over
traditional methods that rely on resizing or cropping the image to fit the input size of the neural
network.
        </p>
        <p>By processing smaller patches, the SAHI algorithm can effectively handle high-resolution images
without the need for resizing, which often leads to a loss of information, especially for small objects.
Each patch is fed into the neural network at its original resolution, preserving the fine details
necessary for accurate object detection.</p>
        <p>The SAHI algorithm can be easily parallelized, as each patch can be processed independently by
a separate instance of the object detection model. This parallelization can significantly speed up the
inference process, making it more suitable for real-time applications.</p>
        <p>The SAHI algorithm consists of the following steps:
1. Slicing: the input image is divided into smaller, overlapping patches of a fixed size. The size
of the patches and the amount of overlap between them are hyperparameters that can be
adjusted based on the specific requirements of the task. In this study, we experiment with
patch sizes of 256×256, 512×512, and 1024×1024 pixels, with overlap ratios of 0.25 and 0.5.
2. Inference: each patch is independently processed by the object detection model, which
outputs a set of bounding boxes and corresponding class probabilities for the objects detected
within the patch. In this study, we evaluate five state-of-the-art object detection models,
namely YOLOv5, YOLOv8, YOLOX, Torchvision, and RetinaNet.
3. Merging: the bounding boxes and class probabilities obtained from each patch are combined
to form the final set of detections for the entire image. This merging process involves
resolving any duplicate detections that may occur in the overlapping regions between
patches. Several strategies can be employed for this purpose, such as non-maximum
suppression (NMS), which retains only the bounding box with the highest class probability
among a set of overlapping detections.
4. Post-processing: the merged detections are further refined through post-processing
techniques, such as thresholding based on class probabilities and adjusting bounding box
coordinates to account for the original image size.</p>
        <p>The SAHI algorithm provides a flexible and efficient framework for object detection in
highresolution images, enabling the use of existing state-of-the-art object detection models without the
need for extensive modifications.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Role of NMS in the SAHI algorithm</title>
        <p>
          Non-Maximum Suppression (NMS) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is a crucial post-processing step in object detection
algorithms, including the SAHI algorithm. Its primary purpose is to eliminate redundant and
overlapping bounding boxes, keeping only the most confident detection for each object in the image.
        </p>
        <p>In object detection, the model typically generates a large number of bounding boxes, many of
which may belong to the same object. This is particularly common in sliding window-based
approaches, such as the SAHI algorithm, where the image is divided into overlapping patches. Each
patch is processed independently, leading to multiple detections for objects that appear in multiple
patches.</p>
        <p>NMS addresses this issue by suppressing less confident detections that significantly overlap with
more confident ones. The algorithm works as follows:
1. Sort the detected bounding boxes in descending order of their confidence scores.
2. Select the bounding box with the highest confidence score and add it to the list of final</p>
        <p>detections.</p>
        <sec id="sec-4-2-1">
          <title>Intersection over Union (IoU) scores.</title>
          <p>0.5) with the selected box.</p>
          <p>Compare the selected bounding box with all the remaining boxes and calculate their
Remove any bounding box that has an IoU score higher than a predefined threshold (typically
Repeat steps 2-4 until all bounding boxes have been either selected or suppressed.</p>
          <p>
            The IoU score measures the overlap between two bounding boxes and is calculated as the area of
their intersection divided by the area of their union. A high IoU score indicates that the two bounding
boxes significantly overlap and likely belong to the same object [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]:
(1)
,
where B1 and B2 is the bounding boxes of detections.
          </p>
          <p>NMS is crucial for several reasons:
1. Improved precision: by removing redundant detections, NMS helps improve the precision of
the object detection algorithm. Precision measures the percentage of detected objects that are
actually correct, and reducing false positives is essential for achieving high precision.
2.</p>
          <p>Reduced clutter: NMS helps declutter the output of the object detection algorithm, making it
easier to interpret and use the results. Without NMS, the output would be overwhelmed by
numerous overlapping bounding boxes, making it difficult to distinguish individual objects.
3. Increased efficiency: by reducing the number of bounding boxes, NMS helps improve the
efficiency of downstream processes that rely on the object detection results, such as tracking
or counting objects. Processing fewer bounding boxes requires less computational resources
and can lead to faster overall pipeline performance.
4.</p>
          <p>Better user experience: NMS helps provide a cleaner and more intuitive visual representation
of the detected objects, which is particularly important for applications that involve human
interaction, such as video surveillance or autonomous driving.</p>
          <p>In the context of the SAHI algorithm, NMS plays a vital role in merging the detections from
multiple patches into a coherent set of final detections. Without NMS, the SAHI algorithm would
produce numerous duplicate detections for objects that appear in multiple patches, leading to a
cluttered and imprecise output.</p>
          <p>By applying NMS with a carefully chosen IoU threshold, the SAHI algorithm can effectively
merge the detections from multiple patches and provide a clean and accurate set of final detections.
The IoU threshold is a hyperparameter that can be tuned based on the specific characteristics of the
dataset and the desired balance between precision and recall.</p>
          <p>In summary, Non-Maximum Suppression is a crucial post-processing step in object detection
algorithms, including the SAHI algorithm. It helps improve precision, reduce clutter, increase
efficiency, and provide a better user experience by eliminating redundant and overlapping bounding
boxes. NMS is particularly important for the SAHI algorithm, as it enables the effective merging of
detections from multiple patches into a coherent set of final detections.</p>
          <p>•
•
•
•
•
•
•
•
•
•</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. SAHI Algorithm</title>
      </sec>
      <sec id="sec-4-4">
        <title>4.3.1. Basic Non-Maximum Suppression (NMS)</title>
        <p>
          The basic Non-Maximum Suppression (NMS) algorithm [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] works as follows:
•
•
•
•
        </p>
        <p>Score Sorting: sort all bounding boxes by their confidence scores in descending order.
Selection: select the box with the highest score and remove it from the list.</p>
        <p>Overlap Removal: remove all other boxes that have an Intersection over Union (IoU) greater
than a predefined threshold with the selected box.</p>
        <p>Repetition: repeat the process until no more boxes remain.</p>
        <p>This algorithm ensures that only the most confident and least overlapping bounding boxes are
retained.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.3.2. GREEDYNMM (Greedy Non-Maximum Merging)</title>
        <p>
          Greedy Non-Maximum Merging (GREEDYNMM) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is an algorithm designed to improve upon basic
        </p>
        <p>NMM focuses on maintaining spatial accuracy and consistency by carefully merging overlapping
boxes.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.3.4. Large Scale Non-Maximum Suppression (LSNMS)</title>
        <p>Large Scale Non-Maximum Suppression (LSNMS) is an optimized algorithm designed to address the
inefficiencies of traditional Non-Maximum Suppression (NMS) when working with large-scale image
Score Sorting: sort all bounding boxes by their confidence scores in descending order.
Selection: select the box with the highest score.</p>
        <p>Merging: merge this box with all other boxes that have an IoU greater than a certain
threshold. The merging process involves averaging the coordinates of the overlapping boxes
weighted by their confidence scores.</p>
        <p>Update: replace the selected box with the merged box and remove the other overlapping
boxes.</p>
        <p>Repetition: repeat the process until no more boxes overlap significantly.</p>
        <p>GREEDYNMM helps retain more information by merging boxes rather than discarding them,
which can be beneficial in densely populated object scenarios.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.3.3. NMM (Non-Maximum Merging)</title>
        <p>
          Non-Maximum Merging (NMM) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is similar to GREEDYNMM but uses a different strategy for
merging:
        </p>
        <sec id="sec-4-7-1">
          <title>Score Sorting: sort bounding boxes by confidence scores. Selection: select the highest-scoring box. Merging: for boxes with IoU above the threshold, calculate the weighted average of the box coordinates and confidence scores.</title>
          <p>Replacement: replace the selected box with the merged result and remove the overlapping
boxes.</p>
          <p>
            Repetition: continue this process until all boxes have been processed.
data [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. This method significantly speeds up the NMS process, especially for high-dimensional
images and large numbers of bounding boxes.
          </p>
          <p>Key Features of LSNMS:
•
•
•
•
•
•</p>
          <p>R-Tree Structure: LSNMS constructs an R-Tree on the bounding boxes before starting the
NMS process. The R-Tree structure allows for efficient querying of overlapping boxes in
logarithmic time, reducing the complexity of the NMS process.</p>
          <p>Complexity Reduction: traditional NMS has a worst-case quadratic time complexity, which
can be prohibitive with large numbers of boxes. LSNMS reduces this to  ( log( )) by only
considering boxes that are spatially close to each other during the suppression steps.
Handling Large Images: when dealing with large images (e.g., satellite or histology images),
LSNMS handles the patching of images and applies NMS independently to each patch. A final
NMS step is performed to consolidate results from overlapping patches, ensuring accurate
detection without redundant computations.
just-in-time compilation for efficient
computation. This method ensures that even the tree-building process and subsequent NMS
steps are executed swiftly.</p>
          <p>Multiclass Support: LSNMS also supports multiclass NMS by offsetting bounding boxes in a
way that minimizes query times and maximizes the efficiency of the R-Tree structure.
Performance: LSNMS offers significant speed improvements over traditional NMS. For
example, on 40k×40k pixel images with about 300,000 bounding boxes, naive NMS took
approximately 5 minutes on a modern CPU, whereas LSNMS completed in just 5 seconds,
providing nearly a 60 times speedup.</p>
          <p>In this paper, we will evaluate each NMS algorithm for its accuracy.</p>
        </sec>
      </sec>
      <sec id="sec-4-8">
        <title>4.4. Object Detection Models</title>
        <p>
          In this study, we evaluate five state-of-the-art object detection models in combination with the SAHI
algorithm:
1. YOLOv5 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]: YOLOv5 is a single-stage object detector that builds upon the success of
previous YOLO (You Only Look Once) models. It achieves real-time inference speeds while
maintaining high accuracy, making it suitable for various applications. YOLOv5 utilizes a
novel backbone network, a feature pyramid network (FPN) for multi-scale feature fusion, and
an anchor-free detection head.
2. YOLOv8 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]: YOLOv8 is an improved version of YOLOv5, featuring a redesigned
architecture and enhanced training techniques. It achieves state-of-the-art performance on
several object detection benchmarks while maintaining real-time inference capabilities.
3. YOLOX [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]: YOLOX is an anchor-free variant of the YOLO family of object detectors. It
introduces a decoupled head design, where classification and localization are performed
separately, leading to improved accuracy and flexibility. YOLOX also incorporates advanced
data augmentation techniques and a novel loss function to enhance its performance.
4. Torchvision [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]: Torchvision is a popular computer vision library that provides a collection
of pre-trained models for various tasks, including object detection. In this study, we use the
Faster R-CNN model with a ResNet-50 backbone, which has demonstrated strong
performance on several benchmark datasets.
5. RetinaNet [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]: RetinaNet is a single-stage object detector that addresses the class imbalance
problem often encountered in object detection tasks. It introduces a novel focal loss function
that focuses on hard examples during training, leading to improved accuracy, especially for
small and rare objects.
        </p>
        <p>Each of these object detection models has its own strengths and weaknesses, and their
performance may vary depending on the specific characteristics of the dataset and the objects of
interest. By evaluating multiple models in combination with the SAHI algorithm, we aim to provide
a comprehensive analysis of their suitability for high-resolution object detection tasks.</p>
      </sec>
      <sec id="sec-4-9">
        <title>4.5. Experimental Setup</title>
        <p>To evaluate the effectiveness of the proposed approach, we conduct a series of experiments on the
dataset described in previous section. The experiments are designed to assess the performance of the
SAHI algorithm in combination with each of the five object detection models under different settings.</p>
        <p>The main factors considered in the experiments are:
•
•
•</p>
        <p>Patch size: we evaluate three different patch sizes: 256×256, 512×512, and 1024×1024 pixels.
Smaller patch sizes allow for more fine-grained processing but may increase the
computational overhead, while larger patch sizes can reduce the number of patches processed
but may miss small objects.</p>
        <p>Overlap ratio: we experiment with two overlap ratios: 0.25 and 0.5. A higher overlap ratio
ensures that objects are less likely to be split across patch boundaries but increases the
number of patches that need to be processed.</p>
        <p>Object detection model: we evaluate the performance of each of the five object detection
models (YOLOv5, YOLOv8, YOLOX, Torchvision, and RetinaNet) in combination with the
SAHI algorithm.</p>
        <p>
          For each combination of patch size, overlap ratio, and object detection model, we run the SAHI
algorithm on the test set and compute several performance metrics, including execution time, error
percentage, and efficiency [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. These metrics provide a comprehensive assessment of the object
detection performance, considering both the accuracy of the bounding box predictions and the
correctness of the class assignments [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]:
        </p>
        <p />
        <p>The error (2) is just a percentage-based deviation from the true value, which is determined
manually.</p>
        <p>− 1| ∙ 100%,
1</p>
        <p>∙ 102.</p>
        <p>101 + 106</p>
        <p>
          The proposed formula (3) represents significance of error rather than execution time in terms of
efficiency [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The 102 multiplier is for the normalization of the efficiency value. Because efficiency
is calculated per each experiment and is used just to compare different combinations of object
detection models and NMS algorithms, so it can vary.
        </p>
        <p>In addition to the quantitative evaluation, we also perform a qualitative analysis of the results,
visually inspecting the detected objects and their bounding boxes for a subset of the test images. This
analysis provides insights into the strengths and weaknesses of each approach and helps identify
potential areas for improvement.</p>
        <p>To ensure the reproducibility of the results, all experiments are conducted using a fixed random
seed, and the code and data used in the study will be made publicly available upon the acceptance of
this paper.</p>
        <p />
        <p>= |
= 
(2)
(3)</p>
      </sec>
      <sec id="sec-4-10">
        <title>4.6. Implementation Details</title>
        <p>The proposed approach is implemented using the Python programming language and the PyTorch
deep learning framework. The SAHI algorithm is used as a standalone module that can be easily
integrated with existing object detection models.</p>
        <p>For each of the object detection models, we use pretrained weights available from their respective
repositories. During inference, the SAHI algorithm is applied to each test image, and the resulting
patches are processed by the object detection model. The detected bounding boxes and class
probabilities are then merged using different non-maximum suppression algorithms with an IoU
threshold of 0.5.</p>
        <p>All experiments are conducted on a workstation with an AMD Ryzen 9 5900X CPU, 64 GB of
RAM, and NVIDIA RTX 2070 GPU.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments and Results</title>
      <p>We conducted two experiments on object detection and classification in high-resolution images
following the developed methodology presented in Section 4.</p>
      <sec id="sec-5-1">
        <title>5.1. First Experiment</title>
        <p>As a good example of searching for and classifying small objects, a large beach panorama was chosen,
containing a significant number of people, cars, and boats, which is presented in Figure 1.
•
•
•
•</p>
        <p>Image size: 19968×6144 pixels (0.122 GPixels);
image format: PNG;
image size on disk: 185.56 MiB;
image size in RAM: 351.00 MiB.</p>
        <p>As a result of processing this image with all neural networks without using the SAHI algorithm,
no objects were detected. To begin, we performed object searches using a tile size of 256×256 pixels
with a 50% overlap ratio.</p>
        <p>By manually verifying the results of different networks, the approximate number of people in the
photo was visually counted to be around 1100.</p>
        <p>The results of the first experiment, presented in Figure 2, demonstrate the effectiveness of various
combinations of neural networks and post-processing algorithms for object detection in images.</p>
        <p>General Observations:
•
•
•
•
•
•
•
•</p>
        <p>Object Detection Variability: the number of detected objects varies significantly depending
on the neural network and post-processing method used.</p>
        <p>SAHI Performance: applying SAHI without any post-processing (RAW) results in the highest
number of detections for all neural networks but includes many false positives.</p>
        <p>Post-Processing Methods: methods such as NMS, NMM, Greedy NMM, and LSNMS
significantly reduce the number of detections compared to RAW, indicating their
effectiveness in removing redundant bounding boxes, as shown in Figure 3.</p>
        <p>Comparison of Neural Networks:
YOLOv5 and YOLOv8 show similar results, with YOLOv8 having a slightly higher number of
detections.</p>
        <p>YOLOX detects more objects than the YOLO models, especially for classes with a small
number of objects.</p>
        <p>TorchVision demonstrates a high number of detections for certain classes (e.g., person,
umbrella) but underperforms compared to other networks for many other classes.</p>
        <p>RetinaNet shows the fewest detections among all the networks.
•
•
•
•
Comparison of Post-Processing Methods:</p>
        <p>NMS, NMM, and GREEDYNMM: These methods yield very similar results for most classes
and neural networks.</p>
        <p>LSNMS: generally produces slightly more detections than other post-processing methods,
which might indicate lower precision. This algorithm is experimental and not suitable for
non-testing purposes.</p>
        <p>Interesting Observations:
Person Detection: TorchVision significantly outperforms YOLOv5 (1218 vs. 526 detections)
when using the same post-processing method (GREEDYNMM). This can be attributed to
water, as
shown in Figure 4.</p>
        <p>Classobject classes, YOLOv5 and YOLOv8 demonstrate better performance.</p>
        <p>Each neural network classifies objects at different speeds. For this example, the experimental data
provided in the table can be used to calculate the error using the formula (2). To calculate the
efficiency of the neural network along with SAHI settings, use the formula (3).</p>
        <p>Analyzing the results, presented in Figure 5 and Table 1, the following conclusions can be drawn:
•
•
•</p>
        <p>TorchVision with a tile size of 512×512 and a 50% overlap achieves the highest efficiency due
to its superior detection accuracy.</p>
        <p>YOLOX and YOLOv8 also demonstrate high efficiency, particularly with a tile size of 512×512
and a 50% overlap, providing a good balance between accuracy and speed.</p>
        <p>Models with lower accuracy, such as YOLOv5 and RetinaNet, exhibit significantly lower
efficiency despite their relatively high processing speed.</p>
        <p>Increasing the tile size and decreasing the overlap generally reduces efficiency, as it often
lowers detection accuracy.</p>
        <p>The best-performing models for this task are TorchVision with a tile size of 512×512 and a 50%
overlap, and YOLOX and YOLOv8 with a tile size of 512×512 and a 50% overlap. The choice between
them may depend on specific requirements for processing speed and available computational
resources.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Second Experiment</title>
        <p>To evaluate object detection for larger objects, such as boats, a large panorama of a bay was selected
for the second experiment.</p>
        <p>The characteristics of the image, presented in Figure 6, are:
•</p>
        <p>Image size: 30208x10752 pixels (0.324 GPixels);
•
•
•
image format: PNG;
image size on disk: 548.34 MiB;
image size in RAM: 929.25 MiB.</p>
        <p>As with the previous image, processing this image with all neural networks without using the
SAHI algorithm resulted in no detected objects.</p>
        <p>Since the objects in the image are much larger than people, the tile size needs to be increased.
Object detection was performed using a tile size of 1024 by 1024 pixels with a 50% overlap ratio.</p>
        <p>As can be seen from the results of the experiment, presented in Figure 7, the trend that detection
without post-processing is unrepresentative persists. It is also evident that TorchVision again
produces the highest number of detections, but it found a lot of duplicate detections, which are
difficult to remove with the existing post-processing algorithms. The most reasonable result was
obtained by the YOLOv8 model with GREEDYNMM post-processing.</p>
        <p>The approximate number of boats in the image is 391.</p>
        <p>As can be seen from Figure 8, decreasing the tile size leads to an increase in the number of
duplicate detections, making it challenging for existing algorithms to manage and resulting in many
false positives.</p>
        <p>As shown in Figure 9, the closest to the true value are YOLOv8, YOLOv5, and RetinaNet when
detecting with a tile size of 1024 by 1024 pixels with a 50% overlap ratio.</p>
        <p>Decreasing the tile size for this example generally reduces efficiency as it promotes the
occurrence of more false positives.</p>
        <p>From Table 2, it is clear that for detecting large objects, such as boats, a tile size of 1024×1024
pixels is sufficient, as the highest efficiency is achieved with the TorchVision model at 1024x1024,
25%.</p>
        <p>The results of the second experiment demonstrate the effectiveness of various neural network
models and SAHI parameters for detecting boats in the image:</p>
        <p>Detection Accuracy (Error):
The highest accuracy is achieved by TorchVision with a tile size of 1024×1024 and 25%
overlap (error of only 3.32%).</p>
        <p>The lowest accuracy is with TorchVision with a tile size of 512×512 and 50% overlap (error
of 90.54%).</p>
        <p>YOLOv5, YOLOv8, YOLOX, and RetinaNet models show moderate accuracy with errors
ranging from 3% to 40%, depending on the parameters.</p>
        <p>2. Efficiency:
•
•
•</p>
        <p>The highest efficiency is shown by TorchVision with a tile size of 1024×1024 and 25% overlap
(efficiency of 89.09).</p>
        <p>YOLOv8 and YOLOX also demonstrate high efficiency, especially with a tile size of 1024×1024
and 50% overlap (efficiency of 74.42 and 56.41, respectively).</p>
        <p>The lowest efficiency is seen in TorchVision with a tile size of 512×512 and 50% and 25%
overlap (efficiency of 7.81 and 7.26, respectively).</p>
        <sec id="sec-5-2-1">
          <title>3. Impact of Tile Size and Overlap:</title>
          <p>•
•</p>
          <p>Increasing the tile size from 512×512 to 1024×1024 generally improves accuracy and
efficiency for most models.</p>
          <p>Reducing the overlap from 50% to 25% has varying impacts on accuracy and efficiency
depending on the model and tile size, requiring further experiments to determine the optimal
configuration for each model.</p>
          <p>Model Comparison:
TorchVision shows the best results in terms of accuracy and efficiency with a tile size of
1024×1024 and 25% overlap.</p>
          <p>YOLOv8 and YOLOX offer a good balance of accuracy and efficiency, especially with a tile
size of 1024×1024 and 50% overlap.</p>
          <p>YOLOv5 and RetinaNet show moderate results in terms of both accuracy and efficiency.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this study, a novel approach was proposed for object detection in high-resolution images by
integrating the Slicing Aided Hyper Inference (SAHI) algorithm with an ensemble of state-of-the-art
neural networks, including YOLOv5, YOLOv8, YOLOX, Torchvision, and RetinaNet. The approach
addresses the challenges posed by high-resolution images, such as computational inefficiency and
difficulties in detecting small objects, by dividing the images into smaller, manageable patches while
maintaining resolution and detail.</p>
      <p>Experimental results demonstrate that the combination of SAHI with modern object detection
models significantly enhances the detection accuracy and efficiency. Specifically, the ensemble of
YOLOv8 and YOLOX models, aided by the SAHI algorithm, achieves superior performance in terms
of both precision and recall, compared to traditional methods and individual neural network models.
The use of RetinaNet further highlights the importance of addressing class imbalance in detecting
small and rare objects.</p>
      <p>The proposed method not only improves object detection in high-resolution images but also offers
a scalable solution adaptable to various real-world applications, including satellite imagery analysis,
medical imaging, and surveillance systems. Future work will focus on optimizing the patching
strategy and exploring more advanced neural network architectures to further enhance detection
performance. Additionally, integrating advanced techniques such as attention mechanisms and
transformer models could provide further improvements.</p>
      <p>In conclusion, the research provides a robust and efficient framework for object detection in
highresolution images, leveraging the strengths of both the SAHI algorithm and cutting-edge neural
networks. This approach paves the way for more accurate and reliable detection systems in diverse
and demanding applications.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Martsenyuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Merkulova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhabska</surname>
          </string-name>
          ,
          <article-title>Exploring Image Unified Space for Improving Information Technology for Person Identification</article-title>
          ,
          <source>in IEEE Access</source>
          , vol.
          <volume>11</volume>
          , pp.
          <fpage>76347</fpage>
          -
          <lpage>76358</lpage>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3297488</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Revisiting Image Pyramid Structure for High Resolution Salient Object Detection</article-title>
          ,
          <source>Computer Vision ACCV 2022, Lecture Notes in Computer Science</source>
          , vol.
          <volume>13847</volume>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -26293-7_
          <fpage>16</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Selective Multi-Scale Learning for Object Detection</article-title>
          ,
          <source>Artificial Neural Networks and Machine Learning ICANN 2021, Lecture Notes in Computer Science()</source>
          , vol.
          <volume>12892</volume>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -86340-
          <issue>1</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Papais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          , S. Waslander, SWTrack: Multiple Hypothesis Sliding Window 3D
          <string-name>
            <surname>Multi-Object</surname>
            <given-names>Tracking</given-names>
          </string-name>
          ,
          <source>2024 IEEE International Conference on Robotics and Automation (ICRA)</source>
          , Yokohama, Japan,
          <year>2024</year>
          , pp.
          <fpage>4939</fpage>
          -
          <lpage>4945</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICRA57147.
          <year>2024</year>
          .
          <volume>10611067</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Qing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Region Proposal Networks (RPN) Enhanced Slicing for Improved Multi-Scale Object Detection</article-title>
          ,
          <source>2024 7th International Conference on Communication Engineering and Technology (ICCET)</source>
          , Tokyo, Japan,
          <year>2024</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F. C.</given-names>
            <surname>Akyon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Onur</given-names>
            <surname>Altinuc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Temizel</surname>
          </string-name>
          ,
          <article-title>Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection</article-title>
          ,
          <source>2022 IEEE International Conference on Image Processing (ICIP)</source>
          , Bordeaux, France,
          <year>2022</year>
          , pp.
          <fpage>966</fpage>
          -
          <lpage>970</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICIP46576.
          <year>2022</year>
          .
          <volume>9897990</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>He, O3NMS: An Out-Of-Order-Based Low-Latency Accelerator for Non-Maximum Suppression</article-title>
          ,
          <source>2023 IEEE International Symposium on Circuits and Systems (ISCAS)</source>
          , Monterey, CA, USA,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Petrivskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shevchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pokotylo</surname>
          </string-name>
          ,
          <article-title>Models and Information Technologies of Coverage of the Territory by Sensors with Energy Consumption Optimization</article-title>
          ,
          <source>In: Mathematical Modeling and Simulation of Systems, MODS 2021, Lecture Notes in Networks and Systems</source>
          , vol.
          <volume>344</volume>
          , Springer, Cham,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -89902-
          <issue>8</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Park</surname>
          </string-name>
          , S.-J. Jang,
          <article-title>Standard Greedy Non Maximum Suppression Optimization for Efficient and High speed Inference, 2021 IEEE International Conference on Consumer Electronics-Asia (</article-title>
          <string-name>
            <surname>ICCE-Asia</surname>
            <given-names>)</given-names>
          </string-name>
          , Gangwon, Korea, Republic of,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Yang,</surname>
          </string-name>
          <article-title>PENet: Object Detection Using Points Estimation in High Definition Aerial Images</article-title>
          ,
          <year>2020</year>
          19th
          <string-name>
            <given-names>IEEE</given-names>
            <surname>International</surname>
          </string-name>
          <article-title>Conference on Machine Learning and Applications (ICMLA), Miami</article-title>
          , FL, USA,
          <year>2020</year>
          , pp.
          <fpage>392</fpage>
          -
          <lpage>398</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICMLA51294.
          <year>2020</year>
          .
          <volume>00069</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          et al.,
          <article-title>Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges</article-title>
          ,
          <source>in IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>44</volume>
          , no.
          <issue>11</issue>
          , pp.
          <fpage>7778</fpage>
          -
          <issue>7796</issue>
          , 1 Nov.
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2021</year>
          .
          <volume>3117983</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          , MS-YOLO:
          <article-title>Object Detection Based on YOLOv5 Optimized Fusion Millimeter-Wave Radar and Machine Vision</article-title>
          ,
          <source>in IEEE Sensors Journal</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>15</issue>
          , pp.
          <fpage>15435</fpage>
          -
          <issue>15447</issue>
          , 1 Aug.1,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/JSEN.
          <year>2022</year>
          .
          <volume>3167251</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , E. Liu,
          <article-title>Small Object Detection Algorithm Based on Improved YOLOv8 for Remote Sensing</article-title>
          , in
          <source>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing</source>
          , vol.
          <volume>17</volume>
          , pp.
          <fpage>1734</fpage>
          -
          <lpage>1747</lpage>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/JSTARS.
          <year>2023</year>
          .
          <volume>3339235</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Noon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amjad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Qureshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mannan</surname>
          </string-name>
          ,
          <article-title>Handling Severity Levels of Multiple CoOccurring Cotton Plant Diseases Using Improved YOLOX Model</article-title>
          ,
          <source>in IEEE Access</source>
          , vol.
          <volume>10</volume>
          , pp.
          <fpage>134811</fpage>
          -
          <lpage>134825</lpage>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3232751</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Albardi</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. M. D. Kabir</surname>
            ,
            <given-names>M. M. I.</given-names>
          </string-name>
          <string-name>
            <surname>Bhuiyan</surname>
            ,
            <given-names>P. M.</given-names>
          </string-name>
          <string-name>
            <surname>Kebria</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Khosravi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Nahavandi</surname>
            ,
            <given-names>A Comprehensive</given-names>
          </string-name>
          <article-title>Study on Torchvision Pre-trained Models for Fine-grained Inter-species</article-title>
          <string-name>
            <surname>Classification</surname>
          </string-name>
          ,
          <source>2021 IEEE International Conference on Systems, Man, and Cybernetics</source>
          (SMC), Melbourne, Australia,
          <year>2021</year>
          , pp.
          <fpage>2767</fpage>
          -
          <lpage>2774</lpage>
          . doi:
          <volume>10</volume>
          .1109/SMC52423.
          <year>2021</year>
          .
          <volume>9659161</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Focal Loss for Dense Object Detection</article-title>
          ,
          <source>in IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , vol.
          <volume>42</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>318</fpage>
          -
          <issue>327</issue>
          , 1 Feb.
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1109/TPAMI.
          <year>2018</year>
          .
          <volume>2858826</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          et al.,
          <source>Using Neural Networks Application for the Font Recognition Task Solution</source>
          ,
          <year>2020</year>
          55th International Scientific Conference on Information, Communication and Energy -
          <volume>170</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          et al.,
          <source>Increasing the Classification Accuracy of EEG based Brain-computer Interface Signals</source>
          ,
          <year>2020</year>
          10th International Conference on Advanced Computer Information Technologies (ACIT), Deggendorf, Germany,
          <year>2020</year>
          , pp.
          <fpage>386</fpage>
          -
          <lpage>390</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACIT49673.
          <year>2020</year>
          .
          <volume>9208944</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>V.</given-names>
            <surname>Petrivskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shevchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bychkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brazhenenko</surname>
          </string-name>
          ,
          <article-title>Estimation of Noise Hazards in Environmental Monitoring Tools Designe in the Subway</article-title>
          ,
          <source>2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM)</source>
          , Polyana, Ukraine,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . doi:
          <volume>10</volume>
          .1109/CADSM.
          <year>2019</year>
          .
          <volume>8779315</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>