<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving accuracy of photogrammetry method using image segmentation by YOLO neural networks⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Serhiy Balovsyak</string-name>
          <email>s.balovsyak@chnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nazar Hrynyk</string-name>
          <email>hrynyk.nazar@chnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhengbing Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Khrystyna Odaiska</string-name>
          <email>k.odaiska@chnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivanna</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>School of Computer Science, Hubei University of Technology</institution>
          ,
          <addr-line>Wuhan</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Yuriy Fedkovych Chernivtsi National University</institution>
          ,
          <addr-line>Kotsiubynsky 2, 58012, Chernivtsi</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A review of modern methods and tools for building three-dimensional (3D) models of physical objects has been carried out. An information technology for building 3D models of objects using the photogrammetry method has been proposed, which provides an increase in the accuracy of their construction due to automatic masking of images. The hardware of the technology consists of video cameras, computers and network equipment. The software of the technology consists of the developed program "YOLO_mask_24" for forming masked images, the 3DF Zephyr program for building 3D models using the photogrammetry method. The program "YOLO_mask_24" was developed in Python on the Google Colab cloud platform for segmentation, scaling, dilation, filtering and masking of object images. Image segmentation was performed by a convolutional artificial neural network YOLOv8. 3D models of an object (bench) were constructed using the 3DF Zephyr program based on images without masking. The accuracy of most of the object model is satisfactory, but defects appeared in part of the model. It was found that constructing 3D models using photogrammetry only based on masked images leads to significant model defects. The novelty of the work is the simultaneous use of images of objects without masking and with masking to build their 3D models using the photogrammetry method. To verify this approach, three-dimensional models of the object (bench) were constructed using the 3DF Zephyr program based on images without masking and with masking. The use of the proposed information technology ensured high accuracy for all parts of the 3D model of the object. The resulting 3D models of objects are used, in particular, in threedimensional computer graphics, virtual and augmented reality systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Information technology</kwd>
        <kwd>3D model</kwd>
        <kwd>photogrammetry</kwd>
        <kwd>image segmentation</kwd>
        <kwd>image masking</kwd>
        <kwd>artificial neural networks</kwd>
        <kwd>software</kwd>
        <kwd>Python</kwd>
        <kwd>cloud platform 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The purpose of the photogrammetry method is to build three-dimensional (3D) models of physical
objects based on a series of their digital photographs obtained from different angles [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ]. The
photogrammetry method is used to build, in particular, 3D models of geographical and natural
objects (terrain relief, landscape, etc.), architectural objects (buildings, structures, etc.), industrial
and technical objects (mechanisms, devices, etc.), cultural and artistic objects (sculptures, etc.),
biological and medical objects. Such 3D models make it possible to automate and increase the
efficiency of solving many applied problems, therefore photogrammetry is widely used in
architecture, archaeology, construction, video games, education, medicine, virtual reality and other
industries [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The construction of three-dimensional models of objects is performed by specialized
programs, for example, 3DF Zephyr [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        In addition to the photogrammetry method, LIDAR (Light Identification, Detection and
Ranging) laser rangefinders are used to build 3D models, the principle of which is to calculate the
distance to objects due to delays in the propagation of optical radiation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The use of LIDAR
provides high accuracy of 3D models, but requires special hardware and software. Therefore, the
photogrammetry method is preferable for solving many problems of building models, the initial
images for which can be obtained using video cameras.
      </p>
      <p>A significant problem in the application of the photogrammetry method is that photographs, in
addition to the objects under study, often contain images of extraneous objects (background). Such
objects in the images can lead to defects in the construction of three-dimensional models. The
impact of extraneous objects on the reduction of the quality of 3D models is especially significant
in the case of a non-uniform background and in the presence of transparent or specularly reflective
surfaces.</p>
      <p>One of the ways to increase the accuracy of constructing three-dimensional models using the
photogrammetry method is image masking. The essence of masking is to select and further process
only certain areas of the image (mask) that belong to the object under study. In this case, other
areas of the image outside the mask are usually painted with a uniform color. Thus, masking
removes excess information from the original images that can cause artifacts.</p>
      <p>
        In most photogrammetry programs, image masking is performed manually, which is a rather
laborious and subjective process [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Image masking can be automated using segmentation
methods that separate the studied objects as separate segments. Currently, the photogrammetry
method is actively developing and improving, so the task of increasing the accuracy of the
photogrammetry method using image segmentation is relevant.
      </p>
      <p>
        Segmentation methods, such as region-based and watersheds, provide accurate segmentation
only in the case of clear object boundaries and a significant difference between the color
(brightness) of objects and the background [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. When obtaining images using digital video
cameras, it is important that the contrast and brightness of the images are within the permissible
range [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Otherwise (for example, when illuminated), some of the details in the image may be lost.
It is possible to increase the accuracy of image segmentation using artificial intelligence [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ], in
particular, using artificial neural networks (ANN) [
        <xref ref-type="bibr" rid="ref10">10-12</xref>
        ]. In image processing, convolutional
neural networks (CNN) [13, 14] are among the most effective, the structure of which is adapted for
processing two-dimensional signals.
      </p>
      <p>Therefore, the paper proposes to perform image segmentation using convolutional neural
networks YOLO (You Only Look Once) [15, 16]. Such neural networks allow for object detection in
images, as well as their segmentation. According to the principle of operation of YOLO neural
networks, objects in images are detected in one stage. An important advantage of neural networks
is the ability to accurately detect and segment real images of objects that differ slightly from the
background due to brightness, color and texture [17, 18]. Neural networks also allow for the
selection of areas of objects of a certain class as segments. YOLO neural networks are pre-trained
to detect 80 classes of objects, but can be further trained to detect objects of other classes.</p>
      <p>The aim of the work is to increase the accuracy of constructing 3D models of objects using the
photogrammetry method, which is implemented in the proposed information technology by
preprocessing a series of images, namely, by segmenting images using neural networks YOLO and
highlighting segments of certain classes as object masks. The object of research is the information
technology for building 3D models using the photogrammetry method.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        The basics of the photogrammetry method are described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The photogrammetry method is
based on mathematical models that describe the geometry of the video camera, the spatial
arrangement of the objects under study, etc. Both special and household cameras are used to obtain
photographs of objects. Based on the analysis of a series of photographs, the three-dimensional
coordinates of points on its surface are determined by the photogrammetry method.
      </p>
      <p>In work [19] the creation of three-dimensional digital models of heritage sites using laser
scanning (by LIDAR) and photogrammetry methods is described. It is shown that using laser
scanning, a triangulation grid of a 3D model with high accuracy and detailed geometry is obtained.
However, in the case of laser scanning, hidden areas often occur, especially for objects of complex
shape. Such gaps can be significantly reduced by combining the results of laser scanning and
photogrammetry. The initial images for the photogrammetry method are obtained by terrestrial
and unmanned aerial vehicle (UAV) photography. The combination of photogrammetry with laser
scanning has given good results in constructing three-dimensional models of historical buildings
with complex architectural and decorative details. High-resolution photographs not only increase
the accuracy of 3D models, but also improve the quality of textures. At the same time, it is
important to obtain images of the object under study from all angles, which is possible when using
a UAV.</p>
      <p>Ways to improve the quality of models constructed using the 3D reconstruction method are
considered in the study [20]. The importance of obtaining high-quality 3D models for building
information modeling and city management is described. It is shown that obtaining a series of
images of buildings using a UAV for the photogrammetry method is economically beneficial.
Possible defects of models for buildings of complex shape, in which some parts may overlap. To
improve the quality of building reconstruction, the study proposes a new method of planning the
UAV trajectory, which involves obtaining images of all surface fragments from different
viewpoints. Model construction errors were determined through the exact spatial coordinates x, y
and z of checkpoints and the coordinates of these points obtained by the photogrammetry method.
It is proposed to increase the accuracy of models by adding viewpoints for areas where larger
construction errors are observed.</p>
      <p>The modern neural network model SSG2 (Semantic Segmentation Generation 2) for semantic
image segmentation is considered in work [21]. The peculiarity of the model is that it does not
operate on separate images, but integrates the results of several images (observations). The SSG2
model uses a basic ANN with dual-encoder, single-decoder base network augmented with a
sequence model. Due to the consideration of several similar images, the accuracy of their
segmentation is increased. The resulting segmented images are used in the photogrammetry
method and in the analysis of object images.</p>
      <p>The initial images for the photogrammetry method can be obtained as individual frames of the
video stream. Obtaining important frames (Keyshot) from the video stream is described in research
[22]. The proposed algorithm allows obtaining important frames even from long video sequences.</p>
      <p>
        Detection and segmentation of object images using neural networks YOLO (version 8) is
described in research [15]. Neural networks YOLO are pre-trained on the COCO (Common Objects
in Context) dataset, which contains over 100,000 images with labeled objects (80 classes). However,
if necessary, YOLO networks can be trained on their own datasets. After training, the YOLO
network selects the object under study in the image as a segment. The advantages of YOLO
networks include high image processing speed and accurate selection of object boundaries, even of
complex shapes. YOLO allows to detect objects not only in individual images, but also in real-time
video. The accuracy of segmentation by YOLO networks in most cases exceeds the accuracy of
known methods of region-based and watersheds [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The analysis of the reviewed publications confirms the relevance of research related to
increasing the accuracy of the photogrammetry method using image segmentation by neural
networks YOLO.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed information technology for building three-dimensional models of objects</title>
      <p>An information technology for constructing three-dimensional models of objects using the
photogrammetry method is proposed, which provides an increase in the accuracy of constructing
3D models using image segmentation by neural networks YOLO. The information technology uses
hardware (video cameras, computers and network equipment) and software (a Python program has
been developed for image segmentation by neural networks YOLO and the formation of masked
images, the 3DF Zephyr program for constructing 3D models using the photogrammetry method).</p>
      <p>
        The initial images are written to the array img(i, k, c), where i = 0, ..., M-1; k = 0, ..., N-1,
c = 0,..., QC; M, N are image height and width (in pixels); QC = 3 is number of color channels (red,
green, blue) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The process of constructing a 3D model includes the following steps:</p>
      <sec id="sec-3-1">
        <title>Reading a series of QIm photos img of the object from the video camera.</title>
        <p>Segmentation of images img by neural networks YOLO.</p>
        <p>Extraction of masks for segments of the object in the image img.</p>
        <p>Calculation of a scaled image of masks mask_sc based on the initial masks, where the scale
of mask_sc corresponds to the scale of the initial image img.</p>
        <p>
          Dilation of mask_sc with the number of iterations HD, resulting in the calculation of a
dilated image mask_scD; image dilation [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is designed to increase the transition region
between the object and the background.
the calculation of the image mask_scDG.
        </p>
        <p>mask_scDG for all color channels.
6. Filtering mask_scD with a Gaussian filter kernel (standard deviation σHD = HD/3, resulting in</p>
        <p>Obtaining a masked image img_mask by multiplying the initial image img by the</p>
      </sec>
      <sec id="sec-3-2">
        <title>8. Forming a set of initial img and masked img_mask images.</title>
        <p>Building a three-dimensional model of the object by photogrammetry based on a set of
images using the 3DF Zephyr program, saving the 3D model to files (.zep, .ply, Obj
formats).</p>
        <p>Steps #2-#7 are implemented using the developed software "YOLO_mask_24" in Python. Let's
consider in more detail the individual steps of building the 3D model.</p>
        <p>Step #2 consists of segmenting the initial images img with a pre-trained neural network
YOLOv8 (one of the latest versions of YOLO) [15, 16]. The neural network for object segmentation,
in addition to the object class and the probability (confidence) of its detection, returns the
coordinates of the object center (x, y) in the image, its width and height, and masks for the object
segments in the image. It is possible to detect several objects in the image at the same time, the
network YOLO returns a segment for each detected object.</p>
        <p>
          Step #6 consists of filtering the mask_scD by convolving it with a Gaussian filter kernel wG (size
Mw × Nw elements) with standard deviation σHD = HD/3 (HD is the width of the transition region
between the object and the background, which is formed as a result of dilation) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The filtered
image mask_scDG with smoothed edges is calculated by the formula

_ 
( ,  ,  ) =
 −1
 =0
 −1
 =0
 −1
 =0

_  ( 1,  1,  ) ∙   ( ,  ) ,
(1)
where i = 0, ..., M-1; k = 0, ..., N-1; c = 0,..., QC; i1 = i – m + mC; k1= k – n – nC;
M, N are image height and width (in pixels);
mC is center of the filter kernel in height; nC is center of the filter kernel in width.
        </p>
        <p>Smoothing the mask edges in the image mask_scDG provides a smooth transition between the
object and the background.</p>
        <p>The two-dimensional Gaussian function with the standard deviation σHD is used as the filter
kernel wG, which is described by the formula:
(2)
  ( ,  ) =
 
1
√2

−(( −   )2 + ( −   )2)
2  2
where m = 0, ..., Mw-1; n = 0, ..., Nw-1;
mC is the center of the filter kernel in height; nC is the center of the filter kernel in width.</p>
        <p>The sizes of the filter kernel are calculated taking into account the 3σ rule for the normal
distribution: Mw = 6 ∙ σHD, Nw = 6 ∙ σHD.</p>
        <p>
          Step #9 consists in constructing a three-dimensional model of the object using the
photogrammetry method in the 3DF Zephyr program [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. The construction of a 3D model of
objects is performed on the basis of a set of QIR images, which are obtained from different angles
using one or more video cameras. The positions of keypoints (for example, object boundaries or
corners) are determined on the obtained images. Image matching consists in matching their
keypoints. To construct the model, QIS images (from the set of QIR images) are used, for which
satisfactory matching with other images by keypoints is obtained. Images that are poorly matched
with others are not used for 3D reconstruction (as this may lead to model defects).
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Software implementation of object mask selection</title>
      <p>The software «YOLO_mask_24» for masking images of objects was developed in Python in the
Google Colab cloud platform using the Jupyter Notebook and the computer vision library
“OpenCV” [23, 24]. Image segmentation was performed by a convolutional neural network
YOLOv8 [15] using the “ultralytics” library [17], from which the YOLO class is imported. Neural
networks YOLO were used, pre-trained on the COCO training set [25].</p>
      <p>A medium-sized neural network model for object detection is created with the command:
model = YOLO("yolov8m.pt"), and a model for image segmentation is created with the command:
model_seg = YOLO("yolov8m-seg.pt"). To increase the accuracy of segmentation (if necessary), the
size of the neural network model can be increased, but this reduces the performance.</p>
      <p>The method «predict» of the "yolov8m-seg.pt" model receives the initial image img as an input
parameter, and returns the result in the form of an Ultralytics YOLO Results object, which contains
information about the predicted</p>
      <p>masks, coordinates of the bounding boxes, classes and
probabilities. The program leaves the bounding boxes only for objects of the given class.</p>
      <p>The result of method «predict» is a list of ultralytics.engine.results.Results objects, where each
object corresponds to one input image. The segmentation results are masks of the segments in the
image and boxes objects, which contain:
•
•
•
cls – object class number.
x2, y2].
conf – object recognition probabilities (Confidence).</p>
      <p>xyxy – coordinates of the rectangular bounding boxes of the segments in the format [x1, y1,
The calculation of the scaled image of the mask_sc based on the initial masks is performed
using the bicubic interpolation method, namely the «resize» function of the OpenCV library.
The scale of the mask_sc is equal to the scale of the initial image img, which allows spatially
selecting arbitrary segments.</p>
      <p>As a result, the dilation of the mask_sc calculates the dilated image of the mask mask_scD.
The dilation is performed by the distance HD (in pixels), which creates a transition area
between the object and the background. The dilation is performed programmatically by the
function «binary_dilation» of the «scipy» library.</p>
      <p>The filtering of the mask_scD with a Gaussian filter kernel is performed by the function
«gaussian_filter» of the «scipy» library, as a result of which the image mask_scDG with a smooth
transition between the object mask and the background is calculated.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results of constructing three-dimensional models of objects</title>
      <sec id="sec-5-1">
        <title>5.1. Results of constructing a three-dimensional model of an object without masking</title>
        <p>In accordance with the proposed information technology (section 3), three-dimensional models of
an object (bench) were constructed. For this purpose, a set of QIm images img of the object was
obtained using a video camera (Figure 1). The position and orientation of the video camera were
chosen so as to obtain an image of the object from all angles.</p>
        <p>Let us consider an example of constructing a three-dimensional model of a bench in the
integrated 3DF Zephyr software package based on QIR = QIm images without masking. The
construction of a 3D model (3D Reconstruction) consists in creating:</p>
        <sec id="sec-5-1-1">
          <title>1. Sparse Point Cloud. 2. Dense Point Cloud. 3. Meshes of polygons. 4. Textured Meshes.</title>
          <p>In the process of determining the alignment with neighboring images and the orientation of the
cameras, satisfactory values were obtained for all images, so the 3D model was built on the basis of
QIS = QIm images. The coefficient of the relative number of images taken into account when
building the model is equal to kI = QIS/QIR = 1. A high value of QIS and kI ensures a relatively high
quality of construction of the front part of the bench (Figure 2), however, defects appeared in the
rear part of the bench (part of the rear supports of the bench are missing) (Figure 3, Figure 4).</p>
          <p>This result can be explained by the small number of images of the object (bench) and the
negative impact of background objects that were captured by the frame. Therefore, to increase the
accuracy of the model, the image set was masked.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Results of constructing a three-dimensional model of an object with masking</title>
        <p>Let us consider an example of constructing a three-dimensional model of a bench in the integrated
software package 3DF Zephyr based on QIR = QIm images (Figure 1) after masking. Image masking
was performed using the software «YOLO_mask_24». In each image, the neural network YOLO
selects segments of objects, and for further processing, only segments of a given class (class
«bench» with number 13) are used (Figure 5). Image masking consists in the fact that all areas of
the image outside the frame of the studied object are painted over with a uniform color (black)
(Figure 5b).</p>
        <p>The image of the mask (Figure 6a) is scaled to the size of the original image (Figure 6b).
Distortions (edge effects) appear on the clear borders of the mask, so around the mask it is possible
to create a transition area between the object and the background using the dilation method (area
width HD = 30) and perform smoothing of the transition area with a Gaussian filter (Figure 6b).
Multiplying the mask image by the original image results a masked image img_mask (Figure 7).</p>
        <p>
          The images processed by masking (Figure 8) were used to build a three-dimensional model of
the object using the photogrammetry method in the integrated software package 3DF Zephyr [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
In the case of building a three-dimensional model only based on images after masking, some of the
images are not taken into account by the program 3DF Zephyr (QIS = 17 images with QIR = 23, kI =
QIS/QIR = 17/23 = 0.739 were selected for building), since in this case it is more difficult to find
common points between different images. As a result, this leads to deterioration in the quality of
the model, so building a model only based on images after masking is impractical. This applies to
both images with clear mask boundaries and images with blurred boundaries (Figure 7).
        </p>
        <p>A significantly better quality of the model is obtained when using simultaneously images
without a mask (Figure 1) and images with a mask (Figure 8) for its construction. In this case, the
model is built on the basis of QIR = 2QIm = 46 images.</p>
        <p>At the same time, smaller distortions of the 3D model were obtained in the case of a mask with
clear boundaries (the coefficient of the relative number of images selected for building the model, kI
= QIS/QIR = 44/46 = 0.957), since when using a mask with blurred boundaries, distortions occur in
the transition areas and the accuracy of image matching in the photogrammetry method
deteriorates. Therefore, the best result was obtained when using simultaneously images without a
mask and images with a clear mask for building the 3D model (Figure 9, Figure 10, Figure 11).</p>
        <p>As a result (Figure 11), high quality of all parts of the 3D model was obtained. Thus, by
constructing an object model using simultaneously images without a mask and images with a
mask, it was possible to reduce 3D model defects. Due to the absence of extraneous objects on the
masked images, higher accuracy of constructing 3D models of objects is ensured. Images without a
mask are required for better matching of different images with each other, since key points of not
only the object, but also the background are used to match the images. The results of the research
show that by simultaneously using images without a mask and images with a mask (obtained by
segmenting the initial set of images), it is possible to increase the accuracy of constructing 3D
models of objects.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>An information technology for constructing three-dimensional models of objects using the
photogrammetry method is proposed, which provides an increase in the accuracy of constructing
3D models due to automatic masking of images. The hardware of the technology consists of video
cameras, computers and network equipment. The software of the technology consists of a program
developed in the Python language for forming masked images, the 3DF Zephyr program for
constructing three-dimensional models using the photogrammetry method. The software
"YOLO_mask_24" was developed in Python on the cloud platform Google Colab, which is designed
for segmentation, scaling, dilation, filtering, and masking of object images. Image segmentation is
performed by a convolutional neural network YOLOv8.</p>
      <p>Three-dimensional object models (benches) were constructed using the 3DF Zephyr program
based on QIR = 23 images without masking using the photogrammetry method. All QIS = 23 images
were used for construction. Sparse Point Cloud, Dense Point Cloud, Meshes and Textured Meshes
were created. The created model contains 302686 Points, 395105 Triangles. The accuracy of most of
the object model is satisfactory, however, defects appeared in the rear part of the model. Part of the
rear supports of the bench is missing, approximately 40% of the surface of the rear supports is
damaged.</p>
      <p>It was found that building models using photogrammetry only on the basis of masked images
leads to significant model defects, since in this case the images are poorly combined with each
other at key points and only QIS = 17 images from QIR = 23 images (relative image number
coefficient kI = 0.739) are used to build the 3D model. This applies to both images with clear mask
boundaries and images with blurred boundaries.</p>
      <p>The novelty of the work is the simultaneous use of images of objects without masking and with
masking to build their three-dimensional models using the photogrammetry method. To verify this
approach, three-dimensional object models (benches) were built using the 3DF Zephyr program
based on QIR = 46 images without masking and with masking. All QIS = 44 images were used for
construction (relative image number coefficient kI = 0.957). The created model contains 250132
Points, 331 378 Triangles. High accuracy was obtained for all parts of the 3D model.</p>
      <p>The accuracy of constructing three-dimensional models of objects using the photogrammetry
method has been improved, therefore the goal of the work has been achieved. It has been shown
that the use of artificial neural networks YOLO allows to automate image masking for the
photogrammetry method, increase the accuracy and speed of image processing, and obtain more
accurate and detailed 3D models. The resulting 3D models of objects are used, in particular, in
three-dimensional computer graphics, virtual and augmented reality systems, reverse engineering.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[11] N. K. Manaswi, Deep Learning with Applications Using Python, Apress, India, 2018. doi:
10.1007/978-1-4842-3516-4.
[12] S. Balovsyak, I. Fodchuk, Kh. Odaiska, Yu. Roman, E. Zaitseva, Analysis of X-Ray Moiré
Images Using Artificial Neural Networks, in: Proceedings of the 3nd International Workshop
on Intelligent Information Technologies and Systems of Information Security (IntelITSIS 2022),
March 23–25, 2022, ceur-ws.org, Khmelnytskyi, Ukraine, 2022, pp. 187-197.
[13] Z. Li, F. Liu, W. Yang, S. Peng, J. Zhou, A Survey of Convolutional Neural Networks: Analysis,
Applications, and Prospects, IEEE Transactions on Neural Networks and Learning Systems 33
(12) (2022) 6999-7019. doi:10.1109/TNNLS.2021.3084827.
[14] S. Balovsyak, O. Kroitor, Kh. Odaiska, A.-B.M. Salem, S. Stets, Car Image Recognition using
Convolutional Neural Network with EfficientNet Architecture, in: Proceedings of the 5th
International Workshop on Intelligent Information Technologies &amp; Systems of Information
Security with CEUR-WS (IntelITSIS 2024), March 28, 2024, ceur-ws.org, Khmelnytskyi,
Ukraine, 2024, pp. 182-195.
[15] How to Detect Objects in Images Using the YOLOv8 Neural Network, 2024, URL:
https://www.freecodecamp.org/news/how-to-detect-objects-in-images-using-yolov8.
[16] Sik-Ho Tsang. Review: YOLOv8. Outperforms YOLOv5, YOLOv6 and YOLOv7, 2024, URL:
https://sh-tsang.medium.com/review-yolov8-object-detection-5214fa105731
[17] Model Prediction with Ultralytics YOLO, 2024, URL:
https://docs.ultralytics.com/modes/predict/#plotting-results
[18] Sik-Ho Tsang. Brief Explanation of YOLOv5, It Outperforms EfficientDet, 2024, URL:
https://sh-tsang.medium.com/brief-review-yolov5-for-object-detection-84cc6c6a0e3a
[19] Z. Wu, P. Marais, H. Rüther, A UAV-based sparse viewpoint planning framework for detailed
3D modelling of cultural heritage monuments, ISPRS Journal of Photogrammetry and Remote
Sensing 218 (B) (2024) 555-571. doi: 10.1016/j.isprsjprs.2024.10.028.
[20] Sh. Zhang, Ch. Liu, N. Haala, Guided by model quality: UAV path planning for complete and
precise 3D reconstruction of complex buildings, International Journal of Applied Earth
Observation and Geoinformation 127 (2024) 103667. doi: 10.1016/j.jag.2024.103667.
[21] F.I. Diakogiannis, S. Furby, P. Caccetta, X. Wu, R. Ibata, O. Hlinka, J. Taylor, SSG2: A new
modeling paradigm for semantic segmentation, ISPRS Journal of Photogrammetry and Remote
Sensing 215 (2024) 44-61. doi: 10.1016/j.isprsjprs.2024.06.011.
[22] A. Zarichkovyi, I.V. Stetsenko, Boundary Refinement via Zoom-In Algorithm for Keyshot
Video Summarization of Long Sequences, in: Advances in Artificial Systems for Logistics
Engineering III. ICAILE 2023. Lecture Notes on Data Engineering and Communications
Technologies. Springer, Cham. 180, 2023, pp. 344–359. doi: 10.1007/978-3-031-36115-9_32.
[23] K. Kargin, Computer Vision Fundamentals and OpenCV Overview, 2024, URL:
https://medium.com/mlearning-ai/computer-vision-fundamentals-and-opencv-overview9a30fe94f0ce.
[24] G. Guillen, Digital Image Processing with Python and OpenCV, in: Sensor Projects with
Raspberry Pi. Maker Innovations Series, Apress, Berkeley, CA, 2024, pp.105–147. doi:
10.1007/979-8-8688-0464-9_5.
[25] COCO (Common Objects in Context), 2024, URL: https://cocodataset.org/#home.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Stylianidis</surname>
          </string-name>
          , Measurements: Introduction to Photogrammetry. in:
          <article-title>Photogrammetric Survey for the Recording and Documentation of Historic Buildings</article-title>
          . Springer Tracts in Civil Engineering, Springer, Cham.
          <year>2020</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>195</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -47310-
          <issue>5</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>3DF</given-names>
            <surname>Zephyr. The Complete Photogrammetry Solution</surname>
          </string-name>
          ,
          <year>2024</year>
          , URL: https://www.3dflow.net.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Mikeroyal.</surname>
          </string-name>
          Photogrammetry-Guide,
          <year>2024</year>
          , URL: https://github.com/mikeroyal/Photogrammetry-Guide?
          <article-title>tab=readme-ov-file</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          . Python Libraries for Mesh,
          <source>Point Cloud, and Data Visualization (Part 1)</source>
          ,
          <year>2024</year>
          , URL: https://towardsdatascience.com
          <article-title>/python-libraries-for-mesh-and-point-cloud-visualization-part1-daa2af36de30</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Palani</surname>
          </string-name>
          ,
          <source>Principles of Digital Signal Processing</source>
          , Springer Cham.,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Woods</surname>
          </string-name>
          , Digital image processing, Pearson/ Prentice Hall, New York,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Balovsyak</surname>
          </string-name>
          , Kh. Odaiska,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakovenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.Iakovlieva</surname>
          </string-name>
          ,
          <article-title>Adjusting the Brightness and Contrast parameters of digital video cameras using artificial neural networks</article-title>
          ,
          <source>in: Proc. SPIE, Sixteenth International Conference on Correlation Optics 12938</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>129380I</fpage>
          -
          <lpage>1</lpage>
          - 129380I-4. doi:
          <volume>10</volume>
          .1117/12.3009429.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Norvig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Artificial</given-names>
            <surname>Intelligence</surname>
          </string-name>
          .
          <article-title>A Modern Approach</article-title>
          , Pearson Education,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hovorushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kysil</surname>
          </string-name>
          ,
          <article-title>Selection of the artificial intelligence component for consultative and diagnostic information technology for glaucoma diagnosis</article-title>
          ,
          <source>Computer Systems and Information Technologies</source>
          <volume>4</volume>
          (
          <year>2023</year>
          )
          <fpage>87</fpage>
          -
          <lpage>90</lpage>
          . doi:
          <volume>10</volume>
          .31891/csit-2023-4-12.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Geron</surname>
          </string-name>
          ,
          <article-title>Hands-On Machine Learning with Scikit-Learn, Keras,</article-title>
          and
          <string-name>
            <surname>TensorFlow. O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>