Wavelet transform based optimization method for Three-
                                Dimensional computer vision
                                Svetlana Antoshchuk1, Galina Shcherbakova1, Sergey Kondratyev1, Daria Koshutina1 and
                                Oleksandr Usov1
                                1
                                    Odessa Polytechnic National University, Shevchenko Ave., 1, Odessa, 65044, Ukraine


                                                   Abstract
                                                   This work aims to develop approximate depth estimation methods for three-dimensional computer vision
                                                   in unmanned vehicles (UVs) based on a developed wavelet transformation-based optimization method. This
                                                   method can assess informative features for matching and/or optimizing costs in image analysis, refining
                                                   discrepancies, and more. Possible solutions are demonstrated for obtaining an approximate depth map by
                                                   simplifying the calculation of disparity values, traditionally used for forming a depth map in intensity space
                                                   and using edge description with adjustable detail based on wavelet transformation. The advantage of the
                                                   developed optimization method over existing algorithms in the wavelet space is the increased speed due to
                                                   the rational selection of the Haar wavelet support length in the extremum search area. Modeling confirmed
                                                   the effectiveness of the proposed approach for constructing depth maps and allowed for recommending the
                                                   proposed method for unmanned vehicles operating under limited computational and energy resources.

                                                   Keywords
                                                   Three-dimensional computer vision, contour detection, wavelet transform, optimization, image
                                                   analysis
                                                    1


                                1. Introduction
                                The primary source of information about the surrounding environment for solving a multitude of
                                tasks in various fields such as unmanned vehicles (quadcopters, unmanned cars), robotics, and vision
                                for the visually impaired (UVs) is video cameras [1]. Recent advancements in video sensor
                                technologies, as well as image and video analysis and processing methods, have enabled machine
                                vision to confidently advance into the realms of automation and security systems, including
                                household, industrial, and military applications.
                                    Modern machine vision systems not only extract video information but also present it in a form
                                and quantity that allows for the identification of significant, informative features of objects and
                                processes. These systems recognize and assess their position and condition, select rational
                                trajectories for UV movement, and, if necessary, generate corresponding control actions.
                                    A distinctive feature of machine vision systems is not only the identification and recognition of
                                an object and its position but also the consideration of the scene and object depth, i.e., the
                                transformation of a two-dimensional image into a three-dimensional one, where information about
                                the object is represented not just in units of brightness but with pixel/distance parameters. The
                                analysis of the evolution of three-dimensional computer vision and the corresponding hardware and
                                software designed for obtaining images and forming depth parameters has shown that the main
                                problem is the high cost of the sensors and image processors used, making them largely inaccessible
                                to the broader consumer market. Therefore, finding solutions for obtaining 3D depth information
                                about objects in an image characterized by low cost, low power consumption, and acceptable


                                1ICST-2024: Information Control Systems & Technologies, September, 23 – 25, 2024, Odesa, Ukraine

                                   asg@op.edu.ua (S. Antoshchuk); galina.sherbakova@op.edu.ua (G. Shcherbakova); kondratiev@op.edu.ua
                                (S. Kondratyev); d.v.koshutina@op.edu.ua (D. Koshutina); a.usoff@stud.op.edu.ua (O. Usov)
                                   0000-0002-9346-145X (S. Antoshchuk); 0000-0003-0475-3854 (G. Shcherbakova); 0000-0003-4975-5757 (S.
                                Kondratyev); 0009-0004-1326-8775 (D. Koshutina)

                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
performance is an important and relevant scientific and practical task for a wide range of the
aforementioned UVs.

2. Leveraging LiDAR and Stereo Cameras for Efficient Depth
   Information Retrieval in UVs
When solving navigation tasks for UVs or collecting depth information about objects for them, the
acquisition of this information can be redistributed and/or transferred to backup subsystems and/or
systems operating on other physical principles [2]. For instance, one such approach could be the
combination of data from LiDAR (laser sensors) and video sensors. Currently, the LiDAR-SLAM
technology is actively being researched by Suzuki [3], Lee, Savkin, and Vuletic [4], Tripicchio et al.
[5], Bi et al. [6], Mansouri et al. [7], Vong et al. [8], and others.
    These researchers have demonstrated the feasibility of using LiDAR for path planning, collision
avoidance, orientation, and navigation.
    However, image sensors (video cameras) are characterized by their lightweight, potential
decrease in accuracy under low lighting conditions, and loss (absence) of depth information, which
increases computational load during data processing. LiDAR allows for direct distance measurement,
but the result depends on the reflective properties of the surface and is energy-intensive [9].
Therefore, a less energy-intensive, interesting, and promising solution for obtaining depth
information about objects and/or scenes in an image is the stereoscopic vision approach. This
approach involves the use of two video cameras with known optical characteristics and parameters
of their relative positioning [4, 7, 8]. Typically, video cameras with similar optical characteristics are
used. These cameras are directed in the same direction, with the distance between the optical centers
of the cameras being much smaller than the distance to the observed objects. Depth information is
extracted by comparing and analyzing this pair of images. This approach allows for modeling image
processing similarly to how it occurs in the vision of many living beings. Just as a given control point
in space is located in different positions in each human eye, this allows the system to calculate the
position of this point in space. As a result, a height map is obtained where objects closer to the
observer are displayed in lighter shades, and those further away are displayed in darker shades.
Several solutions enable the creation of depth maps using local block matching (StereoBM function)
or semi-global block matching (StereoSGBM function) [10-11].
    Let's consider the method for constructing a depth map, implemented by the StereoBM function.
The algorithm involves the following stages of data processing:

   1.   Loading images from both video cameras (often using the OpenCV library).
   2.   Converting images from color to grayscale.
   3.   Creating a StereoBM object with specified parameters.
   4.   Calculating the disparity map (the magnitude of the difference in the localization of
        corresponding image elements obtained from the right and left cameras). Traditionally, this
        is done by matching pixels in the left and right images to calculate a disparity map.
   5.   Outputting and saving the disparity map.

    In the StereoBM algorithm, stages 1, 2, 3, and 5 are standard and do not provide opportunities to
increase execution speed and thereby reduce energy consumption. Therefore, stage 4, which is aimed
at calculating the disparity map, is more promising. This stage involves the following steps:

   1.   Selecting blocks. Blocks of a fixed size are highlighted in the image.
   2.   Comparing blocks. For each block found in the left image, the algorithm tries to find a
        matching block in the right image.
   3.   Calculating disparity. For each block in the left image, a displacement (disparity) is calculated,
        indicating how many pixels the corresponding right image is shifted. This disparity is stored
        in a disparity map.
   4.   Normalizing the disparity map. The resulting disparity map may contain values other than
        depth. Therefore, it is normalized so that the values are within [0, 255] for convenient display.

   In this part of the algorithm, the most complex and crucial for the quality of the depth map is the
block comparison stage. The block comparison algorithm searches for a pattern on the left frame
corresponding to a block on the right frame. This is done by algorithmically moving along the
horizontal axis of the right image and comparing it with the corresponding block of the left image.
Various approaches can be used to assess the degree of block matching [10-14]:

   1.   Based on assessing the proximity of pixel intensity values in blocks.
   2.   Based on assessing the minimum sums of squared differences of intensities (Sum of Squared
        Differences, SSD).
   3.   Based on determining the minimum normalized cross-correlation coefficient (Normalized
        Cross-Correlation).
   4.   Based on the Semi-Global Matching (SGM) method, which takes into account global and local
        features of low-quality images with different intensities or difficult or complex image
        acquisition conditions.
   5.   Based on adaptive window matching. Instead of a fixed block size, algorithms can use
        adaptive window mapping to account for changes in texture and intensities in images.
   6.   Based on the Graph Cuts method, which uses graph structures to calculate the disparity map.
        These algorithms can take global and local image properties into account and improve
        accuracy in challenging imaging conditions (e.g., low-light conditions).
   7.   Based on machine learning methods. These methods rely on machine learning models trained
        to predict disparity based on feature detection in images. They can handle images with
        uneven texture and variations in illumination.
   8.   Based on deep learning methods such as Convolutional Neural Networks (CNN). These
        neural networks can be trained on large datasets and perform well in complex lighting
        conditions.
   9.   Joint use of sensors implemented on various physical principles, such as LiDAR or infrared
        cameras, to improve the accuracy of the disparity map.

    Note that the approaches are numbered and sorted by the degree of increase in computational
costs. Approaches N2 to N6 require high energy consumption, which may be critically unacceptable
for BTS. Approaches N7 and N8 rely on neural network calculations, requiring computational
resources and a training "database," which can be challenging to provide for some BTS applications.
Approaches involving the joint use of sensors implemented on different physical principles (e.g., N9)
lead to increased energy consumption. Therefore, for further research, the method of comparing
image blocks in the right and left images (N1) was chosen as the base method, which can ensure the
speed of calculations. However, this method is characterized by low noise immunity, determining its
low quality. This property is especially noticeable in changing light conditions and/or low light
conditions, often encountered in applied problems solved by BTS. Additionally, the analysis allowed
us to establish the following. It should be noted that when developing algorithms for obtaining depth
information about objects and/or scenes, two assumptions about the structure of the observed scene
are used [9]. The first assumption considers that neighboring pixels have similar disparity values
(the difference between the positions of an object in the left and right images) because the scene
objects are piecewise smooth. The second assumption is based on the idea that homogeneous areas
of the scene correspond to flat surfaces in 3D. The algorithm for constructing a depth map involves
measuring the displacement along the x-axis of each point in the right frame and correlating it with
the corresponding point in the left frame [9, 15]. The search for the corresponding point is strictly
along the horizontal line of each frame.
    Therefore, to correctly determine the distance to objects during vertical calibration (along the y-
axis in the image), the position of the cameras is set such that the horizontal lines of both cameras
coincide. Horizontal calibration (along the x-axis in the image) is done by rotating the cameras
relative to each other until the x-coordinates of points at a distance of more than 10 meters coincide.
Consequently, this method requires pixel-precise positioning of the cameras both horizontally and
vertically. This complicates alignment and reduces the quality of object positioning [9, 12]. A
significant argument in favor of using stereoscopic systems in UVs is the availability of numerous
open-source algorithms in the OpenCV computer vision library (the StereoBM class exists) for
implementing stereo vision in languages such as C/C++, Python, Java, Ruby, Matlab, and others [12].
The main disadvantages of stereoscopic systems for constructing depth maps for mobile navigation
systems and UVs include the following [9]:

   ●   The need for camera calibration. It should be noted that even with perfectly accurate camera
       positioning, obtaining a depth map in these systems (e.g., when avoiding obstacles) poses
       difficulties due to the necessity of pixel-by-pixel image matching.
   ●   Dependence on the quality of the initial images and/or incorrectly set camera parameters,
       lighting, and illumination for each camera.
   ●   Dependence of the number of computational operations and, consequently, performance on
       the size and quality of the image.

    As the above analysis showed, a number of methods based on the assessment of local and global
image characteristics have been developed to evaluate disparity. Algorithms based on estimating
local image characteristics are fast but low in accuracy, while global methods can improve accuracy
and (often) noise immunity, but are characterized by low performance. The search for a compromise
between accuracy and speed can be aimed at developing methods for assessing disparity based on
optimization algorithms. When assessing disparity based on changes in the intensity of the pixels in
an image line, the coordinates of the extremum with a signal-to-noise ratio below 6 are estimated.
Known methods for searching for an extremum based on estimating the value of the first derivative,
which are traditionally used at this stage, do not work when the signal-to-noise ratio is less than 15.
To search for the minimum under such conditions, the authors developed and investigated an
optimization method based on the wavelet transform (WT) [16]. However, due to the large number
of stages using several types of wavelet functions, this method is characterized by low performance.
Therefore, improving the method to increase its performance for the application task described above
is relevant.

3. Positions Assessment of Depth Information for Object Positions
To reduce the impact of these disadvantages when constructing depth maps from stereo images, an
approximate approach to depth information assessment is proposed. This approach considers the
results of identifying areas of similar intensity or object contours in the images or points of maximum
curvature in the contour description.
   Figure 1 illustrates the main steps of in-depth assessment by identifying areas of similar intensity
in a test image (see Figure 1, a). The proposed approach involves the following:

   ●   For a segment of the intensity row, determine the region of maximum intensity on the image
       from one of the cameras, for example, the left one (1) (see Figure 1, b).
   ●   Create a brightness envelope template: the intensity value corresponding to the extremum
       coordinate of the row and the intensity values of neighboring pixels (Figure 1, c), and locate
       a similar fragment in the image from the right camera (2).
   ●   Determine the distance between similar elements and the corresponding disparity value.
   ●   Processing is performed row-wise and column-wise. If necessary, the results are combined
       using a logical OR scheme.
    To illustrate, brightness envelope templates representing a block of 3 pixels are provided for the
left and right frames of the image (see Figure 1, c). The templates show the intensity distribution
around the extremum in the given image row for the left and right cameras (see Figure 1, b).


                                                     b


Figure 1: a - Test image: right and left frames; b - Fragments of intensity rows from the right and
left frames; c - Template showing the distribution of intensities around the extremum in the given
image row.

   Both templates have a "peak" shape (other possible shapes include "trough," "plateau," "rise," "fall,"
etc.). Black squares highlight the brightness levels of each pixel in the blocks, while gray areas
indicate the brightness dispersion boundaries. The disparity between blocks is 4 pixels along the X-
axis, and the absolute difference in brightness is 2. In real stereo images, the brightness difference
can reach up to 30. This approach is simpler than the well-known StereoBM method [10, 12 - 14],
which uses two computationally intensive procedures for similar searches: Sum of Squared
Differences (SSD) and Normalized Cross-Correlation (NCC), both of which also exhibit lower
performance. The computational complexity of finding templates can be reduced if object contours
are used in the calculation of disparity. It should be noted that contours are the most informative
part of object images, and their analysis can significantly reduce the number of computational
operations [15].
    However, it should be noted that in this case, the quality of the depth map depends on the
effectiveness of the contour detection methods used, among which the most noise-resistant methods
are those using the wavelet transform (WT) [15]. To extract the contours of objects using the wavelet
transform, the summing component of the approximation (vertical) of the smallest scale (a1) of the
discrete wavelet transform is removed by setting its values to zero. Calculation of the inverse
transformation using the detailing component (d1) leads to the selection of contours (horizontally)
in the image thus reconstructed. In a similar way, the vertical boundary of an object can be selected.
The procedure for identifying contours using the wavelet transform can be presented as follows:
                                          f0    1                                               (1)

                                         d1
    where f is the row (column) of the original image; a1 summing component of the wavelet
transform; d1 detailing component of the wavelet transform; f ′ row (column) of the image after
the inverse transformation; K operator for obtaining a contour preparation.
    Due to the frequency-selective properties of the WT, the ratio of the intensity of the contour of
an object having a smaller size to the intensity of the contour having a larger size decrease as the
scale of the WT increases [15]. This pattern makes this technique insensitive to changes in the
intensity of the object and allows you to take its size into account (adjust the detail). This scheme is
proposed to be used to estimate depth by identifying the contours of objects with adjustable detail
in the wavelet transform space (see Figure 2):

   ●   The rows of the left frame image (see Figure 2, b) and the right frame image are convolved
       with wavelet functions of a given scale and contour areas are detected (see Figure 2, c);
   ●   The distance between object contours and the corresponding disparity value are determined;
   ●   Processing is performed along rows and columns as needed. Results are combined using a
       logical OR scheme;
   ●   If necessary, the process is repeated for different scales.


                                                   a


                                                   b


                                                   c
Figure 2: a - Test image, left camera; b - Fragment of intensity row, left camera; c - Wavelet
transform result for the images from the left and right cameras with a wavelet function support
length of s=1

  Research has also shown that with a wavelet function support length s = 1, both fine and coarse
image details result in peaks of similar amplitude in the intensity drop area after processing.
    As the scale increases, the relative size of the peaks for fine details decreases, while the amplitude
of the peak at the boundaries of large-scale objects increases [15].
    Furthermore, the proposed methodology allows for adjusting the level of image detail, ensuring
high noise resilience, resolution of object contour extraction, and providing an approximate depth
information assessment that meets the requirements for a range of tasks.
    Thus, depth map construction methods can be categorized into local and global groups. Local
method-based algorithms are characterized by their speed and lower accuracy, while global methods
enhance accuracy but are slower.
    To find a compromise between accuracy and performance, a group of methods based on
optimization algorithms is actively being developed.
    These methods can be used for evaluating informative features, matching fragments, and/or
optimizing costs in image analysis, as well as refining discrepancies and other related tasks [11].

4. Development of the Optimization Method
Analysis has shown that the use of global optimization methods based on exhaustive search leads to
high computational costs, while local methods based on first and second derivative estimates exhibit
low noise resilience [16, 17].
   Therefore, for solving a range of optimization problems in image analysis for UVs, the wavelet
transformation (WT) - based approach has been developed, which enhances noise resilience while
providing sufficient accuracy for these tasks. In known optimization methods using WT, the
property of gradient estimation and processing with WT changes sign as it approaches an extremum
[15].
   Continuous wavelet transformation is defined by convolution:
                                           +∞                                              (2)
                                      1                𝑥 − 𝑥0
                        𝑊(𝑠, 𝑥0 ) =      ∫ 𝑓(𝑥)Ψ ∗ (          ) 𝑑𝑥 ,
                                    √|𝑠| −∞               𝑠
   where 𝑓 is the function being transformed
                                   0
                                              (analyzed); Ψ𝑠,𝑥0 (𝑥)is a two-parameter basis function
derived from the mother wavelet Ψ0  (𝑥) through scaling with a scaling factor 𝑠 ∈ 𝑅 + and translation
                                 ∗
with a parameter 𝑥0 ∈ 𝑅; and Ψ is the complex conjugate function with respect to Ψ (where 𝑠
corresponds to the width of the wavelet and 𝑥 defines the position of the wavelet on the x-axis). The
         1
factor      introduced for normalization.
         √𝑠
   Iterative Extremum Search Method in Wavelet Transform Space. Let's define the one-dimensional
partial wavelet transform:
                                         1      𝐽(𝑎)                                          (3)
                                    ℎ𝑛 = ∫              𝑑𝑎𝑛
                                         𝜋 𝑐𝑛 − 𝑎𝑛
   where 𝐽(𝑎) is the objective function; 𝑎 is the parameter vector; 𝑐𝑛 − 𝑎𝑛 represents the parameters.
   In known optimization methods using wavelet transform (WT), the property of gradient
estimation and wavelet processing is employed, where the sign of the gradient changes as it
approaches an extremum. In other words, the condition for an optimum is considered to be the
equality of all partial wavelet transforms: WT(c) = 0 .
   Operator
                                        (𝑐) = (ℎ1 , … , ℎ𝑁 )                                  (4)
   was used for synthesizing regular iterative algorithms. Here, the one-dimensional partial wavelet
transform is: Based on this approach, several authors have developed and investigated algorithms
for adaptation, optimization, segmentation, classification, and clustering [16 - 18]. For instance, a
regular iterative optimization algorithm in the wavelet transform space [19] is as follows:
                            𝑐[𝑛] = 𝑐[𝑛 − 1] − 𝛾[𝑛] ⋅       𝑐[𝑛 − 1]                           (5)
where 𝛾 [𝑛] step size; 𝑛 iteration number; 𝑐 extremum coordinate.
   It should be noted that due to the direction of the search being assessed using wavelet processing,
the optimization requires significant computational resources.
    This limits the applicability of the method for practical UV tasks, as it is necessary to ensure
optimization efficiency, especially for multimodal objective functions with low signal-to-noise ratios
[16, 17].
    In the known method, the assessment of the search direction using Haar wavelet functions (WF)
provides robustness against noise. The choice of the Haar wavelet among a significant number of
existing wavelets is due to the fact that it is characterized by low computational complexity.
    When implementing computational procedures using it, multiplication operations by +1 and -1
are performed, which can simplify the hardware implementation of this procedure and increase the
speed of operation [17].
    However, with a large Haar wavelet function length and a high signal-to-noise ratio of the
objective function, the search may deviate in the wrong direction, moving away from the global
minimum towards a local minimum. This results in reduced search efficiency.
    The proposed work aims to improve search efficiency by selecting the Haar wavelet function
length in the search area.
    To achieve this, it is proposed that after localizing the minimum area of the functional during the
Haar wavelet function search phase Ψ1 (𝑖) and narrowing down the search area by defining
constraints 𝑔𝑐 ≤ 0, the Haar wavelet function length should be selected judiciously, taking into
account the characteristics of the extrema in disparity evaluation.
    The method with iterative constraint evaluation using Haar wavelet functions is implemented in
the following sequence:
    Step 1. Execute steps 1-5 of the basic wavelet optimization method, taking into account the
iterative scheme [17]. The coordinates of the extremum are determined as follows:
                      𝑐[𝑛] = 𝑐[𝑛 − 1] − 𝛾[𝑛] ⋅        𝑘 (𝑄(𝑥 [𝑛], 𝑐[𝑛 − 1]))                     (6)
   where 𝛾 [𝑛] step size; 𝑛 iteration number; 𝑘 starting index; 𝑄(𝑥, 𝑐) functional vector
dependent on vector 𝑐 = (𝑐1    𝑐𝑁); 𝑥 = (𝑥1    𝑥𝑀) minimum coordinate at iteration 𝑐[𝑛     𝑘
  direction of movement towards the extremum, calculated as follows:
                         𝑘 (𝑄(𝑥 [𝑛], 𝑐[𝑛 − 1])) = {𝐺1𝑘 , 𝐺2𝑘 , … , 𝐺𝑁𝑘 }               (7)
where 𝐺𝑗𝑘 the result of processing by the 𝑗-th variable:
                                     𝑠𝑘
                                      2
                                                                                       (8)
                                 1
                          𝐺𝑗𝑘 =     ∑ 𝑄(𝒙[𝑛], 𝒄𝑗 + 𝑖𝑎) ⋅ 𝛹𝑘 (𝑖)
                                𝑠𝑘      𝑠
                                    𝑖=− 𝑘
                                        2
                                    𝑖≠0
   where s𝑘 is the length of the Haar wavelet support; Ψ𝑘 (𝑖) is the step size of the Haar wavelet
discretization; Ψ1𝑗 is the Haar wavelet at the k-th start; 𝑗 = 1, … , 𝑁 is the dimensionality of the
parameter vector. This approach allows for determining the range of variation in the extremum [18]
coordinates with 𝛿1 being the search error for the optimal start (determined during the preliminary
functional quality studies); 𝛿2 being the search error for the practical task's optimum. The search is
conducted considering the following parameters: 𝑐[0] as the initial approximation to the optimum
coordinate; 𝛾[1] as the step size; 𝑎 as the Haar wavelet discretization step size; s1 as the length of
the Haar wavelet support for the first start Ψ𝑘 (𝑖) (determined during the preliminary functional
quality studies); Δ𝑠 as the step size for changing the length of the Haar wavelet support when
determining the range of extremum coordinates; k = 1 as the start number; n = 1 as the iteration
number; A1 as the value of the minimal height of the registered extremum. The search continues
until the sign of processing 𝑊𝑇𝑘 (𝑄(𝑥 [𝑛], 𝑐[𝑛 − 1])) changes when estimating the direction of
movement towards the extremum.
   Step 2. The constraints of the search zone for the next algorithm start g_1(c[n]) and g_2(c[n])
are determined:
                           𝑔1 (𝑐[𝑛]) ≥ 𝑐 ∗ [𝑛 − 1]; 𝑔2 (𝑐[𝑛]) ≥ 𝑐 ∗ [𝑛]                        (9)
            ∗ [𝑛
   where 𝑐       − 1]is the extremum coordinate at step [𝑛                                    𝑥𝑛 is the
extremum coordinate at step 𝑛, where the sign changed (see Figure 1).
   Step 3. The length of the Haar wavelet support 𝑠𝑘 for the subsequent start is determined by
                                                                        |𝑔2 (𝑐[𝑛])−𝑔1 (𝑐[𝑛])|
selecting from the range 𝑠𝑘 = {2,4,6} according to the condition 𝑠𝑘 ≤                       .
                                                                                 𝑎


Support Length Ψ𝑘 (𝑖)𝑠1 = 10
Defined Using Haar Wavelet Ψ𝑘 (𝑖) (see Figure 3). When the SNR (signal-to-noise ratio) is reduced
to 2, the proposed method's performance (measured by timer) is on average 1.1 times higher (see
Figure 4a), and the relative error in finding the minimum is 1.2 times greater due to the introduced
stage of selecting the length of the Haar WF carrier (Fig. 2, b). The sensitivity of the developed
optimization method to local extrema and the starting point of the search using the test Schwefel
function
                          𝑓 (𝑥) = 418,9829 + (−𝑥 ⋅ 𝑠𝑖𝑛 √|𝑥|)                               (10)
                          𝑠
   The sensitivity of the developed optimization method to local extrema and the starting point of
the search using the test Schwefel function has been studied. This function has a false global
minimum. The function at x ∈ (−500; 500) has a global minimum fs (x) = 0) at (x = 420.9829.
During the research, the starting point was chosen randomly. The gradient descent method made it
possible to find the minimum closest to the start, while the optimization method with the Haar
wavelet function achieved the global minimum with an error 𝛿 ≤ 10−2 in 128 out of 150 cases.
   This results in a probability of finding the extremum coordinate of 0.85. The global minimum was
not found when the starting point values were chosen outside the interval 𝑥 ∈ (−420; 470).


Figure 3: Constraints Identification Using Haar Wavelet Functions. The coordinates 𝑐 ∗ [𝑛 − 1] and
𝑐[𝑛] are marked with squares.


 1 basic method [16]; 2 proposed method            1 basic method [16]; 2 proposed method
Figure 4: Experimental results of the basic [16] and proposed methods: execution time (in seconds)
(a); relative error in determining the extremum coordinates (b).
   Future research in this direction will focus on evaluating the impact of wavelet function support
length on the performance of the maximum search procedure, as well as the effects of the wavelet
discretization step and the step size in iterative searches with Haar wavelets (𝛾).

5. Testing
The developed optimization method on the wavelet transformation base was tested in the context of
constructing a depth map from a test image in the image database [20, 21].
    The length of the wavelet support used in the minimum search after investigation was chosen as
17, the wavelet discretization step was 1, and the step 𝛾 in the iterative Haar wavelet search was
applied 0,9. The minimum image alignment error was found after 12 iterations (starting the search
at [1; 1]).
    Furthermore, simulation results indicated that modifying the known StereoBM method by
incorporating wavelet transformation for disparity calculation enhances noise robustness, reduces
error in locating the characteristic fragment on the intensity line, and also decreases energy
consumption by nearly a factor of 2 (Table 1).

   Table 1
   Results of Simulation
      Algorithm                   StereoBM                         Developed
    Power      Consumption,       950                              620
    mA
    Depth Map Obtained


6. Conclusions
The paper addresses the development of approximate depth estimation methods for three-dimension
computer vision in autonomous vehicles based on an improved optimization method utilizing
wavelet transform. This method can be used for evaluating informative features for fragment
matching and/or optimizing costs in image analysis, as well as for refining inconsistencies, among
other applications. The study demonstrates possible solutions for obtaining an approximate depth
map by simplifying the calculation of disparity values, traditionally used for depth map formation,
within the intensity space and using image contour descriptions with adjustable detail based on
wavelet transform. The advantage of the developed optimization method over existing algorithms
on the wavelet transformation base is its increased efficiency due to the rational choice of Haar
wavelet support length in the extremum search area.
   Future research in this direction will focus on evaluating the impact of wavelet function support
length on the performance of the maximum search procedure, as well as the effects of the wavelet
discretization step and the step size in iterative searches with Haar wavelets.
   This improvement makes the proposed method suitable for autonomous vehicles operating under
constraints of computational and energy resources.
    References
[1] A. Karnati, D. Mehta, and Ks, Manu. "Artificial Intelligence in Self Driving Cars: Applications,
     Implications and Challenges," Universal Journal of Business and Management 21 (2022): 1-28.
     doi:10.12725/ujbm.61.1.
[2] I. Konovalenko, E. Kuznetsova, A. Miller, B. Miller, A. Popov, D. Shepelev, and K. Stepanyan,
     "New Approaches to the Integration of Navigation Systems for Autonomous Unmanned
     Vehicles (UAV)," Sensors 18 (2018): 3010. doi:10.3390/s18093010.
[3] S. Suzuki, "Integrated Navigation for Autonomous Drone in GPS and GPS-Denied
     Environments," Journal of Robotics and Mechatronics 30.3 (2018): 373 379.
     doi:10.20965/jrm.2018.p0373.
[4] H. Li, A. V. Savkin, and B. Vucetic, "Collision Free Navigation of a Flying Robot for Underground
     Mine Search and Mapping," in 2018 IEEE International Conference on Robotics and Biomimetics
     (ROBIO), pp. 1102 1106. IEEE, 2018. doi:10.1109/ROBIO.2018.8665108.
[5] P. Tripicchio, M. Satler, M. Unetti, and C. A. Avizzano, "Confined Spaces Industrial Inspection
     with Micro Aerial Vehicles and Laser Range Finder Localization," International Journal of Micro
     Air Vehicles 10.2 (2018): 207 224. doi:10.1177/1756829318757471.
[6] Y. Bi, M. Lan, J. Li, K. Zhang, H. Qin, S. Lai, and B. M. Chen, "Robust Autonomous Flight and
     Mission Management for MAVs in GPS-Denied Environments," in: 11th Asian Control
     Conference (ASCC), 2017. doi:10.1109/ascc.2017.8287144.
[7] S. S. Mansouri, C. Kanellakis, D. Kominiak, and G. Nikolakopoulos, "Deploying MAVs for
     Autonomous Navigation in Dark Underground Mine Environments," Robotics and Autonomous
     Systems 126 (2020): 103472. doi:10.1016/j.robot.2020.103472.
[8] C. H. Vong, R. Ravitharan, P. Reichl, J. Chevin, and H. Chung, "Small Scale Unmanned Aerial
     System (UAS) for Railway Culvert and Tunnel Inspection," in: ICRT 2017, 2018, pp. 1024 1032..
     doi:10.1061/9780784481257.102.
[9] H.-W. Choi, et al., "An Overview of Drone Applications in the Construction Industry," Drones
     7.8 (2023): 515. doi:10.3390/drones7080515.
[10] R. A. Hamzah and H. Ibrahim, "Literature Survey on Stereo Vision Disparity Map Algorithms,"
     J. Sensors (2016): 8742920:1-8742920:23. doi:10.1155/2016/8742920.
[11] J. Du and J. Okae, "Optimization of Stereo Vision Depth Estimation Using Edge-Based Disparity
     Map," in: 10th International Conference on Electrical and Electronics Engineering (ELECO),
     2017, pp. 1171-1175.
[12] M. Richter, M. Rosenberger, R. Illmann, D. Buchanan, and G. Notni, "Suitability Study for Real-
     Time Depth Map Generation Using Stereo Matchers in OpenCV and Python," in: Engineering
     for a Changing World: Proceedings: 60th ISC, Ilmenau Scientific Colloquium, Technische
     Universität Ilmenau, September 04-08, 2023. doi:10.22032/dbt.58859.
[13] A. Aslam and M. S. Ansari, "Depth-Map Generation Using Pixel Matching in Stereoscopic Pair
     of Images," (2019) arXiv:1902.03471v3 [cs.CV]
[14] A. A. Fahmy, O. Ismail, and A. K. Al-Janabi, "Stereovision Based Depth Estimation Algorithm
     in Uncalibrated Rectification," International Journal of Video & Image Processing & Network
     Security 13.2 (2013).
[15] S. G. Antoshchuk, S. B. Kondratyev, G. Y. Shcherbakova, and M. A. Hodovychenko, "Depth Map
     Generation for Mobile Navigation Systems Based on Objects Localization in Images," Herald of
     Advanced Information Technology 5.1 (2022): 11-18. doi:10.15276/hait.05.2022.1.
[16] G. Shcherbakova, V. Krylov, V. Abakumov, V. Brovkov, and I. Kozina, "Sub Gradient Iterative
     Method for Neural Networks Training," in: Intelligent Data Acquisition and Advanced
     Computing Systems: Technology and Applications: 6th IEEE Int. Workshop IDAACS'2011,
     Prague, Czech Republic, 15 17 Sept. 2011: Proceedings, pp. 361-364.
[17] W. Huan, G. Shcherbakova, A. Sachenko, L. Yan, N. Volkova, B. Rusyn, and A. Molga, "Haar
     Wavelet-Based Classification Method for Visual Information Processing Systems," Applied
     Sciences 13.9 (2023): 5515. doi:10.3390/app13095515.
[18] Y. Bodyanskiy, N. Lamonova, I. Pliss, and O. Vynokurova, "An Adaptive Learning Algorithm
     for a Wavelet Neural Network," Expert Systems 22.5 (2005): 235-240. doi:10.1111/j.1468-
     0394.2005.00314.x.
[19] G. Shcherbakova, H.-S. Shi, V. Krylov, N. Bilous, and S. Antoshchuk, "Estimation of the
     Duration of RR-Intervals of Electrocardiograms by Means of Multi-Start Optimization Based on
     Wavelet Transformation," in: IEEE 9th International Workshop on Intelligent Data Acquisition
     and Advanced Computing Systems: Technology and Applications, 21-23 September 2017,
     Bucharest, Romania. doi:10.1109/IDAACS58523.2023.10348849.
[20] Middlebury Stereo Vision Dataset, "Full-Size Stereo Data and Scene Information," Middlebury
     Stereo                        Vision                      Project.                      URL:
     https://vision.middlebury.edu/stereo/data/scenes2005/FullSize/Art/Illum1/Exp1/.
[21] Middlebury Stereo Vision Dataset, "Stereo Data Archive," Middlebury Stereo Vision Project.
     URL: https://vision.middlebury.edu/stereo/data/scenes2005/.