<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A method of iterative image normalization for tasks of visual navigation of UAVs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M O Elantcev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I O Arkhipov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R M Gafarov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kalashnikov Izhevsk State Technical University</institution>
          ,
          <addr-line>Russian Federation, Izhevsk, Studencheskaya 7, 426069</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>144</fpage>
      <lpage>152</lpage>
      <abstract>
        <p>The work deals with a method of eliminating the perspective distortion of an image acquired from an unmanned aerial vehicle (UAV) camera in order to transform it to match the parameters of the satellite image. The normalization is performed in one of the two ways. The first variant consists in the calculation of an image transformation matrix based on the camera position and orientation. The second variant is based on matching the current frame with the previous one. The matching results in the shift, rotation, and scale parameters that are used to obtain an initial set of pairs of corresponding keypoints. From this set four pairs are selected to calculate the perspective transformation matrix. This matrix is in turn used to obtain a new set of pairs of corresponding keypoints. The process is repeated while the number of the pairs in the new set exceeds the number in the current one. The accumulated transformation matrix is then multiplied by the transformation matrix obtained during the normalization of the previous frame. The final part presents the results of the method that show that the proposed method can improve the accuracy of the visual navigation system at low computational costs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>This paper is concerned with the development of a method for image normalization that is used in a
visual navigation system of unmanned aerial vehicles (UAVs). The navigation system being
developed [1] matches images acquired from an UAV onboard camera with the satellite image of the
flying area. The matching yields the shift, rotation, and scale values that can be used to transform the
input image to the corresponding part of satellite image. Knowing the coordinates of the satellite
image and the shift, rotation, and scale parameters of the current input image, it is possible to
determine the coordinates of the UAV.</p>
      <p>During the flight the tilt of the UAV is unstable, it changes because of weather conditions or when
the UAV is making a pivot. It results in perspective distortions of images acquired from the onboard
camera, even if a stabilized platform used. The distortions interfere with the correct matching of the
images from the camera with the satellite image, which is usually represented in an orthogonal
projection. To increase the likelihood of correct matching a method of image normalization for
eliminating the perspective distortions is needed. The method should be computationally simple
because it is supposed to run on low-power U AV processors.</p>
      <p>There are several approaches to solve this problem. The first approach is to calculate the matrix of
perspective transformation based on the external parameters of the camera obtained from the UAV
sensors, namely pitch, yaw, roll, and altitude. In [2], the distortion is compensated by determining the
orientation of a virtual camera with its optical axis collinear to the normal vector of the shooting plane.
This virtual camera provides us with an image without perspective distortions. Therefore, to eliminate
the distortions of the input image we need to calculate the homography matrix between the images
from the real and the virtual cameras. There are two difficulties with using this approach: the moments
of shooting and of external parameters recording should be strictly synchronized, and the surface of
the flying area should be flat.</p>
      <p>The second approach is to find and analyze the location of the known structural elements in the
input image. This approach assumes that each frame from the camera is processed independently. The
paper [3] focuses on the distortion elimination for the task of bar code recognition. Its idea is to find
four straight lines bounding the quadrilateral barcode area. These lines provide four corner points that
are used to calculate the transformation matrix. This approach is difficult to apply to the task of
normalizing the UAV images because the shapes of the objects in the images are usually unknown.</p>
      <p>The third approach is to find the pairs of corresponding elements in the two images and use their
positions to determine the transformation matrix of one image into another. Usually the methods
search for the pairs of image points with characteristic local features known as keypoints. Depending
on the task, different methods of keypoints extraction and matching can be used [4,5].</p>
      <p>Under ideal conditions, four pairs of the keypoints are enough to determine the transformation
matrix. However, in practice even the positions of the correctly matched points can be different
because of noise. To obtain a more accurate solution a system of equations based on all the matched
pairs is constructed. This system is usually solved by the least squares method so as to minimize the
given error function (the DLT method [6]). In [7], it is proposed to perform this method iteratively.
This method helps to increase the likelihood of image matching by means of eliminating the noise
points.</p>
      <p>The set of the pairs of corresponding points can also contain mismatched pairs. In this case the
DLT method can yield a wrong result and one of the robust methods should be used, such as
RANSAC [8, 9]. In this method four pairs of corresponding points are randomly selected among all
the matched pairs. Using them the homography matrix is calculated, and then the algorithm checks
how many pairs fit this transform (inliers). The process is repeated, and other random pairs are
selected from the set of inliers until a solution is found that satisfies the specified number of pairs or
the maximum number of iterations is exceeded. Another similar robust method is LMS. It estimates
four current selected pairs of points not by the number of inliers but by the median distance among all
the pairs of points [6].</p>
      <p>The described robust methods require a large number of iterations; therefore, they are not effective
on low-power processors. The RHO method [10] aims to solve this problem. Instead of randomly
electing four pairs of points at each iteration, the method preliminarily evaluates each pair, selects the
most promising ones and then corrects the estimation. This allows to obtain the result in significantly
fewer iterations.</p>
      <p>The fourth approach is to analyze the input image using a neural network, as it is demonstrated in
[11]. It is noted that the advantage of this approach is the ability to work when there is a very small
number of the keypoints or when the input images are blurred. However, this method is
computationally complex and requires special hardware to work in real time, which makes it difficult
to apply to the task of UAV visual navigation.
input image,
previous frame
data</p>
      <p>Is the
previous frame
successfully
matched?</p>
      <sec id="sec-1-1">
        <title>Normalization based on the external parameters of the camera</title>
      </sec>
      <sec id="sec-1-2">
        <title>Normalization based on keypoints matching normalized image</title>
        <p>The proposed method of iterative image normalization for tasks of visual navigation of UAVs
combines the first and the third approaches. A simplified scheme of the method is shown in Figure 1.
Normalization based on the external parameters of the camera (pitch, yaw, roll, and flight altitude) is
used either when processing the initial camera frame or in the case where the previous frame has not
been successfully matched with the satellite image or the current frame cannot be matched with the
previous one. In other cases the iterative method of homography matrix calculation based on matching
the keypoints of the previous and the current frames is used.
2. Description of the peculiarities of the task considered
The UAV flies along the given trajectory that contains both straight sections and turns. The considered
flight altitude is 250-500m. There is a camera onboard the UAV that is directed strictly downwards.
There is no stabilized platform on the UAV. The shooting speed of the camera is enough to provide at
least 80% of overlap between the current and the previous frames. The parameters of focal length and
the dimensions of the camera's sensor are known.</p>
        <p>The UAV is equipped with an accelerometer and a pressure sensor that provide information about
the UAV orientation (roll, pitch, and yaw) and the flight altitude. However, according to experiments
on real images, the moments of shooting and of parameters recording are not exactly the same.
Therefore, strong wind gusts cause inaccuracies in the recorded external parameters of the camera. In
addition, the pressure sensor estimates the flight altitude only above sea level. When calculating the
flight altitude above ground level the height map of the flying area should be taken into account.</p>
        <p>The result of the normalization is the image transformed into the orthogonal projection so that later
it can be matched with the satellite image.
3. Normalization based on the external parameters of the camera
The first algorithm of the proposed normalization method is based on the information on the camera
orientation and position in space, namely the roll ρ, pitch τ, yaw γ, and flight altitude hf. In addition,
the input information includes the terrain height ha at the shooting point and the reference altitude hr at
which the shooting scale equals the satellite image scale. The output information is the coefficients of
the transformation matrix that eliminates the distortion. The algorithm consists of two stages: the
calculation of the transformation coefficients disregarding the scale and the calculation of the scaling
factor.
3.1. Calculation of the transformation coefficients disregarding the scale
The camera’s field of view can be represented as a pyramid that intersects the earth's surface plane. To
simplify the calculation, instead of rotating the pyramid, according to the camera orientation the earth's
surface plane is rotated and placed at one of the base vertices of the pyramid, as Figure 2a shows.</p>
        <p>The purpose of the first stage is to calculate the coordinates of the points M, N, K on the top view
projection (Figure 2b). These coordinates can be used to calculate the resultant homography matrix Ta
by solving an equation system constructed with the coordinates transformation equations of the pairs
of points: A-A, B-K, C-N, D-M. The transformation of each pair of points can be described by two
equations [12]:</p>
        <p>′ =  1311+ + 1322+ +113  ′ =  2311+ + 2322+ +123, (1)
where x and y are the coordinates of the point before the transformation, x’ and y’ are the coordinates
of the point after the transformation, t11, t12, t13, t21, t22, t23, t31 и t32 are the coefficients of the
transformation matrix Ta.</p>
        <p>The shape of the pyramid depends on the size of the photo sensor and the focal length. The viewing
angles are calculated as follows:</p>
        <p>x = 2 arctg 2 wL  y = 2 arctg 2 hL, (2)
where mw and mh are the dimensions of the photo sensor, fL is the focal length. The vertex angles of the
pyramid are calculated as follows:</p>
        <p>x =   wh sin  2  y =  wh sin  2 (3)</p>
        <p>The length of the edge of the pyramid le and the diagonal of the base of the pyramid ld are
calculated as follows:</p>
        <p>e = 2sin ( hy/2)  d =  w2 +  h2, (4)</p>
        <p>Depending on the orientation of the camera, the earth's surface plane is placed at one of the four
base vertices, as shown in Figure 3.</p>
        <sec id="sec-1-2-1">
          <title>3.2. Calculation of the scale factor</title>
          <p>Since the earth's surface plane was placed at one of the base vertices of the pyramid, the obtained
transformation matrix disregards the original scale of the image. To calculate the scaling factor the
following formula can be used:
 nd =</p>
          <p>(Nx − A′x)2 + (Ny − A′y)2</p>
          <p>To transform the image acquired from the UAV camera to the scale of the satellite image, the
transformation matrix should be adjusted using the following scale factor:</p>
        </sec>
        <sec id="sec-1-2-2">
          <title>3.3. Obtaining the result of the normalization</title>
          <p>To obtain the final transformation matrix the following formula should be used:
  =  e nd</p>
          <p>nf  d
  =   ℎf−ℎa</p>
          <p>ℎr−ℎa
  =  
 
0
0
0
 
0
0
0
1
4. Normalization based on keypoints matching
The first algorithm of the proposed method eliminates the distortions of the image coarsely, because,
as it is mentioned above, the moment of recording the external parameters of the camera not
necessarily coincides with the moment of shooting. Moreover, it is rather difficult to accurately take
into account the elevation difference of the underlying surface. Therefore, it is used only when the
previous frame from the camera cannot be matched with the satellite image. In other cases the second
algorithm is used that is based on matching the current frame with the previous frame already
normalized.
4.1. Matching the current frame with the previous one
To reduce its complexity, the image matching algorithm analyzes not all the pixels but only the
keypoints extracted with the method described in [13].</p>
          <p>After the keypoint sets of the both images are found, they are used to determine the shift, rotation,
and scale values between the two images using the statistical differentiation method [14]. These
parameters form the following transformation matrix [15]:
 m =  sin 
 cos 
0
−

sin   x
cos   y ,
0
1
where tx and ty are the shift values along the X axis and the Y axis respectively, φ is the rotation value,
s is the scale value.
4.2. Selecting pairs of corresponding points
After the transformation matrix is found, it is used to determine the pairs of corresponding points.
Some point A of the current frame corresponds to some point B of the previous frame when after
(9)
(10)
(11)
(12)
applying the transformation matrix to the coordinates of the point A the distance between A and B
does not exceed the specified threshold (1-2 pixels).</p>
          <p>From the whole set of the pairs of corresponding points four pairs are selected that will be used to
calculate the homography matrix (1). The choice is limited by the condition that no three points are
collinear. The best case occurs when a quadrilateral of maximum area with as large angles as possible
is found. However, the search for such a set of points is a computationally complex operation. The
following approaches to selecting the points were tried:
1) the four points nearest to the corners of the image;
2) the four points most distant from the center of mass of the keypoints set and located in different
quadrants relative to the center of mass;</p>
          <p>3) the first point is the most distant from the center of mass, the second point is the most distant
from point 1, the third is the point with the greatest sum of its distances from points 1 and 2, the fourth
is the point with the greatest sum of its distances from points 1, 2 and 3;
4) a random selection of points.</p>
          <p>If this algorithm finds less than four corresponding pairs, the normalization based on the external
parameters of the camera is applied.
4.3. Calculation of the transformation matrix
After the four pairs of the corresponding keypoints are selected, the refined transformation matrix is
calculated. This homography matrix matches the projection of the current frame with the projection of
the previous frame already normalized.</p>
          <p>Then the search for the pairs of corresponding points is repeated using the current frame warped
according to the refined transformation matrix. From the found pairs a new set of four pairs is selected
and a new refined matrix is calculated. The process is repeated as long as the number of the pairs of
corresponding points increases. Experiments show that in most cases two iterations are enough to
obtain the solution. The final matrix of the transformation of the original image into the normalized
one is determined by the formula:
where Tp is the perspective transformation matrix for normalizing the previous frame, Hi are the
transformation matrices obtained at each iteration, n is the number of iterations.</p>
          <p>Since the previous frame is used to normalize the current frame, the normalization error can
accumulate. However, this method is used as a part of the visual navigation system that implies that
the next step is to match the normalized image with the satellite image. This matching eliminates the
 r =</p>
          <p>∏=1  
(13)
accumulated error.
5. Experiments
To test the proposed method, it has been integrated into the preprocessing stage of the UAV visual
navigation system that is being developed. Then a series of experiments using real images of a UAV
flight were performed. The trajectory of the UAV flight is shown in Figure 5 and consists of sections
of two types: straight sections (270 frames) and turns (70 frames).
marked with green dots. The positions of the UAV at the turns are marked with yellow dots. The
black lines show the boundaries between the sections. The white lines connect the coordinates of the</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>UAV to the corresponding centers of photographing.</title>
        <p>The first experiment aims to choose the best algorithm for image normalization based on keypoints
matching. The proposed iterative algorithm was tested (with the four variants of selecting the four
pairs of points described above), as well as the methods implemented in the OpenCV 3.4.5 library:
DLT, RANSCAC, LMS and RHO. To determine the speed of the methods, each input frame was
processed 100 times and the average execution time was calculated. Processing was performed on an
AMD FX-8320E 3.2 GHz processor without using a graphics accelerator.
method from the OpenCV library, and at that the proposed method performs much faster, which is
very important for integrating it into a UAV visual navigation system.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>6. Conclusion</title>
      <p>The proposed method of iterative image normalization successfully solves the task of image
preprocessing for developing a UAV visual navigation system. The integration of this method into the
navigation system improves the accuracy of matching the images acquired from the onboard camera
with the satellite image and reduces the probability of the UAV loss.</p>
      <p>Due to using the two normalization approaches the proposed method can eliminate perspective
distortions resulting from both the camera orientation and the inclination of the underlying surface. At
the same time, this method is computationally simple and can be integrated into the UAV navigation
system, which should run on low-power processors.</p>
      <p>was funded by</p>
      <sec id="sec-2-1">
        <title>Kalashnikov ISTU according to the research project</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>The reported study № 09.06</source>
          .01/18 GMM.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>