1. Introduction

A method of iterative image normalization for tasks of visual navigation of UAVs

M O Elantcev

I O Arkhipov

R M Gafarov

0 0 Kalashnikov Izhevsk State Technical University , Russian Federation, Izhevsk, Studencheskaya 7, 426069

2019

144 152

The work deals with a method of eliminating the perspective distortion of an image acquired from an unmanned aerial vehicle (UAV) camera in order to transform it to match the parameters of the satellite image. The normalization is performed in one of the two ways. The first variant consists in the calculation of an image transformation matrix based on the camera position and orientation. The second variant is based on matching the current frame with the previous one. The matching results in the shift, rotation, and scale parameters that are used to obtain an initial set of pairs of corresponding keypoints. From this set four pairs are selected to calculate the perspective transformation matrix. This matrix is in turn used to obtain a new set of pairs of corresponding keypoints. The process is repeated while the number of the pairs in the new set exceeds the number in the current one. The accumulated transformation matrix is then multiplied by the transformation matrix obtained during the normalization of the previous frame. The final part presents the results of the method that show that the proposed method can improve the accuracy of the visual navigation system at low computational costs.

1. Introduction

This paper is concerned with the development of a method for image normalization that is used in a visual navigation system of unmanned aerial vehicles (UAVs). The navigation system being developed [1] matches images acquired from an UAV onboard camera with the satellite image of the flying area. The matching yields the shift, rotation, and scale values that can be used to transform the input image to the corresponding part of satellite image. Knowing the coordinates of the satellite image and the shift, rotation, and scale parameters of the current input image, it is possible to determine the coordinates of the UAV.

During the flight the tilt of the UAV is unstable, it changes because of weather conditions or when the UAV is making a pivot. It results in perspective distortions of images acquired from the onboard camera, even if a stabilized platform used. The distortions interfere with the correct matching of the images from the camera with the satellite image, which is usually represented in an orthogonal projection. To increase the likelihood of correct matching a method of image normalization for eliminating the perspective distortions is needed. The method should be computationally simple because it is supposed to run on low-power U AV processors.

There are several approaches to solve this problem. The first approach is to calculate the matrix of perspective transformation based on the external parameters of the camera obtained from the UAV sensors, namely pitch, yaw, roll, and altitude. In [2], the distortion is compensated by determining the orientation of a virtual camera with its optical axis collinear to the normal vector of the shooting plane. This virtual camera provides us with an image without perspective distortions. Therefore, to eliminate the distortions of the input image we need to calculate the homography matrix between the images from the real and the virtual cameras. There are two difficulties with using this approach: the moments of shooting and of external parameters recording should be strictly synchronized, and the surface of the flying area should be flat.

The second approach is to find and analyze the location of the known structural elements in the input image. This approach assumes that each frame from the camera is processed independently. The paper [3] focuses on the distortion elimination for the task of bar code recognition. Its idea is to find four straight lines bounding the quadrilateral barcode area. These lines provide four corner points that are used to calculate the transformation matrix. This approach is difficult to apply to the task of normalizing the UAV images because the shapes of the objects in the images are usually unknown.

The third approach is to find the pairs of corresponding elements in the two images and use their positions to determine the transformation matrix of one image into another. Usually the methods search for the pairs of image points with characteristic local features known as keypoints. Depending on the task, different methods of keypoints extraction and matching can be used [4,5].

Under ideal conditions, four pairs of the keypoints are enough to determine the transformation matrix. However, in practice even the positions of the correctly matched points can be different because of noise. To obtain a more accurate solution a system of equations based on all the matched pairs is constructed. This system is usually solved by the least squares method so as to minimize the given error function (the DLT method [6]). In [7], it is proposed to perform this method iteratively. This method helps to increase the likelihood of image matching by means of eliminating the noise points.

The set of the pairs of corresponding points can also contain mismatched pairs. In this case the DLT method can yield a wrong result and one of the robust methods should be used, such as RANSAC [8, 9]. In this method four pairs of corresponding points are randomly selected among all the matched pairs. Using them the homography matrix is calculated, and then the algorithm checks how many pairs fit this transform (inliers). The process is repeated, and other random pairs are selected from the set of inliers until a solution is found that satisfies the specified number of pairs or the maximum number of iterations is exceeded. Another similar robust method is LMS. It estimates four current selected pairs of points not by the number of inliers but by the median distance among all the pairs of points [6].

The described robust methods require a large number of iterations; therefore, they are not effective on low-power processors. The RHO method [10] aims to solve this problem. Instead of randomly electing four pairs of points at each iteration, the method preliminarily evaluates each pair, selects the most promising ones and then corrects the estimation. This allows to obtain the result in significantly fewer iterations.

The fourth approach is to analyze the input image using a neural network, as it is demonstrated in [11]. It is noted that the advantage of this approach is the ability to work when there is a very small number of the keypoints or when the input images are blurred. However, this method is computationally complex and requires special hardware to work in real time, which makes it difficult to apply to the task of UAV visual navigation. input image, previous frame data

Is the previous frame successfully matched?

Normalization based on the external parameters of the camera Normalization based on keypoints matching normalized image

The proposed method of iterative image normalization for tasks of visual navigation of UAVs combines the first and the third approaches. A simplified scheme of the method is shown in Figure 1. Normalization based on the external parameters of the camera (pitch, yaw, roll, and flight altitude) is used either when processing the initial camera frame or in the case where the previous frame has not been successfully matched with the satellite image or the current frame cannot be matched with the previous one. In other cases the iterative method of homography matrix calculation based on matching the keypoints of the previous and the current frames is used. 2. Description of the peculiarities of the task considered The UAV flies along the given trajectory that contains both straight sections and turns. The considered flight altitude is 250-500m. There is a camera onboard the UAV that is directed strictly downwards. There is no stabilized platform on the UAV. The shooting speed of the camera is enough to provide at least 80% of overlap between the current and the previous frames. The parameters of focal length and the dimensions of the camera's sensor are known.

The UAV is equipped with an accelerometer and a pressure sensor that provide information about the UAV orientation (roll, pitch, and yaw) and the flight altitude. However, according to experiments on real images, the moments of shooting and of parameters recording are not exactly the same. Therefore, strong wind gusts cause inaccuracies in the recorded external parameters of the camera. In addition, the pressure sensor estimates the flight altitude only above sea level. When calculating the flight altitude above ground level the height map of the flying area should be taken into account.

The result of the normalization is the image transformed into the orthogonal projection so that later it can be matched with the satellite image. 3. Normalization based on the external parameters of the camera The first algorithm of the proposed normalization method is based on the information on the camera orientation and position in space, namely the roll ρ, pitch τ, yaw γ, and flight altitude hf. In addition, the input information includes the terrain height ha at the shooting point and the reference altitude hr at which the shooting scale equals the satellite image scale. The output information is the coefficients of the transformation matrix that eliminates the distortion. The algorithm consists of two stages: the calculation of the transformation coefficients disregarding the scale and the calculation of the scaling factor. 3.1. Calculation of the transformation coefficients disregarding the scale The camera’s field of view can be represented as a pyramid that intersects the earth's surface plane. To simplify the calculation, instead of rotating the pyramid, according to the camera orientation the earth's surface plane is rotated and placed at one of the base vertices of the pyramid, as Figure 2a shows.

The purpose of the first stage is to calculate the coordinates of the points M, N, K on the top view projection (Figure 2b). These coordinates can be used to calculate the resultant homography matrix Ta by solving an equation system constructed with the coordinates transformation equations of the pairs of points: A-A, B-K, C-N, D-M. The transformation of each pair of points can be described by two equations [12]:

′ = 1311+ + 1322+ +113 ′ = 2311+ + 2322+ +123, (1) where x and y are the coordinates of the point before the transformation, x’ and y’ are the coordinates of the point after the transformation, t11, t12, t13, t21, t22, t23, t31 и t32 are the coefficients of the transformation matrix Ta.

The shape of the pyramid depends on the size of the photo sensor and the focal length. The viewing angles are calculated as follows:

x = 2 arctg 2 wL y = 2 arctg 2 hL, (2) where mw and mh are the dimensions of the photo sensor, fL is the focal length. The vertex angles of the pyramid are calculated as follows:

x = wh sin 2 y = wh sin 2 (3)

The length of the edge of the pyramid le and the diagonal of the base of the pyramid ld are calculated as follows:

e = 2sin ( hy/2) d = w2 + h2, (4)

Depending on the orientation of the camera, the earth's surface plane is placed at one of the four base vertices, as shown in Figure 3.

3.2. Calculation of the scale factor

Since the earth's surface plane was placed at one of the base vertices of the pyramid, the obtained transformation matrix disregards the original scale of the image. To calculate the scaling factor the following formula can be used: nd =

(Nx − A′x)2 + (Ny − A′y)2

To transform the image acquired from the UAV camera to the scale of the satellite image, the transformation matrix should be adjusted using the following scale factor:

3.3. Obtaining the result of the normalization

To obtain the final transformation matrix the following formula should be used: = e nd

nf d = ℎf−ℎa

ℎr−ℎa = 0 0 0 0 0 0 1 4. Normalization based on keypoints matching The first algorithm of the proposed method eliminates the distortions of the image coarsely, because, as it is mentioned above, the moment of recording the external parameters of the camera not necessarily coincides with the moment of shooting. Moreover, it is rather difficult to accurately take into account the elevation difference of the underlying surface. Therefore, it is used only when the previous frame from the camera cannot be matched with the satellite image. In other cases the second algorithm is used that is based on matching the current frame with the previous frame already normalized. 4.1. Matching the current frame with the previous one To reduce its complexity, the image matching algorithm analyzes not all the pixels but only the keypoints extracted with the method described in [13].

After the keypoint sets of the both images are found, they are used to determine the shift, rotation, and scale values between the two images using the statistical differentiation method [14]. These parameters form the following transformation matrix [15]: m = sin cos 0 − sin x cos y , 0 1 where tx and ty are the shift values along the X axis and the Y axis respectively, φ is the rotation value, s is the scale value. 4.2. Selecting pairs of corresponding points After the transformation matrix is found, it is used to determine the pairs of corresponding points. Some point A of the current frame corresponds to some point B of the previous frame when after (9) (10) (11) (12) applying the transformation matrix to the coordinates of the point A the distance between A and B does not exceed the specified threshold (1-2 pixels).

From the whole set of the pairs of corresponding points four pairs are selected that will be used to calculate the homography matrix (1). The choice is limited by the condition that no three points are collinear. The best case occurs when a quadrilateral of maximum area with as large angles as possible is found. However, the search for such a set of points is a computationally complex operation. The following approaches to selecting the points were tried: 1) the four points nearest to the corners of the image; 2) the four points most distant from the center of mass of the keypoints set and located in different quadrants relative to the center of mass;

3) the first point is the most distant from the center of mass, the second point is the most distant from point 1, the third is the point with the greatest sum of its distances from points 1 and 2, the fourth is the point with the greatest sum of its distances from points 1, 2 and 3; 4) a random selection of points.

If this algorithm finds less than four corresponding pairs, the normalization based on the external parameters of the camera is applied. 4.3. Calculation of the transformation matrix After the four pairs of the corresponding keypoints are selected, the refined transformation matrix is calculated. This homography matrix matches the projection of the current frame with the projection of the previous frame already normalized.

Then the search for the pairs of corresponding points is repeated using the current frame warped according to the refined transformation matrix. From the found pairs a new set of four pairs is selected and a new refined matrix is calculated. The process is repeated as long as the number of the pairs of corresponding points increases. Experiments show that in most cases two iterations are enough to obtain the solution. The final matrix of the transformation of the original image into the normalized one is determined by the formula: where Tp is the perspective transformation matrix for normalizing the previous frame, Hi are the transformation matrices obtained at each iteration, n is the number of iterations.

Since the previous frame is used to normalize the current frame, the normalization error can accumulate. However, this method is used as a part of the visual navigation system that implies that the next step is to match the normalized image with the satellite image. This matching eliminates the r =

∏=1 (13) accumulated error. 5. Experiments To test the proposed method, it has been integrated into the preprocessing stage of the UAV visual navigation system that is being developed. Then a series of experiments using real images of a UAV flight were performed. The trajectory of the UAV flight is shown in Figure 5 and consists of sections of two types: straight sections (270 frames) and turns (70 frames). marked with green dots. The positions of the UAV at the turns are marked with yellow dots. The black lines show the boundaries between the sections. The white lines connect the coordinates of the

UAV to the corresponding centers of photographing.

The first experiment aims to choose the best algorithm for image normalization based on keypoints matching. The proposed iterative algorithm was tested (with the four variants of selecting the four pairs of points described above), as well as the methods implemented in the OpenCV 3.4.5 library: DLT, RANSCAC, LMS and RHO. To determine the speed of the methods, each input frame was processed 100 times and the average execution time was calculated. Processing was performed on an AMD FX-8320E 3.2 GHz processor without using a graphics accelerator. method from the OpenCV library, and at that the proposed method performs much faster, which is very important for integrating it into a UAV visual navigation system.

6. Conclusion

The proposed method of iterative image normalization successfully solves the task of image preprocessing for developing a UAV visual navigation system. The integration of this method into the navigation system improves the accuracy of matching the images acquired from the onboard camera with the satellite image and reduces the probability of the UAV loss.

Due to using the two normalization approaches the proposed method can eliminate perspective distortions resulting from both the camera orientation and the inclination of the underlying surface. At the same time, this method is computationally simple and can be integrated into the UAV navigation system, which should run on low-power processors.

was funded by

Kalashnikov ISTU according to the research project

The reported study № 09.06 .01/18 GMM.