1. Introduction

The algorithm for a video panorama construction and its software implementation using CUDA technology

I.A. Kudinov

O.V. Pavlov

I.S. Kholopov

0 1

M.Yu. Khramov

0 0 Ryazan State Instrument-making Enterprise , Seminarskaya str. 32, 390000, Ryazan , Russia 1 Ryazan State Radio Engineering University , Gagarina str. 59/1, 390005, Ryazan , Russia

2017

37 42

A video panorama constructing algorithm based on information from five different types pre-calibrated cameras with partially overlapping fields of view was developed and implemented using the CUDA C language. Distortion compensation, image stitching on the virtual unit sphere surface, and blending procedures are performed for the operator-controlled 1024768 pixels region of interest with 50 fps. The automatic generation of high-resolution video panoramas from information of cameras with partially overlapping fields of view (FoV) is one of the modern trends in the vision systems development. Generally, panorama navigation implies the presence of a user-controlled region of interest (RoI) with the predefined angular FoV dimensions and resolution [1]. In avionics, for example, the advantages of panorama systems in comparison with traditional electro-optical systems are [2, 3], at first, the possibility of simultaneous use of a panorama field by several independent operators, and, secondly, the absence of mechanical parts.

video panorama camera calibration distortion compensation spherical panorama region of interest inclinometer blending CUDA technology

1. Introduction 2. Problem statement

 u j A(2  9) =   0  v j 0 1 0

0  u j

0  v j 0 1 u u i j viu j uiv j viv j ui  .

vi  A(2m  9) = UVT, h = V<min>. Нij = Ki[Rij – tijnT/d]Kj-1, where Ki and Kj are intrinsic matrices, i, j  1..N , Rij and tij are respectively a rotation matrix and translation vector for transition from j-th camera coordinate system to i-th camera coordinate system, d – the perpendicular length to the shooting

High-Performance Computing / I.A. Kudinov, O.V. Pavlov, I.S. Kholopov, M.Yu. Khramov plane with the normal n in the reference (i-th) camera coordinate system. If the distance to the observed objects is large (||tij||<<d), then we have the following approximate equality from ( 3 ):

While combining information from several cameras, it is advisable to form the resultant panoramic image not in the plane in accordance with ( 1 ), but on a virtual uniform curvature surface: usually a unit sphere or a cylinder with a unit radius [5]. It allows us to work with normalized homogenous pixel and spatial coordinates. The geometric problem statement for combining N = 3 camera frames with intersecting FoVs on the unit sphere surface (the coordinate system of camera with number 0 is taken as reference) is shown at Fig. 1. It is expected that by analogy with ( 4 ) all nodal points of the camera lenses are located in the unit sphere center O.

In the Fig. 1 the symbol M denotes the point of the object spatial homogeneous coordinates, which image is projected onto the point on the unit sphere surface with spatial coordinates ||Msph|| = 1 in the coordinate system of the sphere OXsphYsphZsph, and the symbols x0 and x2 are the homogeneous pixel coordinates of its image on the frames of cameras with numbers 0 and 2 respectively.

X0 Xsph Y

2 R02

Yа

O Y0

Zа Y1

Ysph

q R01 Xа

Msph x2 x0

Z 2

Zsph

Z0 ( 4 ) ( 5 ) ( 6 ) ( 7 )  = atan2(ay, az), q = atan2[–ax, (ay2 + az2)0,5],

The correct spherical panoramas construction implies the availability of information about the angular position of the optical axes of the cameras in relation to the horizon plane. This problem is solved in our work by mounting of inclinometer (based on the triaxial MEMS accelerometer) on the reference camera case. An accelerometer coordinate system is designated as OXaYaZa (Fig. 1). The roll  and pitch q angles are estimated by the formulas [6]: where [ax, ay, az]T – a vector of accelerometer measurements (taking into account its calibration [7]).

The choice of pixels in areas where the radius-vector OMsph crosses several camera planes can be fulfilled by several criteria: for example, the maximum distance from the pixel to the intersection line of the camera planes or the minimum length of the normalized pixel coordinates vector. In order to minimize computations we choose a criterion of minimum angle between a vector to a pixel and the i-th camera principal axis, as such pixels, usually, have the minimal distortion correction error: where min R0iRθ 3  Msph , i cos  Rθ  sin   0  sin  cos 0 01 00 10

0 cosq sin q

0   sin q – cosq  is a rotation matrix of reference camera coordinate system relative to the horizon plane, R0i – estimated during calibration rotation matrix of i-th camera relative to the reference camera, <k> – k-th column of matrix, and symbol «» is a dot product. where from geometric constructions of Fig. 2 the initial position ( =  = 0, zuv = 1) of the RoI pixels (u, v) corresponds to quaternion quv0 = [cos(u/2)cos(v/2), cos(u/2)sin(v/2), sin(u/2)сos(v/2), sin(u/2)sin(v/2)]T, u = arctg(xuv), v = arcsin[yuv/(xuv2 + yuv2 + 1)0,5], xuv = (2u/W – 1)tg(w/2), yuv = –(2v/H – 1)tg(h/2).

Quaternion quv allows us to determine coordinates of the point Muv in the unit sphere coordinate system [8]:

Muv = [2(qxqz + qwqy), 2(qyqz – qwqx), qw2 + qz2– (qx2 + qy2)]T, where qw and [qx, qy, qz]T are scalar and vector parts of quaternion quv.

Xsph

O u h

Ysph w

W/2

Zsph v

 Muv u  (xuv, yuv, zuv), (u, v) v 0 H/2 xuvi = PiMuv, xнi = Ki-1xuvi ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( 13 ) qvis = [cos(/2)cos(/2), cos(/2)sin(/2), sin(/2)сos(/2), sin(/2)sin(/2)]T. quv = qvisquv0,

High-Performance Computing / I.A. Kudinov, O.V. Pavlov, I.S. Kholopov, M.Yu. Khramov

To realize the sliding RoI function with size W  H pixels with angular dimensions in the horizontal and vertical directions w and h respectively we introduce a quaternion qvis [8] that is defined by the current line of sight azimuth  and the elevation angle :

To each pixel of the RoI with coordinates (u, v) corresponds the point Muv = [xuv, yuv, zuv]T/||[xuv, yuv, zuv]T|| on the unit sphere (Fig.2), determined by the radius-vector with the corresponding quaternion

On the each i-th camera principal plane the homogenous pixel coordinates xuvi corresponding to the point Muv are determined (taking into account the orientation relative to the horizon plane) through the projection matrix Pi [4]: where Pi = Ki[R0iRq | 0], 0 = [0, 0, 0]T, and the selection of the pixel transferred from the i-th camera plane to the RoI is performed by the criterion ( 6 ).

For the cameras with wide-angle lenses it is necessary to do the distortion compensation: for the normalized coordinates with distortion coordinates without distortion are estimated according to the Brown-Conrady model [9] (in order to reduce the amount of calculations in our work only the first two coefficients of radial distortion are used). Corrected pixel coordinates without distortion are calculated by multiplying on the intrinsic matrices Ki: xcorri = Kixнi (14) and since it is fractional so the brightness value is interpolated (in our work we used a bilinear interpolation [10]).

Since the scenes of each camera are different, then in the automatic exposure time mode the average brightness of the composing panorama frames is different too. Therefore, after panorama filling (or the RoI filling) it is additionally necessary to perform a smoothing procedure for brightness differences – blending. We used a simplified approach to blending, similar to the work [11].

3. The algorithm for a video panorama construction and its software implementation

The algorithm for forming the RoI image contains the following steps.

1. Initialization: calculating quaternions quv0 by formula ( 10 ).

Main operation cycle: 2. Estimation of current angular position of reference camera by ( 5 ) and its rotation matrix Rq by ( 7 ). 3. Quaternion qvis calculation for the current line of sight angular position by ( 8 ) and quaternions quv by ( 9 ). 4. Filling the RoI (with distortion compensation) according to ( 11 )-(14) by criterion ( 6 ).

5. Blending procedure performing (optional).

As the processing for each RoI pixel is homogeneous this allows us to parallelize computations. In the layout based on a PC for implementing the algorithm steps 1, 4, and 5 we used the resources of the NVIDIA GPU: using CUDA technology and CUDA C language the whole amount of computations was distributed to 3072 parallel blocks (64 horizontally and 48 vertically) with 256 threads in each block (16 horizontal and vertical threads). In this case, according to [12], the pixel indices are represented as: u = blockIdx.y * blockDim.y + threadIdx.y; v = blockIdx.x * blockDim.x + threadIdx.x; (15)

In (15) the following standard CUDA notation is used: blockIdx is a block identifier (number), blockDim – a block dimension, threadIdx – an identifier (number) of a parallel thread, and attributes "x" and "y" indicate the axis of the coordinate system in a two-dimensional Euclidean space.

As the copying from CPU memory to GPU memory and back is rather slow [12], so by the implementing of the algorithm the number of such operations is minimized: by the initialization GPU memory arrays are recorded with the data on the camera parameters (intrinsic matrices and distortion coefficients) and initial quaternions quv0 ( 10 ) that correspond to RoI pixels. In the main operation cycle frames from each camera, the line of sight angular coordinates and the angular coordinates of reference camera, estimated from MEMS signals, are copied into the GPU memory, then the steps 3-5 of the algorithm are performed and the result (array of brightness values in the RoI) is copied back to the CPU memory for displaying on the PC monitor.

4. Results and Discussion

The layout of the video panorama system is implemented on a PC with Intel Core-i5 processor and NVIDIA GeForce GTX 560 Ti GPU (with 384 cores) and consists of five digital GigE interface video cameras, mounted on a special rigid polyamide bracket (Fig. 3): two Basler acA2000 cameras with 20481088 frame resolution and 5 mm megapixel Cowa lenses (from below) and three IDS 5240 RE cameras (from above) with 12801024 frame resolution and 5 mm megapixel Computar lenses (two outer cameras) and 8 mm Navitar lens (central reference camera), an evaluation board with a InvenSense MPU 9250 MEMS sensor mounted on the reference camera, two motorized rotation positioners Standa 8MR190-2 for moving the bracket with cameras in horizontal and vertical planes, and USB joystick Defender Cobra M5 for the RoI navigation.

The layout allows to create a panorama with a 36002400 pixels resolution and 180°120° FoV or display RoI with userdefined resolution and angular FoV dimensions. Without blending With blending Without blending With blending CPU  GPU copy time

High-Performance Computing / I.A. Kudinov, O.V. Pavlov, I.S. Kholopov, M.Yu. Khramov

Preliminary camera calibration was performed using OpenCV libraries [13]: on 40 test "chessboard" pattern frames, containing 1211 cells with a side of 3 cm, the intrinsic matrices and distortion coefficients of their lenses were estimated; then on 40 test "chessboard" pattern frames, containing 96 cells with a side of 3 cm, the rotation matrices of cameras relative to the reference camera were estimated.

The influence of the angular orientation estimation error of the reference camera relative to the horizon plane on the geometry of the panorama is shown in Fig. 4. As can be seen from the figure, the pitch estimation error leads to an incorrect horizon display (the position of the spherical panorama equator is shown by the orange line).

The results of the 1024768 pixels RoI forming with a 40°30° FoV and the line of sight angular coordinates  = 15° and  = –5° with blending and without it are shown in Fig. 5 and 6 respectively. The times required to create one frame on the CPU and GPU resources are summarized in Table 1 (the time for copying the data from the CPU memory to the GPU memory and back is 3.5 ms), and the times for different RoI sizes are summarized in Table 2.

In Fig. 7 and 8 are shown the results of intermediate computations for blending implementation: the boundaries of the camera frames in the RoI (Fig. 7) and the 100-pixel width binary mask, over which the brightness is smoothed (Fig. 8).

Increasing of the RoI forming time with blending more than 2 times is explained by the need to perform auxiliary procedures: searching the intersection lines of camera frames in the RoI, determining areas for brightness fusion and smoothing images to estimate the low-frequency component of brightness in the each camera frame. To reduce the amount of calculation, we apply a smoothing procedure with an 88 window and an accumulator. Its computational complexity is O(n2).

For ||t0i|| < 15 cm and distance to objects d > 70 m the hypothesis about camera lenses nodal points superimposition provides the error of stitching on the camera frame boundaries not more than 5 pixels.

The results of the experiment showed that the use of GPU resources makes it possible to reduce the RoI forming time up to 16 times in comparison with the CPU (when only one CPU core is used).

5. Conclusion References

The algorithm for a video panorama construction, designed and implemented with the CUDA technology and parallelization of computations on a GPU, for a one megapixel resolution region of interest provides panorama navigation with 50 fps on the whole 8.2 megapixels panoramic video frame with 180°120° field of view.

[1] Banta

, Donaldson

. Apparatus and method for panoramic video hosting . Patent US US9516225 , 2016 .

[2] AN/AAQ-37 Distributed Aperture System (DAS) for the F-35 . URL: http://www.northropgrumman.com/Capabilities/ ANAAQ37F35/Pages/ default.aspx ( 01 . 10 . 2016 ).

[3] IronVisionTM . URL: http://elbitsystems.com/media/IronVision.pdf ( 10 . 09 . 2016 ).

[4] Hartley

, Zisserman

. Multiple view geometry in computer vision . 2nd edition . Cambridge: Cambridge University Press, 2003 ; 656 р.

[5] Szeliski

Image alignment and stitching: a tutorial. Foundations and trends in computer graphics and vision 2006 ; 2 ( 1 ): 1 - 104 .

[6] Tilt sensing using a three-axis accelerometer . URL: http://www.freescale.com/files/sensors/doc/app_note/AN3461.pdf ( 11 . 04 . 2016 ).

[7] Wang

, Wang

. Intelligent calibration method of low cost MEMS inertial measurement unit for an FPGA-based navigation system . International J. of Intelligent Engineering and Systems 2011 ; 4 ( 2 ): 32 - 41 .

[8]

Kuipers

JB . Quaternions and rotation sequences . New Jersey: Princeton University, 1998 ; 400 p.

[9] Brown

. Close-range camera calibration . Photogrammetric Engineering 1971 ; 37 ( 8 ): 855 - 866 .

[10] Parker

, Kenyon

, Troxel

. Comparison of interpolating methods for image resampling . IEEE Trans. on Medical Imaging 1983 ; 2 ( 1 ): 31 - 39 .

[11] Xiong

, Pulli

Fast image stitching and editing for panorama painting on mobile phones . IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) , 13 - 18 June 2010: 47 - 52 .

[12] Sanders

, Kandrot

CUDA by example . New York: Addison-Wesley, 2010 ; 290 p.

[13] Camera Calibration and 3D reconstruction . URL: http://docs.opencv. org/2 .4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction. html (14.03 . 2015 ).