<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The algorithm for a video panorama construction and its software implementation using CUDA technology</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>I.A. Kudinov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O.V. Pavlov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I.S. Kholopov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M.Yu. Khramov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ryazan State Instrument-making Enterprise</institution>
          ,
          <addr-line>Seminarskaya str. 32, 390000, Ryazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ryazan State Radio Engineering University</institution>
          ,
          <addr-line>Gagarina str. 59/1, 390005, Ryazan</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>37</fpage>
      <lpage>42</lpage>
      <abstract>
        <p>A video panorama constructing algorithm based on information from five different types pre-calibrated cameras with partially overlapping fields of view was developed and implemented using the CUDA C language. Distortion compensation, image stitching on the virtual unit sphere surface, and blending procedures are performed for the operator-controlled 1024768 pixels region of interest with 50 fps. The automatic generation of high-resolution video panoramas from information of cameras with partially overlapping fields of view (FoV) is one of the modern trends in the vision systems development. Generally, panorama navigation implies the presence of a user-controlled region of interest (RoI) with the predefined angular FoV dimensions and resolution [1]. In avionics, for example, the advantages of panorama systems in comparison with traditional electro-optical systems are [2, 3], at first, the possibility of simultaneous use of a panorama field by several independent operators, and, secondly, the absence of mechanical parts.</p>
      </abstract>
      <kwd-group>
        <kwd>video panorama</kwd>
        <kwd>camera calibration</kwd>
        <kwd>distortion compensation</kwd>
        <kwd>spherical panorama</kwd>
        <kwd>region of interest</kwd>
        <kwd>inclinometer</kwd>
        <kwd>blending</kwd>
        <kwd>CUDA technology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Problem statement</title>
      <p> u j
A(2  9) = 
 0
 v j
0
1
0</p>
      <p>0
 u j</p>
      <p>0
 v j
0
1
u u
i j
viu j
uiv j
viv j
ui  .</p>
      <p>vi 
A(2m  9) = UVT, h = V&lt;min&gt;.
Нij = Ki[Rij – tijnT/d]Kj-1,
where Ki and Kj are intrinsic matrices, i, j  1..N , Rij and tij are respectively a rotation matrix and translation vector for
transition from j-th camera coordinate system to i-th camera coordinate system, d – the perpendicular length to the shooting</p>
      <p>
        High-Performance Computing / I.A. Kudinov, O.V. Pavlov, I.S. Kholopov, M.Yu. Khramov
plane with the normal n in the reference (i-th) camera coordinate system. If the distance to the observed objects is large
(||tij||&lt;&lt;d), then we have the following approximate equality from (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ):
      </p>
      <p>
        While combining information from several cameras, it is advisable to form the resultant panoramic image not in the plane in
accordance with (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), but on a virtual uniform curvature surface: usually a unit sphere or a cylinder with a unit radius [5].
It allows us to work with normalized homogenous pixel and spatial coordinates. The geometric problem statement for combining
N = 3 camera frames with intersecting FoVs on the unit sphere surface (the coordinate system of camera with number 0 is taken
as reference) is shown at Fig. 1. It is expected that by analogy with (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) all nodal points of the camera lenses are located in the
unit sphere center O.
      </p>
      <p>In the Fig. 1 the symbol M denotes the point of the object spatial homogeneous coordinates, which image is projected onto
the point on the unit sphere surface with spatial coordinates ||Msph|| = 1 in the coordinate system of the sphere OXsphYsphZsph, and
the symbols x0 and x2 are the homogeneous pixel coordinates of its image on the frames of cameras with numbers 0 and 2
respectively.</p>
      <p>X2</p>
      <p>X0
Xsph Y</p>
      <p>2
R02</p>
      <p>Yа</p>
      <p>O
Y0</p>
      <p>Zа
Y1</p>
      <sec id="sec-2-1">
        <title>Ysph</title>
        <p>q
R01
Xа</p>
        <p>X1</p>
      </sec>
      <sec id="sec-2-2">
        <title>Msph x2 x0</title>
        <p>Z
2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Zsph</title>
        <p>M</p>
        <p>Z1</p>
        <p>
          Z0
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
 = atan2(ay, az), q = atan2[–ax, (ay2 + az2)0,5],
        </p>
        <p>The correct spherical panoramas construction implies the availability of information about the angular position of the optical
axes of the cameras in relation to the horizon plane. This problem is solved in our work by mounting of inclinometer (based on
the triaxial MEMS accelerometer) on the reference camera case. An accelerometer coordinate system is designated as OXaYaZa
(Fig. 1). The roll  and pitch q angles are estimated by the formulas [6]:
where [ax, ay, az]T – a vector of accelerometer measurements (taking into account its calibration [7]).</p>
        <p>The choice of pixels in areas where the radius-vector OMsph crosses several camera planes can be fulfilled by several criteria:
for example, the maximum distance from the pixel to the intersection line of the camera planes or the minimum length of the
normalized pixel coordinates vector. In order to minimize computations we choose a criterion of minimum angle between a
vector to a pixel and the i-th camera principal axis, as such pixels, usually, have the minimal distortion correction error:
where
min R0iRθ 3  Msph ,
i
cos

Rθ  sin 
 0
 sin 
cos
0
01
00
10</p>
        <p>0
cosq
sin q</p>
        <p>0 
 sin q –
cosq 
is a rotation matrix of reference camera coordinate system relative to the horizon plane, R0i – estimated during calibration
rotation matrix of i-th camera relative to the reference camera, &lt;k&gt; – k-th column of matrix, and symbol «» is a dot product.
where from geometric constructions of Fig. 2 the initial position ( =  = 0, zuv = 1) of the RoI pixels (u, v) corresponds to
quaternion
quv0 = [cos(u/2)cos(v/2), cos(u/2)sin(v/2), sin(u/2)сos(v/2), sin(u/2)sin(v/2)]T,
u = arctg(xuv), v = arcsin[yuv/(xuv2 + yuv2 + 1)0,5],
xuv = (2u/W – 1)tg(w/2), yuv = –(2v/H – 1)tg(h/2).</p>
        <p>Quaternion quv allows us to determine coordinates of the point Muv in the unit sphere coordinate system [8]:</p>
        <p>Muv = [2(qxqz + qwqy), 2(qyqz – qwqx), qw2 + qz2– (qx2 + qy2)]T,
where qw and [qx, qy, qz]T are scalar and vector parts of quaternion quv.</p>
        <p>Xsph</p>
        <p>O
u
h</p>
        <p>Ysph
w</p>
        <p>W/2</p>
        <p>Zsph
v</p>
        <p>
          
Muv
u

(xuv, yuv, zuv),
(u, v)
v
0
H/2
xuvi = PiMuv,
xнi = Ki-1xuvi
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
(
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
qvis = [cos(/2)cos(/2), cos(/2)sin(/2), sin(/2)сos(/2), sin(/2)sin(/2)]T.
quv = qvisquv0,
        </p>
        <p>High-Performance Computing / I.A. Kudinov, O.V. Pavlov, I.S. Kholopov, M.Yu. Khramov</p>
        <p>To realize the sliding RoI function with size W  H pixels with angular dimensions in the horizontal and vertical directions
w and h respectively we introduce a quaternion qvis [8] that is defined by the current line of sight azimuth  and the
elevation angle :</p>
        <p>To each pixel of the RoI with coordinates (u, v) corresponds the point Muv = [xuv, yuv, zuv]T/||[xuv, yuv, zuv]T|| on the unit sphere
(Fig.2), determined by the radius-vector with the corresponding quaternion</p>
        <p>
          On the each i-th camera principal plane the homogenous pixel coordinates xuvi corresponding to the point Muv are determined
(taking into account the orientation relative to the horizon plane) through the projection matrix Pi [4]:
where Pi = Ki[R0iRq | 0], 0 = [0, 0, 0]T, and the selection of the pixel transferred from the i-th camera plane to the RoI is
performed by the criterion (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ).
        </p>
        <p>For the cameras with wide-angle lenses it is necessary to do the distortion compensation: for the normalized coordinates with
distortion
coordinates without distortion are estimated according to the Brown-Conrady model [9] (in order to reduce the amount of
calculations in our work only the first two coefficients of radial distortion are used). Corrected pixel coordinates without
distortion are calculated by multiplying on the intrinsic matrices Ki:
xcorri = Kixнi
(14)
and since it is fractional so the brightness value is interpolated (in our work we used a bilinear interpolation [10]).</p>
        <p>Since the scenes of each camera are different, then in the automatic exposure time mode the average brightness of the
composing panorama frames is different too. Therefore, after panorama filling (or the RoI filling) it is additionally necessary to
perform a smoothing procedure for brightness differences – blending. We used a simplified approach to blending, similar to the
work [11].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The algorithm for a video panorama construction and its software implementation</title>
      <p>The algorithm for forming the RoI image contains the following steps.</p>
      <p>
        1. Initialization: calculating quaternions quv0 by formula (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ).
      </p>
      <p>
        Main operation cycle:
2. Estimation of current angular position of reference camera by (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) and its rotation matrix Rq by (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ).
3. Quaternion qvis calculation for the current line of sight angular position by (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) and quaternions quv by (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ).
4. Filling the RoI (with distortion compensation) according to (
        <xref ref-type="bibr" rid="ref11">11</xref>
        )-(14) by criterion (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ).
      </p>
      <p>5. Blending procedure performing (optional).</p>
      <p>As the processing for each RoI pixel is homogeneous this allows us to parallelize computations. In the layout based on a PC for
implementing the algorithm steps 1, 4, and 5 we used the resources of the NVIDIA GPU: using CUDA technology and CUDA C
language the whole amount of computations was distributed to 3072 parallel blocks (64 horizontally and 48 vertically) with 256
threads in each block (16 horizontal and vertical threads). In this case, according to [12], the pixel indices are represented as:
u = blockIdx.y * blockDim.y + threadIdx.y; v = blockIdx.x * blockDim.x + threadIdx.x;
(15)</p>
      <p>In (15) the following standard CUDA notation is used: blockIdx is a block identifier (number), blockDim – a block
dimension, threadIdx – an identifier (number) of a parallel thread, and attributes "x" and "y" indicate the axis of the coordinate
system in a two-dimensional Euclidean space.</p>
      <p>
        As the copying from CPU memory to GPU memory and back is rather slow [12], so by the implementing of the algorithm the
number of such operations is minimized: by the initialization GPU memory arrays are recorded with the data on the camera
parameters (intrinsic matrices and distortion coefficients) and initial quaternions quv0 (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ) that correspond to RoI pixels. In the
main operation cycle frames from each camera, the line of sight angular coordinates and the angular coordinates of reference
camera, estimated from MEMS signals, are copied into the GPU memory, then the steps 3-5 of the algorithm are performed and
the result (array of brightness values in the RoI) is copied back to the CPU memory for displaying on the PC monitor.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>The layout of the video panorama system is implemented on a PC with Intel Core-i5 processor and NVIDIA GeForce GTX
560 Ti GPU (with 384 cores) and consists of five digital GigE interface video cameras, mounted on a special rigid polyamide
bracket (Fig. 3): two Basler acA2000 cameras with 20481088 frame resolution and 5 mm megapixel Cowa lenses (from below)
and three IDS 5240 RE cameras (from above) with 12801024 frame resolution and 5 mm megapixel Computar lenses (two
outer cameras) and 8 mm Navitar lens (central reference camera), an evaluation board with a InvenSense MPU 9250 MEMS
sensor mounted on the reference camera, two motorized rotation positioners Standa 8MR190-2 for moving the bracket with
cameras in horizontal and vertical planes, and USB joystick Defender Cobra M5 for the RoI navigation.</p>
      <p>The layout allows to create a panorama with a 36002400 pixels resolution and 180°120° FoV or display RoI with
userdefined resolution and angular FoV dimensions.
Without blending
With blending
Without blending
With blending
CPU  GPU copy time</p>
      <p>High-Performance Computing / I.A. Kudinov, O.V. Pavlov, I.S. Kholopov, M.Yu. Khramov</p>
      <p>Preliminary camera calibration was performed using OpenCV libraries [13]: on 40 test "chessboard" pattern frames, containing
1211 cells with a side of 3 cm, the intrinsic matrices and distortion coefficients of their lenses were estimated; then on 40 test
"chessboard" pattern frames, containing 96 cells with a side of 3 cm, the rotation matrices of cameras relative to the reference
camera were estimated.</p>
      <p>The influence of the angular orientation estimation error of the reference camera relative to the horizon plane on the geometry of
the panorama is shown in Fig. 4. As can be seen from the figure, the pitch estimation error leads to an incorrect horizon display (the
position of the spherical panorama equator is shown by the orange line).</p>
      <p>The results of the 1024768 pixels RoI forming with a 40°30° FoV and the line of sight angular coordinates  = 15° and
 = –5° with blending and without it are shown in Fig. 5 and 6 respectively. The times required to create one frame on the CPU
and GPU resources are summarized in Table 1 (the time for copying the data from the CPU memory to the GPU memory and
back is 3.5 ms), and the times for different RoI sizes are summarized in Table 2.</p>
      <p>In Fig. 7 and 8 are shown the results of intermediate computations for blending implementation: the boundaries of the camera
frames in the RoI (Fig. 7) and the 100-pixel width binary mask, over which the brightness is smoothed (Fig. 8).</p>
      <p>Increasing of the RoI forming time with blending more than 2 times is explained by the need to perform auxiliary procedures:
searching the intersection lines of camera frames in the RoI, determining areas for brightness fusion and smoothing images to
estimate the low-frequency component of brightness in the each camera frame. To reduce the amount of calculation, we apply a
smoothing procedure with an 88 window and an accumulator. Its computational complexity is O(n2).</p>
      <p>For ||t0i|| &lt; 15 cm and distance to objects d &gt; 70 m the hypothesis about camera lenses nodal points superimposition provides
the error of stitching on the camera frame boundaries not more than 5 pixels.</p>
      <p>The results of the experiment showed that the use of GPU resources makes it possible to reduce the RoI forming time up to 16
times in comparison with the CPU (when only one CPU core is used).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion References</title>
      <p>The algorithm for a video panorama construction, designed and implemented with the CUDA technology and parallelization
of computations on a GPU, for a one megapixel resolution region of interest provides panorama navigation with 50 fps on the
whole 8.2 megapixels panoramic video frame with 180°120° field of view.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Banta</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donaldson</surname>
            <given-names>G</given-names>
          </string-name>
          .
          <article-title>Apparatus and method for panoramic video hosting</article-title>
          .
          <source>Patent US US9516225</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] AN/AAQ-37 Distributed Aperture System (DAS) for the F-35</article-title>
          . URL: http://www.northropgrumman.com/Capabilities/ ANAAQ37F35/Pages/ default.aspx (
          <volume>01</volume>
          .
          <fpage>10</fpage>
          .
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] IronVisionTM</article-title>
          . URL: http://elbitsystems.com/media/IronVision.pdf (
          <volume>10</volume>
          .
          <fpage>09</fpage>
          .
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Hartley</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Multiple view geometry in computer vision</article-title>
          .
          <source>2nd edition</source>
          . Cambridge: Cambridge University Press,
          <year>2003</year>
          ;
          <volume>656</volume>
          р.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Szeliski</surname>
            <given-names>R.</given-names>
          </string-name>
          <article-title>Image alignment and stitching: a tutorial. Foundations and trends in computer graphics</article-title>
          and vision
          <year>2006</year>
          ;
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] Tilt sensing using a three-axis accelerometer</article-title>
          . URL: http://www.freescale.com/files/sensors/doc/app_note/AN3461.pdf (
          <volume>11</volume>
          .
          <fpage>04</fpage>
          .
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Wang</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>F</given-names>
          </string-name>
          .
          <article-title>Intelligent calibration method of low cost MEMS inertial measurement unit for an FPGA-based navigation system</article-title>
          .
          <source>International J. of Intelligent Engineering and Systems</source>
          <year>2011</year>
          ;
          <volume>4</volume>
          (
          <issue>2</issue>
          ):
          <fpage>32</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kuipers</given-names>
            <surname>JB</surname>
          </string-name>
          .
          <article-title>Quaternions and rotation sequences</article-title>
          . New Jersey: Princeton University,
          <year>1998</year>
          ; 400 p.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Brown</surname>
            <given-names>DC</given-names>
          </string-name>
          .
          <article-title>Close-range camera calibration</article-title>
          .
          <source>Photogrammetric Engineering</source>
          <year>1971</year>
          ;
          <volume>37</volume>
          (
          <issue>8</issue>
          ):
          <fpage>855</fpage>
          -
          <lpage>866</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Parker</surname>
            <given-names>JA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kenyon</surname>
            <given-names>RV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Troxel</surname>
            <given-names>DE</given-names>
          </string-name>
          .
          <article-title>Comparison of interpolating methods for image resampling</article-title>
          .
          <source>IEEE Trans. on Medical Imaging</source>
          <year>1983</year>
          ;
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Xiong</surname>
            <given-names>Y</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pulli</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Fast image stitching and editing for panorama painting on mobile phones</article-title>
          .
          <source>IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</source>
          ,
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          June 2010:
          <fpage>47</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Sanders</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kandrot</surname>
            <given-names>E.</given-names>
          </string-name>
          <article-title>CUDA by example</article-title>
          . New York: Addison-Wesley,
          <year>2010</year>
          ; 290 p.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <article-title>Camera Calibration and 3D reconstruction</article-title>
          . URL: http://docs.opencv.
          <source>org/2</source>
          .4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.
          <source>html (14.03</source>
          .
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>