<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Panorama Stitching Method Using Sensor Fusion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleksei Goncharov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei Bykovskii</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ITMO University</institution>
          ,
          <addr-line>49 Kronverksky Pr., St. Petersburg, 197101</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A commonly used solution for stitching a set of images into a panorama is to use computer vision algorithms. The greatest computational complexity in these algorithms present by the methods of image analysis, specifically, the methods for finding key points. Now there are many methods for finding key points, suitable for various conditions and shooting parameters of the initial set of frames. By choosing the correct method, you can avoid stitching defects and get the final image faster. This article introduces a method that allows you to consider the initial set of images and select a suitable algorithm for finding key points by using various data from sensors. This method allows obtaining final panoramic images without significant defects, as well as better performance relative to the compared methods for finding key points. The developed method, using the PASSAT dataset as an example, made it possible to obtain a final panoramic image of about 1.33 Mb in size in 16 seconds, regardless of the number of frames used (11, 8 or 6) with an angular displacement of (25/35/45) degrees, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Computer vision</kwd>
        <kwd>Embedded systems</kwd>
        <kwd>Cyber-Physical systems</kwd>
        <kwd>Panoramic photography</kwd>
        <kwd>Sensor fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>position of the camera relative to the captured object (change in perspective). To search for
interpreted information on the image, it is necessary to link to the local features of the image.</p>
      <p>Diferent algorithms for selecting key points do not provide universal solutions for
diferent images due to the specifics of determining local features. In this study, it is proposed to
use various sensor data to select the most appropriate algorithm for searching for key points,
depending on the scene in the image, the angular displacement between frames, illumination,
and other parameters.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        The technical literature is rich in new detection features and image description algorithms[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
However, to this day, there is no ideal detector[4]. This is mainly due to the almost infinite
number of possible computer vision applications (which may require one or more functions)[5],
the discrepancy in the image conditions (zoom, viewpoint, lighting and contrast, image quality,
compression, etc.)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and the possible scene[6]. The computational eficiency of such detectors
becomes even more important when considered for real-time applications[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Three algorithms (SIFT, SURF, ORB) were studied in detail and the following conclusions
were made:
1. ORB algorithm - the fastest algorithm, but with a lower percentage of matches among
other algorithms.
2. The SIFT algorithm is the slowest, but at the same time it surpasses other algorithms in
terms of percentage coincidence in most cases of frame distortion considered.
3. The SURF algorithm is close enough in percentage coincidence to the SIFT algorithm and
is close in speed to the ORB algorithm.
4. It is important to note that the ORB algorithm finds key points mainly in the center of
the image, while the SIFT and SURF algorithms are evenly distributed over the entire
image.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposal</title>
      <p>When creating panoramas, you can use various auxiliary data of the device from which the
shooting was carried out: based on a timer, an encoder, a gyroscope, or other sensors.
Smartphones are often used to capture panoramic images, the image below shows the various sensors
found on most smartphones.</p>
      <p>For general control of the initial set of images, control of overlapping between frames and
ofsets along the axes, angular displacements, you can modify the basic algorithm based on
the OpenCV library as shown in the block diagram below. As can be seen from the proposed
block diagram, before starting the algorithm, it is planned to analyze the position of the camera
between images to warn the user about the uselessness of processing this set of frames. With
further stitching of images, it is proposed to estimate the displacement between frames, and,
accordingly, the total area of overlap between frames. This approach will allow full control of
the original images for the suitability of stitching into a general panoramic image, as well as
control between individual frames, stopping the algorithm when the general overlaps between
images are lost, and at the output of the algorithm, the user will not be provided with a full
panorama, but correct in terms of image integrity. Figure 1 below shows the minimum required
equipment for using the developed method and briefly shows the algorithm.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>To test the developed method, use the PASSTA datasets (ie, image sets) of Linkoping University.
These image sets have a few functions:
1. Images were taken from a camera mounted on a tripod.
2. Between each subsequent image, the camera rotates around the vertical axis through the
optical center.</p>
      <p>3. Small enough.</p>
      <p>The dataset includes three sets of images. Sets of images were used for the experiment:
1. Blue Dining Room: Contains 72 images captured with the Canon DS50, perspective
lenses with a poor resolution of 1280 x 1920 pixels under lighting. The panoramic head
Figure 2: Proposed algorithm based on OpenCV algorithm
was used to rotate approximately 5 degrees around the vertical axis around the optical
center of the camera.
2. Dining Room: Consists of 72 images captured with Canon DS70 Samyang 2.8 / 10mm
wide-angle lenses (about 105 degrees), with a resolution of 5740x3780 pixels. The panoramic
head was used to rotate approximately 5 degrees of the vertical axis around the optical
center of the camera.</p>
      <p>Figures 3 and 6 below show the results of the developed application at various angular
displacements between frames on the proposed datasets. For comparison, the results of work with the
same initial data of the basic algorithm are presented in Figures 4, 5 for the “LunchRoomBlue”
set, in Figures 7, 8 for the “LunchRoom” set. Figures 9 show the defect and distortions used in
panoramas when the algorithm operates in the angular values that are limiting for the search
for key points, with violation of the spacing. Figure 10 shows for comparison the work of the
developed method and OpenCV tools when using various methods for finding key points, the
arising defects in the final images are separately marked.</p>
      <p>Figure 11 shows a comparison diagram with diferent input data for the developed method
and the standard method library OpenCV using diferent methods for finding key points.
Defects in the final images are indicated separately. Table 1 below shows the results obtained with
various input data and methods used.</p>
      <p>The operating time of the developed method was estimated for various sets of initial images.
Based on the data obtained, the following conclusions can be drawn:
1. The developed method, regardless of the displacement between frames, creates a panoramic
image in approximately the same time
2. With an angular displacement between frames up to 45 degrees for light scenes and up
to 40 for dark scenes, the developed method does not create obvious defects in the final
image.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The analysis of existing methods for creating panoramic images carried out, methods for
finding key points in images using computer vision are analyzed in detail. As a result of the
analysis, it was concluded that the presence of diferent methods is dictated by the diference in
applied problems and, accordingly, objects in the images that require the search for key points.
A method was developed for creating panoramic images using multisensory data based on the
OpenCV library in the Python programming language. To improve the quality of the created
panoramic images by using the most suitable key point search algorithm for scenes on the
original frames, as well as to control the mutual overlap between frames, shifts and
displacements, it was proposed to add data from position sensors (gyroscope and accelerometer) to the
algorithm. Choosing the optimal algorithm for finding key points also allows you to reduce the
total running time of the algorithm without losing quality. It can be concluded that multisensor
data is useful for creating panoramic images. At the same time, the developed method can be
implemented in an embedded system due to a decrease in the operating time due to the use of
an optimal algorithm for finding key points.</p>
      <p>The developed method, using the PASSAT dataset as an example, made it possible to obtain
a final panoramic image of about 1.33 Mb in size in 16 seconds, regardless of the number of
frames used (11, 8 or 6) with an angular displacement of (25/35/45) degrees, respectively.</p>
      <p>The developed method is adapted for expansion and use with other algorithms for finding
key points, as well as the use of various sensors.</p>
      <sec id="sec-5-1">
        <title>Number of frames</title>
      </sec>
      <sec id="sec-5-2">
        <title>Panorama stitching time, seconds</title>
      </sec>
      <sec id="sec-5-3">
        <title>Final</title>
        <p>image
size, Mb</p>
        <p>Presence
of
obvious
defects
16,04
18,49
19,23
16,04
14,11
16,27
16,65
16,27
13,41
14,93
15,58
15,58
1,27
1,3
1,29
1,27
1,33
1,38
1,32
1,38
1,35
1,31
1,34
1,34</p>
        <p>No
No
No
No
Yes
No
No
No
Yes
Yes
No
No</p>
        <p>Circuits, and Systems
[4] Tareen S. A. K., Saleem Z. A comparative analysis of sift, surf, kaze, akaze, orb, and brisk
//2018 International conference on computing, mathematics and engineering technologies
(iCoMET). – IEEE, 2018. – C. 1-10
[5] Karami E., Prasad S., Shehata M. Image matching using SIFT, SURF, BRIEF and ORB:
performance comparison for distorted images //arXiv preprint arXiv:1710.02726. – 2017
[6] Jayanthi N., Indu S. Comparison of image matching techniques //International Journal of
Latest Trends in Engineering and Technology. – 2016. – T. 7. – №. 3. – C. 396-401</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Tian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Ding</surname>
          </string-name>
          , “
          <article-title>A survey of recent advances in visual feature detection</article-title>
          ,
          <source>” Neurocomputing</source>
          , vol.
          <volume>149</volume>
          , pp.
          <fpage>736</fpage>
          -
          <lpage>751</lpage>
          ,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Gu</surname>
          </string-name>
          , “
          <article-title>Extracting semantic information from visual data: A survey,” Robotics</article-title>
          , vol.
          <volume>5</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>8</fpage>
          ,
          <issue>2016</issue>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Salahat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Saleh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mohammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Qutayri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sluzek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ismail</surname>
          </string-name>
          , “
          <string-name>
            <surname>Automated</surname>
          </string-name>
          real
          <article-title>-time video surveillance algorithms for soc implementation: A survey,” in Electronics,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>