<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visual-inertial odometry algorithms on the base of thermal camera</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>A P Alekseev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>E V Goshin</string-name>
          <email>goshine@yandex.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N S Davydov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N A Ivliev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A V Nikonorov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute of RAS - Branch of the FSRC "Crystallography and Photonics" RAS</institution>
          ,
          <addr-line>Molodogvardejskaya street 151, Samara, Russia, 443001</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>Moskovskoe Shosse, 34А, Samara, Russia, 443086</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>183</fpage>
      <lpage>188</lpage>
      <abstract>
        <p>A trajectory building based on a camera data is one of the most popular tasks in the field of machine vision. In particular, this task appears when it is necessary to navigate in the absence of signals from global navigation systems such as GLONASS and GPS. In this work, study of existing methods of visual odometry for the flight trajectory restoration by shooting an infrared camera of the thermal range were conducted. To improve the accuracy, it is proposed to use the data from inertial sensors. As a result, it is shown that the proposed solution allows to successfully solve the problem of trajectory reconstruction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Nowadays, there is a need to create methods for building navigation systems based on the usage of
video data, which would be based on cheap off-the-shelf hardware and software. One of the
fundamental tasks in the field of mobile robots and unmanned vehicles is the localization of the object
or vehicle. Along with existing systems, such as pulsed laser systems (e.g. LiDAR [2, 7], IMU [5],
GPS, radar [6]), visual odometry methods are of great interest. These methods use the video stream of
the camera installed on the object. Due to its low cost in comparison to most other technical means
and the availability of algorithms capable of qualitatively converting photometric information into
location information, this approach turns out to be quite promising. Of course, this method has
disadvantages. Poor illumination of the scene can adversely affect the assessment of movement,
besides, it is necessary to dominate the environment of static objects for better correct comparison.</p>
      <p>Visual odometry in the case of infrared (IR) shooting has additional difficulties. The low contrast
of the image from the infrared camera make it necessary to do an additional processing, which affects
its speed. Sharpness of objects in the IR range is low. In addition, there are fundamental geometric
constraints for determining the precise rotation and movement of a camera through images. For many
systems, however, this is the most promising approach, its combinations with the use of other
additional sensors (LiDAR, IMU, etc.) are not uncommon. Modern researches are far advanced in
easing restrictions for the applicability of this method. Such researches are based on several 3D
reconstruction techniques, which are widely useful in other applications e.g. [8], [9]. This paper shows
the result of testing modern approaches of visual odometry in relation to data from a monocular IR
camera.</p>
      <p>In this article, a comparison of the existing approaches to the trajectory building with the use of a
video sequence from an IR camera is made, the optimal approach has been chosen and a conclusion
has been made about its applicability for solving the problem of visual odometry. An approach to
improving the accuracy of IR odometry through the use of inertial sensor readings is also proposed.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Monocular visual inertial odometry methods</title>
      <p>
        Most monocular visual odometry algorithms for small UAVs are based on the technology of parallel
tracking and mapping of PTAM (Parallel Tracking and Mapping) [3]. The PTAM technology is based
on the method of simultaneous localization and mapping of the SLAM (simultaneous localization and
mapping) map. SLAM provides reliability by tracking and displaying hundreds of control points and it
works in real time. A special feature is the simultaneous execution of the task of displaying
destinations and estimating the displacement, taking into account a rather effective correction based on
processing images from different viewing angles. PTAM technology was developed for augmented
reality applications in small spaces. There are several modifications of it, for example, with a limited
number of personnel, which ensures its full-fledged work in the conditions of spaces with urban-type
buildings [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>According to research [4, 8]:
Nowadays the following methods are applied:
• single chamber (monoVO)
• multichamber (stereoVO, RGBD-vo)
Methods using data from one camera:
Direct:
• direct distributed (DSO - direct sparse odometry)
• semidirect (SVO - semi-direct odometry)
Indirect:
• method of key points (feature-base visual odometry)
• half-line
A comparison of the approaches is presented in Table 1, the details are analyzed in detail in [4, 8].</p>
      <sec id="sec-2-1">
        <title>Use a small amount of information from the images</title>
      </sec>
      <sec id="sec-2-2">
        <title>Do not require complex initialization</title>
      </sec>
      <sec id="sec-2-3">
        <title>Sensitive to image intensity distortions</title>
      </sec>
      <sec id="sec-2-4">
        <title>Over 20 years of intensive development</title>
      </sec>
      <sec id="sec-2-5">
        <title>Use more complete information</title>
      </sec>
      <sec id="sec-2-6">
        <title>Require complex initialization</title>
      </sec>
      <sec id="sec-2-7">
        <title>Less sensitive to image intensity distortions</title>
      </sec>
      <sec id="sec-2-8">
        <title>About 4 years of research</title>
        <p>All monocular algorithms have a set of similar requirements and limitations:
1. The need for accurate camera calibration. Less critical for feature-based algorithms.
2. The inability to determine the scale without the help of external sensors or user.</p>
        <p>3. Camera requirements: high shooting speed and wide viewing angle. These parameters are
associated with each other, and with the maximum speed of movement of the camera.</p>
        <p>For the solvable problem of odometry using IR video streams, some of the drawbacks of indirect
methods are critical, for example, when IR shooting, the sensitivity of the method to changes in
intensity is essential. That is why the choice was made in favor of the algorithm DSO.</p>
        <sec id="sec-2-8-1">
          <title>2.1. Description of the direct visual odometry method</title>
          <p>The semi-direct odometry model is associated with a dense model, however, it will be shown later that
sparse depth information is sufficient to obtain a rough estimate of the motion and the search function
of control points [4]. As soon as the control points and the initial position of the camera are found, the
algorithm will use only control points. This explains its name - “semidirect”. This technique allows
you to quickly set the processing frame in the new image. The Bayesian filter, which explicitly
removes erroneous measurements, also evaluates the depth and positions of the control points. That is,
a point is plotted on a three-dimensional map only when the accompanying depth filter has given
convergence, which requires multiple measurements. The result of the work is a three-dimensional
map with control points, whose reliability has been verified</p>
          <p>The speed of the DSO algorithm significantly increases if the control points are not extracted from
all frames, but only the key ones. Accuracy is also increased when using the sub-pixel checkpoint
extraction function. Unlike direct methods, many small sections are used here, rather than several
large flat sections, which positively affects the speed of the algorithm and its reliability.</p>
          <p>As part of the project, the task was to form the trajectory of the infrared camera with reference to
the objects of the terrain. The first stage is the assessment of existing approaches for solving this
problem in the visible range. Despite the various methods of counting, common to all methods is the
conversion of the processed image into shades of gray. Further actions take place with a
monochromatic image.</p>
          <p>Direct sparse odometry is based on continuous optimization of the photometric error over the last
frame window, taking into account the photometrically calibrated imaging model. Unlike existing
direct methods, all the involved parameters are jointly optimized here (built-in cameras, external
cameras and inverse depth values), effectively performing the photometric equivalent of adjusting a
window sparse beam. The geometric representation used by other direct approaches is preserved, that
is, the points are represented as the inverse depth in the frame of reference (and, thus, have the same
degree of freedom).</p>
        </sec>
        <sec id="sec-2-8-2">
          <title>2.2. Visual inertial odometry</title>
          <p>This paper proposes a modification of the DSO, which allows to take into account the data of inertial
sensors when restoring the trajectory. According to [4], the key procedure in the DSO algorithm is to
minimize the photometric error Epj between the neighborhood of a point p in the reference frame Ii
and the neighborhood of the corresponding point p ' in the target frame I j :
t eaj
Epj   wp  I j (p ')  bj   j
pNp tieai
 Ii (p)  bi  .</p>
          <p>
Here, Np neighborhood of the point where minimization is performed, t jea j and t jea j describe the
exposure of the relevant frames,</p>
          <p> Huber rate, wp keyframe weight.</p>
          <p>In the original DSO point p ' is found as a back projection of a point p in the reference frame:
p '  с Rc1(p, dp )  t  ,
where dp inverse point depth, R and t rotation matrix and translation vector.</p>
          <p>In this work, it is proposed to replace the rotation matrix and translation vector in formula (2) with
their estimates obtained using inertial sensors. The results of experiments showed, that a visual-inertial
modification of the DSO algorithm allowed to significantly increase the accuracy of visual odometry.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. IR Camera Calibration</title>
        <p>For successful calculations and trajectory construction, it is necessary to calibrate the camera. For
cameras of the visual range there are special scales (Figure 1).</p>
        <p>The infrared camera requires the same calibration board with contrast fields. After a series of
experiments, the substrate material was selected, the printing method and the ink composition for the
(1)
(2)
areas absorbing infrared radiation, and a coating with maximum contrast was found by selecting
materials. Figure 1 on the right shows the finished layout of the board for calibrating the IR camera.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experiment</title>
        <p>To check the quality of the developed algorithms for visual inertial odometry, a full-scale survey was
conducted from a hydroplane using two cameras, an IR and a visible range. As the infrared camera
used camera thermal range COX 1000 with a resolution of 1024 * 768. A suspension was made for the
field survey (Figure 2). For the control shooting used camera GoPro3. The pitch of both cameras is the
same. The platform was fixed on the front wing of the wing to eliminate the structural elements of the
aircraft from falling into the frame (Figure 2).</p>
        <p>To control the accuracy of the constructed trajectory during the flight, the GPS coordinates were
recorded using a mobile navigator. The trajectory of the flight experiment built on GPS is shown in
Figure 3. The inertial sensor readings were taken from a mobile phone.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Accuracy of trajectory recovery using visual inertial odometry methods</title>
        <p>In some areas of the flight, it was possible to build a fairly accurate flight trajectory. This selectivity is
due to the characteristics of the infrared camera matrix, as well as the presence in the frame of fairly
large areas with a uniform surface (water, forest). On such sites, the restoration of the trajectory is
difficult. Examples of frames where trajectory recovery was completed successfully are shown in
Figure 4.</p>
        <p>The proposed adjustment of the visual odometry algorithm through the use of inertial sensor data
has significantly improved the accuracy of trajectory recovery. Figure 5 shows a comparison of the
obtained trajectory sections with GPS data. The average deviation in the studied areas did not exceed
20 meters, which confirms the efficiency of the proposed approach.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this work, direct and indirect visual odometry methods for the problem of trajectory recovery from
a sequence of frames taken in the thermal infrared range are investigated. For the experimental study,
the IR camera was calibrated, a test bench was built, a test laboratory was assembled, full-scale video
recordings were carried out with a camera installed on a small aircraft, under various lighting
conditions and in different temperature conditions. The methods of direct visual odometry showed
acceptable results on the quality of the hydroplane trajectory recovery in comparison with the GPS
data. The modification proposed in this paper, which makes it possible to use inertial sensor data to
clarify visual odometry, has provided a significant increase in the trajectory recovery accuracy.
Acknowledgment
Methods and algorithms were designed and developed with the support of RFBR grants (projects
1929-01235-mk, 16-29-11744-ofi_m, № 16-29-09528- ofi_m, № 17-29-03112- ofi_m, №
18-07-01390А, № 18-37-00457-mol_а), optics and experimental studies - in the framework of the state assignment
of the IPSI RAS - a branch of the Federal Scientific-Research Center "Crystallography and Photonics"
of the RAS (agreement № 007-ГЗ/Ч3363/26).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Baker</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matthews</surname>
            <given-names>I 2004</given-names>
          </string-name>
          <article-title>Lucas-kanade 20 years on: A unifying framework</article-title>
          <source>International journal of computer vision 56</source>
          (
          <issue>3</issue>
          )
          <fpage>221</fpage>
          -
          <lpage>255</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>