<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Accuracy of object tracking based on time-multiplexed structured light</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>B. Wagner</string-name>
          <email>wagner@rob.uni-luebeck.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P. Stüber</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Wissel</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Bruder</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Schweikard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Ernst</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Lübeck, Graduate School for Computing in Medicine and Life Sciences</institution>
          ,
          <addr-line>Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Lübeck, Institute for Robotics and Cognitive Systems</institution>
          ,
          <addr-line>Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>139</fpage>
      <lpage>142</lpage>
      <abstract>
        <p>Our research group is currently developing a new optical head tracking system which utilises infrared laser light to measure features of the soft tissue on the patient's forehead. These features are intended to offer highly accurate registration with respect to the rigid skull structure by means of compensating for the soft tissue. In this context, the system also has to be able to quickly generate accurate reconstructions of the skin surface. For this purpose, we have developed a laser scanning device which uses time-multiplexed structured light to triangulate surface points. This paper shows, that time-multiplexed structured light can be used to generate highly accurate reconstructions of surfaces (RMS error 0.17 mm, Kinect: RMS error 0.89 mm). Moreover, we used our laser scanner for tracking of a rigid object to determine how this process is influenced by the remaining triangulation errors. It turned out that our scanning device can be used for high-accuracy tracking of objects (RMS errors of 0.33 mm and 0.12 degrees).</p>
      </abstract>
      <kwd-group>
        <kwd>optical head tracking</kwd>
        <kwd>time-multiplexed structured light</kwd>
        <kwd>triangulation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The laser scanning device generates a red laser point which is redirected by two moveable mirrors to create specific
laser point patterns on a target object. An IDS UI-3240CP-NIR-GL camera captures all laser points pattern wise. The laser
scanning device was configured to project a grid of 56 x 72 equidistant laser points at a frame rate of 10 Hz. Template
matching based on normalized cross-correlation is used for highly accurate detection of the centres of the imaged laser
points. In order to be able to reconstruct the surface of a target object, every laser ray was calibrated with respect to the
camera. Therefor the laser grid was projected onto a planar calibration body which was placed at different distances
with respect to the laser scanning device. By means of the calibration body, the homography matrix [5] was calculated
to map all imaged laser points to spatial points relative to the calibration body. This was done for every projected grid
and afterwards all spatial points were used to fit a 3D line for every laser ray. The final calibration step was given by
defining the pose of every laser ray with respect to the camera.</p>
      <p>The reconstruction of the surface of a target object requires knowledge about the mapping between the laser points in an
image and the matching calibrated laser rays. This is also known as the correspondence problem. Due to the field of
view and the shadowing of the camera, it can’t be guaranteed that all laser points of the projected pattern are captured
by the camera. Hence, the correspondence problem can’t be solved directly in the image. For this reason,
timemultiplexed structured light [6] has been utilised in this work. The basic idea of this method is that within n frames,
every laser ray projects a unique code for its identification. In this work, an extended version of the binary code is used for
this task. Since the target object will move during head tracking, a high number of consecutive zeros has to be avoided
in the code sequences to ensure the tracking of the laser points in an image sequence. For this reason, the binary code
was extended by so-called full frames. When the camera captures a full frame it is ensured that all laser points of the
pattern are projected. As shown in Fig. 1, the process starts with a full frame to get the initial location of the laser points
that are visible in the image. Subsequently, every image sequence is arranged as sub sequences. Every sub sequence has
a configurable length of images and always ends with a full frame. In this context, a length of two images means that a
sub sequence includes one code frame which is followed by a full frame. Furthermore, the code frames start with the
least significant bit and end with the most significant bit. After the first n sub sequences, all tracked laser points have
been identified and consequently a reconstruction can be carried out for the final full frame. Since the identified points
are tracked in the following images, a reconstruction can also be carried out for every following full frame. Moreover,
points that were not visible before are considered by the identification process as soon as they are visible at the
beginning of a sequence.
The reconstruction of the surface of a target object is carried out by triangulation of spatial laser points with respect to
the camera. The used method for the triangulation of a spatial laser point is referred to as the Linear-Eigen method and
depends on the respective laser point in the image and the corresponding calibrated laser ray. For triangulation in
general, it applies that incorrectly identified laser points in the image lead to outlying spatial points. To avoid this, the
identification of a laser point in the image is verified by means of the corresponding calibrated laser ray. Therefore the
calibrated laser ray is projected in the image plane. The resulting line in the image is called epipolar line. Afterwards, the
identification of the laser point in the image is verified by calculating the perpendicular distance of the point to the
epipolar line. The point is considered to be identified correctly if the perpendicular distance does not exceed a threshold
of 1 pixel.</p>
      <p>To analyse the triangulation accuracy of the developed laser scanning device, two experiments have been carried out. In
the first experiment, a planar surface was reconstructed to calculate the deviation of the resulting points with respect to
the plane. For this purpose, the resulting point cloud was processed with Principal Component Analysis (PCA) to
calculate the three principal axes of the point cloud. Subsequently, the point cloud was transformed by using the rotation
matrix that is given by the three computed principal axes. Afterwards, the mean-free equivalent of the transformed points
was calculated. Consequently, the sought deviation of the triangulated points was given by the point data regarding only
one axis. Since Microsoft’s Kinect represents an alternative for fast surface reconstruction (reconstruction rate of 30
Hz), we also compare our results to the triangulation accuracy of the Kinect device.
In the second experiment, the overall accuracy of the triangulation was determined by considering the deviations in all
axes. For this purpose, a plastic triangulation phantom was designed (see Fig. 2). Subsequently, a ground truth for the
reconstruction of the phantom was created. This ground truth was given by a CT-based point cloud of the phantom’s
surface (Siemens SOMATOM® Definition AS+, voxel size 0.359 x 0.359 x 0.6 mm³). Afterwards, the triangulation
phantom was reconstructed by using our laser scanning device. In order to calculate the overall accuracy of the
triangulation, the resulting point cloud was registered to the ground truth by using the Iterative Closest Point algorithm (ICP)
[7]. Here, the point-to-plane distance metric was utilised since different reconstructions of the same object contain only
few or no corresponding points. After registration, the overall triangulation accuracy was given by the remaining
distances between points and planes. Again our results were compared to the overall triangulation accuracy of the Kinect
device.
To determine the influence of the remaining triangulation errors on tracking, we used our laser scanner for tracking of a
rigid object (here given by the triangulation phantom). For this measurement, the scanning device was mounted to the
end-effector of an Adept Viper s850 robot. This approach allows the calculation of a ground truth for the tracking. The
tracking itself is realized by using the ICP algorithm for the registration of a 3D reference point cloud to a second 3D
point cloud. Here, the point-to-plane distance metric was used again. The result of the registration is given by a rotation
matrix R and a translation vector t which define the estimated pose of the object with respect to the camera. After the
acquisition of the first reconstruction (reference point cloud) of the surface of the object, consecutive translational
displacements of 0.2 mm were applied to the robot end-effector. The space which includes all applied end-effector
displacements is described by a sphere (with radius r = 6 mm) and the initial pose of the end-effector defines the centre of
the sphere. After the completion of a displacement, the laser point pattern was projected onto the surface of the object
and the camera captured a new image. Since the imaged laser points were already identified after the acquisition of the
reference point cloud, a new reconstruction of the surface could be calculated for every new full frame. Concerning the
described method for time-multiplexed structured light, a length of two images was utilized for the sub sequences. For
tracking, the reference point cloud was registered to all new reconstructions. Finally, the ground truth of the tracking
was given by the respective translational end-effector displacements. Based on this procedure, the tracking accuracy
was computed for a set of 200 tracking results.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Results</title>
      <p>The results in Tab. 1 and Tab. 2 show that our laser scanning device clearly outperforms the Kinect device as far as the
accuracy of the triangulation is considered. One major reason for the high accuracy of the triangulation is given by the
use of time multiplexed structured light. This method ensures that the number of incorrectly identified laser points in an
image is very low. Furthermore, identified points are verified by means of their corresponding epipolar lines. The results
in Tab. 3 show that the remaining triangulation errors only cause small inaccuracies concerning the tracking of a rigid
object. Hence, the reconstructed point clouds can be used for highly accurate tracking of objects.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>Currently, we are developing a new optical head tracking system which utilises infrared laser light to measure features
of the soft tissue on the patient’s head. These features are intended to offer highly accurate registration with respect to
the rigid skull structure by means of compensating for the soft tissue. In this context, the system also has to be capable
of a fast generation of accurate reconstructions of the skin surface. For this purpose, we developed a laser scanning
device which uses time-multiplexed structured light to triangulate surface points. This paper shows, that time-multiplexed
structured light can be used to generate highly accurate reconstructions of surfaces. Since Microsoft’s Kinect represents
an alternative for fast surface reconstruction, we also compare our results to the triangulation accuracy of the Kinect
device. The results show that our laser scanning device outperforms the Kinect device by a factor of five. To determine the
influence of the remaining triangulation errors on tracking, we used our developed laser scanner for tracking of a rigid
object. It turned out that the remaining triangulation errors only cause small inaccuracies for the tracking. In future
works, the presented tracking shall be improved by using a stochastic filter which offers a better compensation of noise
over time.</p>
      <p>This work was supported by Varian Medical Systems Inc. (Palo Alto, CA, USA). Furthermore, this work was supported
by the Graduate School for Computing in Medicine and Life Sciences funded by Germany’s Excellence Initiative [DFG
GSC 235/1].
6
7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Kuduvalli,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Mitrovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ;
            <surname>Main</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            ;
            <surname>Thomson</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Automated skull tracking for the CyberKnife imageguided radiosurgery system</article-title>
          .
          <source>Proc. SPIE 5744</source>
          ,
          <string-name>
            <surname>Medical</surname>
            <given-names>Imaging 2005</given-names>
          </string-name>
          : Visualization,
          <string-name>
            <surname>Image-Guided Procedures</surname>
          </string-name>
          , and
          <string-name>
            <surname>Display</surname>
          </string-name>
          , April 2005
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>GY</given-names>
          </string-name>
          ; Pawlicki,
          <string-name>
            <surname>T.</surname>
          </string-name>
          ; Le,
          <string-name>
            <surname>QT</surname>
          </string-name>
          ; Luxton,
          <string-name>
            <surname>G.</surname>
          </string-name>
          :
          <article-title>Linac-based on-board imaging feasibility and the dosimetric consequences of head roll in head-and-neck IMRT plans</article-title>
          .
          <source>Medical Dosimetry</source>
          , Volume
          <volume>33</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>1</given-names>
          </string-name>
          ,
          <year>Spring 2008</year>
          , pages
          <fpage>93</fpage>
          -
          <lpage>99</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Ernst</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Bruder</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Wissel,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Stüber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Wagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Schweikard</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Real time contact-free and non-invasive tracking of the human skull - first light and initial validation</article-title>
          .
          <source>Proceedings of SPIE Optical Engineering + Applications</source>
          , SPIE Optics + Photonics,
          <string-name>
            <surname>SPIE</surname>
          </string-name>
          ,
          <year>August 2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Wagner</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Stüber</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Wissel,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Bruder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ;
            <surname>Schweikard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Ernst</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Time-multiplexed structured light for head tracking</article-title>
          . 44.
          <article-title>Jahrestagung der Deutschen Gesellschaft für Medizinische Physik</article-title>
          ,
          <string-name>
            <surname>DGMP</surname>
          </string-name>
          ,
          <year>September 2013</year>
          Hartley, R.;
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Multiple View Geometry in Computer Vision</article-title>
          . 2nd edition. Cambridge University Press,
          <year>2003</year>
          .
          <source>- ISBN 0-521-54051-8</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Salvi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Fernandez,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Pribanic,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Llado</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.:</surname>
          </string-name>
          <article-title>A state of the art in structured light patterns for surface profilometry</article-title>
          .
          <source>Pattern Recognition</source>
          , Volume
          <volume>43</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>8</given-names>
          </string-name>
          ,
          <year>August 2010</year>
          , pages
          <fpage>2666</fpage>
          -
          <lpage>2680</lpage>
          Besl, P.J.;
          <string-name>
            <surname>McKay</surname>
            ,
            <given-names>H.D.:</given-names>
          </string-name>
          <article-title>A method for registration of 3-d shapes</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , Volume
          <volume>14</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>2</given-names>
          </string-name>
          ,
          <year>1992</year>
          , pages
          <fpage>239</fpage>
          -
          <lpage>256</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>