<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Estimation of error sources for optical head tracking in cranial radiation therapy</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>P. Grüning</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P. Stüber</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L. Richter</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O. Blanck</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>R. Bruder</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Schweikard</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department for Radiation Oncology, University Hospital of Lübeck</institution>
          ,
          <addr-line>Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Graduate School for Computing in Life Science, University of Lübeck</institution>
          ,
          <addr-line>Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Robotics and Cognitive Systems, University of Lübeck</institution>
          ,
          <addr-line>Lübeck</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2012</year>
      </pub-date>
      <fpage>219</fpage>
      <lpage>222</lpage>
      <abstract>
        <p>There is a growing demand for high-accuracy and frameless solutions for cranial radiation therapy. Among different approaches for intra-fractional head tracking using X-Ray or MV imaging or Cone Beam CT, optical head tracking in particular promises high spatial and temporal resolution with a minimum of system latency and no additional dose exposition. It may therefore be ideal for motion-compensated or high-accuracy cranial radiation therapy. Nevertheless, up to now optical systems lack accuracy and are therefore only found in prototypes or test setups. Using a consumer-grade optical rangefinder, we have built a test setup to systematically quantify critical error sources for tracking systems based on triangulation. Subsequently, we present and discuss potential solutions to minimize the error.</p>
      </abstract>
      <kwd-group>
        <kwd>Head tracking</kwd>
        <kwd>cranial radiation therapy</kwd>
        <kwd>image-guidance</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Purpose</title>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>The basic setup for this measurement was a human head phantom made of Styrofoam, which was mounted to an
industrial robot (Adept Viper s850, Adept Technology, Inc., Livermore, CA, USA), serving as a ground truth.
The robot moved to over 1500 positions with known locations. At each position, data was acquired and the
distances of the movements were later compared to the ground truth. In this way, we are able to systematically
evaluate the tracking capabilities of the Microsoft Kinect. Furthermore, the amount and location of noise was
calculated. However, in contrast to off-the-shelf tracking systems (e.g. the Polaris system), the Primesense sensor
does not provide the spatial position and orientation of a tracker marker. Instead, it results in a 3D point cloud
representing the surface of the scanned object. Generally, a point cloud  can be described as a set of points  ∈
 , where each point represents a set of different features  = { ,  , … ,  }. In our case, those features are the
3D position with respect to the origin of a 3D-camera’s head and the RGB-coded color. Further data processing
must be therefore applied to gain the head pose over time. For this direct head tracking, a template is matched
upon the captured data. The Microsoft Kinect data is range filtered to an area the object is presumed, ignoring
unnecessary points to decrease the computation time. The distance between the template and the acquired data
can be quite large. Therefore, surface normals are estimated for both point clouds, approximating a plane using
points in a predefined radius for each normal. With this information, the template is roughly matched upon the
sensor data. Subsequently, an Iterative-closest-points (ICP) algorithm is employed to precisely match both data
sets. The ICP-algorithm iteratively computes a rigid transformation to fit one set of points to another with a
given accuracy or maximum number of iterations. The template is now successively transformed and a rigid
transformation matrix is obtained for every scan. Based on the current position of the template and the estimated
transformation, assertions can be made about the head’s pose.</p>
      <p>The depth sensor consists of an infrared-laser projector (λ=830nm, P &lt; 60mW), which is emitting a known
speckle pattern. An infrared-Sensor (IR-sensor) is able to collect the reflected light of the sent pattern. The
resolution is 640x480pixel and the sensor’s angle is 57° horizontal and 43° vertical, providing 11Bit of
information with 30 frames per second. In addition, a processor (PrimeSense PS1080-A2-Chip) computes the
distances in the captured image.</p>
      <p>The processor’s memory includes a reference image Iref, capturing a known, usually planar, object at a certain
distance Zo. The distance estimation for a new Image is done by comparing parts of it to Iref with the obligation to
find the best match. Deviations in the scaling of the speckle pattern can lead to assumptions about the spatial
change in the z-direction. Dark areas, respectively pixels which fall below a certain threshold, are regarded as
shadow areas and are thus of no interest for further processing. Considering triangulation, a change in z-direction
δZ produces a proportional speckle shift in the x-direction   which is described as:
whereas S is the distance of projector to sensor.</p>
      <p>As we were able to obtain a 3D point cloud for any object we scan, we used our robot-based setup to evaluate the
sensor. To systematically analyze the accuracy and performance of the PrimeSense sensor system, we mount a
human head phantom to the robot’s end effector. We positioned the PrimeSense sensor opposite to the robot,
facing the mounted head phantom. Subsequently, we moved the effector with the attached head phantom to a set
of roughly 1500 well known poses within a 3D grid spaced at 20 mm. At each pose, we tracked the head
phantom with the sensor. To evaluate the stability of the measurements, we recorded 8 point clouds at each
position.</p>
      <p>Further, those point clouds were transformed into 2D range images, containing each point’s z-location as the
pixel value with a resolution of 2mm for each pixel. This format allowed the calculation of mean and standard
derivation images. The latter gives information about the distribution of noise within each position.
In a next step, the accuracy of the movement detection was evaluated. The main problem was to simplify the set
of points provided by each data set to a single point representing the head’s position precisely. To minimize the
error of the evaluation itself, this single point should correspond to every measured point cloud in the same
manner. Two possible solutions were used:
1. For every mean image, a center spot was calculated by computing the mean vector of the points that could be
found in all of the 8 images of the position. In several locations, especially on the sides of the head, whole
parts of the point cloud can appear and disappear from picture to picture. A sudden gain of points on a
particular side of the image can dramatically change its center. Leaving out those areas provided a consistent
centre spot for every position.
2. Since the measurement focused on translations and the head itself did not rotate while varying its position,
the assumption could be made that the phantom’s nose tip was always the area nearest to the camera.
Therefore, the 20 nearest points for each range image were selected and from those, outliers were removed.
The medial pixel represented the nose tip for a single range image and for the 8 scans of each position, a
grand mean was calculated.</p>
      <p>After estimating those particular points, the movement from one point to another was calculated, using the
Euclidian norm, and the difference from the ground truth was estimated.</p>
      <p>Further, an ICP-evaluation was done. For every captured point cloud, a transformation was calculated from its
preceding position. With the estimation of the rotation angles and the length of the translation, it was possible to
examine whether the movement was recognized correctly.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>The estimated standard derivation (SD) images showed an accumulation of relatively big error pixels in areas of
a high pixel value gradient (Fig.1). Those regions were, for example, the side of the nose as well as the corners
of the face. Pixels which could not be consistently found in all 8 images were not regarded, because presence and
absence of certain pixels are hard to quantify. However, those pixels are another error source and all object edges
are showing those problems. In a certain perspective, whole areas, respectively body parts, can alternately appear
and disappear. The noise for planar areas was scattered equally with an average of less than 0.4 mm. Regarding
the spatial distribution of a position’s mean SD, a small enhancement to 0.2 mm could be detected in the areas
that were farer than 790 mm away from the camera. On average the SD is 0.4 mm (Fig.2).</p>
      <p>The Microsoft Kinect camera was able to detect movements with an accuracy of 1.7 mm +/- 2.7 mm, using the
center point of the mean image (Fig.3). The detection of the tip of the nose brought an accuracy of 1.7 mm
+/3.6 mm. The spatial distribution showed no noticeable areas. In addition, the estimated points representing the
tip of the nose were compared to each other for every position. The average of the maximum error was 1 mm
+/1.6 mm. Due to noise, the highest estimated value was 650 mm. The median error was 0.4mm. For the above
mentioned measurements, the mean percentage of outliers was 5.24% +/- 1.6%. Those average 80 outliers were
mostly caused by noise speckles that did not belong to the object, but were not removed adequately by the
implemented filter methods. Those located in an area nearer to the camera than the object tremendously changed
the outcome of the computations.</p>
      <p>With an average error of 7.5 mm +/- 10 mm, the ICP algorithm could not keep up with the preceding results. It is
very likely that 20 mm distance is too far to get a sufficient matching and that the algorithm is mislead by local
minima. The Euclidian norm of the three averaged rotation angles was 0.015°. This means, the method
succeeded in identifying, that the movement was only a translation. Regarding the 3D-error-plot (Fig.5), the
large errors were equally distributed and showed no particular accumulation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In general, a motion tracking system based on triangulation, like the Microsoft Kinect, might be applied for
head-tracking during radiation therapy. However, the presented setup is not optimal. The used sensor is only a
standard consumer electronics device and thus, there is a lot of potential for an accuracy increase, by using
specialized equipment, for example.
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.5
1
1.5
2
2.5
3
3.5
4
0.8 0.75
-0.1
0</p>
      <p>Nevertheless, our measurements give an insight into the capabilities of all range finders based on the
triangulation principle. We have found a set of common problems: First of all, no information is gained from
shadowed areas, leaving a constrained data field to the user. Second, when facing a high depth gradient high
noise can be regarded including an inconsistent number of points. Third, there is an occurrence of noise speckles
which do not belong to the object. Positive aspects are, that the camera showed no signs of an accumulation of
errors in a certain location or a spatial distortion, but it is substantial to consider the camera’s ground noise. It
was not possible to achieve an accuracy better than 1.7 mm, which clearly not fulfills the localization
requirements for radiation therapy. As for the software, the ICP algorithm could not compete with our other
methods, as it is too susceptible to noise and too unstable for a distance of 20 mm. Still, the ICP remains an
essential part of data processing, since it can track the head with full six degrees-of-freedom. To reduce the ICP
error, smart templates are needed which on the one hand, avoid noisy areas to increase the robustness of the
calculation, but on the other hand, need to contain distinguishable landmarks. Moreover, data filtering and
preprocessing should be used, for example by calculating averaged data. Further investigations are needed, in
both hard- and software, to overcome the evaluated problems.</p>
      <p>In summary, we presented a first setup that allows for a systematic analysis of error sources with tracking
devices based on structured light. Even though current systems do not fulfill the desired accuracy, our proposed
idea might be applicable with the next generation of depth sensors.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>M.J. Murphy</surname>
            ,
            <given-names>S.D.</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>I.C.</given-names>
          </string-name>
          <string-name>
            <surname>Gibbs</surname>
            ,
            <given-names>Q.T.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Hai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          <string-name>
            <surname>Adler</surname>
          </string-name>
          Jr,
          <article-title>Patterns of patient movement during frameless image-guided radiosurgery</article-title>
          ,
          <source>Int J Radiat Oncol Biol Phys</source>
          ,
          <year>2003</year>
          D.A.
          <string-name>
            <surname>Jaffray</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          <string-name>
            <surname>Siewerdsen</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          <string-name>
            <surname>Martinez</surname>
          </string-name>
          ,
          <article-title>Flat-panel cone-beam computed tomography for image-guided radiation therapy</article-title>
          ,
          <source>Int J Radiat Oncol Biol Phys</source>
          , 2002
          <string-name>
            <given-names>T.</given-names>
            <surname>Moser</surname>
          </string-name>
          , G. Habl,
          <string-name>
            <given-names>M.</given-names>
            <surname>Uhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Schubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sroka-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Debus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Herfarth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.P.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <article-title>Clinical Evaluation of a Laser Surface Scanning System in 120 Patients for Improving Daily Setup Accuracy in Fractionated Radiation Therapy</article-title>
          ,
          <source>Int J Radiat Oncol Biol Phys</source>
          , 2012
          <string-name>
            <given-names>S.</given-names>
            <surname>Izadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hilliges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molyneaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Newcombe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kohli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shotton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hodges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fitzgibbon</surname>
          </string-name>
          ,
          <article-title>Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera</article-title>
          ,
          <source>ACM Symposium on User Interface Software and Technology</source>
          , 2011
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>