<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Instrument segmentation in hybrid 3-D endoscopy using multi-sensor super-resolution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S. Haase</string-name>
          <email>sven.haase@fau.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Köhler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>T. Kilgus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L. Maier-Hein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>J. Hornegger</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>H. Feußner</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Div. Medical and Biological Informatics Junior Group: Computer-assisted Interventions, German Cancer Research Center (DKFZ) Heidelberg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Erlangen Graduate School in Advanced Optical Technologies (SAOT)</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pattern Recognition Lab, Dept. of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Research Group Minimally-invasive interdisciplinary therapeutical intervention, Klinikum rechts der Isar of the Technical University Munich</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>194</fpage>
      <lpage>197</lpage>
      <abstract>
        <p>In hybrid 3-D endoscopy, photometric information is augmented by range data for guidance in minimally invasive procedures. In this paper, we propose a method for instrument segmentation exploiting sensor data fusion between range data and complementary photometric information. For improved robustness and to overcome the limited spatial resolution of range sensors, we make use of multi-sensor super-resolution to obtain high-quality range images. Data of both modalities is then segmented separately using thresholding techniques. The results are then consolidated into a common segmentation mask. Our approach was evaluated on real image data acquired from a liver phantom and manually labeled ground truth data. Compared to purely color driven segmentation we improved the F-score from 0.61 to 0.73.</p>
      </abstract>
      <kwd-group>
        <kwd>Time-of-Flight</kwd>
        <kwd>3-D Endoscopy</kwd>
        <kwd>Super-Resolution</kwd>
        <kwd>Segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>3-D endoscopy gained high attention as it enables new applications to minimally invasive surgery [1]. Besides
structured light [2] and stereo vision [3], Time-of-Flight (ToF) technology was manufactured into a first hybrid 3-D
endoscope prototype, recently. In comparison to stereo vision, ToF is independent of texture information. Hence, the
endoscope acquires range images with a constant resolution of 64×48 px. As the ToF sensor is manufactured into a
conventional endoscope we additionally acquire high-resolution color images with 640×480 px through a common optical
system using a beam splitter. Both range and complementary color information can be used to develop robust
algorithms for image guided surgery. Haase et al. [4] proposed a tool localization framework that exploits range and color
information for increased robustness. Nevertheless, as ToF technology exhibits a low signal-to-noise ratio,
preprocessing is a required first step. Different preprocessing techniques have been proposed for ToF range images [5, 6],
recently. For instrument segmentation approaches based on geometric information [7] or color information [8] have been
investigated. This segmentation result can thereupon be used for further applications, e.g. the avoidance of risk
situations as proposed in [9]. However, in comparison to purely 2-D driven approaches we are able to incorporate 3-D
surface data as well as 2-D photometric data to improve robustness. Our preliminary framework describes a first approach
towards entire instrument segmentation on 3-D surface information using a ToF/RGB endoscope.
We propose a multi-sensor instrument segmentation framework using super-resolution to denoise ToF data and increase
spatial resolution [6]. Our framework exploits a data fusion for range and color images [10]. After upsampling ToF data,
segmentation is performed on both modalities and then the results are consolidated into a common segmentation mask.</p>
    </sec>
    <sec id="sec-2">
      <title>2.1 Super-Resolution for Range Image Preprocessing</title>
      <p>We cope with the low signal-to-noise ratio of the ToF images by applying super-resolution as described in [6]. The
super-resolution approach is subdivided into motion estimation, range correction and numerical optimization. Multi-frame
super-resolution employs subpixel displacements between consecutive frames as a cue to obtain a super-resolved image
from multiple low-resolution frames. These displacements are induced by navigating the endoscope. Our objective
function to obtain a maximum a-posteriori (MAP) estimate for a high-resolution image is described by:
The first sum denotes the data term and the second sum is a regularizer based on a pseudo Huber loss function of a
high-pass filtered version of the input image . weights the regularizer and denotes the number of low-resolution
input frames and denotes the number of pixels in the super-resolved output image. The data term describes the
distance of the kth low-resolution input frame and a mathematical model of our image acquisition.
The system matrix incorporates blur induced by the point spread function, downsampling and the displacement
field of a high-resolution image . As the low signal-to-noise ratio of ToF data limits the accuracy of displacement field
estimation, we exploit data fusion to estimate a high-quality displacement field in the color domain using optical flow
[11] and transfer it into the range domain. As we acquire images from different angles and distances we have to correct
the range data, to have all low-resolution range images in the same plane. This correction is modeled by and .
For more details considering the multi-sensor super-resolution see Köhler et al. [6].</p>
    </sec>
    <sec id="sec-3">
      <title>2.2 Multi-Sensor Segmentation</title>
      <p>Based on the output of the preprocessing we apply instrument segmentation on data of both modalities. We distinguish
between instruments and background by different thresholding techniques [8]. For our segmentation we exploit the fact
that instruments are usually closer to the sensor and that instruments are usually grayish. Due to the data fusion in our
hybrid 3-D endoscope, we can not only exploit the range data but also incorporate the color information into the
segmentation process similar to [9]. Range values are considered as instruments pixels if . In the color
domain we exploit the value and the saturation channel of the HSV color space to segment the instrument. Here, pixels
are considered as instrument pixels if and , where and denote the
saturation channel and the value channel of the color image, respectively. Both binary results are then consolidated into a
common segmentation mask by multiplication. For outlier removal caused by noisy data we apply morphological
operators to close small holes and remove separated areas with less than 1000 instrument pixels as false instrument
candidates.</p>
    </sec>
    <sec id="sec-4">
      <title>2.3 Experimental Setup</title>
      <p>Our algorithm is evaluated on real data with a realistic liver phantom. Data was acquired with a ToF/RGB endoscope
manufactured by Richard Wolf GmbH, Knittlingen, Germany. We assembled realistic scenarios including two different
endoscopic instruments. For evaluation we investigated the results in two different scenarios, for 6 frames each. The
upsampled images had a resolution of 240×160 px. Our instrument segmentation is compared to segmentation for both
modalities separately. For ground truth data, the endoscopic instruments were manually segmented by an expert in the
color domain. The threshold parameters were set empirically to , and by analyzing the first
frame. This frame was excluded for further evaluation to separate between training and evaluation data.
3</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>For quantitative evaluation, we analyzed the sensitivity, the specificity and the F-score of our approach in Table 1. Here,
we compare our segmentation results to the results of our framework for a purely range driven approach based on
superresolution and for a purely color driven approach.</p>
      <p>For qualitative evaluation we illustrate the results of all three approaches in Fig. 2. The benefit of super-resolution for
our noisy range data is shown in Fig. 3 with the color overlay encoding the segmentation result.</p>
    </sec>
    <sec id="sec-6">
      <title>4 Discussion</title>
      <p>Table 1 illustrates that our approach results in the best specificity, i.e. only few background pixels are considered as
instrument pixels. However, both single-sensor approaches result in satisfying sensitivities, i.e. only few instrument pixels
are missing. Nevertheless, the F-score as a measurement of the accuracy indicates a more reliable performance of our
approach. Furthermore, as our approach consolidates both modalities, we have a higher robustness considering different
threshold parameters. Oversegmentation in one modality can be compensated by the other modality. The qualitative
results confirm the comparison in Tab. 1 and highlight that both modalities oversegment the image in areas close to the
sensor with surface normals pointing directly to the camera. In those areas the instruments are too close to the tissue to
be distinguished in the range image, but specular highlights exclude the use of the color image, likewise. Our approach
achieves a reasonable compromise, where only few instrument pixels are missed and oversegmentation is reduced. The
3-D reconstructions show that most parts of the instruments are segmented correctly in our approach and that
preprocessing is required to provide any intuitive visualization.</p>
    </sec>
    <sec id="sec-7">
      <title>5 Summary</title>
      <p>In this paper we proposed an instrument segmentation framework for 3-D ToF/RGB endoscopy. Our method applies
robust multi-sensor super-resolution based on motion estimation on high-resolution RGB images to upsample and to
denoise low-resolution range images. Due to improved signal-to-noise ratio of the range images we apply instrument
segmentation using thresholding techniques and consolidate the results of both modalities. Compared to purely color
driven segmentation we improved the F-score from 0.61 to 0.73.</p>
      <p>Future work will consider different segmentation techniques and refinement of our super-resolution for further
denoising. For the consolidation of both sensor results additional weighting factors will be taken into account as
proposed in [4]. In experiments on real organs, we will investigate the robustness of our segmentation in real medical
scenarios.</p>
    </sec>
    <sec id="sec-8">
      <title>6 Acknowledgments</title>
      <p>We gratefully acknowledge the support by the Deutsche Forschungsgemeinschaft (DFG) under Grant No. HO 1791/7-1.
This research was funded/ supported by the Graduate School of Information Science in Health (GSISH) and the TUM
Graduate School. The authors gratefully acknowledge funding of the Erlangen Graduate School in Advanced Optical
Technologies (SAOT) by the DFG in the framework of the German excellence initiative. We thank the Metrilus GmbH
for their support. This project was supported by the research training group 1126 funded by the DFG.
7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Röhl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenstedt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suwelack</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kenngott</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueller-Stich</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillmann</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speidel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <article-title>Real-time surface reconstruction from stereo endoscopic images for intraoperative registration</article-title>
          ,
          <string-name>
            <surname>Proc</surname>
            <given-names>SPIE</given-names>
          </string-name>
          , Volume
          <volume>7964</volume>
          ,
          <fpage>796414</fpage>
          -
          <lpage>796414</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Schmalz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forster</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angelopoulou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <source>An Endoscopic 3D Scanner based on Structured Light, Med Image Anal</source>
          <volume>16</volume>
          (
          <issue>5</issue>
          ),
          <fpage>1063</fpage>
          -
          <lpage>1072</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Field</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strup</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seales</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <article-title>Stereo Endoscopy as a 3-D Measurement Tool</article-title>
          ,
          <string-name>
            <surname>EMBC</surname>
          </string-name>
          <year>2009</year>
          ,
          <volume>5748</volume>
          -
          <fpage>5751</fpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wasza</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kilgus</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornegger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <source>Laparoscopic Instrument Localization using</source>
          a 3-
          <string-name>
            <given-names>D</given-names>
            <surname>Time</surname>
          </string-name>
          -ofFlight/RGB Endoscope,
          <year>WACV 2013</year>
          ,
          <volume>449</volume>
          -
          <fpage>454</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Lenzen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , In Kim,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Nair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Meister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Garbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Theobalt</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <article-title>Denoising Strategies for Time-of-Flight Data, Time-of-Flight Imaging: Algorithms, Sensors and Applications (</article-title>
          <year>2012</year>
          ) Köhler,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Wasza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Kilgus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Maier-Hein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Feußner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Hornegger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>ToF Meets</surname>
          </string-name>
          <string-name>
            <surname>RGB</surname>
          </string-name>
          :
          <article-title>Novel Multi-Sensor Super-Resolution for Hybrid 3</article-title>
          -
          <string-name>
            <given-names>D</given-names>
            <surname>Endoscopy</surname>
          </string-name>
          ,
          <string-name>
            <surname>MICCAI</surname>
          </string-name>
          , LNCS
          <volume>8149</volume>
          , To Appear (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Climent</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mares</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <source>Automatic Instrument Localization in Laparoscopic Surgery, Electronic Letters on Computer Vision and Image Analysis</source>
          ,
          <volume>4</volume>
          (
          <issue>1</issue>
          ):
          <fpage>21</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Doignon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nageotte</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>De Mathelin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Detection of grey regions in color images : application to the segmentation of a surgical instrument in robotized laparoscopy</article-title>
          ,
          <source>Proc of IROS</source>
          , Volume
          <volume>4</volume>
          ,
          <fpage>3394</fpage>
          -
          <lpage>3399</lpage>
          (
          <year>2004</year>
          ) Speidel,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Sudra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Senemaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Drentschew</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Müller-Stich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. P.</given-names>
            ,
            <surname>Gutt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dillmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ,
          <source>Recognition of Risk Situations Based on Endoscopic Instrument Tracking and Knowledge Based Situation Modeling</source>
          ,
          <string-name>
            <surname>Proc</surname>
            <given-names>SPIE</given-names>
          </string-name>
          , Volume
          <volume>6918</volume>
          ,
          <fpage>69180X</fpage>
          -691
          <lpage>8</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Forman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kilgus</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bammer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maier-Hein</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornegger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , ToF/RGB Sensor Fusion for 3-
          <string-name>
            <given-names>D</given-names>
            <surname>Endoscopy</surname>
          </string-name>
          ,
          <source>Current Medical Imaging Reviews</source>
          <volume>9</volume>
          ,
          <fpage>113</fpage>
          -
          <lpage>119</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyond</surname>
            <given-names>Pixels</given-names>
          </string-name>
          :
          <article-title>Exploring New Representations and Applications for Motion Analysis</article-title>
          ,
          <source>PhD thesis</source>
          , Massachusetts Institute of Technology (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>