Image stitching of sphenoid sinuses from monocular endoscopic views

                      T. Bergen1, P. Hastreiter2, C. Münzenmayer¹, M. Buchfelder2, T. Wittenberg1

                          ¹ Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany
                         ² Department for Neurosurgery, University Clinics Erlangen, Germany


                             Contact: thomas.wittenberg@iis.fraunhofer.de

Abstract:

For operations of the pituitary glands, the most subtle method is an intervention through the paranasal and especially
through the sphenoid sinus. To avoid dangerous interference with adjacent organs and nerves, the surgeon has to orient
himself in the very small sphenoid cavity and navigate across the hollow space to break through the sellar floor to the
pituitary gland above. Especially in reoperations or anatomical variants such as so-called kissing carotids,
transsphenoidal surgery is a challenge even in experienced hands. To support such a surgery, various imaging modali-
ties can be applied such as CT, MRI or endoscopy. While pre-operative MRI or CT-data can be used for intervention
planning and navigation support, endoscopy can be applied intra-operatively for the examination of surfaces inside the
sphenoid sinus. In this work, we present initial experiments and results from real-time panorama-endoscopy of the
sphenoid sinus for navigation and orientation support, based on monocular endoscopic sequences of a skull phantom,
yielding partial reconstructions of the walls of the sphenoid sinus.

Key words: pituitary surgery, sinus surgery, panorama-endoscopy, stitching, mosaicking, real-time

1       Introduction
                                                                                  Nasal cavity             Nasal Septum
The most subtle method for operations of the pituitary glands, such as the
removal of tumors or adenomas, is transsphenoidal surgery. This involves the
difficulty of maneuvering through the paranasal and especially the sphenoid
sinus, a small cavity behind the eyes, to break through the sellar floor and
gain access to the pituitary gland. This is a difficult operation due to the risk
of damaging adjacent nerves and organs, such as the internal carotid artery.
depicts the situation in a CT slice. Various imaging modalities can be applied
to support the surgeon, including e.g. CT, MRI or endoscopy. CT and MRI                                         Optical
                                                                                                                 Nerve
are available in the pre-operative planning phase. The standard imaging mo-       Sphenoidal
dality during the operation is the view through an endoscope. One major as-       Sinus
pect of difficulty is the limited field of view provided by the endoscope. To                                 Tumor
improve orientation and maneuverability for the surgeon, image stitching
techniques can be applied to provide an augmented field of view. In this pa- Figure 1: Transnasal approach to the
per, we propose a real-time panorama-imaging approach for navigation and tumor in the pituitary gland, depicted in
orientation support, based on monocular endoscopic sequences of a skull an axial CT slice of a head.
phantom, yielding partial reconstructions of the walls of the sphenoid sinus.
These experiments are based on prior experiences, gained from a 3D recon-
struction approach from endoscopic views [1]. Further work concerning view enhancement in sinus surgery includes
registration techniques for CT/endoscopy registration by Burschka et al. and Mirota et al. [2, 3]. Wise and DelGaudio as
well as Palmer and Kennedy provide review articles of computer-assistance in paranasal sinus surgery [4, 5]. Different
aspects of navigation and registration of pre- and intra-operative imaging techniques are discussed as a means of facili-
tating orientation for the surgeon. However, panorama-endoscopy has not yet been considered in the field of sinus sur-
gery.


                                                          226
                                 Image                    Hybrid feature
                             preprocessing                extraction and
                                                            matching


       Single
                            Frame-to-scene                  Panorama
     endoscopy
                              registration                  rendering                           Panorama
       image
                                                                                                  image


    Figure 2: Workflow of the proposed system. Every video frame is preprocessed. A hybrid feature tracking is
    applied to align the image with the scene. Finally the panorama is rendered to the screen.


2        Materials and Methods

In this work, we present a system for real-time panorama imaging (“mosaicking”) of monocular endoscopic views with
application to sphenoid sinuses. The approach is based on a system, which we have published earlier [6] for real-time
stitching of the urinary bladder. In this section, the algorithmic components are described. Figure 2 depicts an overview
of the proposed approach. In the following sections, we describe all steps in further detail.

Image Preprocessing
The video frames are captured from the camera at a rate of about 30 frames per second. Every video frame is prepro-
cessed to detect the circular mask (aperture) typical for endoscopic recordings. This is achieved by segmenting all non-
black pixels from the image and fitting a circular disc to the extracted region. Furthermore, we compensate for lense
distortion and inhomogeneous illumination. Endoscopic images usually suffer from a barrel distortion. To reduce this
effect, we apply an undistortion filter, computed on the basis of priorly captured images of a checkerboard pattern. We
use the undistortion filter provided by the OpenCV software library. Inhomogeneous illumination, i.e. a strong vignet-
ting effect, is caused by the point light source generally used in endoscopy. We compensate for illumination
inhomogeneities by applying a high-pass filtering to the input image: a strongly smoothed image is subtracted before
passing the frame to the feature tracking module.

Feature tracking and frame registration
This section is based on a recent publication of ours, describing a hybrid tracking approach for real-time stitching dur-
ing cystoscopy [7]. Here, we briefly describe the essential steps and refer the reader to [7] for details. The tracking
module encapsulates both the SURF (Speeded Up Robust Features [8]) and KLT (Kanade-Lucas-Tomasi [9]) tracking
algorithms in a multi-threaded implementation. While SURF generates feature descriptors that allow matching features
from the current video frame to the global set of all prior features, KLT is more suited for matching features between
successive video frames. Consequently, a mosaicking system based on KLT tracking suffers from a drift error, which
increases over time. On the other hand, KLT requires less computational time than SURF, making it very suitable for
real-time applications. In order to exploit the advantages of both approaches, we combine KLT and SURF tracking to
achieve both, a fast processing speed as well as high matching accuracy. Both tracking threads calculate a projective
transformation within a RANSAC (RANdom SAmple Consensus) scheme to align the current video frame to the scene,
i.e. the panorama coordinate space.

Panorama rendering
Based on the previous image registration step, all processed frames are rendered as a panorama image using OpenGL.
The two-dimensional projective transformation is converted to a three-dimensional transform, which maps the respec-
tive frame texture to the xy-plane. To reduce visible seams along the edges of a frame, we use a basic alpha blending
approach. The alpha channel of each frame is designed as a center weighted function with α = 1 in the central image
region and linearly decreasing value to the outer image edge. Due to the real-time processing ability, the system is able
to dynamically extent the endoscopic view field during the procedure. This motivates our choice to always use the most
recent video frame as reference frame placed at the center of the screen surrounded by the panorama. For the final visu-
alization (as depicted in the results section), the panorama is displayed in reference to a coordinate system, defined by
the average projective transform of all images to present a panorama image with small global deformation.


                                                          227
Figure 3: Nasal endoscopy on a real skull phantom (left) and a plastic skull phantom (right)

3           Results

Twenty endoscopic panoramas from monocular image sequences of the sphenoid sinus and the pituitary glands have
been obtained of the skull phantoms in real-time (ref.
Figure 3). All panoramas were directly computed during slow manual movements of the endoscope tip in a translational
way through the hollows. Fehler! Verweisquelle konnte nicht gefunden werden. depicts typical examples of five
panoramic images obtained from the pituitary glands of the skull phantom. A white circle approximates the size of one
original endoscopic view. Table 1 summarizes further information about the panorama images. Frames Total is the total
number of image frames considered for the panorama generation. Frames In Scene is the number of frames that could
be successfully registered to form the panorama. Other frames are omitted either due to insufficient quality (too few
corresponding feature points) or due to the fact, that the SURF tracking is not executed with full video frame rate but
processes only about every third to fourth frame. In general, this is not of any disadvantage, since it is still sufficient to
provide enough overlap between frames for successful stitching. Scene Features is the number of SURF feature points,


    A                                                   B

the scene consists of.


        C                                           D                                        E

 Figure 4: Five panorama images of the pituitary glands obtained from real-time stitching. A white circle approximates
 the size of one original endoscopic view.
                                                             228
                         Panorama     Frames Total      Frames In Scene      Scene Features
                         A                773                 283                61292
                         B                541                 200                46912
                         C                701                 225                 36111
                         D                447                 183                44837
                         E                532                 230                42034


                      Table 3: Frames Total: Number of frames considered for panorama gen-
                      eration. Frames In Scene: Number of frames successfully registered. Sce-
                      ne Features: Number of SURF features present in panorama.

4       Discussion

The results show that the proposed approach is applicable to the problem of real-time image stitching of the sphenoid
sinuses. The generated panorama images from five experiments with a skull phantom have been presented, each consist-
ing of about two to three hundred single video frames. These first experiments show the potential of the approach. Fur-
ther experiments will be conducted with clinical endoscopic video sequences obtained during transsphenosoidal surgery
to validate the method with real patient data.

5       Summary

By applying a real-time image stitching approach to monocular endoscopic images from a skull phantom, an augmented
field of view can be provided to the surgeon. We successfully stitched several image sequences obtained by an endo-
scope and generated panorama views in real-time consisting of several hundred single video frames. This technique has
the potential of improving orientation and maneuverability for the surgeon during difficult transsphenosoidal proce-
dures.

6       Literature

[1]   Wittenberg, T., Winter, C., Scholz, I., Rupp, S., Stamminger, M., Bumm, K., Nimsky, C., 3-D reconstruction of
      the sphenoid sinus from monocular endoscopic views: First results, Proceedings, Gemeinsame Jahrestagung der
      Deutschen, Österreichischen und Schweizerischen Gesellschaften für Biomedizinische Technik, DGBMT, Zü-
      rich, Schweiz (2006)
[2]   Burschka, D., Ming L., Ishii M., Taylor, R. and Hager, G., Scale-Invariant Registration of Monocular Endoscop-
      ic Images to CT-Scans for Sinus Surgery. In Proceedings of the 7th International Conference on Medical Image
      Computing and Computer Assisted Intervention, 413–421 (2004)
[3]   irota, D, Wang, H., Taylor, R., Ishii, M. Gallia, G. and Hager, G., A System for video-Based Navigation for Endo-
      scopic Endonasal Skull Base Surgery, IEEE Transactions on Medical Imaging, Vol.31/4, 963–976, (2012)
[4]   Wise, S. and DelGaudio, J., Computer-aided surgery of the paranasal sinuses and skull base, Expert Review of
      Medical Devices, Vol. 2/4, 395–408, (2005)
[5]   Palmer, J. and Kennedy, D., Historical Perspective on Image-guided Sinus Surgery, Otolaryngologic Clinics of
      North America, Vol. 38/3, 419–428, (2005)
[6]   Bergen, T., Ruthotto, S., Münzenmayer, C., Rupp, S., Paulus, D. & Winter, C., Feature-based real-time endo-
      scopic mosaicking, Proceedings of 6th International Symposium on Image and Signal Processing and Analysis,
      ISPA, Salzburg, Autria (2009)
[7]   Bergen, T., Nowack, S., Münzenmayer, C., Wittenberg, T., A hybrid tracking approach for endoscopic real-time
      panorama imaging, 17th Annual Conference of the International Society for Computer Aided Surgery, Heidel-
      berg, Germany, International Journal of Computer Assisted Radiology and Surgery, Vol. 8, Suppl. 1, Springer,
      352-354 (2013)
[8]   Bay, H., Tuytelaars, T., & Gool, L. V., SURF: Speeded Up Robust Features, Computer Vision – ECCV 2006,
      Lecture Notes in Computer Sciencem, Vol. 3951, Springer, 404–417 (2006)
[9]   Shi, J., & Tomasi, C., Good features to track, Proceedings IEEE Computer Society Conference on Computer Vi-
      sion and Pattern Recognition, CVPR, 593–600 (1994)


                                                         229