=Paper=
{{Paper
|id=None
|storemode=property
|title=Video see-through in the clinical practice
|pdfUrl=https://ceur-ws.org/Vol-727/eics4med4.pdf
|volume=Vol-727
|dblpUrl=https://dblp.org/rec/conf/eics/FerrariFM11
}}
==Video see-through in the clinical practice==
<pdf width="1500px">https://ceur-ws.org/Vol-727/eics4med4.pdf</pdf>
<pre>
                   Video see-through in the clinical practice
                Vincenzo Ferrari                                                 Mauro Ferrari, Franco Mosca
                 Centro EndoCAS                                                        Centro EndoCAS
                 Università di Pisa                                                   Università di Pisa
                +39 (0) 50 995689                                                     +39 (0) 50 995689
           vincenzo.ferrari@endocas.org                                           name.surname@med.unipi.it


ABSTRACT                                                             physician have to interact with the patient (palpation,
In this paper, we discuss potentialities and technological           introduction of biopsy needle, catheterization, intervention,
limits to overcome for the introduction in the clinical              etc.) [9; 10; 25]
practice of useful functionalities, using video see-through          The next figure shows a binocular see-through mixed reality
visualizations, created mixing virtual preoperative                  system at work implemented using a HMD (Head Mounted
information, obtained by means of radiological images,               Display) and external cameras [8].
with real patient live images, for procedures where the
physician have to interact with the patient (palpation,
percutaneous biopsy, catheterism, intervention, etc…).
Keywords
Mixed reality, surgical navigation, general surgery.
INTRODUCTION
Modern CT and MRI scanners coupled with new contrast
mediums allow the acquisition of volumetric datasets
describing human anatomy, functionality and pathology,
with high level of detail.
The detailed information contained in a volumetric dataset                Figure 1: Stereoscopic video see-through in the
are fully used during the diagnostic phase, but are partially                             operative room
lost passing from the radiological department to the surgical
                                                                     To implement this kind of systems is generally required to
department.
                                                                     localize the anatomy in respect to the real video source and
In fact, generally, surgeons plan interventions just using           to determine its projection model in order to coherently mix
limited information provided by the radiologist and                  virtual and real scenarios. Localization can be done using
consisting in the textual diagnosis coupled with few 2D              commercial tracking systems, introducing additional costs
significant images selected from the volumetric dataset.             and logistic troubles in the traditional clinical scenario, with
The application of the “computer assisted” model to the              large errors on soft tissues, while the projection model of
patient workflow, consisting of computer aided diagnosis             the video source can be calculated using theoretical
(CAD) and computer aided surgery (CAS) technologies,                 algorithms that impose some constrains for the real camera.
allows the optimal use of medical dataset and to overcome            In the following is described in the detail the problem and
the above cited limitations of the current clinical practice.        possible solutions to avoid the need of the tracker or to
The 3D visualization of patient specific virtual models of           improve the localization quality on soft tissues taking into
anatomies [23; 24], extracted from medical dataset,                  account the limits of the current images source used in
drastically simplifies the interpretation process of exams           surgery.
and provides benefits both in diagnosing and in surgical
planning phases. Computer assisted technologies allow to             HOW TO OBTAIN A MIXED REALITY VIEW
augment real views of the patient, grabbed by means of               The following picture essentially describes the video see-
cameras, with virtual information[26]. This augmented-               through concept.
reality, or in general mixed-reality techniques [20],
introduces many advantages for each task where the


Copyright © 2011 for the individual papers by the papers'
authors. Copying permitted only for private and academic
purposes. This volume is published and copyrighted by
the editors of EICS4Med 2011.                                                   Figure 2: Video see-through concept

                                                                19
                                                                     How to determine camera projection model
                                                                     Line scan and telecentric cameras are used for particular
                                                                     industrial applications, while for all visualization purposes,
                                                                     including laparoscopy, the perspective projective camera is
                                                                     the only used, because it offers the most similar images in
                                                                     respect to human vision.
                                                                     Regarding the sensor, two technologies are predominantly
                                                                     applied: CCD (Charge Coupled Device) and CMOS
                                                                     (Complementary Metal Oxide Semiconductor). In each case
                                                                     unitary elements (pixels) are disposed on a regular grid
                                                                     (with fixed resolution).
                                                                     Each camera, composed of a projective optics and a grid
                                                                     sensor, can be represented by the following model:
 Figure 3: Functional scheme of a surgical see-through
                        system
Real video frames, grabbed by of real camera/s, are mixed
with virtual objects not visible in the real scene and shown
on a display/s. This virtual information can be obtained
using radiological images as depicted in the next figure.
The using of volumetric scanners, like CT (Computed
Tomography) or MRI (Magnetic Resonance Imaging),
allows to obtain a 3D virtual model of the anatomy [4; 6],
which can be loaded in a virtual scene, running on a
computer, rendered from a point of view coherent with the
real point of view.
The mixing of the real (2D) images with the virtual (2D)
rendered images can be done using a hardware video mixer
or using the real images in the scene graph as foreground or            Figure 5: Schematic representation of the pinhole
background [19]. The concept and the work to do are                   camera model: the generic point Pc is ideally projected
similar: in the first case the mixing is done by external            on the image sensor of the camera (the plane with origin
hardware after the rendering of the virtual scene, while in           OI) through the projection center OC (where the origin
the second one by the GPU during the rendering. Figure 4                      of the camera reference frame is fixed)
shows this concept. The real camera acquires video frames
from the real environment (a spleen in this case). Video
frames are shown as background of the virtual scene.                 The perspective projection matrix Mp, mapping a generic
Virtual objects are positioned in the scene (green flashes in        3D point Pc = [x, y, z, 1]T, in the camera reference system,
this case) and rendered from a virtual camera.                       to the corresponding 2D point Pp = [u, v, 1]T in the image
                                                                     reference system (fixed on the center of the sensor), i.e.:
In order obtain a coherent fusion we have to obtain a virtual
scene where:                                                         Pp    M p Pc                                              (1)
          virtual camera projection model ≈ to the real one          is defined starting from the internal camera parameters (f,
               virtual camera position ≈ to the real one             Cx, Cy) as follows:
              virtual objects positions ≈ to the real ones
                                                                             ª f      0     Cx 0º
The following paragraphs describes how to obtain the
                                                                     Mp      « 0       f    Cy 0»»                            (2)
previous three conditions.                                                   «
                                                                             «¬ 0      0     1 0»¼
                                                                     where f is the focal distance and (Cx, Cy) are the
                                                                     coordinates of the projection of the Oc on the image
                                                                     reference frame (with origin in OI).
                                                                     Other internal camera parameters parameterize the model of
                                                                     the radial distortion, introduced by common lens, by means
                                                                     of which the projected point Pp is deviated on Pd.
 Figure 4: Implementation of mixed-reality in a virtual              The pixelization process is defined by the pixel dimensions
                        scene                                        dx and dy and the image sensor dimensions Dx and Dy. These
                                                                20
internal parameters of the camera allow to convert                   laparoscopic images: see-through systems applying on
measurements done on the image (in pixels) in real                   organs artificial markers [SOFT TISSUE], recovering the
measurements (in millimeters) and vice-versa.                        position of needle [29] and the pose of surgical instruments
All internal camera parameters can be determined in a                [5].
calibration phase acquiring some images of a knowing                 How to register the patient
object in different positions with fixed camera configuration        In surgical applications, virtual objects, representing patient
(in terms of diaphragm and camera focus) and using                   anatomies, are acquired in the reference frame of the
calibration routines like described in [30].                         radiological instrumentation just before or days before the
These parameters have to be used to adjust the virtual               surgical procedure, whereas the intra-operative information
camera to the real one.                                              is related to the reference frame of the surgical room
                                                                     (generally defined by means of a tracking system) during
Using traditional surgical endoscopes a new camera                   the intervention.
calibration and virtual camera adjustment is required
whenever either the optic zoom or the diaphragm opening              In case of rigid objects like bones, a changing of reference
are changed. Another important source of error can be                frame, performed aligning fiducial points or fiducial
determined by the mechanical joint between the optic and             surfaces, acquired on the radiology department and in the
the camera body. Their relative movements can determines             surgical room, can be enough [1; 3]. Deformations of the
a shift of the center of projection C up to tens of pixels.          fiducial structure composed by elements, such as points of a
                                                                     cloud or points characterizing a surface, introduce
How to localize the camera                                           systematic errors in the registration. In order to minimize
Camera position and orientation can be obtained using a              the registration error, at least on fiducials elements, each
tracker able to track a sensor mounted on the camera body            fiducial point (or fiducial surface) in the proximity of steady
as shown in the following figure.                                    element on the patient has to be chosen, and its
                                                                     configuration has to be as replicable as possible [19].
                                                                     In case of soft tissue, further than the changing of reference
                                                                     frame, there are many deformation effects to avoid or to
                                                                     compensate, due to: changing of patient decubitus,
                                                                     changing in bed configuration, physiological movements
                                                                     (breathing, heart beating, gastrointestinal movements,
                                                                     etc…), constraints due to the radiological scanners (breath
                                                                     hold, arts positions, etc…).
                                                                     To reduce these movement effects we can employ practical
 Figure 6: Camera localization and calibration process
                                                                     and useful artifices, used routinely by radiotherapists
 using an optical localizer and a sensor mounted on the
                                                                     reproducing meticulously the patient settings during the
                      camera body
                                                                     treatment as in the planning room. By following their work,
The tracker offers in real time the transformation matrix T1         bed positioning and its shape, during the acquisition of
relative to the sensor. The calibration matrix Tc,                   medical datasets, can be chosen accordingly to the bed
representing the relative transformation of the camera               configuration used inside the surgical room for the specific
viewpoint with respect to the sensorized frame, necessary to         intervention (considering the requirements of the used
determine position and orientation of the camera projection          radiological device and the type of intervention to be
center OC, can been computed using a sensorized                      performed). Furthermore during the intervention, the exact
calibration grid. During the calibration T1 and T2 are given         decubitus of the patient during radiological scanning
by the localization system, while the transformation T3 is           requires to obtain the same relative position of the basin
determined using computer vision methods that allow to               and the thoracic cage. A realignment of these structures
localize, in the camera reference frame, objects with known          needs immobilization devices and/or additional iterative
geometry (the sensorized calibration grid).                          work in the surgical room in order to find a perfect
Another approach could be the localization using directly            correspondence between pre-operative and intra-operative
video frames acquired by the cameras as done in some                 patient positioning [15].
applications. Several computer vision libraries (OpenCV or           The using of intra-operative imaging devices like 3D RA
Halcon by MVTec) offers many tools for this purpose.                 (Rotational Angiograph), which could be diffused in the
Using a single camera, we could localize objects with                early future, thanks to the decreasing of their price and the
known geometry or texturing [11] as in the case of EasyOn            possibility to be portable (Ziehm Vision FD Vario 3D or
by Seac02 (www.seac02.it). The localization accuracy is              Siemens ARCADIS Orbic 3D), allows to avoid the change
enough for many applications, but requires knowing in                of reference frame for each patient. These scanners,
advance the dimensions and the texture of a rigid object in          positioned in the operating room, can be easily and
the scene (or different objects rigidly linked together).            precisely calibrated with the localizer by means of sensors.
Interesting monoscopic solutions have been applied using             Furthermore the acquisition of the anatomy directly on the
                                                                21
surgical bed allows to dramatically simplify the problem, by
removing error due to the change of bed and patient
decubitus. This simplification will allow to obtain high
precision also on soft tissues. As proven by experimental
results, the application of predictive models of organs
motion due to breathing, driven by simple intra-operative
parameters like the trajectory of a point on the patient skin
or the time over the breathing cycle, can be applied in the
real surgical scenario [14; 22].
ALTERNATIVE SOLUTIONS
Head mounted tracker-free stereoscopic video see-                    Figure 8: Image composed by 3 frames of a laparoscopic
through                                                               video with fixed camera and a moving instrument. The
Depth perception can be drastically increased using head               projections of instrument axes, represented with blue
mounted stereoscopic devices [17], that allow to evaluate                  lines, are constrained to pass through a point
object depth dislocation, like in the natural binocular view.         representing the projection (on the image plane) of the
The use of localized head mounted displays (HMD), like                        insertion point (on the abdominal wall)
the one shown in figure 1, allows to see a synthetic scene           Figure 7 shows the functional scheme of our system, where
from a point of view aligned with the real user’s point of           video frames are used, not only as background of the virtual
view.                                                                scene, but also to localize the cameras and to register the
For the implementation of head mounted mixed reality                 patient.
systems, the video see-through approach, based on the                Epipolar geometry [13], using two or more cameras, allows
acquisition of real images by means of external cameras, is          to detect the 3D position of each conjugate points,
preferable to the optic see-through approach that projects           identifiable in the images. In a stereoscopic configuration,
virtual information on semi transparent glasses. This is due         knowing the internal camera parameters, for each marker
to the fact that tracking of eye movements, strictly required        position, in the image plane, the relative projection line in
for optical see-through approach, is very difficult to be            the 3D world, defined as the line l passing through the
performed with sufficient precision [16; 18]. On the                 camera center of projection Oc and lying on the point Pc, is
contrary, head tracking, required for video see-through              determined. These steps, performed both on left and right
approach, can be performed with high precision using                 images, identify respectively two projection lines ll and lr.
external localizer based on different technologies [2; 12],          Knowing the relative pose of the right camera to the left
like described before.                                               camera (expressed by a roto-traslation matrix determinable
We implemented a head mounted stereoscopic video see-                in a calibration phase), the 3D position of each marker is
through system, that does not require the use of an external         then defined as the intersection point between ll and lr.
localizer to track head movements [8]. Our system                    Since ll and lr do not intersect (due to pixelization process
implements mixed-reality aligning in real-time virtual and           and noise in marker identification) the 3D marker position
real scene just using geometric information extracted by             is approximated with the position of the closest point to
segmenting coloured markers, attached on the patient’s               both projection lines. After fiducials localization a rigid
skin, directly from couples of camera images.                        registration is performed using a point based approach.
                                                                     Results demonstrate that stereoscopic localization
                                                                     approach, adopted in our system, is enough for system
                                                                     usability.
                                                                     Laparoscope auto localization
                                                                     As described before, localization using monoscopic
                                                                     cameras can be done in case of objects with known
                                                                     geometry or texturing. In case of laparoscopic interventions
                                                                     the localization of the endoscopic camera can be
                                                                     determined using information offered by endoscopic video
                                                                     images without the introduction of any artificial add-on in
                                                                     the scenario[7].
                                                                     The position and orientation of the endoscopic camera can
                                                                     be determined, with respect to a reference frame fixed to
                                                                     the access ports configuration, elaborating video images
                                                                     and knowing the distances between insertion points. During
                                                                     laparoscopic interventions, camera movements are minor
Figure 7: Schematic representation of our stereoscopic               respect to instruments movements. Therefore the
                mixed-reality system                                 laparoscope can be considerate steady in a time interval,
                                                                22
and a reference frame fixed on the camera can be used to               necessary, in the future, the development of endoscopic
perform measurements [21; 28].                                         cameras taking into account the previous considerations.
The projections of instrument axis on the image plane                  Endoscopes should natively integrate sensors for their
(projection lines), which can be simply determined using               localization and manufactures should take into account the
HSV color space and Hough transform [27], are                          stability of the joint between optic and camera body.
constrained to pass through the projection of the insertion            On the other hand it is possible the development of
point on the image plane [28] (figure 8).                              tracker-free implementations elaborating camera images,
Insertion point projection on the image plane can be                   allowing to reduce costs and logistic troubles related to the
calculated as the barycentre of the intersection of couples of         need of sensors and the tracker in the operating room.
projection lines, for each instrument. It allows (after camera         The using of intra-operative imaging devices like 3D RA,
calibration) to determine the direction of the insertion point         which could be diffused in the early future, thanks to the
in the camera reference frame (Fig. 9 Left). Therefore,                decreasing of their price and the possibility to be portable,
versors Tl and Tr, representing respectively the direction of          will allow to obtain high precision in see-through systems
the left and the right instrument insertion point, are                 also in case of soft tissues.
determined. The versor Tc, representing the direction of the           REFERENCES
camera insertion point, lies on the Z axis of the camera               1. Arun, K. S., Huang, T. S., & Blostein, S. D. (1987).
reference frame (using 0 degree optic).                                   Least-squares fitting of two 3-D point sets. IEEE Trans.
                                                                          Pattern Anal. Mach. Intell., 9(5), 698-700.
                                                                       2. Baillot, Y., & Julier, S. J. (2003). A tracker alignment
                                                                          framework for augmented reality, In Proc. Second IEEE
                                                                          and ACM International Symposium on Mixed and
                                                                          Augmented Reality (pp. 142-150): IEEE.
                                                                       3. Besl, P. J., & McKay, N. D. (1992). A Method for
                                                                          Registration of 3-D Shapes. IEEE Trans. Pattern Anal.
                                                                          Mach. Intell., 14(2), 239-256.
   Figure 9: (Left) The projections of instrument axes                 4. Coll, D. M., Uzzo, R. G., Herts, B. R., Davros, W. J.,
    (blue lines) allow to calculate the projection of the                 Wirth, S. L., & Novick, A. C. (1999). 3-dimensional
  insertion point on the image plane P, which allows to                   volume rendered computerized tomography for
  determinate the direction of the insertion point in the                 preoperative evaluation and intraoperative treatment of
camera reference frame fixed on OC. (Right) Geometric                     patients undergoing nephron sparing surgery. J Urol,
 relations involved in the insertion points configuration                 161(4), 1097-1102.
           that allow to localize the laparoscope                      5. Doignon, C., Nageotte, F., Maurin, B., & Krupa, A.
                                                                          (2007). Model-based 3-D pose estimation and feature
The geometrical relations between Tl, Tr, Tc, and insertion               tracking for robot assisted surgery with medical
points are shown on the right of figure 9. In the figure lc, ll           imaging, From Features to Actions - Unifying
and lr represent distances of the insertion points from the               Perspectives in Computational and Robot Vision,
camera origin, which have to be chosen in order to guaranty               Workshop at the IEEE Int. Conf. on Robotics and
the distances between access ports d1, d2 and d3. The                     Automation.
tetrahedral configuration allows to determine univocally lc,           6. Ferrari, V., Carbone, M., Cappelli, C., Boni, L.,
ll and lr and consequently, having Tl, Tr and Tc, to localize             Cuschieri, A., Pietrabissa, A., et al. (2010).
the access ports respect to the camera (and vice versa).                  Improvements of MDCT images segmentation for
The localization accuracy depends on the instruments                      surgical planning in general surgery - practical
configuration and on their movements. The proposed                        examples. Proceedings of the International Congress and
solution allows to provide a cheap and tracker-free                       Exhibition. IJCARS Volume 5, Supplement 1 / June.
implementation for a class of computer assisted surgical               7. Ferrari, V., Megali, G., Pietrabissa, A., & Mosca, F.
systems that do not require extremely accurate localization.              (2009). Laparoscope 3D auto-localization. Proceedings
For example, offering 3D pre-operative model visualization                of the International Congress and Exhibition. IJCARS
with automatic point of view selection and remote                         Volume 4, Supplement 1 / June.
assistance using virtual objects on the laparoscopic monitor.          8. Ferrari, V., Megali, G., Troia, E., Pietrabissa, A., &
CONCLUSIONS                                                               Mosca, F. (2009). A 3-D mixed-reality system for
The development of video see-through systems is useful                    stereoscopic visualization of medical dataset. IEEE
and possible using various approaches.                                    Trans Biomed Eng, 56(11), 2627-2633.
In order to reduce misalignment errors, between real and               9. Freschi, C., Ferrari, V., Porcelli, F., Peri , A., Pugliese,
virtual world, using commercial trackers, it would be                     L., Morelli, L., et al. (2010). An Augmented Reality

                                                                  23
   Navigation Guidance for High Intensity Focused                       The International Journal of Medical Robotics and
   Ultrasound Treatment. Paper presented at the Conf Proc               Computer Assisted Surgery, 4, 242 - 251.
   ICABB, International Conference on Applied Bionics                20.Milgram, P., & Kishino, F. (1994). A Taxonomy of
   and Biomechanics 2010, Venice, Italy.                                Mixed Reality Visual Displays. IEICE transactions on
10. Freschi, C., Troia, E., Ferrari, V., Megali, G.,                    information and systems, 77(12), 1321-1329.
    Pietrabissa, A., & Mosca, F. (2009). Ultrasound guided           21.Naogette, F., Zanne, P., Doignon, C., & De Mathelin,
    robotic biopsy using augmented reality and human-robot              M. (2006). Visual Servoing-Based Endoscopic Path
    cooperative control. Conf Proc IEEE Eng Med Biol                    Following for Robot-Assisted Laparoscopic Surgery,
    Soc, 2009, 5110-5113.                                               International Conference on Intelligent Robots and
11. Gao, C., & Ahuja, N. (2004). Single camera stereo using             Systems (IROS).
    planar parallel plate, Pattern Recognition, 17th                 22.Olbricha, B., Traubb, J., Wiesnerb, S., Wicherta, A.,
    International Conference on (ICPR'04) Volume 4 (pp.                 Feussnera, H., & N.Navabb. (2005). Respiratory Motion
    108-111).                                                           Analysis: Towards Gated Augmentation of the Liver,
12. Genc, Y., Sauer, F., Wenzel, F., Tuceryan, M., &                    CARS 2005 Computer Assisted Radiology and Surgery,
    Navab, N. (2000). Optical see-through HMD                           19st International Congress and Exhibition.
    calibration: A stereo method validated with a video see-         23.Peters, T. M. (2006). Image-guidance for surgical
    through system, International Symposium for                         procedures. Physics in Medicine and Biology, 51(14),
    Augmented Reality.                                                  R505-R540.
13. Hartley, R., & Zisserman, A. (2004). Multiple View               24.Peters, T. M. (2000). Image-guided surgery: from X-
    Geometry in Computer Vision.                                        rays to virtual reality. Computer Methods in
14. Hawkes, D. J., Penney, G. P., Atkinson, D., Barratt, D.             Biomechanics and Biomedical Engineering, 4(1), 27-57.
    C., Blackall, J. M., Carter, T. J., et al. (2007). Motion        25.Pietrabissa, A., Morelli, L., Ferrari, M., Peri, A., Ferrari,
    and Biomechanical Models for Image-Guided                           V., Moglia, A., et al. (2010). Mixed reality for robotic
    Interventions, ISBI (pp. 992-995).                                  treatment of a splenic artery aneurysm. Surg Endosc,
15. Hinson, W. H., Kearns, W. T., Ellis, T. L., Sprinkle, D.,           24(5), 1204.
    Cullen, T., Smith, P. G., et al. (2007). Reducing set-up         26.Shuhaiber, J. H. (2004). Augmented Reality in Surgery.
    uncertainty in the elekta stereotactic body frame using             Archive of surgery, 139, 170-174.
    stealthstation software. Technology in cancer research &
    treatment, 6(3), 181-186.                                        27.Tonet, O., Ramesh, T. U., Megali, G., & Dario, P.
                                                                        (2007). Tracking endoscopic instruments without
16. Hua, H., Krishnaswamy, P., & Rolland, J. P. Video-                  localizer: a shape analysis-based approach. Computer
    based eyetracking methods and algorithms in head-                   Aided Surgery, 12(1), 35-42.
    mounted displays. Optics Express, 14, 4328-4350.
                                                                     28.Voros, S., Long, J.-A., & Cinquin, P. (2006). Automatic
17. Johnson, L., Philip, E., Lewis, G., & Hawkes, D. (2004).            Localization of Laparoscopic Instruments for the Visual
    Depth perception of stereo overlays in image-guided                 Servoing of an Endoscopic Camera Holder, MICCAI (1)
    surgery, Medical Imaging, Proceedings of the SPIE,                  (pp. 535-542).
    Volume 5372, pp. 263-272.
                                                                     29.Wengert, C., Bossard, L., Baur, C., Szekely, G., &
18. Lee, E. C., & Park, K. R. (2008). A robust eye gaze                 Cattin, P. C. (2008). Endoscopic navigation for
    tracking method based on a virtual eyeball model.                   minimally invasive suturing. Comput Aided Surg, 13(5),
    Machine Vision and Applications.                                    299-310.
19. Megali, G., Ferrari, V., Freschi, C., Morabito, B.,              30.Zhang, Z. (2000). A Flexible New Technique for
    Cavallo, F., Turini, G., et al. (2008). EndoCAS                     Camera Calibration. IEEE Transactions on Pattern
    navigator platform: a common platform for computer                  Analysis and Machine Intelligence, 22(11), 1330-1334.
    and robotic assistance in minimally invasive surgery.


                                                                24

</pre>