=Paper=
{{Paper
|id=None
|storemode=property
|title=Video see-through in the clinical practice
|pdfUrl=https://ceur-ws.org/Vol-727/eics4med4.pdf
|volume=Vol-727
|dblpUrl=https://dblp.org/rec/conf/eics/FerrariFM11
}}
==Video see-through in the clinical practice==
Video see-through in the clinical practice
Vincenzo Ferrari Mauro Ferrari, Franco Mosca
Centro EndoCAS Centro EndoCAS
Università di Pisa Università di Pisa
+39 (0) 50 995689 +39 (0) 50 995689
vincenzo.ferrari@endocas.org name.surname@med.unipi.it
ABSTRACT physician have to interact with the patient (palpation,
In this paper, we discuss potentialities and technological introduction of biopsy needle, catheterization, intervention,
limits to overcome for the introduction in the clinical etc.) [9; 10; 25]
practice of useful functionalities, using video see-through The next figure shows a binocular see-through mixed reality
visualizations, created mixing virtual preoperative system at work implemented using a HMD (Head Mounted
information, obtained by means of radiological images, Display) and external cameras [8].
with real patient live images, for procedures where the
physician have to interact with the patient (palpation,
percutaneous biopsy, catheterism, intervention, etc…).
Keywords
Mixed reality, surgical navigation, general surgery.
INTRODUCTION
Modern CT and MRI scanners coupled with new contrast
mediums allow the acquisition of volumetric datasets
describing human anatomy, functionality and pathology,
with high level of detail.
The detailed information contained in a volumetric dataset Figure 1: Stereoscopic video see-through in the
are fully used during the diagnostic phase, but are partially operative room
lost passing from the radiological department to the surgical
To implement this kind of systems is generally required to
department.
localize the anatomy in respect to the real video source and
In fact, generally, surgeons plan interventions just using to determine its projection model in order to coherently mix
limited information provided by the radiologist and virtual and real scenarios. Localization can be done using
consisting in the textual diagnosis coupled with few 2D commercial tracking systems, introducing additional costs
significant images selected from the volumetric dataset. and logistic troubles in the traditional clinical scenario, with
The application of the “computer assisted” model to the large errors on soft tissues, while the projection model of
patient workflow, consisting of computer aided diagnosis the video source can be calculated using theoretical
(CAD) and computer aided surgery (CAS) technologies, algorithms that impose some constrains for the real camera.
allows the optimal use of medical dataset and to overcome In the following is described in the detail the problem and
the above cited limitations of the current clinical practice. possible solutions to avoid the need of the tracker or to
The 3D visualization of patient specific virtual models of improve the localization quality on soft tissues taking into
anatomies [23; 24], extracted from medical dataset, account the limits of the current images source used in
drastically simplifies the interpretation process of exams surgery.
and provides benefits both in diagnosing and in surgical
planning phases. Computer assisted technologies allow to HOW TO OBTAIN A MIXED REALITY VIEW
augment real views of the patient, grabbed by means of The following picture essentially describes the video see-
cameras, with virtual information[26]. This augmented- through concept.
reality, or in general mixed-reality techniques [20],
introduces many advantages for each task where the
Copyright © 2011 for the individual papers by the papers'
authors. Copying permitted only for private and academic
purposes. This volume is published and copyrighted by
the editors of EICS4Med 2011. Figure 2: Video see-through concept
19
How to determine camera projection model
Line scan and telecentric cameras are used for particular
industrial applications, while for all visualization purposes,
including laparoscopy, the perspective projective camera is
the only used, because it offers the most similar images in
respect to human vision.
Regarding the sensor, two technologies are predominantly
applied: CCD (Charge Coupled Device) and CMOS
(Complementary Metal Oxide Semiconductor). In each case
unitary elements (pixels) are disposed on a regular grid
(with fixed resolution).
Each camera, composed of a projective optics and a grid
sensor, can be represented by the following model:
Figure 3: Functional scheme of a surgical see-through
system
Real video frames, grabbed by of real camera/s, are mixed
with virtual objects not visible in the real scene and shown
on a display/s. This virtual information can be obtained
using radiological images as depicted in the next figure.
The using of volumetric scanners, like CT (Computed
Tomography) or MRI (Magnetic Resonance Imaging),
allows to obtain a 3D virtual model of the anatomy [4; 6],
which can be loaded in a virtual scene, running on a
computer, rendered from a point of view coherent with the
real point of view.
The mixing of the real (2D) images with the virtual (2D)
rendered images can be done using a hardware video mixer
or using the real images in the scene graph as foreground or Figure 5: Schematic representation of the pinhole
background [19]. The concept and the work to do are camera model: the generic point Pc is ideally projected
similar: in the first case the mixing is done by external on the image sensor of the camera (the plane with origin
hardware after the rendering of the virtual scene, while in OI) through the projection center OC (where the origin
the second one by the GPU during the rendering. Figure 4 of the camera reference frame is fixed)
shows this concept. The real camera acquires video frames
from the real environment (a spleen in this case). Video
frames are shown as background of the virtual scene. The perspective projection matrix Mp, mapping a generic
Virtual objects are positioned in the scene (green flashes in 3D point Pc = [x, y, z, 1]T, in the camera reference system,
this case) and rendered from a virtual camera. to the corresponding 2D point Pp = [u, v, 1]T in the image
reference system (fixed on the center of the sensor), i.e.:
In order obtain a coherent fusion we have to obtain a virtual
scene where: Pp M p Pc (1)
virtual camera projection model ≈ to the real one is defined starting from the internal camera parameters (f,
virtual camera position ≈ to the real one Cx, Cy) as follows:
virtual objects positions ≈ to the real ones
ª f 0 Cx 0º
The following paragraphs describes how to obtain the
Mp « 0 f Cy 0»» (2)
previous three conditions. «
«¬ 0 0 1 0»¼
where f is the focal distance and (Cx, Cy) are the
coordinates of the projection of the Oc on the image
reference frame (with origin in OI).
Other internal camera parameters parameterize the model of
the radial distortion, introduced by common lens, by means
of which the projected point Pp is deviated on Pd.
Figure 4: Implementation of mixed-reality in a virtual The pixelization process is defined by the pixel dimensions
scene dx and dy and the image sensor dimensions Dx and Dy. These
20
internal parameters of the camera allow to convert laparoscopic images: see-through systems applying on
measurements done on the image (in pixels) in real organs artificial markers [SOFT TISSUE], recovering the
measurements (in millimeters) and vice-versa. position of needle [29] and the pose of surgical instruments
All internal camera parameters can be determined in a [5].
calibration phase acquiring some images of a knowing How to register the patient
object in different positions with fixed camera configuration In surgical applications, virtual objects, representing patient
(in terms of diaphragm and camera focus) and using anatomies, are acquired in the reference frame of the
calibration routines like described in [30]. radiological instrumentation just before or days before the
These parameters have to be used to adjust the virtual surgical procedure, whereas the intra-operative information
camera to the real one. is related to the reference frame of the surgical room
(generally defined by means of a tracking system) during
Using traditional surgical endoscopes a new camera the intervention.
calibration and virtual camera adjustment is required
whenever either the optic zoom or the diaphragm opening In case of rigid objects like bones, a changing of reference
are changed. Another important source of error can be frame, performed aligning fiducial points or fiducial
determined by the mechanical joint between the optic and surfaces, acquired on the radiology department and in the
the camera body. Their relative movements can determines surgical room, can be enough [1; 3]. Deformations of the
a shift of the center of projection C up to tens of pixels. fiducial structure composed by elements, such as points of a
cloud or points characterizing a surface, introduce
How to localize the camera systematic errors in the registration. In order to minimize
Camera position and orientation can be obtained using a the registration error, at least on fiducials elements, each
tracker able to track a sensor mounted on the camera body fiducial point (or fiducial surface) in the proximity of steady
as shown in the following figure. element on the patient has to be chosen, and its
configuration has to be as replicable as possible [19].
In case of soft tissue, further than the changing of reference
frame, there are many deformation effects to avoid or to
compensate, due to: changing of patient decubitus,
changing in bed configuration, physiological movements
(breathing, heart beating, gastrointestinal movements,
etc…), constraints due to the radiological scanners (breath
hold, arts positions, etc…).
To reduce these movement effects we can employ practical
Figure 6: Camera localization and calibration process
and useful artifices, used routinely by radiotherapists
using an optical localizer and a sensor mounted on the
reproducing meticulously the patient settings during the
camera body
treatment as in the planning room. By following their work,
The tracker offers in real time the transformation matrix T1 bed positioning and its shape, during the acquisition of
relative to the sensor. The calibration matrix Tc, medical datasets, can be chosen accordingly to the bed
representing the relative transformation of the camera configuration used inside the surgical room for the specific
viewpoint with respect to the sensorized frame, necessary to intervention (considering the requirements of the used
determine position and orientation of the camera projection radiological device and the type of intervention to be
center OC, can been computed using a sensorized performed). Furthermore during the intervention, the exact
calibration grid. During the calibration T1 and T2 are given decubitus of the patient during radiological scanning
by the localization system, while the transformation T3 is requires to obtain the same relative position of the basin
determined using computer vision methods that allow to and the thoracic cage. A realignment of these structures
localize, in the camera reference frame, objects with known needs immobilization devices and/or additional iterative
geometry (the sensorized calibration grid). work in the surgical room in order to find a perfect
Another approach could be the localization using directly correspondence between pre-operative and intra-operative
video frames acquired by the cameras as done in some patient positioning [15].
applications. Several computer vision libraries (OpenCV or The using of intra-operative imaging devices like 3D RA
Halcon by MVTec) offers many tools for this purpose. (Rotational Angiograph), which could be diffused in the
Using a single camera, we could localize objects with early future, thanks to the decreasing of their price and the
known geometry or texturing [11] as in the case of EasyOn possibility to be portable (Ziehm Vision FD Vario 3D or
by Seac02 (www.seac02.it). The localization accuracy is Siemens ARCADIS Orbic 3D), allows to avoid the change
enough for many applications, but requires knowing in of reference frame for each patient. These scanners,
advance the dimensions and the texture of a rigid object in positioned in the operating room, can be easily and
the scene (or different objects rigidly linked together). precisely calibrated with the localizer by means of sensors.
Interesting monoscopic solutions have been applied using Furthermore the acquisition of the anatomy directly on the
21
surgical bed allows to dramatically simplify the problem, by
removing error due to the change of bed and patient
decubitus. This simplification will allow to obtain high
precision also on soft tissues. As proven by experimental
results, the application of predictive models of organs
motion due to breathing, driven by simple intra-operative
parameters like the trajectory of a point on the patient skin
or the time over the breathing cycle, can be applied in the
real surgical scenario [14; 22].
ALTERNATIVE SOLUTIONS
Head mounted tracker-free stereoscopic video see- Figure 8: Image composed by 3 frames of a laparoscopic
through video with fixed camera and a moving instrument. The
Depth perception can be drastically increased using head projections of instrument axes, represented with blue
mounted stereoscopic devices [17], that allow to evaluate lines, are constrained to pass through a point
object depth dislocation, like in the natural binocular view. representing the projection (on the image plane) of the
The use of localized head mounted displays (HMD), like insertion point (on the abdominal wall)
the one shown in figure 1, allows to see a synthetic scene Figure 7 shows the functional scheme of our system, where
from a point of view aligned with the real user’s point of video frames are used, not only as background of the virtual
view. scene, but also to localize the cameras and to register the
For the implementation of head mounted mixed reality patient.
systems, the video see-through approach, based on the Epipolar geometry [13], using two or more cameras, allows
acquisition of real images by means of external cameras, is to detect the 3D position of each conjugate points,
preferable to the optic see-through approach that projects identifiable in the images. In a stereoscopic configuration,
virtual information on semi transparent glasses. This is due knowing the internal camera parameters, for each marker
to the fact that tracking of eye movements, strictly required position, in the image plane, the relative projection line in
for optical see-through approach, is very difficult to be the 3D world, defined as the line l passing through the
performed with sufficient precision [16; 18]. On the camera center of projection Oc and lying on the point Pc, is
contrary, head tracking, required for video see-through determined. These steps, performed both on left and right
approach, can be performed with high precision using images, identify respectively two projection lines ll and lr.
external localizer based on different technologies [2; 12], Knowing the relative pose of the right camera to the left
like described before. camera (expressed by a roto-traslation matrix determinable
We implemented a head mounted stereoscopic video see- in a calibration phase), the 3D position of each marker is
through system, that does not require the use of an external then defined as the intersection point between ll and lr.
localizer to track head movements [8]. Our system Since ll and lr do not intersect (due to pixelization process
implements mixed-reality aligning in real-time virtual and and noise in marker identification) the 3D marker position
real scene just using geometric information extracted by is approximated with the position of the closest point to
segmenting coloured markers, attached on the patient’s both projection lines. After fiducials localization a rigid
skin, directly from couples of camera images. registration is performed using a point based approach.
Results demonstrate that stereoscopic localization
approach, adopted in our system, is enough for system
usability.
Laparoscope auto localization
As described before, localization using monoscopic
cameras can be done in case of objects with known
geometry or texturing. In case of laparoscopic interventions
the localization of the endoscopic camera can be
determined using information offered by endoscopic video
images without the introduction of any artificial add-on in
the scenario[7].
The position and orientation of the endoscopic camera can
be determined, with respect to a reference frame fixed to
the access ports configuration, elaborating video images
and knowing the distances between insertion points. During
laparoscopic interventions, camera movements are minor
Figure 7: Schematic representation of our stereoscopic respect to instruments movements. Therefore the
mixed-reality system laparoscope can be considerate steady in a time interval,
22
and a reference frame fixed on the camera can be used to necessary, in the future, the development of endoscopic
perform measurements [21; 28]. cameras taking into account the previous considerations.
The projections of instrument axis on the image plane Endoscopes should natively integrate sensors for their
(projection lines), which can be simply determined using localization and manufactures should take into account the
HSV color space and Hough transform [27], are stability of the joint between optic and camera body.
constrained to pass through the projection of the insertion On the other hand it is possible the development of
point on the image plane [28] (figure 8). tracker-free implementations elaborating camera images,
Insertion point projection on the image plane can be allowing to reduce costs and logistic troubles related to the
calculated as the barycentre of the intersection of couples of need of sensors and the tracker in the operating room.
projection lines, for each instrument. It allows (after camera The using of intra-operative imaging devices like 3D RA,
calibration) to determine the direction of the insertion point which could be diffused in the early future, thanks to the
in the camera reference frame (Fig. 9 Left). Therefore, decreasing of their price and the possibility to be portable,
versors Tl and Tr, representing respectively the direction of will allow to obtain high precision in see-through systems
the left and the right instrument insertion point, are also in case of soft tissues.
determined. The versor Tc, representing the direction of the REFERENCES
camera insertion point, lies on the Z axis of the camera 1. Arun, K. S., Huang, T. S., & Blostein, S. D. (1987).
reference frame (using 0 degree optic). Least-squares fitting of two 3-D point sets. IEEE Trans.
Pattern Anal. Mach. Intell., 9(5), 698-700.
2. Baillot, Y., & Julier, S. J. (2003). A tracker alignment
framework for augmented reality, In Proc. Second IEEE
and ACM International Symposium on Mixed and
Augmented Reality (pp. 142-150): IEEE.
3. Besl, P. J., & McKay, N. D. (1992). A Method for
Registration of 3-D Shapes. IEEE Trans. Pattern Anal.
Mach. Intell., 14(2), 239-256.
Figure 9: (Left) The projections of instrument axes 4. Coll, D. M., Uzzo, R. G., Herts, B. R., Davros, W. J.,
(blue lines) allow to calculate the projection of the Wirth, S. L., & Novick, A. C. (1999). 3-dimensional
insertion point on the image plane P, which allows to volume rendered computerized tomography for
determinate the direction of the insertion point in the preoperative evaluation and intraoperative treatment of
camera reference frame fixed on OC. (Right) Geometric patients undergoing nephron sparing surgery. J Urol,
relations involved in the insertion points configuration 161(4), 1097-1102.
that allow to localize the laparoscope 5. Doignon, C., Nageotte, F., Maurin, B., & Krupa, A.
(2007). Model-based 3-D pose estimation and feature
The geometrical relations between Tl, Tr, Tc, and insertion tracking for robot assisted surgery with medical
points are shown on the right of figure 9. In the figure lc, ll imaging, From Features to Actions - Unifying
and lr represent distances of the insertion points from the Perspectives in Computational and Robot Vision,
camera origin, which have to be chosen in order to guaranty Workshop at the IEEE Int. Conf. on Robotics and
the distances between access ports d1, d2 and d3. The Automation.
tetrahedral configuration allows to determine univocally lc, 6. Ferrari, V., Carbone, M., Cappelli, C., Boni, L.,
ll and lr and consequently, having Tl, Tr and Tc, to localize Cuschieri, A., Pietrabissa, A., et al. (2010).
the access ports respect to the camera (and vice versa). Improvements of MDCT images segmentation for
The localization accuracy depends on the instruments surgical planning in general surgery - practical
configuration and on their movements. The proposed examples. Proceedings of the International Congress and
solution allows to provide a cheap and tracker-free Exhibition. IJCARS Volume 5, Supplement 1 / June.
implementation for a class of computer assisted surgical 7. Ferrari, V., Megali, G., Pietrabissa, A., & Mosca, F.
systems that do not require extremely accurate localization. (2009). Laparoscope 3D auto-localization. Proceedings
For example, offering 3D pre-operative model visualization of the International Congress and Exhibition. IJCARS
with automatic point of view selection and remote Volume 4, Supplement 1 / June.
assistance using virtual objects on the laparoscopic monitor. 8. Ferrari, V., Megali, G., Troia, E., Pietrabissa, A., &
CONCLUSIONS Mosca, F. (2009). A 3-D mixed-reality system for
The development of video see-through systems is useful stereoscopic visualization of medical dataset. IEEE
and possible using various approaches. Trans Biomed Eng, 56(11), 2627-2633.
In order to reduce misalignment errors, between real and 9. Freschi, C., Ferrari, V., Porcelli, F., Peri , A., Pugliese,
virtual world, using commercial trackers, it would be L., Morelli, L., et al. (2010). An Augmented Reality
23
Navigation Guidance for High Intensity Focused The International Journal of Medical Robotics and
Ultrasound Treatment. Paper presented at the Conf Proc Computer Assisted Surgery, 4, 242 - 251.
ICABB, International Conference on Applied Bionics 20.Milgram, P., & Kishino, F. (1994). A Taxonomy of
and Biomechanics 2010, Venice, Italy. Mixed Reality Visual Displays. IEICE transactions on
10. Freschi, C., Troia, E., Ferrari, V., Megali, G., information and systems, 77(12), 1321-1329.
Pietrabissa, A., & Mosca, F. (2009). Ultrasound guided 21.Naogette, F., Zanne, P., Doignon, C., & De Mathelin,
robotic biopsy using augmented reality and human-robot M. (2006). Visual Servoing-Based Endoscopic Path
cooperative control. Conf Proc IEEE Eng Med Biol Following for Robot-Assisted Laparoscopic Surgery,
Soc, 2009, 5110-5113. International Conference on Intelligent Robots and
11. Gao, C., & Ahuja, N. (2004). Single camera stereo using Systems (IROS).
planar parallel plate, Pattern Recognition, 17th 22.Olbricha, B., Traubb, J., Wiesnerb, S., Wicherta, A.,
International Conference on (ICPR'04) Volume 4 (pp. Feussnera, H., & N.Navabb. (2005). Respiratory Motion
108-111). Analysis: Towards Gated Augmentation of the Liver,
12. Genc, Y., Sauer, F., Wenzel, F., Tuceryan, M., & CARS 2005 Computer Assisted Radiology and Surgery,
Navab, N. (2000). Optical see-through HMD 19st International Congress and Exhibition.
calibration: A stereo method validated with a video see- 23.Peters, T. M. (2006). Image-guidance for surgical
through system, International Symposium for procedures. Physics in Medicine and Biology, 51(14),
Augmented Reality. R505-R540.
13. Hartley, R., & Zisserman, A. (2004). Multiple View 24.Peters, T. M. (2000). Image-guided surgery: from X-
Geometry in Computer Vision. rays to virtual reality. Computer Methods in
14. Hawkes, D. J., Penney, G. P., Atkinson, D., Barratt, D. Biomechanics and Biomedical Engineering, 4(1), 27-57.
C., Blackall, J. M., Carter, T. J., et al. (2007). Motion 25.Pietrabissa, A., Morelli, L., Ferrari, M., Peri, A., Ferrari,
and Biomechanical Models for Image-Guided V., Moglia, A., et al. (2010). Mixed reality for robotic
Interventions, ISBI (pp. 992-995). treatment of a splenic artery aneurysm. Surg Endosc,
15. Hinson, W. H., Kearns, W. T., Ellis, T. L., Sprinkle, D., 24(5), 1204.
Cullen, T., Smith, P. G., et al. (2007). Reducing set-up 26.Shuhaiber, J. H. (2004). Augmented Reality in Surgery.
uncertainty in the elekta stereotactic body frame using Archive of surgery, 139, 170-174.
stealthstation software. Technology in cancer research &
treatment, 6(3), 181-186. 27.Tonet, O., Ramesh, T. U., Megali, G., & Dario, P.
(2007). Tracking endoscopic instruments without
16. Hua, H., Krishnaswamy, P., & Rolland, J. P. Video- localizer: a shape analysis-based approach. Computer
based eyetracking methods and algorithms in head- Aided Surgery, 12(1), 35-42.
mounted displays. Optics Express, 14, 4328-4350.
28.Voros, S., Long, J.-A., & Cinquin, P. (2006). Automatic
17. Johnson, L., Philip, E., Lewis, G., & Hawkes, D. (2004). Localization of Laparoscopic Instruments for the Visual
Depth perception of stereo overlays in image-guided Servoing of an Endoscopic Camera Holder, MICCAI (1)
surgery, Medical Imaging, Proceedings of the SPIE, (pp. 535-542).
Volume 5372, pp. 263-272.
29.Wengert, C., Bossard, L., Baur, C., Szekely, G., &
18. Lee, E. C., & Park, K. R. (2008). A robust eye gaze Cattin, P. C. (2008). Endoscopic navigation for
tracking method based on a virtual eyeball model. minimally invasive suturing. Comput Aided Surg, 13(5),
Machine Vision and Applications. 299-310.
19. Megali, G., Ferrari, V., Freschi, C., Morabito, B., 30.Zhang, Z. (2000). A Flexible New Technique for
Cavallo, F., Turini, G., et al. (2008). EndoCAS Camera Calibration. IEEE Transactions on Pattern
navigator platform: a common platform for computer Analysis and Machine Intelligence, 22(11), 1330-1334.
and robotic assistance in minimally invasive surgery.
24