Combining Color with Spatial and Temporal Position of the Endoscopic Capsule for Improved Topographic Classification and Segmentation M. Coimbra, J. Kustra, P. Campos, and J.P. Silva Cunha  difficult (the median error performed by three senior capsule Abstract—Capsule endoscopy is a recent technology with a specialists was about 400 images) and time-consuming clear need for automatic tools that reduce the long exam (around 15 minutes can be saved by automation). annotation times of exams. We have previously developed a The main contribution of this paper is the improvement of topographic segmentation method, which is now improved by using spatial and temporal position information. Two approaches our previous topographic segmentation methods using color are studied: using this information as a confidence measure for and texture, by incorporating not only temporal but also our previous segmentation method, and direct integrating of this spatial position information in the image classification data into the image classification process. These allow us not only process. to automatically know when we have obtained results with error magnitudes close to human errors, but also to reduce these II. METHODS automatic errors to much lower values. All the developed methods have been integrated in the CapView annotation The ultimate objective of the presented methods is to software, currently used for clinical practice in hospitals reliably divide the video of the gastrointestinal tract into its 4 responsible for over 250 capsule exams per year, and where we constituent parts (entrance, stomach, small intestine, large estimate that the two hour annotation times are reduced by intestine) and thus determine its corresponding junctions (eso- around 15 minutes. gastric junction, pylorus, ileo-cecal valve). Index Terms— Endoscopic capsule, image classification, A. Capsule Position and Velocity biomedical engineering, medical imaging We can theoretically estimate the spatial position of a capsule via antenna signal triangulation. We have selected 47 capsule exams where a clinical specialist manually annotated I. INTRODUCTION the temporal location of the pylorus (tPYL) and of the ICV (tICV) The clinical importance of the endoscopic capsule is now in the video using the CapView annotation software. We’ve solidly established in literature: Iddan [1], Oureshi [2], etc. then used our automatic topographic segmentation algorithm Due to space limitations, we refer to our previous work [3,4], to determine these same temporal locations. Using our 2D for more extensive capsule details and clinical importance position information, we can then obtain the corresponding information. All this attempts to solve an important limitation spatial locations: xPYL, yPYL, xVIC, yVIC etc. For comparison of the endoscopic capsule, excessively long annotation times. purposes, these were normalized. Besides analysing 2D Currently it takes about 2 hours to fully view, annotate an position information, we have looked at average capsule exam and write its corresponding report. Our clinical studies displacement velocity (module of the displacement vector show that the task of topographic segmentation is both between two points with temporal references t and t+1). Manuscript received July 11, 2006. This work was supported in part by the B. Topographic Segmentation Algorithm IEETA institute and the Fundação para a Ciência e Tecnologia (grant nr. Our previously developed automatic topographic SFRH/BPD/ 20479/2004/YPH2). The authors would like to thank Dr. José Soares of the gastroenterology department of Santo António General Hospital segmentation method, from now on referred as TSA, is in Porto, Portugal (www.hgsa.pt) for providing and annotating all the described in Coimbra [4]. anonymous data that has made this work possible and for contributions regarding the medical importance of capsule endoscopy. C. Spatial Information as a Confidence Measure M. Coimbra, J. Kustra, P. Campos are with the IEETA institute, Campus Two high-confidence areas were defined, one for the pylorus Universitário de Santiago, 3810-193 Aveiro - Portugal (email: {miguel.coimbra, jacek,pcampos}@ieeta.pt). and another for the ILC. We have measured the median J.P. Silva Cunha is with the Departments of Electronics, segmentation error SEz for all marks (z12 - eso-gastric Telecommunications and Informatics of the University of Aveiro, Campus junction; z23 – pylorus; z34 – ileo-cecal valve), and for all Universitário de Santiago, 3810-193 Aveiro – Portugal (email: exams SE, whose junctions are inside and outside these areas, jcunha@det.ua.pt). and results presented in section 3 have showed that this An analysis of Table 1 and Figure 1 shows that high- information is indeed useful as a confidence measure for confidence areas contain almost all correct estimations, and automatic segmentation results. low-confidence areas mainly contain incorrect estimations. D. Integrating Spatial and Temporal Information for Table 2. Segmentation results using various distances and classifiers for Classification feature vector F. L1, L2, and Full Multivariate were previously defined. Max An alternative way of using this information is to use it Z corresponds to our previously used classification method [7], which is the maximum positive distance to the SVM hyperplanes. Finally, we use directly for individual image classification. Our previous Mahalanobis distances on a reduced feature vector F = [Z1, Z2, Z3, Z4] – method trained 4 SVM classifiers, one for each zone, which Multivariate Color. determine the topographic section each image belongs to as the classifier with the highest positive distance to the SVM Accuracy SE SE-EGJ SE-PYL SE-ICV hyperplane (see [4] for details). We can however, use these L1 82.2% 2285 7 50 2228 distances to build a feature vector for each image, along with L2 83.1% 1730 5 23 1702 additional information such as spatial and temporal location. Max Z 79.4% 3063 5 16 3042 Our new feature vector F is now defined as: U Multivariate Color 77.4% 1052 5 22 1025 F >x, y , Z 1 , Z 2 , Z 3 , Z 4 , V , t @ (1) Full Multivariate 79.7% 2285 6 433 1846 where x and y are the normalized spatial location coordinates Table 3. SE values as coefficients are removed from the feature vector F in a (1,2), Z1, Z2 Z3 Z4 are the SVM classifier results [4] (distances step-wise elimination process. In each step we eliminate the coefficient that to SVM hyperplanes), V is the spatial velocity, and t the produces the minimum SE when removed from the vector. These areas are marked in light grey in the table. The corresponding individual classification temporal location in number of frames. The combination of accuracy is presented instead. Discrepancies with are highlighted in dark grey. these different features into a single vector requires that all coefficients are previously normalized. Median Segmentation Error A variety of well-known distances was used for classification Maximum 83.6% 83.1% 83.5% 82.8% 82.2% 82.1% 79.2% (L1 Norm, Euclidean, Mahalanobis). Finally, we have measured the relevance of each coefficient for the x 82.3% 82.3% 83.1% 82.8% 82.2% 82.1% 79.2% segmentation process using a step-wise elimination analysis. y 83.6% 83.1% 83.1% 82.8% 82.2% 82.1% 79.2% Z1 81.8% 81.2% 81.7% 80.8% 79.9% 79.2% 79.2% III. RESULTS Z2 81.9% 81.0% 82.8% 82.8% 82.2% 82.1% 79.2% Z3 80.9% 82.6% 83.4% 79.3% 70.3% 69.8% 68.7% 1 1 Z4 80.8% 82.5% 83.5% 82.2% 82.2% 82.1% 79.2% ICV-Error ICV-Correct V 83.2% 82.4% 83.1% 82.8% 82.1% 82.1% 79.2% High Confidence 0.5 0.5 t 80.4% 79.9% 79.9% 79.5% 66.4% 63.0% 54.2% -1 -0.5 0 0 0.5 1 -1 -0.5 0 0 0.5 1 IV. DISCUSSION Results show that doctors can trust that automatic -0.5 Pylorus-Error -0.5 segmentation errors in high-confidence areas are as low as Pylorus-Correct High Confidence human ones. Including other information has allowed us to -1 -1 improve segmentation results significantly. Step-wise Fig. 1. Spatial distribution of correct (green) and incorrect (blue) estimations. elimination analysis has shown us that the most relevant Points in high confidence areas are highlighted with a black bounding box. We can observe that most correct detections fall into high-confidence areas features for segmentation are capsule temporal position, and while incorrect ones are more distributed over the whole 2D space. the color recognition of the entrance and the small intestine topographic sections. It has also shown us that spatial location Table 1. Numerical analysis of the spatial distribution of automatic is not a relevant factor for individual image classification. topographic estimations. Accuracy = correct estimations / total estimations; recall = correct estimations / total annotations; mean and median segmentation REFERENCES errors are given in number of images. [1] G. Iddan, G. Meron, A. Glukhovsky, and P. Swain, “Wireless Capsule Endoscopy”, in Nature, 2000, 405, pp. 417. Pylorus ICV [2] 2. W.A. Qureshi, “Current and future applications of the capsule Accuracy Recall Accuracy Recall camera”, in Nature, vol.3, 2004, pp. 447-450. Correct 80 % 93 % 58 % 96 % [3] 4. M. Coimbra, and J.P. Silva Cunha, “MPEG-7 visual descriptors – Incorrect 89 % 70 % 92 % 39 % Contributions for automated feature extraction in capsule endoscopy”, in Mean Median Mean Median IEEE Transaction on Circuits and Systems for Video Processing, vol. Err. Err. Err. Err. 16/5, 2006, pp. 628-637. All 2157 158 3966 844 [4] 6. M. Coimbra, P. Campos, and J.P. Silva Cunha, “Topographic High- 493 45 2096 246 Segmentation and Transit Time Estimation for Endoscopic Capsule confidence Exams”, in Proc. of IEEE ICASSP 2006, Toulouse, France, 2006.