=Paper= {{Paper |id=Vol-233/paper-8 |storemode=property |title=Combining Color with Spatial and Temporal Position of the Endoscopic Capsule for Improved Topographic Classification and Segmentation |pdfUrl=https://ceur-ws.org/Vol-233/p17.pdf |volume=Vol-233 |dblpUrl=https://dblp.org/rec/conf/samt/CoimbraKCC06 }} ==Combining Color with Spatial and Temporal Position of the Endoscopic Capsule for Improved Topographic Classification and Segmentation== https://ceur-ws.org/Vol-233/p17.pdf
             Combining Color with Spatial and Temporal
               Position of the Endoscopic Capsule for
              Improved Topographic Classification and
                            Segmentation
                                        M. Coimbra, J. Kustra, P. Campos, and J.P. Silva Cunha


                                                                              difficult (the median error performed by three senior capsule
   Abstract—Capsule endoscopy is a recent technology with a                    specialists was about 400 images) and time-consuming
clear need for automatic tools that reduce the long exam                       (around 15 minutes can be saved by automation).
annotation times of exams. We have previously developed a                         The main contribution of this paper is the improvement of
topographic segmentation method, which is now improved by
using spatial and temporal position information. Two approaches
                                                                               our previous topographic segmentation methods using color
are studied: using this information as a confidence measure for                and texture, by incorporating not only temporal but also
our previous segmentation method, and direct integrating of this               spatial position information in the image classification
data into the image classification process. These allow us not only            process.
to automatically know when we have obtained results with error
magnitudes close to human errors, but also to reduce these                                              II. METHODS
automatic errors to much lower values. All the developed
methods have been integrated in the CapView annotation                            The ultimate objective of the presented methods is to
software, currently used for clinical practice in hospitals                    reliably divide the video of the gastrointestinal tract into its 4
responsible for over 250 capsule exams per year, and where we                  constituent parts (entrance, stomach, small intestine, large
estimate that the two hour annotation times are reduced by                     intestine) and thus determine its corresponding junctions (eso-
around 15 minutes.
                                                                               gastric junction, pylorus, ileo-cecal valve).
   Index Terms— Endoscopic capsule, image classification,                        A. Capsule Position and Velocity
biomedical engineering, medical imaging
                                                                                  We can theoretically estimate the spatial position of a
                                                                               capsule via antenna signal triangulation. We have selected 47
                                                                               capsule exams where a clinical specialist manually annotated
                          I. INTRODUCTION
                                                                               the temporal location of the pylorus (tPYL) and of the ICV (tICV)
The clinical importance of the endoscopic capsule is now                       in the video using the CapView annotation software. We’ve
solidly established in literature: Iddan [1], Oureshi [2], etc.                then used our automatic topographic segmentation algorithm
Due to space limitations, we refer to our previous work [3,4],                 to determine these same temporal locations. Using our 2D
for more extensive capsule details and clinical importance                     position information, we can then obtain the corresponding
information. All this attempts to solve an important limitation                spatial locations: xPYL, yPYL, xVIC, yVIC etc. For comparison
of the endoscopic capsule, excessively long annotation times.                  purposes, these were normalized. Besides analysing 2D
Currently it takes about 2 hours to fully view, annotate an                    position information, we have looked at average capsule
exam and write its corresponding report. Our clinical studies                  displacement velocity (module of the displacement vector
show that the task of topographic segmentation is both                         between two points with temporal references t and t+1).

   Manuscript received July 11, 2006. This work was supported in part by the     B. Topographic Segmentation Algorithm
IEETA institute and the Fundação para a Ciência e Tecnologia (grant nr.          Our previously developed automatic topographic
SFRH/BPD/ 20479/2004/YPH2). The authors would like to thank Dr. José
Soares of the gastroenterology department of Santo António General Hospital    segmentation method, from now on referred as TSA, is
in Porto, Portugal (www.hgsa.pt) for providing and annotating all the          described in Coimbra [4].
anonymous data that has made this work possible and for contributions
regarding the medical importance of capsule endoscopy.                           C. Spatial Information as a Confidence Measure
   M. Coimbra, J. Kustra, P. Campos are with the IEETA institute, Campus
                                                                               Two high-confidence areas were defined, one for the pylorus
Universitário de Santiago, 3810-193 Aveiro - Portugal (email:
{miguel.coimbra, jacek,pcampos}@ieeta.pt).                                     and another for the ILC. We have measured the median
   J.P. Silva Cunha is with the Departments of Electronics,                    segmentation error SEz for all marks (z12 - eso-gastric
Telecommunications and Informatics of the University of Aveiro, Campus         junction; z23 – pylorus; z34 – ileo-cecal valve), and for all
Universitário de Santiago, 3810-193 Aveiro – Portugal (email:
                                                                               exams SE, whose junctions are inside and outside these areas,
jcunha@det.ua.pt).
and results presented in section 3 have showed that this                                             An analysis of Table 1 and Figure 1 shows that high-
information is indeed useful as a confidence measure for                                             confidence areas contain almost all correct estimations, and
automatic segmentation results.                                                                      low-confidence areas mainly contain incorrect estimations.
  D. Integrating Spatial and Temporal Information for                                                Table 2. Segmentation results using various distances and classifiers for
  Classification                                                                                     feature vector F. L1, L2, and Full Multivariate were previously defined. Max
An alternative way of using this information is to use it                                            Z corresponds to our previously used classification method [7], which is the
                                                                                                     maximum positive distance to the SVM hyperplanes. Finally, we use
directly for individual image classification. Our previous                                           Mahalanobis distances on a reduced feature vector F = [Z1, Z2, Z3, Z4] –
method trained 4 SVM classifiers, one for each zone, which                                           Multivariate Color.
determine the topographic section each image belongs to as
the classifier with the highest positive distance to the SVM                                                                   Accuracy      SE      SE-EGJ       SE-PYL       SE-ICV
hyperplane (see [4] for details). We can however, use these                                                     L1              82.2%       2285          7           50        2228
distances to build a feature vector for each image, along with
                                                                                                                L2              83.1%       1730          5           23        1702
additional information such as spatial and temporal location.
                                                                                                               Max Z            79.4%       3063          5           16        3042
Our new feature vector F is now defined as:
               U                                                                                      Multivariate Color        77.4%       1052          5           22        1025
               F >x, y , Z 1 , Z 2 , Z 3 , Z 4 , V , t @   (1)                                         Full Multivariate        79.7%       2285          6           433       1846

where x and y are the normalized spatial location coordinates                                        Table 3. SE values as coefficients are removed from the feature vector F in a
(1,2), Z1, Z2 Z3 Z4 are the SVM classifier results [4] (distances                                    step-wise elimination process. In each step we eliminate the coefficient that
to SVM hyperplanes), V is the spatial velocity, and t the                                            produces the minimum SE when removed from the vector. These areas are
                                                                                                     marked in light grey in the table. The corresponding individual classification
temporal location in number of frames. The combination of                                            accuracy is presented instead. Discrepancies with are highlighted in dark grey.
these different features into a single vector requires that all
coefficients are previously normalized.
                                                                                                                                       Median Segmentation Error
A variety of well-known distances was used for classification
                                                                                                     Maximum           83.6%   83.1%     83.5%    82.8%       82.2%    82.1%    79.2%
(L1 Norm, Euclidean, Mahalanobis). Finally, we have
measured the relevance of each coefficient for the                                                         x           82.3%   82.3%     83.1%     82.8%      82.2%    82.1%    79.2%

segmentation process using a step-wise elimination analysis.                                               y           83.6%   83.1%     83.1%     82.8%      82.2%    82.1%    79.2%
                                                                                                           Z1          81.8%   81.2%     81.7%     80.8%      79.9%    79.2%    79.2%
                                               III. RESULTS                                                Z2          81.9%   81.0%     82.8%     82.8%      82.2%    82.1%    79.2%
                                                                                                           Z3          80.9%   82.6%     83.4%     79.3%      70.3%    69.8%    68.7%
                                1                                     1                                    Z4          80.8%   82.5%     83.5%     82.2%      82.2%    82.1%    79.2%
                                                                               ICV-Error
                                                                               ICV-Correct                 V           83.2%   82.4%     83.1%     82.8%      82.1%    82.1%    79.2%
                                                                               High Confidence
                              0.5                                   0.5                                    t           80.4%   79.9%     79.9%     79.5%      66.4%    63.0%    54.2%


       -1           -0.5
                                0
                                     0   0.5       1   -1   -0.5
                                                                      0
                                                                           0    0.5              1
                                                                                                                                  IV. DISCUSSION
                                                                                                     Results show that doctors can trust that automatic
                              -0.5

            Pylorus-Error
                                                                    -0.5
                                                                                                     segmentation errors in high-confidence areas are as low as
            Pylorus-Correct
            High Confidence
                                                                                                     human ones. Including other information has allowed us to
                               -1                                    -1
                                                                                                     improve segmentation results significantly. Step-wise
Fig. 1. Spatial distribution of correct (green) and incorrect (blue) estimations.                    elimination analysis has shown us that the most relevant
Points in high confidence areas are highlighted with a black bounding box.
We can observe that most correct detections fall into high-confidence areas
                                                                                                     features for segmentation are capsule temporal position, and
while incorrect ones are more distributed over the whole 2D space.                                   the color recognition of the entrance and the small intestine
                                                                                                     topographic sections. It has also shown us that spatial location
Table 1. Numerical analysis of the spatial distribution of automatic                                 is not a relevant factor for individual image classification.
topographic estimations. Accuracy = correct estimations / total estimations;
recall = correct estimations / total annotations; mean and median segmentation
                                                                                                                                       REFERENCES
errors are given in number of images.
                                                                                                     [1]   G. Iddan, G. Meron, A. Glukhovsky, and P. Swain, “Wireless Capsule
                                                                                                           Endoscopy”, in Nature, 2000, 405, pp. 417.
                                       Pylorus                                 ICV
                                                                                                     [2]   2. W.A. Qureshi, “Current and future applications of the capsule
                                Accuracy      Recall               Accuracy             Recall
                                                                                                           camera”, in Nature, vol.3, 2004, pp. 447-450.
    Correct                       80 %         93 %                  58 %                96 %
                                                                                                     [3]   4. M. Coimbra, and J.P. Silva Cunha, “MPEG-7 visual descriptors –
   Incorrect                      89 %         70 %                  92 %                39 %
                                                                                                           Contributions for automated feature extraction in capsule endoscopy”, in
                                 Mean        Median                 Mean                Median
                                                                                                           IEEE Transaction on Circuits and Systems for Video Processing, vol.
                                  Err.         Err.                  Err.                Err.
                                                                                                           16/5, 2006, pp. 628-637.
     All                          2157          158                  3966                 844
                                                                                                     [4]   6. M. Coimbra, P. Campos, and J.P. Silva Cunha, “Topographic
    High-                          493          45                   2096                 246
                                                                                                           Segmentation and Transit Time Estimation for Endoscopic Capsule
  confidence                                                                                               Exams”, in Proc. of IEEE ICASSP 2006, Toulouse, France, 2006.