Hierarchical Multi–structure Segmentation Guided by
               Anatomical Correlations

                 Oscar Alfonso Jiménez del Toro                     Henning Müller
                     oscar.jimenez@hevs.ch                       henningmueller@hevs.ch

                         University of Applied Sciences Western Switzerland
                      University and University Hospitals of Geneva, Switzerland


                                                  Abstract
                     Many medical image analysis techniques require an initial
                     localization and segmentation of anatomical structures. As
                     part of the VISCERAL benchmarks on Anatomy segmenta-
                     tion, a hierarchical multi–atlas multi–structure segmentation
                     approach guided by anatomical correlations is proposed. The
                     method begins with a global alignment of the volumes and re-
                     fines the alignenment of the structures locally. The alignment
                     of the bigger structures is used as reference for the smaller
                     and harder to segment structures. The method is evaluated
                     in the ISBI VISCERAL testset on ten anatomical structures
                     in both contrast–enhanced and non–enhanced computed to-
                     mography scans. The proposed method obtained the highest
                     DICE overlap score in the entire competition for some struc-
                     tures such as kidneys and gallbladder. Similar segmentation
                     accuracies compared to the highest results of the other meth-
                     ods proposed in the challenge are obtained for most of the
                     other structures segmented with the method.


1   Introduction
Anatomical structure segmentation in medical imaging is a fundamental step for further image
analysis and computer–aided diagnosis [Doi05]. With the ongoing increase in medical image data,
it is necessary to develop fast and automatic algorithms that can process a large quantity of images
with high accuracy and sufficient speed for clinical daily use. Although many different methods have
already been proposed [LSL+ 10, CRK+ 13], it is uncommon to test multiple approaches on the same
available dataset. The Visual Concept Extraction Challenge in Radiology (VISCERAL1 ) bench-
marks have been organized with the objective to evaluate the available state-of-the-art segmenting

Copyright c by the paper’s authors. Copying permitted only for private and academic purposes.
In: O. Goksel (ed.): Proceedings of the VISCERAL Organ Segmentation and Landmark Detection Benchmark at
the 2014 IEEE International Symposium on Biomedical Imaging (ISBI), Beijing, China, May 1st , 2014
published at http://ceur-ws.org
   1
     http://www.visceral.eu/, as of 27 April 2014


                                                       32
Jiménez del Toro and Müller: Hierarchical Segmentation via Anatomical Correlations

approaches on a large public dataset. Twenty anatomical structures in four imaging modalities,
enhanced and non–enhanced magnetic resonance (MR) and computed tomography volumes, are
included in both the training and testing sets provided to the participants. The benchmarks are
handled in a novel cloud environment that allows to distribute large quantities of volumes and im-
plement algorithms of the research groups under the same conditions (regarding computing power
etc.) inside the cloud [LMMH13].
   Multi–atlas based segmentation is an approach that requires little or no interaction from the user.
It has been evaluated showing high accuracy and consistent reproducibility in different anatomical
structures [LSL+ 10, RBMMJ04]. In this method, an atlas includes a patient volume and a label
volume, created by manual annotation, that identifies the location of one or more structures in
the patient volume. The target is the query volume where the location of the structures is un-
known. Using image registration, the spatial relationship between the target and atlas volume is
estimated. The label volumes are transformed taking the coordinate transformation obtained from
the registration. Afterwards the labels are fused resulting in a single label volume that provides
an estimated location of the label in the target volume. When multiple atlases are used, the local
errors of the registration will be removed by a per–voxel classification.
   The proposed method was tested on computed tomography scans with ten different anatomical
structures. The method can be extended and applied to the other modalities and any of the
anatomical structures in the VISCERAL dataset.

2       Method
All volumes are resampled to obtain isotropic 1mm voxels. Afterwards they are down–sampled to
half their size in all three dimensions to speed up the registrations and resampled to their original
size for the label fusion.

2.1        Image registration
The atlas patient volume, considered as moving volume VA (x), is registered to the fixed query
volume VQ (x) using the image registration implementation of Elastix software2 [KSM+ 10]. The
registration is evaluated in every iterative optimization by a cost function C of the parameterized
coordinate transformation Tµ from the moving atlas volume VA to the query volume VQ . The
adaptive stochastic gradient descent optimizer proposed in [KPSV09] is applied. A coordinate
transformation is obtained by minimizing the value of C with respect to the transformation:

                                          µ̂     =     arg min C(Tµ ; VQ , VA ),
                                                                                                  (1)
                                                            µ

the subscript µ indicates that the transformation was parameterized with a vector µ that contains
the transformation parameters. Normalized Cross–Correlation (NCC) is selected as the similarity
metric for cost function C .

2.2        Hierarchical anatomical structure alignment
The anatomy can differ considerably from patient to patient, particularly the spatial relations be-
tween the different structures in the same patient volume [JdTM13]. Since multiple structures
are segmentation targets in the VISCERAL benchmark, a hierarchical selection of the registra-
tions improves the segmentations of all the structures. A global affine registration is followed by
individual affine registrations using local binary masks to enforce the spatial correlation of each
    2
        Elastix: http://elastix.isi.uu.nl, 2014.[Online; accesed 27–April–2014].


                                                              33
Jiménez del Toro and Müller: Hierarchical Segmentation via Anatomical Correlations

anatomical structure separately. These masks are obtained from the morphological dilation of the
output labels of the different atlases registered in the previous step. The registrations of the bigger
structures are used as a starting point for the closely related smaller structures, which are harder
to segment. Most of the registrations of the initial bigger structures (liver, lungs, urinary bladder)
will be reused in the method which makes it faster than segmenting each structure individually
from the start. The method is repeated for the non-rigid registrations of all the target structures.
Also the creation of regions-of-interest with the local masks speeds up the image registrations and
improves the output estimations.


                                    Figure 1: Method Pipeline.


2.3   Non-rigid registration
After each anatomical structure has its own independent ROI mask, the volumes are registered
again but using a non–rigid B–spline transformation model. This non–rigid registration allows local
deformations obtaining a higher spatial similarity between the volumes. The B–spline registration
was also performed in a multi–resolution approach with an adaptive stochastic gradient descent
optimizer. This final registration step has a higher computational cost than the affine registration.
The transformed labels are updated using the coordinate transformation parameters from the B–
spline registration. The new transformed label volumes for each structure constitute the individual
votes that will be used for the label fusion step.

                                                  34
Jiménez del Toro and Müller: Hierarchical Segmentation via Anatomical Correlations

2.4    Label fusion
A different label volume is obtained for every atlas registered to the target volume. In order to
combine the information obtained from the multiple atlases registered, the output labels are fused
in a single label for the target volume. Defining a majority voting threshold is a commonly used
label fusion method. An optimal threshold is found for each of the different structures on a per–
voxel basis with this approach. Majority voting has also the advantage of providing more than one
output segmentation varying the threshold parameter with no additional computations required.

3     Experimental Setup
Ten CT volumes were used to evaluate the performance of the algorithm for the International
Symposium on Biomedical Imaging (ISBI) 2014 VISCERAL challenge. Five of them are contrast–
enhanced (ceCT) with a field–of–view from below the skull base to the pelvis. The other five are
non–enhanced whole body CT scans (wbCT). For the ten CT volumes, ten structures were included
in the proposed segmentation method: liver, 2 kidneys, 2 lungs, urinary bladder, spleen, trachea,
first lumbar vertebra and gallbladder.
   An initial global affine registration is followed by individual affine registrations of the indepen-
dent structures using local masks as described in the method. The liver, both lungs, 1st lumbar
vertebra and urinary bladder were segmented with individual affine and non–rigid registrations.
The gallbladder and right kidney have the affine alignment of the volume after the liver registra-
tions as a starting point. The left lung affine alignment is used for the spleen and the left kidney.
The right lung affine alignment is refined for the trachea segmentation. All structures are refined
with non–rigid b–spline registration for the final estimation.
   According to the results of the VISCERAL Benchmark 1, an individual majority vote threshold
was selected in each structure for the label fusion.

4     Results
The method obtained a total average DICE of 0.789 for ten structures in ceCT and 0.694 for
the same ten structures in wbCT (Table 1). All the overlap scores were higher in ceCT and in
close relation to the results from the other participants in the challenge for the same anatomical
structures. The method obtained the best DICE score of the ISBI Visceral challenge for the left
kidney, right kidney and the gallbladder in ceCT. For wbCT the method had the best DICE in the
1st lumbar vertebra, gallbladder and trachea.

                             Table 1: Average Segmentation Accuracy

             Structure               Reference structure    DICE CTwb       DICE ceCT
             Liver                   none                         0.823          0.908
             Right lung              none                         0.967          0.963
             Left lung               none                         0.969          0.952
             Urinary bladder         none                         0.616           0.68
             1st Lumbar vertebra     none                          0.44          0.472
             Right kidney            liver                        0.649          0.905
             Gallbladder             liver                        0.271            0.4
             trachea                 right lung                   0.855           0.83
             Spleen                  left lung                    0.677          0.859
             Left kidney             left lung                    0.678          0.923


                                                  35
Jiménez del Toro and Müller: Hierarchical Segmentation via Anatomical Correlations

5   Conclusions
The proposed method showed robustness in the segmentation of multiple structures from two
different modalities of the challenge using a relatively small dataset. The overlap accuracies are
consistent for most of the evaluated anatomical structures and obtained some of the best structure
overlap of the challenge when compared to the other proposed methods in the same testset.
   Due to the flexibility of the method for adding more structures, for future work the method will
be extended to include all of the anatomical structures in the VISCERAL dataset. An evaluation
of the method for the other modalities (MR and contrast–enhanced MR) is also foreseen for the
VISCERAL benchmark 2 Anatomy with a much bigger testset.

6   Acknowledgments
This work was supported by the EU/FP7 through VISCERAL (318068).

References
[CRK+ 13]    Antonio Criminisi, Duncan Robertson, Ender Konukoglu, Jamie Shotton, Sayan
             Pathak, Steve White, and Khan Siddiqui. Regression forests for efficient anatomy
             detection and localization in computed tomography scans. Medical Image Analysis,
             17(8):1293–1303, 2013.

[Doi05]      K Doi. Current status and future potential of computer–aided diagnosis in medical
             imaging. British Journal of Radiology, 78:3–19, 2005.

[JdTM13]     Oscar Alfonso Jiménez del Toro and Henning Müller. Multi–structure atlas–based
             segmentation using anatomical regions of interest. In MICCAI workshop on Medical
             Computer Vision, Lecture Notes in Computer Science. Springer, 2013.

[KPSV09]     Stefan Klein, Josien P.W. Pluim, Marius Staring, and Max A. Viergever. Adaptive
             stochastic gradient descent optimisation for image registration. International Journal
             of Computer Vision, 81(3):227–239, 2009.

[KSM+ 10]    Stefan Klein, Marius Staring, Keelin Murphy, Max A. Viergever, and Josien P.W.
             Pluim. Elastix: a toolbox for intensity–based medical image registration. IEEE
             Transactions on medical imaging, 29(1):196–205, 2010.

[LMMH13]     Georg Langs, Henning Müller, Bjoern H. Menze, and Allan Hanbury. Visceral: To-
             wards large data in medical imaging – challenges and directions. Lecture Notes in
             Computer Science, 7723:92–98, 2013.

[LSL+ 10]    Marius George Linguraru, Jesse K. Sandberg, Zhixi Li, Furhawn Shah, and Ronald M.
             Summers. Automated segmentation and quantification of liver and spleen from CT
             images using normalized probabilistic atlases and enhancement estimation. Medical
             Physics, 37(2):771–783, 2010.

[RBMMJ04] Torsten Rohlfing, Robert Brandt, Randolf Menzel, and Calvin R. Maurer Jr. Evalua-
          tion of atlas selection strategies for atlas–based image segmentation with application
          to confocal microscopy images of bee brains. Neuroimage, 23(8):983–994, April 2004.


                                                36