Segmentation and Landmark Localization
                   Based on Multiple Atlases

                      Orcun Goksel              Tobias Gass             Gabor Szekely

                            Computer Vision Lab, ETH Zurich, Switzerland


                                                  Abstract

                     In this work, we present multi-atlas based techniques for both
                     segmentation and landmark detection. We focus on modality
                     and anatomy independent techniques to be applied to a wide
                     range of input images, in contrast to methods customized to
                     a specific anatomy or image modality. For segmentation, we
                     use label propagation from several atlases to a target image
                     via a Markov random field (MRF) based registration method,
                     followed by label fusion by majority voting weighted by lo-
                     cal cross-correlations. For landmark localization, we use a
                     consensus based fusion of location estimates from several at-
                     lases identified by a template-matching approach. Results in
                     IEEE ISBI 2014 VISCERAL challenge as well as VISCERAL
                     Anatomy1 challenge are presented herein.


1   Introduction
Segmentation and landmark detection are two very common problems in medical image analysis,
as they both pertain to several clinical applications. Although there exist methods customized
for specific anatomy and modality, generic methods are valuable as they are applicable in a wide
range of applications without much effort for customization. Regarding the two tasks above, in this
work we use modality and anatomy independent techniques to treat the diverse dataset from the
Anatomy challenge series of the VISCERAL (Visual Concept Extraction Challenge in Radiology)
Consortium. The methods are detailed below, also presenting our results from the said challenges.

2   Segmentation
For segmentation, a multi-atlas based technique is used by registering several atlases individually
to a target image using our implementation of the MRF-based deformable registration method
in [GKT+ 08]. These registrations are then used to propagate the anatomical labels (ground-truth
annotations) from each atlas image into the target coordinate frame. At voxel level, a majority

Copyright c by the paper’s authors. Copying permitted only for private and academic purposes.
In: O. Goksel (ed.): Proceedings of the VISCERAL Organ Segmentation and Landmark Detection Benchmark at
the 2014 IEEE International Symposium on Biomedical Imaging (ISBI), Beijing, China, May 1st , 2014
published at http://ceur-ws.org


                                                       37
Goksel et al: Multi-Atlas Segmentation and Localization

voting is held to decide the winning label where each label votes based on the locally-normalized
cross-correlation (LNCC) [CBD+ 03] of the registered atlas to the given target at that location.

2.1   Atlas-based segmentation using registration via MRF
Finding an optimal displacement vector field T̂ can be defined as the minimization of a functional:

                                    T̂ = arg min E (T, X, An )                                      (1)
                                              T

where X is a target image and An is an atlas image and E is the registration energy. MRFs provide
an efficient means for the minimization of such energy, with the main advantage being that it does
not rely on the gradient of the criterion and therefore is less prone to poor local optima. In order
to use MRFs for solving the minimization problem, this energy is decomposed into unary (ψ) and
pairwise (Ψ) potentials over discrete labels as follows:
                                                                           
                                      X                X
                      E (T, X, An ) =     ψp (lp ) +       λ Ψpq (lp , lq ) ,                  (2)
                                      p∈Ω              q∈N (p)

where Ω is the discretized image space. The continuous displacement space is sampled discretely,
so that each registration label lp and lq in the set of all registration labels LR maps to a unique
displacement vector d~p . The unary potentials are then a local similarity metric, measuring the fit
between the deformed atlas image and the target image at a location i. The pairwise potentials
correspond to prior assumptions over the displacement, which is often implemented as a smoothing
over the neighborhood N as justified by a first-order Markov assumption. λ is the pairwise weight.
   For robust and smooth solution of (2), an efficient method was proposed in [GKT+ 08] that seeks
the displacements d~ of control points in a multi-resolution cubic B-spline framework. We use our
implementation of this method with four levels of detail, where the coarsest grid resolution has
three nodes along the shortest edge of an input image and a spacing as isotropic as possible given
that a control-point is required on each corner of the image. Each following level of the resolution
hierarchy has twice the resolution compared to the previous step. At each level of detail, we sample
four displacements in each cardinal direction, yielding 25 displacement samples in total. Within
each direction, samples are equidistant, with the largest displacement set to 0.4 times the control
grid spacing. This was shown to guarantee diffeomorphic deformations [RAH+ 06]. For each level
of detail, we re-run the MRF registration with the displacements being re-scaled by the golden
ratio 0.618. The resulting displacements are then composed onto the previous deformation. This
guarantees that the result is still diffeomorphic and sub-pixel accuracy can be achieved. For the
unary potentials we use normalized cross correlation (NCC) of patches centered around each control
point with their radius equivalent to control grid spacing. Euclidean distance between displacements
of neighboring control grids is used to penalize non-smooth deformations. Tree-reweighted message
passing (TRW-S) [Kol06] is employed to find a solution to each energy minimization instance.
   A segmentation candidate of the target image X based on the atlas An is then obtained by
applying the resulting displacement field T̂ to the known segmentation SAn as follows:

                                         SX,n = SAn (T̂ ).                                          (3)

2.2   Label fusion via weighted majority voting
Although MRF-based registration is a relatively robust method, it can only guarantee a locally op-
timal solution and is therefore susceptible to poor initialization. Furthermore, for a grossly different
atlas, correspondences for registration may not be guaranteed. Accordingly, the segmentation from

                                                  38
Goksel et al: Multi-Atlas Segmentation and Localization

a single atlas may not be satisfactory. It was shown in different research fields that combination
of multiple weak information sources can surpass average accuracy. Multiple segmentations from
different atlases were combined in [HHA+ 06].
   Assume that N segmentation candidates of a target image X are computed from N atlases via
(3). Let final target segmentation SX be an image of the same size as X, where pixels take on values
from the set LS ={1, . . . , NS } such that each discrete value corresponds to an organ or anatomical
structure. An intuitive and straight-forward method to combine multiple segmentation estimates
is then to choose the most frequent segmentation label (majority voting, MV) at each location p:
                                                    X
                                 MV
                               SX   (p) = arg max      δ (lS , SX,n (p)) .                        (4)
                                              lS ∈LS
                                                       n

Such majority voting does not take into account the individual quality of each registration and
therefore the resulting segmentation. We assume that post-registration image similarity between
the deformed atlas and the target image is an indicator of segmentation reliabilty and can be used
to locally assign weights w to each individual segmentation. The resulting weighted majority vote
(wMV) can then be formalized as follows:
                                              X
                          wMV
                         SX    (p) = arg max      wn (p) δ (lS , SX,n (p)) .                   (5)
                                        lS ∈LS
                                                   n

To obtain the weights w, we use local normalized cross correlation (LNCC,[CBD+ 03]) between
image X and deformed atlas An (Tn ). The advantages of LNCC are its smoothness and fast com-
putation time due to convolution with Gaussian kernels:
                               hX, Y i(p)
           LNCC(X, Y, p) =                                  hX, Y i(p) = X · Y (p) − X(p) · Y (p)
                              σX (p) σY (p)
                                                                2                   2
                         X = G σG ∗ X                          σX (p) = X 2 (p) − X (p),            (6)

where ∗ is the convolution operator and GσG is a Gaussian kernel with standard deviation σG . From
the LNCC metric, we compute the weights:
                                        1 − LNCCσ (X, An (Tn ), p) γ
                                                                 
                             wn (p) =                                ,                          (7)
                                                    2

which normalizes LNCC to the range [0, 1]. γ is used to scale the similarity such that contributions
from individual segmentations are well spread [IK09].

3     Landmark Detection
For anatomical landmark detection, we use a template based approach from multiple atlases, the
location estimates from which are fused based on their consensus. We localize each landmark ℓ
separately from the others using the two stages below. To localize the unknown voxel coordinates
pℓ of landmark ℓ in the target image X, we perform the following template matching procedure
from each atlas An where n represents the atlas index.

3.1    Determining template and search regions
The template is set as a box-shaped image region Aℓn in the current atlas. Similarly, a box-shaped
search region X ℓ is defined in the target image. Both such regions are chosen targeting a physically
isotropic region of interest (ROI) in corresponding image, while limiting the maximum number of
ROI voxels to ensure efficient computation. Specifics of ROI selection are given in Table 1.

                                                       39
Goksel et al: Multi-Atlas Segmentation and Localization

Table 1: Specifics of template and search ROI, where | · | represents image dimensions (per axis).

      ROI box (cropped image)     Centered at     Targeted half-width (dHW )   Max size (nmax )
      Template image Aℓn              pℓAn                           20 mm           413 voxels
                                           |X|
      Search region X ℓ            pℓAn · |A n|
                                                                           –        2413 voxels


   When setting the template size, the trade-off between it containing sufficient image features and
final localization precision was considered. Template half-width was set to 20 mm empirically via
cross-validation in multiple modalities using different template sizes (e.g., 10, 20, 30, ...). The
search region is centered around a gross estimate of the landmark location, which is the normalized
voxel coordinates of the landmark from the atlas; using the fact that both the atlas and the target
have similar fields of view (ie. both abdomen, thorax, or whole body). Note that our large search
region covers most or all of the image in many modalities (e.g. in MRce) or at least a quadrant
thereof (e.g. in CT), such that the searched landmark can be guaranteed to exist therein.

3.2    Similarity metric
For template matching, the template is convolved over the search region by computing two in-
dependent similarity metrics, sum of squared differences (SSD) and normalized cross-correlation
(NCC), at each template location i with respect to search image. Both values are then normalized
linearly to [0, 1] such that they are both 1 at the best match location. A combined similarity metric
SSDa ·CORb is then computed, where the parameters a=2 and b=3 were determined empirically via
cross-validation with several powers. The maximum of this combined metric gives the best match
location estimate pℓn for landmark ℓ considered atlas An .

3.3    Statistical fusion of estimate location from atlases
From cross-validation trials with different techniques such as the mean and weighted average of
location estimates, the median operator was determined to be the best method for fusing location
estimates. Accordingly, each axis coordinate of the the target landmark location is found as the
median value of those axes from atlas estimates. The entire process can be summarized as:
• Repeat for each atlas An :
  → Crop landmark template Aℓn centered at given landmark location pℓAn in atlas image An
  → Crop a large search region X ℓ centered around a grossly approximated location in X
                       ℓ ℓ
  → Compute SSD An , X and COR Aℓn , X ℓ
                                            

  → pℓn = arg max SSD[i]2 · COR[i]3
             i
• pℓ = median pℓn | ∀n

4     Results and Discussion
Throughout the results, the following abbreviations are used for the image modalities: whole-body
CT images (CT), thorax+abdomen contrast-agent CT images (CTce), abdominal T1-weighted
contrast-agent MR images (MRce), and whole-body T1-weighted MR images (MR).
   We treated each modality separately. We used the training images from Anatomy1 benchmark as
atlases, ie. depending on the modality of the test image, six atlases for CTce and seven for CT, MR,
MRce. For convenience, we combined all organ segmentations for each atlas into a single multi-label

                                                  40
Goksel et al: Multi-Atlas Segmentation and Localization

segmentation image, which was then deformed using the atlas-to-target registration T̂ described in
Sec. 2.1. In Tab. 2, Dice overlap metric results regarding our segmentation approach reported by
VISCERAL for the test images can be seen both for the VISCERAL Anatomy1 benchmark and
the ISBI challenge.
   In order to compare our technique to other participants’ in the ISBI challenge, we computed an
average rank per organ per participant. For given test image, we assigned a rank to each method,
ie. {1,2,...,P } where P is the number of participants submitted a (non-blank) output for that test
image. In Fig. 1 the average of such ranks for all given test images is seen per anatomy. Our
submission was the only entrant that aimed to segment all images in all modalities and it achieved
competitive results for many organs as seen in the given figure. We did not plot ranks for the MR
modalities, since we were the only participant to submit such results.
   It is notable that the multi-atlas fusion has significantly lower segmentation accuracy for organs
which has low volume, e.g. urinary, gall bladder, and the adrenal glands. One possible explanation
is that such small structures are difficult to register using full-body images. Due to such misalign-
ments, overlap between multiple deformed atlas segmentations for such label can be small, resulting
in the weighted majority voting not selecting that label.

Table 2: Our segmentation overlap (Dice) results in VISCERAL Anatomy1 and ISBI challenges.

                                        Anatomy1                       ISBI challenge
        Modality               Ctce    MR       CT    MRce     Ctce     MR      CT      MRce
        Kidney (L)            0.903   0.730   0.805    0.782   0.885   0.548   0.756    0.888
        Kidney (R)            0.877   0.733   0.754    0.787   0.827   0.589   0.679    0.732
        Spleen                0.802   0.668   0.688    0.689   0.803   0.646   0.684    0.785
        Liver                 0.899   0.822   0.830    0.847   0.882   0.817   0.798    0.861
        Lung (L)              0.961   0.533   0.952    0.650   0.960   0.486   0.955
        Lung (R)              0.968   0.900   0.960    0.664   0.966   0.909   0.965
        Urinary bladder       0.676   0.656   0.640    0.280   0.657   0.577   0.636    0.334
        Lumbar Vertebra 1     0.604   0.396   0.350    0.060   0.548   0.623   0.333    0.084
        Thyroid               0.252   0.367   0.469            0.315   0.488   0.439
        Pancreas              0.465           0.438    0.356   0.442           0.466    0.356
        Psoas major (L)       0.811   0.801   0.772    0.644   0.797   0.765   0.773    0.654
        Psoas major (R)                       0.787                            0.780
        Gallblader            0.334   0.023   0.102    0.035   0.212   0.044   0.078    0.000
        Sternum               0.595   0.358   0.648            0.612   0.359   0.630
        Aorta                 0.785   0.744   0.723    0.616   0.787   0.783   0.724
        Trachea               0.847   0.736   0.822            0.839   0.747   0.837
        Adrenal gland (L)     0.204   0.109   0.165    0.000   0.099   0.144   0.282
        Adrenal gland (R)     0.164   0.215   0.138    0.107   0.019   0.268   0.133


5   Conclusions
In the Anatomy1 and ISBI challenges organized by VISCERAL project, our landmark localization
achieved in whole-body CT images an impressive 11 and 13 voxel average error, respectively in
these challenges. No comparison to alternatives was possible since ours was the only entrant in
landmark localization in both challenges. For segmentation, our multi-atlas based method that

                                                 41
Goksel et al: Multi-Atlas Segmentation and Localization

                     1

                     2
      average rank

                     3

                     4                   participant2
                                         participant3
                     5                   participant4
                                        our Approach


                     1

                     2
      average rank


                     3

                     4                  participant1
                                        participant2
                                        participant3
                     5                  participant4
                                       our Approach
                         Tr
                         Lu che
                         Lu g(r
                         P a g(l
                         G cre
                         U llbla s
                         S t n a r de
                         Lu rnu Bla
                         Ki ba
                         Ki ey e
                         Ad ney r) ebr
                         Ad en l)
                         ps en lGla
                         ps as lGla d(r
                         M as ajo d(l
                         M scle ajo Mu
                         Ao scle ec Mu le(
                         Li ta ec usA le(
                         Th er
                         Sp oi
                           ri d
                           a a


                           v
                            u M r )
                            u R r sc
                            dn rV
                            d ( rt
                            a


                            e y r


                            o a n
                            o M n )
                             n a
                             n )


                             m m dd


                             yr
                             n )


                             r (
                             r a


                             r R t sc l)


                             le d
                                en
                                      tu bd r)
                                        er


                                        sA o
                                           a


                                           bd mi
                                             om nis
                                               in (r)
                                               organ
                                                 is
                                                    (l)
Figure 1: Average segmentation rank (by Dice coefficient) for each participant and organ for the
CT (top) and CTce (bottom) modalities.
does not require any customisation to a specific modality or organ competed in all categories and
ranked satisfactorily compared to the results of other participants.

References
[CBD+ 03] Pascal Cachier, Eric Bardinet, Didier Dormont, Xavier Pennec, and Nicholas Ayache.
          Iconic feature based nonrigid registration: the PASHA algorithm. Computer Vision and
          Image Understanding, 89(2-3):272–298, February 2003.
[GKT+ 08] Ben Glocker, Nikos Komodakis, Georgios Tziritas, Nassir Navab, and Nikos Paragios.
          Dense image registration through MRFs and efficient linear programming. Medical
          Image Analysis, 12(6):731–741, December 2008.
[HHA+ 06] Rolf a Heckemann, Joseph V Hajnal, Paul Aljabar, Daniel Rueckert, and Alexander
          Hammers. Automatic anatomical brain MRI segmentation combining label propagation
          and decision fusion. NeuroImage, 33(1):115–26, October 2006.

                                               42
Goksel et al: Multi-Atlas Segmentation and Localization

[IK09]    Juan Eugenio Iglesias and Nico Karssemeijer. Robust initial detection of landmarks in
          film-screen mammograms using multiple FFDM atlases. IEEE Transactions on Medical
          Imaging, 28(11):1815–24, November 2009.

[Kol06]   Vladimir Kolmogorov. Convergent Tree-Reweighted Message Passing for Energy Mini-
          mization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28:1568–
          1583, 2006.

[RAH+ 06] Daniel Rueckert, Paul Aljabar, Rolf a Heckemann, Joseph V Hajnal, and Alexander
          Hammers. Diffeomorphic registration using B-splines. In Medical Image Computing
          and Computer-Assisted Intervention, pages 702–709, January 2006.


                                              43