VISCERAL —
  VISual Concept Extraction challenge in RAdioLogy:
          ISBI 2014 Challenge Organization

     Oscar Alfonso Jiménez del Toro1 , Orcun Goksel2 , Bjoern Menze2 , Henning Müller1
         Georg Langs3 , Marc–André Weber4 , Ivan Eggel1 , Katharina Gruenberg4 ,
         Markus Holzer3 , András Jakab3 , Georgios Kontokotsios5 , Markus Krenn3 ,
                Tomàs Salas Fernandez6 , Roger Schaer1 , Abdel Aziz Taha5 ,
                          Marianne Winterstein4 , Allan Hanbury5

              University of Applied Sciences Western Switzerland, Switzerland1
              Swiss Federal Institute of Technology (ETH) Zürich, Switzerland2
                            Medical University of Vienna, Austria3
                             University of Heidelberg, Germany4
                          Vienna University of Technology, Austria5
           Catalan Agency for Health Information, Assessment and Quality, Spain6


                                                  Abstract
                     The VISual Concept Extraction challenge in RAdioLogy
                     (VISCERAL) project has been developed as a cloud–based
                     infrastructure for the evaluation of medical image data in
                     large data sets. As part of this project, the ISBI 2014 (Inter-
                     national Symposium for Biomedical Imaging) challenge was
                     organized using the VISCERAL data set and shared cloud–
                     framework. Two tasks were selected to exploit and com-
                     pare multiple state–of–the–art solutions designed for big data
                     medical image analysis. Segmentation and landmark localiza-
                     tion results from the submitted algorithms were compared to
                     manually annotated ground truth in the VISCERAL data set.
                     This paper presents an overview of the challenge setup and
                     data set used as well as the evaluation metrics from the vari-
                     ous results submitted to the challenge. The participants pre-
                     sented their algorithms during an organized session at ISBI
                     2014. There were lively discussions in which the importance
                     of comparing approaches on tasks sharing a common data set
                     was highlighted.

Copyright c by the paper’s authors. Copying permitted only for private and academic purposes.
In: O. Goksel (ed.): Proceedings of the VISCERAL Organ Segmentation and Landmark Detection Benchmark at
the 2014 IEEE International Symposium on Biomedical Imaging (ISBI), Beijing, China, May 1st , 2014
published at http://ceur-ws.org


                                                       6
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization

1   Introduction
Computational approaches that can be scaled to large amounts of medical data are needed to
tackle the ever–growing data resources obtained daily from the hospitals [Doi05]. Handling this
enormous amount of medical data during clinical routine by health professionals has complexity
and scaling limitations. It is also very time–consuming, and hence requires unsupervised and
automatic methods to perform the necessary data analysis and processing for data interpretation.
There are already many algorithms and techniques for big data analysis, however, most research
groups do not have access to large-scale annotated medical data to develop such approaches for
medical images. Distributing these big data sets (on the order of terabytes) requires efficient and
scalable storing and computing capabilities. Evaluation campaigns and benchmarks can objectively
compare multiple state–of–the art algorithms to determine the optimal solution for a certain clinical
task [HMLM14, GSdHKCDF+ 13].
   The Visual Concept Extraction Challenge in Radiology (VISCERAL) project was developed as
a cloud–based infrastructure for the evaluation of medical image analysis techniques on large data
sets [LMMH13]. The shared cloud environment in which the VISCERAL project takes place allows
access and processing of these data without having to duplicate the data or move it to participants’
side. Since the data are stored centrally, and not distributed outside the cloud environment,
the legal and ethical requirements of such data sets can also be satisfied, so also confidential
data sets can be benchmarked in this way as only a small training data set can be accessed by
participants [EILI+ 10]. The cloud infrastructure is provided and funded by the VISCERAL project.
The participants are provided with computationally powerful virtual machines that can be accessed
remotely in the shared cloud infrastructure while working on the training data and tuning their
algorithms. Participant access is withdrawn during the evaluation phase and only the organizers
access the machines. The algorithms are brought to the data to perform automated processing and
data mining. The evaluation of the performance of these methods can therefore be done with real
clinical imaging data and the outcomes can be reused to improve the methods.
   The whole body 3D medical imaging data including manual labels that is provided by VIS-
CERAL includes a small subset with ground truth annotated by experienced radiologists. Through
evaluation campaigns, challenges, benchmarks and competitions, tasks of general interest can be
selected to compare the algorithms on a large scale. This manually annotated gold corpus can be
used to identify high quality methods that can also be combined to create a much larger “reason-
ably annotated” data set, satisfactory but perhaps not as reliable as manual annotation. Using
fusion techniques this silver corpus will be created with the agreement between the segmentations
of the algorithms on a large–scale data set. This maximizes the gain of manual annotation and also
identifies strong differences between participating systems on the annotated organs.

2   ISBI Challenge Framework
The registration procedure for the ISBI challenge was that of the VISCERAL series benchmark
that includes several campaigns. The participants filled information details and uploaded a signed
participation agreement form, which corresponds to ethics requests for usage of the data. Since
the VISCERAL data set is stored on the Azure Cloud, each participant then received access to an
Azure virtual cloud–computing instance. There were 5 operating systems available to choose from
including Windows 2012, Windows 2008, Ubuntu Server 14.04 LTS, openSUSE 13.1 and CentOS
6.5. All cloud– computing instances have an 8–core CPU with 16 GB RAM to provide the same
computing capabilities to different solutions proposed. The participant gets administrator rights
on their virtual machine (VM) and can access remotely to deploy their algorithms and add any
supporting library/applications to their VM. The VISCERAL training data set can then be accessed

                                                  7
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization


                                 Figure 1: The ISBI training set.

and downloaded securely within the VMs through secured URL links.

2.1     Data Set
The medical images contained in the VISCERAL data set have been acquired during daily clinical
routine work. Data sets of children (<18 years) were not included based on the recommendations
of the ethical committee. In the provided data sets multiple organs are visible and depicted in a
resolution sufficient to reliably detect an organ and delineate its borders. This is to enforce that
a large number of organs and structures can be segmented in one data set. The data set consists
of computed tomography (CT) scans and magnetic resonance (MR) imaging with and without
contrast enhancement to evaluate the participants algorithms on several modalities, contrasts and
MR sequence directions, making sure that algorithms are not optimized for one specific machine
or protocol.
   The available training set from VISCERAL Anatomy2 benchmark was used by the participants
of the ISBI VISCERAL challenge. The contents of this dataset are elaborated below.

2.1.1    CT Scans
There are 15 unenhanced whole–body CT volumes acquired from patients with bone marrow neo-
plasms, such as multiple myeloma, to detect osteolysis. The field–of–view spans from and includ-
ing the head to the knee (see Fig. 2, A). The in–plane resolution ranges between 0.977/0.977 to
1.405/1.405 mm, and the in–between plane resolution is 3 mm or higher.
   15 contrast–enhanced CT scans of the trunk that have been acquired in patients with malignant
lymphoma are also included. They have a large field–of–view from the corpus mandibulae to the
lower part of the pelvis (see Fig. 2, B). They have an in–plane resolution of between 0.604/0.604
and 0.793/0.793 mm, and an in–between plane resolution of at least 3 mm or higher.

2.1.2    MR Scans
15 whole–body MR scans in two sequences (30 in total) are also part of the training set. They were
acquired in patients with multiple myeloma to detect focal and/or diffuse bone marrow infiltration.
Both a coronal T1–weighted and fat-suppressed T2–weighted or STIR (short tau inversion recovery)
sequence of the whole body are available for each of the 15 patients. The field–of–view starts and
includes the head and ends at the feet (see Fig. 2, C). The in–plane resolution is 1.250/1.250 mm,
and the in–between plane resolution is 5 mm.

                                                 8
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization


Figure 2: Sample data set volumes. A) Whole–body unenhanced CT; B) contrast–enhanced CT of
the trunk; C) whole–body unenhanced MR; D) contrast–enhanced MR of the abdomen

   To improve the segmentation of smaller organs (such as the adrenal glands), 15 T1 contrast–
enhanced fat saturated MR scans of the abdomen are also included. They were acquired in onco-
logical patients with likely metastases within the abdomen. The field–of–view starts at the top of
the diaphragm and extends to the lower part of the pelvis (see Fig. 2, D). They have an in plane
resolution of between 0.840/0.804 to 1.302/1.302 mm, and an in–between plane resolution of 3 mm.

2.1.3   Annotated Structures and Landmarks
There are in total 60 manually annotated volumes in this ISBI challenge training set. The available
data contains segmentation and landmarks of several different anatomical structures in different
imaging modalities, e.g. CT and MRI.
  The two categories of annotations and results are:

  • Region segmentations: These regions correspond to anatomical structures (e.g. right lung), or
    sub–parts in volume data. The 20 anatomical structures that make up the training set are:
    trachea, left/right lungs, sternum, vertebra L1, left/right kidneys, left/right adrenal glands,
    left/right psoas major muscles, left/right rectus abdominis, thyroid gland, liver, spleen, gall-
    bladder, pancreas, urinary bladder and aorta. Not all structures are visible or within the
    field–of–view in the images, therefore leading to varying numbers of annotations per structure
    (see Fig. 1 for a detailed break–down).

  • Landmarks: Anatomical landmarks are the locations of selected anatomical structures that
    should be identifiable in the different image sequences available in the data set. There can
    be up to 53 anatomical landmarks (see Fig. 1) located in the data set volumes: left/right
    clavicles, left/right crista iliaca, symphysis, left/right trochanter major, left/right trochanter
    minor, aortic arch, trachea bifurcation, aorta bifurcation, vertebrae C2-C7, Th1-Th12, L1-L5,
    xyphoideus, aortic valve, left/right sternoclavicular, VCI bifurcation, left/right tuberculums,

                                                  9
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization

      left/right renal pelvises, left/right bronchus, left/right eyes, left/right ventricles, left/right
      ischiadicum and coronaria.

In total the 60 training set volumes containing 890 manually segmented anatomical structures and
2420 manually located anatomical landmarks make up the training set. Some of the anatomical
structures in the volumes were not segmented if the annotators considered there was insufficient
tissue contrast to perform the segmentation or to locate the landmark. Other structures are miss-
ing or not included in the training set because of anatomical variations (e.g. missing kidney) or
radiologic pathological signs (e.g. aortic aneurysm). Landmarks are easy and quick to annotate
whereas precise organ segmentation is time–consuming even when using automatic tools.

2.1.4    Test Set
The test set contains 20 manually annotated volumes. Each modality (whole–body CT, thorax
and abdomen contrast–enhanced CT, whole–body MR and abdomen contrast enhanced MR) is
represented by 5 volumes. The anatomical structures and landmarks contained in the selected
volumes were used to evaluate the participants’ algorithms.

2.2     ISBI VISCERAL Challenge Submission
The participants can select the structures and modalities in which they choose to participate.
The outputs are therefore evaluated per structure and per modality. The evaluation of the ISBI
challenge has been organized differently than the general VISCERAL evaluation framework to allow
for the evaluation results to complete in the given relatively short time–frame. For this challenge,
the test set volumes were made available in the cloud some weeks ahead of the challenge. The
participants themselves computed the annotations (segmentations and/or landmark locations) in
their VMs and stored them on their VM storage. The files could then be submitted within their
VM through an uploading script provided to the participants. The script stored their output files
in a corresponding cloud container created for the challenge individual for each participant. A list
containing the available ground truth segmentations of the test set filtered duplicates or output
files with incorrect file names. It also ensured all files were coherent with the participant ID list
from the organizers.

2.3     Evaluation Software
To evaluate the output segmentations and landmark locations against the ground truth, the VIS-
CERAL evaluation tool was used. This software was also included in the VM assigned to each
participant. This evaluation tool has different evaluation metrics implemented such as (1) distance–
based metrics, (2) spatial overlap metrics and (3) probabilistic and information theoretic metrics.
The most suitable subset of the metrics was used in the analysis of the results and all metrics
were made available to the participants. For the output segmentations of the ISBI challenge the
following evaluation metrics were selected:

  • DICE coefficient [ZWB+ 04]

  • Adjusted Rand Index [VPYM11]

  • Interclass Correlation [GJC01]

  • Average distance [KCAB09]

                                                   10
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization


Figure 3: Anatomical structure segmentation task: DICE coefficient results. Contrast–enhanced
CT scans of the Thorax and Abdomen.

Only one label is considered per image. The voxel value can be either zero (background) or one for
the voxels containing the segmentation. A threshold is set at 0.5 to create binary images in case
the output label has a fuzzy membership or a probability map.
   For the landmark localization evaluation the same VISCERAL tool measures the landmark–
specific average error (Euclidean distance) error between all the results and the manually located
landmarks. The percentage of detected landmarks per volume (i.e. landmarks detected / landmarks
in the volume) is also computed.

2.4   Participation
The ISBI training and test set volumes were made available through the Azure cloud framework
for all the registered participants of the VISCERAL Anatomy2 benchmark. In total 18 groups
got access to the challenge training set and the 60 training volumes of the data set. The research
groups that submitted working virtual machines had a chance to present their methods and results
at the ”VISCERAL Organ Segmentation and Landmark Detection Challenge” at the 2014 IEEE
International Symposium on Biomedical Imaging (ISBI).
    A single–blind review process was applied to the initial abstract submissions. The accepted
abstracts were then invited to submit a short paper presenting their methods and results in the
challenge. There were 5 high–quality submissions accepted and included in these proceedings.
    Spanier et al. [SJ14] submitted segmentations for five organs in CT contrast–enhanced volumes.
Their multi–step algorithm combines thresholding and region growing techniques to segment each
organ individually. It starts with the location of a region of interest and identification of the largest
axial cross–section slices of the selected structure. It then improves the initial segmentation with
morphological operators and a final step performs 3D region growing.
    Huang et al. [HLJ14] proposed a coarse–to–fine liver segmentation using prior models for the
shape, profile appearance and contextual information of the liver. An AdaBoost voxel–based clas-
sifier creates a liver probability map that is refined in the last step with free–form deformation with

                                                   11
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization

Table 1: Anatomical structure segmentation task: DICE coefficient results table.         Contrast–
enhanced and unenhanced CT scans submissions.


a gradient appearance model.
   Wang et al. [WS14] segmented 10 anatomical structures in CT contrast–enhanced and unen-
hanced scans. Their multi–organ segmentation pipeline performs in a top–down approach by a
model–based level–set segmentation of the ventral cavity. After dividing the cavity in thoracic and
abdominopelvic cavity, the major structures are segmented and their location information is passed
to the lower–level structures.
   Jiménez del Toro et al. [JdTM14] segmented structures in CT and contrast–enhanced CT scans
with a hierarchical multi–atlas approach. Based on the spatial anatomical correlations between the
organs, the bigger and high–contrasted organs are first segmented. These then define the initial
volume transformations for the smaller structures with less defined boundaries.
   Goksel et al. [GGS14] submitted segmentations for both CT and MR anatomical structure seg-
mentation. They also submitted results for the landmark localization task. For the segmentations
they use a multi–atlas based technique that implements Markov Random Fields to guide the reg-
istrations. A multi–atlas template–based approach fuses the different estimations to detect the
landmarks.

3   Results
There were approximately 500 structure segmentations and 211 landmark locations submitted to
the VISCERAL ISBI challenge. Four participants submitted results for the segmentation tasks in
multiple organs using whole–body CT or contrast–enhanced scans with results presented in Table 1
and Fig. 3. There was one participant that contributed segmentations on both the whole–body MR
scans and the contrast–enhanced MR abdomen volumes with results presented in Table 3. Only one
participant submitted landmark localization results, with Table 4 showing their evaluation results.

4   Conclusions and Future Work
The VISCERAL project has the evaluation of algorithms on large data sets as its main objective.
The proposed VISCERAL infrastructure allows evaluations with private or restricted data, such as
electronic health records, without the participants access to the test data by using a fully cloud–
based approach. This infrastructure also avoid moving data, which is potentially hard for very
large data sets. The algorithms are brought to the data and not the data to the algorithms.

                                                12
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization

Table 2: Anatomical structure segmentation task: Average Distance results table. Contrast–
enhanced and unenhanced CT scans submissions.


                     Table 3: Evaluation metrics for the MR scan submissions.


   Both gold corpus and silver corpus will be available as a resource to the community. The ISBI
test set volumes and annotations are now available and are part of the VISCERAL Anatomy2
benchmark training set.
   So far, both past VISCERAL anatomy benchmarks have addressed organ segmentation and
landmark localization tasks. There are two more benchmarks under development in the VISCERAL
project, a retrieval benchmark and a detection benchmark. The retrieval benchmark will be the
retrieval of similar cases based on both visual information and radiology reports. The detection
benchmark will focus in the detection of lesions in MR and CT images.
   In the future, the automation of the evaluation process is intended to reduce the need for interven-
tion from the organizers to a minimum and to provide faster evaluation feedback to the participants.
The participants will then be able to submit their algorithms through the cloud virtual machines
and obtain the calculated metrics directly from the system. Such a system could then store the
results from all the algorithms submitted and perform an objective comparison with state-of-the
art algorithms. Through the involvement of the research community, the VISCERAL framework
could produce novel tools for the clinical work flow that has substantial impact on diagnosis quality

                                                  13
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization

                                    Table 4: Landmarks results.


and treatment success. Having all tools and algorithms in the same cloud environment can also
help us to combine tools and approaches with very little additional effort, which expectedly yields
better results.

5   Acknowledgments
The research leading to these results has received funding from the European Union Seventh Frame-
work Programme (FP7/2007-2013) under grant agreement n◦ 318068 VISCERAL. We would also
like to thank Microsoft research for their financial and information support in using the Azure cloud
for the benchmark.

References
[Doi05]            K Doi. Current status and future potential of computer–aided diagnosis in
                   medical imaging. British Journal of Radiology, 78:3–19, 2005.

[EILI+ 10]         Bernice Elger, Jimison Iavindrasana, Luigi Lo Iacono, Henning Müller, Nicolas
                   Roduit, Paul Summers, and Jessica Wright. Strategies for health data exchange
                   for secondary, cross–institutional clinical research. Computer Methods and Pro-
                   grams in Biomedicine, 99(3):230–251, September 2010.

[GGS14]            Orcun Goksel, Tobias Gass, and Gabor Szekely. Segmentation and landmark
                   localization based on multiple atlases. In Orcun Goksel, editor, Proceedings of
                   the VISCERAL Challenge at ISBI, CEUR Workshop Proceedings, pages 37–43,
                   Beijing, China, May 2014.

[GJC01]            Guido Gerig, Matthieu Jomier, and Miranda Chakos. A new validation tool for
                   assessing and improving 3D object segmentation. In Wiro J. Niessen and Max A.
                   Viergever, editors, Medical Image Computing and Computer-Assisted Interven-
                   tion - MICCAI 2001, volume 2208 of Lecture Notes in Computer Science, pages
                   516–523. Springer Berlin Heidelberg, 2001.

[GSdHKCDF+ 13] Alba Garcı́a Seco de Herrera, Jayashree Kalpathy-Cramer, Dina Demner Fush-
               man, Sameer Antani, and Henning Müller. Overview of the ImageCLEF 2013
               medical tasks. In Working Notes of CLEF 2013 (Cross Language Evaluation
               Forum), September 2013.

                                                 14
Jiménez del Toro et al: ISBI VISCERAL Challenge Organization

[HLJ14]          Cheng Huang, Xuhui Li, and Fucang Jia. Automatic liver segmentation using
                 multiple prior knowledge models and free–form deformation. In Orcun Gok-
                 sel, editor, Proceedings of the VISCERAL Challenge at ISBI, CEUR Workshop
                 Proceedings, pages 22–24, Beijing, China, May 2014.

[HMLM14]         Allan Hanbury, Henning Müller, Georg Langs, and Bjoern H. Menze. Cloud–
                 based evaluation framework for big data. In Alex Galis and Anastasius Gavras,
                 editors, Future Internet Assembly (FIA) book 2013, Springer LNCS, pages 104–
                 114. Springer Berlin Heidelberg, 2014.

[JdTM14]         Oscar Alfonso Jiménez del Toro and Henning Müller. Hierarchical multi–
                 structure segmentation guided by anatomical correlations. In Orcun Goksel,
                 editor, Proceedings of the VISCERAL Challenge at ISBI, CEUR Workshop Pro-
                 ceedings, pages 32–36, Beijing, China, May 2014.

[KCAB09]         Hassan Khotanlou, Olivier Colliot, Jamal Atif, and Isabelle Bloch. 3d brain
                 tumor segmentation in MRI using fuzzy classification, symmetry analysis and
                 spatially constrained deformable models. Fuzzy Sets and Systems, 160(10):1457–
                 1473, 2009. Special Issue: Fuzzy Sets in Interdisciplinary Perception and Intel-
                 ligence.

[LMMH13]         Georg Langs, Henning Müller, Bjoern H. Menze, and Allan Hanbury. Visceral:
                 Towards large data in medical imaging – challenges and directions. Lecture Notes
                 in Computer Science, 7723:92–98, 2013.

[SJ14]           Assaf B. Spanier and Leo Joskowicz. Rule–based ventral cavity multi–organ
                 automatic segmentation in ct scans. In Orcun Goksel, editor, Proceedings of
                 the VISCERAL Challenge at ISBI, CEUR Workshop Proceedings, pages 16–21,
                 Beijing, China, May 2014.

[VPYM11]         Nagesh Vadaparthi, Suresh Varma Penumatsa, Srinivas Yarramalle, and P. S. R.
                 Murthy. Segmentation of Brain MR Images based on Finite Skew Gaussian Mix-
                 ture Model with Fuzzy C–Means Clustering and EM Algorithm. International
                 Journal of Computer Applications, 28:18–26, 2011.

[WS14]           Chunliang Wang and Örjan Smedby. Automatic multi–organ segmentation using
                 fast model based level set method and hierarchical shape priors. In Orcun Gok-
                 sel, editor, Proceedings of the VISCERAL Challenge at ISBI, CEUR Workshop
                 Proceedings, pages 25–31, Beijing, China, May 2014.

[ZWB+ 04]        Kelly H. Zou, Simon K. Warfield, Aditya Bharatha, Clare M.C. Tempany,
                 Michael R. Kaus, Steven J. Haker, William M. Wells III, Ferenc A. Jolesz, and
                 Ron Kikinis. Statistical validation of image segmentation quality based on a
                 spatial overlap index1 : scientific reports. Academic Radiology, 11(2):178 – 189,
                 2004.


                                              15