<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MULTI-ORGAN SEGMENTATION USING SIMPLIFIED DENSE V-NET WITH POST PROCESSING Ming Feng, Weiquan Huang, Yin Wang, Yuxia Xie</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Tongji University</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the recent advances in the field of computer vision, Convolutional Neural Networks (CNNs) are widely used in organ segmentation of computed tomography (CT) images. Based on the Dense V-net model, this paper proposes a simplified version with postprocessing methods to help reduce the fragments in organ segmentation results. Comparing with the baseline method that uses a sharpmask model with conditional random field (SM+CRF), our model improves the Dice ratio of Esophagus, Heart, Trachea, and Aorta by 10%, 4%, 7%, and 6%, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Convolutional Neural Networks</kwd>
        <kwd>CT Segmentation</kwd>
        <kwd>Dense V-net</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Organ segmentation of CT images is of great importance
in medical diagnosis. The identification and localization of
organs are the daily work of the radiologist. Since CT
images are complex and three-dimensional(3D), distinguishing
organs manually is a difficult and tedious task. Therefore,
segmentation using deep learning methods automatically
have received a great deal of attention in medical imaging
research. In the field of 3D medical image segmentation, there
are two main methods. The first is to segment each slice
independently, e.g., using the U-net model [1]. The other is to use
the 3D convolution to aggregate inter-slice information and to
segment all slices of the CT image at once, e.g., V-net [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ] is
one of the 3D convolutional network models for this purpose.
Gibson et al. [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ] integrated the two-dimensional
segmentation model of Dense net [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ] into V-net and proposed a Dense
V-net architecture for multiple organ segmentation. Overall,
single slice segmentation methods cannot utilize inter-layer
dependencies for better results but are computationally more
efficient. All slice 3D segmentation can aggregate all layers
for better accuracy but is more expensive to compute.
      </p>
      <p>
        In this paper, we present our multi-organ segmentation
solution used in the SegTHOR challenge hosted at the ISBI’19
conference. Observing that the training data is relatively
small and easy to overfit deep convolutional neural nets, we
simplify the Dense V-net model to achieve better results with
the testing data. Our postprocessing method further reduces
fragments in the prediction mask. The overall improvement
over the SM+CRF baseline model [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ] is between 4 to 10
percent over different organs.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. OUR MODEL</title>
      <p>1l&lt;atexish1_b64="mQH587cMYJEv3WOkg&gt;ABZDLTIFPrp0XnRuwS2zjNC9/U+fydVqoGK283
2x</p>
      <p>The structure of our proposed model is shown in Fig. 1.
Comparing with the original Dense V-net model, there are two
main differences. First, the input size is different. The input
size of the original model is 1443. The number of partial data
slices in our data is less than 144, so we set the input size to
1283. Second, the spatial prior block is discarded.</p>
      <p>The encoder block of the segmentation network generates
three sets of feature maps of different sizes. The decoder
block upsamples the smaller feature maps so the output mask
is of the same size as the input image. The output layer
generates the segmentation mask with the probability vector of
different segmentation classes at each pixel.</p>
    </sec>
    <sec id="sec-3">
      <title>3. IMPLEMENTATION</title>
      <p>This section discusses various optimization techniques to
reduce the Dice loss and to minimize the Hausdorff distance.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1. Data preprocessing</title>
      <p>Preprocessing is part of our fully automated organ
segmentation method. By analyzing the training data provided, we find
the following issues.</p>
      <p>First, the dataset is small and is quite easy to overfit our
deep neural networks. Second, for a single CT slice, the
proportion of pixels of various organs is quite different. Fig. 2
shows the imbalance of different organs at different slices.
Last, considering the relative position of the machine and the
person while scanning, the CT images can be scaled and
rotated. Based on these observations, we apply the following
techniques.
(a) The ratio of background
to organs.</p>
      <p>
        (b) The ratio of organs.
We ensure that each class is sampled with the same
probability. According to the slice range of the test dataset, the sample
block size is set to 1283.
3.1.2. Data augmentation
During the training stage, we randomly rotate pictures (within
-10 10 ) and randomly scale pictures (-10 % 10 % range).
We implement the data augmentation on the Niftynet
framework [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ]. The data augmentation method used in the training
stage will not affect the structure of the Dense V-net.
      </p>
    </sec>
    <sec id="sec-5">
      <title>3.2. Postprocessing</title>
      <p>By comparing the prediction result with the ground truth
label, we find the following issues.</p>
      <p>In the training data organs are all connected, but organs
are not connected in the predicted results. Some areas of the
CT image are not smooth, Fig.3. There are multiple organ
inclusions in the same slice, which does not actually exist.
In the prediction result, the organ is connected but there are
background noise inside.</p>
      <p>For the first question above, we experimented with the
following methods.</p>
      <p>The CT image is sliced along three dimensions
respectively, then count the number of connected blocks of each
organ. For each dimension and each organ, the largest
connected block is retained, and the other parts are considered
background noise and therefore removed. Experiments show
that our method achieves obvious increase; see Algorithm 1.
(a) Slice 119, label</p>
      <p>(b) Slice 119, prediction
(c) Slice 120, label
(d) Slice 120, prediction
Fig. 3. The 119th and 120th slices of patient 30 in the
labeled data and prediction result. We can see that the heart
disappears at the 120th slice in the labeled data. The sudden
disappearing of an organ often leads to incorrect predictions.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3. DicePlusXEnt loss function</title>
      <p>
        The loss functions commonly used in segmentation are
CrossEntropy loss and Dice loss. The Cross-Entropy loss
examines each pixel separately, and compares the prediction results
with one-hot encoded target vector. It does not consider the
imbalance of different segmentation classes, and can lead to
poor prediction results with the minority classes. Imbalanced
classes are very common in medical image segmentation. The
Dice loss is essentially a measurement of the overlap between
the predicted mask and the ground truth mask, calculated as
follows [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ] :
ldice =
2 X
jKj k2K
      </p>
      <p>Pi2I uikvik
Pi2I uik + Pi2I vik
(1)
where K is the set of segmentation classes, I is the entire</p>
      <sec id="sec-6-1">
        <title>Algorithm 1 Axis-based denoise method</title>
        <p>Input: The result from model, Tm;
Output: Remove noise block prediction result, Qm;
1: for all axisi of Tm do
2: for all slicej of the axisi do
3: for all categoryk of Tm do
4: Sets slice[ 1] and slice[max + 1] to 1;
5: if The current slice contain the categoryk and the
previous slice does not contain categoryk then
6: The current slice index is added to blockIn;
7: end if
8: if The current slice contain the categoryk and the
next layer does not contain categoryk then
9: The current slice index is added to blockOut;
10: end if
11: end for
12: end for
13: The blockIn corresponds to the blockOut element one
by one, each set of them represents a continuous block,
the data difference represents the contiguous block
length, the contiguous block of the maximum length
is reserved, and the other continuous blocks in Qm are
set as the background class;
14: end for
15: return Qm;
image, and uk, vik are the predicted and ground truth value
i
of class k at pixel i, respectively. Dice loss is more suitable
for sample’s extremely imbalance situation, but in our
experience, using the Dice loss alone will adversely affect back
propagation, making training extremely unstable.</p>
        <p>
          We use DicePlusXEnt loss [
          <xref ref-type="bibr" rid="ref7">8</xref>
          ], which is the sum of the
Cross-Entropy loss and the Dice loss, as follows:
ltotal = ldice + lCE
(2)
        </p>
        <p>This loss function will improve the sample imbalance to a
certain extent and improve the stability of network training.</p>
        <p>Due to the imbalance of the samples, we set the weight of
the Cross-Entropy loss in DicePlusXEnt: w(Background)=1,
w(Heart)=2, w(T rachea)=3, w(Aorta)=4, w(Esophagus)
=5.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4. EXPERIMENTS</title>
      <p>
        Our experiment is conducted on the SegTHOR dataset [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ].
Niftynet is used in our model training, which is implemented
by Tensorflow. Based on the preprocessed data, the Dense
V-net network is trained and then fine-tuned with different
parameter configurations.
      </p>
      <p>The activation function used in the network is Leaky
ReLU. The batch size is four. We use the Adam optimizer
with an initial learning rate of 0.01. If the loss value does
1
1
Fig. 4. From top to bottom, main view and the left view of
the true label, the predicted result, the 3D denoise. The small
fragments are significantly reduced.</p>
      <sec id="sec-7-1">
        <title>Algorithm 2 Training model</title>
        <p>
          Input: The training data, X and label, Y ;The fusion model
numbers, N ;The learning rate list, L;
Output: Segmentation result, R;
1: for all ni in range(N ) do
2: for all li 2 L do
3: while loss does decrease in 500 iterations do
4: Forward and backward;
5: end while
6: end for
7: Save the model with the lowest validation set loss
during this iteration;
8: end for
9: Fusion saved models, get Rori;
10: R Axis-based denoise(Rori);
11: return R;
* This Dense V-net is simplified Dense V-net.
not decrease after 500 iterations, then the learning rate
decreases by ten-fold, up to 0.0001. When the learning rate is
0.0001 and after 500 iterations if the loss does not change,
the learning rate is reset to 0.1. This process is repeated seven
times, and the model with the lowest validation loss during
the training process is selected for comparison. In addition,
we pick the parameters of the minimum loss of the validation
set in each training cycle, seven models in total, and fuse the
results together for comparison [
          <xref ref-type="bibr" rid="ref8">9</xref>
          ]; see Algorithm 2. Table 1
shows the results with different settings.
        </p>
        <p>Overall, the fusion results are much better than the
singlemodel prediction. Denoise in postprocessing further improves
the accuracy. Heart and Aorta have much better segmentation
results than Esophagus and Trachea.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>5. CONCLUSION</title>
      <p>
        Based on the analysis of the training data, we simplified
Dense V-net to perform multi-organ segmentation effectively.
We use a variety of optimization techniques such as
multiscale prediction, data augmentation, and data postprocessing
to improve the stability and performance of the model.
Comparing to the baseline model of SM+CRF [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ], the Dice rate
of organ segmentation is improved up to 10%. After our
optimization, there is still room for improvement for small
organs, and delineation algorithms could help to refine organ
boundaries.
      </p>
    </sec>
    <sec id="sec-9">
      <title>6. REFERENCES</title>
      <p>[1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox,
“U-net: Convolutional networks for biomedical image
segmentation,” in International Conference on
Medical image computing and computer-assisted intervention.</p>
      <p>Springer, 2015, pp. 234–241.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Fausto</given-names>
            <surname>Milletari</surname>
          </string-name>
          , Nassir Navab, and
          <string-name>
            <surname>Seyed-Ahmad</surname>
            <given-names>Ahmadi</given-names>
          </string-name>
          , “
          <article-title>V-net: Fully convolutional neural networks for volumetric medical image segmentation</article-title>
          ,
          <source>” in 2016 Fourth International Conference on 3D Vision (3DV)</source>
          . IEEE,
          <year>2016</year>
          , pp.
          <fpage>565</fpage>
          -
          <lpage>571</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Eli</given-names>
            <surname>Gibson</surname>
          </string-name>
          , Francesco Giganti, Yipeng Hu, Ester Bonmati, Steve Bandula, Kurinchi Gurusamy, Brian Davidson, Stephen P Pereira,
          <string-name>
            <surname>Matthew J Clarkson</surname>
          </string-name>
          , and
          <string-name>
            <surname>Dean</surname>
            <given-names>C Barratt</given-names>
          </string-name>
          , “
          <article-title>Automatic multi-organ segmentation on abdominal ct with dense v-networks,” IEEE transactions on medical imaging</article-title>
          , vol.
          <volume>37</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>1822</fpage>
          -
          <lpage>1834</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Gao</given-names>
            <surname>Huang</surname>
          </string-name>
          , Zhuang Liu,
          <string-name>
            <surname>Laurens Van Der Maaten</surname>
          </string-name>
          , and
          <article-title>Kilian Q Weinberger, “Densely connected convolutional networks</article-title>
          ,
          <source>” in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>4700</fpage>
          -
          <lpage>4708</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Roger</given-names>
            <surname>Trullo</surname>
          </string-name>
          , Caroline Petitjean, Su Ruan,
          <string-name>
            <given-names>Bernard</given-names>
            <surname>Dubray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Nie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D</given-names>
            <surname>Shen</surname>
          </string-name>
          , “
          <article-title>Segmentation of organs at risk in thoracic ct images using a sharpmask architecture and conditional random fields</article-title>
          ,” in
          <source>2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI</source>
          <year>2017</year>
          ). IEEE,
          <year>2017</year>
          , pp.
          <fpage>1003</fpage>
          -
          <lpage>1006</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Eli</given-names>
            <surname>Gibson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Wenqi</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Carole</given-names>
            <surname>Sudre</surname>
          </string-name>
          , Lucas Fidon,
          <string-name>
            <given-names>Dzhoshkun I Shakir</given-names>
            , Guotai Wang,
            <surname>Zach</surname>
          </string-name>
          Eaton-Rosen, Robert Gray, Tom Doel,
          <string-name>
            <given-names>Yipeng</given-names>
            <surname>Hu</surname>
          </string-name>
          , et al.,
          <article-title>“Niftynet: a deep-learning platform for medical imaging,” Computer methods and programs in biomedicine</article-title>
          , vol.
          <volume>158</volume>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>122</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Carole</surname>
            <given-names>H Sudre</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Wenqi</given-names>
            <surname>Li</surname>
          </string-name>
          , Tom Vercauteren,
          <string-name>
            <given-names>Sebastien</given-names>
            <surname>Ourselin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M Jorge</given-names>
            <surname>Cardoso</surname>
          </string-name>
          , “
          <article-title>Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep learning in medical image analysis and multimodal learning for clinical decision support</article-title>
          , pp.
          <fpage>240</fpage>
          -
          <lpage>248</lpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Fabian</given-names>
            <surname>Isensee</surname>
          </string-name>
          , Jens Petersen, Andre Klein, David Zimmerer, Paul F Jaeger,
          <string-name>
            <surname>Simon Kohl</surname>
            , Jakob Wasserthal, Gregor Koehler, Tobias Norajitra,
            <given-names>Sebastian</given-names>
          </string-name>
          <string-name>
            <surname>Wirkert</surname>
          </string-name>
          , et al.,
          <article-title>“nnu-net: Self-adapting framework for u-netbased medical image segmentation</article-title>
          ,” arXiv preprint arXiv:
          <year>1809</year>
          .10486,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Leslie</surname>
            <given-names>N Smith</given-names>
          </string-name>
          , “
          <article-title>Cyclical learning rates for training neural networks,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV)</article-title>
          . IEEE,
          <year>2017</year>
          , pp.
          <fpage>464</fpage>
          -
          <lpage>472</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>