<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dilated U-Net based Segmentation of Organs at Risk in Thoracic CT Images</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>BML Munjal University</institution>
          ,
          <addr-line>Haryana - 122413</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CSIR-Central Scientific Instruments Organisation</institution>
          ,
          <addr-line>Chandigarh - 160030</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Manoj Satya Kumar Gali</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Cancer is one of the leading cause of death across the globe and projection of radiation towards tumor is the standard treatment of cancer. The first step of irradiation is to delineate tumor from organs near to the tumor. Unlike previous methods of CT segmentation, this paper proposed a procedure for segmentation of organs individually in CT images of the thoracic region such as Heart, Aorta, Trachea and Esophagus and merging them to form multi organ Segmented image. The aim of this method is to avoid coarse output from dilated UNets. The overlapping issues has addressed by calculating the mode of eight neighborhood of pixels. The performance of the proposed technique tested on 60 CT scans collected from SegTHOR Challenge. Dice ratio and Hausdorff distance have used in evaluation paradigm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Index Terms— Hausdorff distance, dilated U-nets, eight
neighborhoods, Coarse output, CT Segmentation.</p>
      <p>1. INTRODUCTION</p>
      <p>Cancer is the second leading cause of the death across
the globe and projection of radiation is a crucial step for
treatment of esophageal and Lung cancer. Radiation dose
has to be given with great precision because it is a way of
delivering packets of high energy (in general X-rays) to kill
cancer and reduce side effects. Success rate of radiation
therapy, determined by the cell growth of tumor-effected
organs and the organs near to the tumor-effected organs
(called as organs at risk) before and after the treatment.
Radiation does not kill a cell, it destroys the connection
between DNA and cell. This leads to abruption in the
process of tumor cell division, called as abortive mitosis and
sometimes this can happen to organs near the tumor.
Therefore, delineation of organs need to done carefully and
accurately. Radiation therapy is a treatment of choice for
Lung and Esophageal cancer. The irradiation on organs
begins with segmentation of target tumor and the organs
near the tumor in computed tomography (CT-scans) images.</p>
      <p>In general, experts do segmentation manually by intensity
levels and anatomical knowledge e.g. Esophagus is located
behind Heart, Trachea is above the Spinal cord, etc. The
manual process is costly, time consuming and tedious. This
leads to evolution of techniques for automatic segmentation
of organs to assist the doctor.</p>
      <p>Automatic Segmentation of organs is quite challenging
and achieving higher accuracy is very difficult due to
several factors such as acquiring volumetric data, low
contrast images, variable size of organs from patient to
patient, similarity between the shapes of organs and
overfitting towards organs with high intensity or
betterstructured organs.</p>
      <p>
        Recent trends towards development of deep learning
architectures is performing quite well as compared to the
traditional methods, especially working with large volumes
of data and on variety of data such as audio, video, medical,
social, sensor, etc. The development of parallel GPU’s,
publically labeled datasets, powerful frame works like
Tensor flow, and Theano became quite accessible in
addition to speeding up the training of deep learning models.
Deep learning became a fuel for many computer vision
problems such as moving object detection, segmentation,
motion tracking [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        However, several works have addressed for automatic
segmentation of organs at risk on CT/MRI scans at different
parts of body and using deep learning techniques. In a
review paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the techniques of segmentation and
detailed algorithm’s such as region based, Clustering and
classification methods and its applications on MRI and CT
scans has been explained. Litjens et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], conducted a
survey on deep learning in medical image analysis and
described the architectures in convolutional neural network
plus explained about its application’s in medical image
analysis. He et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], worked on segmentation of pelvic
organ segmentation using distinctive curve guided Fully
Connected Neural Networks (FCN) to segment Rectum,
Prostate and Bladder. Segmentation of organs at risk in
Thoracic CT images by applying sharp mask techniques
with FCN followed by Conditional Random Fields(CRF) is
proposed by Trullo et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, the results need to
improve furtherly to aid surgeons. Herein, the authors have
experimentally showed the success rate of architecture with
standard dilated U-net. Atrous convrolution/Dilated
Convolutional network [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] model is applied on each organ
and using Dice ratio and Hausdorff distance as evaluation
metrics. To overcome the challenges mentioned above, this
work proposed a procedure for segmentation of organs
individually in CT images of the thoracic region i.e. Heart,
Aorta, Trachea and Esophagus and merging them to form
multi organ segmented image.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. METHOD</title>
      <p>To segment organs (Esophagus, Trachea, Heart and
Aorta) accurately from the raw CT images, we are using
dilated/ Atrous convolution and the schematic diagram is
shown in Fig 1. Proposed architecture is selected based on
two reasons 1) the size of organs varies from patient to
patient and slice to slice, 2) the output of convolutional
neural network in multi segmentation is a coarse output. The
detailed explanation of the Dilated U-Net is given in section
2.1. Thereafter, the final segmented output is formed by
summation of the individual output coming from each
model.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Dilated U-Net</title>
      <p>
        The use of deep convolutional neural network for fully
connection fashioned segmentation has been addressed
successfully in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, the repeated use of average
pooling and striding at successive layers reduce the spatial
resolution of the output feature maps. The one common
approach is to recover spatial resolution in de-convolutional
layers as used in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but it requires additional time and
memory. Papandreou et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], used dilation kernel to
generate desirable resolution to feature maps at any layer. It
will be applied to a network in two ways 1) post processing
technique once network is trained and 2) an integrated
model for training. In our case, we have followed the second
approach.
      </p>
      <p>Let x[i] be the 1-D input signal, y[i] be the output of
dilated convolution of kernel w[k] of length k is</p>
      <p>k
y i   x i  r.k w k
i1
(1)</p>
      <p>The parameter  corresponds as stride through which we
alter the image. Standard convolution is a special case in
dilated convolution, if  = 1. Dilated convolution helps in
enlarging the filed view of kernels at any layer of the Dilated
U-Net. Dilated U-Net uses a small kernel (typically 3 × 3
kernel) in order to control computational time and number
of parameters. Dilated convolution with r as rate introduces
 − 1 zeros in successive values in kernel i.e. enlarging of
 ×  kernel into   =  + ( − 1)( − 1) without
increasing the computational amount and parameters. It
offers a Mechanism to find trade-off between small field of
view and large field of view.</p>
      <p>We used a Dilated U-Net architecture with 14 layers, in
which first six layers involves operation of convolution,
dilation, ReLU and batch normalization followed by
average pooling after every consecutive two layers. The last
six layers are up sampled using bilinear interpolation and
concatenation with previous layers. Further, it involves
convolution, ReLU and batch normalization. Convolution,
ReLU and batch normalization applied on seventh layer and
passed to the dilation box individually. The detailed
architecture is shown in Figure 2.</p>
      <p>The feature maps of seventh layer followed convolution,
ReLU and batch normalization passed separately to each
layer in dilation box. In dilation box the output of seventh
layer convoluted with different dilation rate from  =
20  25 and summation of this six layers is given as input
to the seventh layer of Dilated U-Net architecture and it is
shown in Figure 3.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Post processing</title>
      <p>The outputs from the Heart, Esophagus, Aorta and
Trachea are combined to form an overall segmented result.
The drawback of this process is overlapping of individual
organ regions. To overcome the overlapping regions
problem, the mode of eight-neighborhood intensities is used
for location    as shown in equation 2
i  1, j  1 i, j  1 i  1, j  1
 i  1, j i, j i  1, j 
i  1, j  1 i, j  1 i  1, j  1
(2)</p>
      <p>From this eight neighborhood locations, the mode is
calculated and replaces the value at location  ,  . By this
approach, the overlapping issue is addressed.</p>
    </sec>
    <sec id="sec-5">
      <title>3. EXPERIMENTAL RESULTS</title>
      <p>We performed the experiments on standard dilated U-Net
architecture. Further, post processing has applied on Dilated
U-Net output using mode as filter. Hausdroff and Dice ratio
has used as quantitative evaluation measures. The proposed
algorithm has implemented in Python 3.5, 64-bit Windows8
platform with Intel Xenon CPU@2.80 GHz, 64 GB of RAM
and 8 GB of GPU.</p>
    </sec>
    <sec id="sec-6">
      <title>3.1 Dataset</title>
      <p>
        The performance of the proposed method is evaluated on
SegTHOR challenge dataset in CodaLab [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We evaluated
our model on 60 CT Scans and 40 CT scans for training
along with manual segmentation of heart, trachea, aorta and
esophagus as ground truth images and 20 CT scans for
testing. The size of each scan is 512 × 512 × (150~300)
voxels and resolution of each scan is 0.98 × 0.98 × 2.5.
      </p>
    </sec>
    <sec id="sec-7">
      <title>3.2 Pre-Processing</title>
      <p>Each scan is normalized to zero mean and unit standard
deviation. Train test split is performed with test size as 0.2,
resulting 32 scans for training and 8 for validation and
remaining 20 CT scans for testing.</p>
      <p>Data augmentation need to apply when training data is
less and it is necessary for network to learn desired
properties at microscopic level. In images like CT scans and
MRI scans, shift, rotation and deformation invariance is
needed and it’s implemented in U-Net. We implemented
rotation, shift variance-using Keras ImageDataGenerator
class focusing on width, height, rotation, and zoom
parameters.</p>
    </sec>
    <sec id="sec-8">
      <title>3.3 Training</title>
      <p>The data is different for each organ i.e. the number of
active voxels for each organ is different, so to avoid
overfitting towards a dominant organ we trained each organ
separately and summation has done over the intensities of
four models. We fine-tune the weights using binary cross
entropy loss. Adam gradient descent has used with a
learning rate 0.0001 until 75 epochs with 300 epochs per
step.</p>
    </sec>
    <sec id="sec-9">
      <title>3.4 Results</title>
      <p>The results of the proposed method are compared with
UNet architecture and are shown in Figure 4. The U-Net
output is shown in Figure 4(b), it is noticed that the output
is coarse in nature. Figure 4(c) shows the results of
segmentation of Heart, Esophagus and Aorta whereas
Figure 4(d) presents the output of overlapping regions
separated by applying post processing.</p>
      <p>The Table1 gives comparative quantitative evalaution of
the proposed U-Net in terms of the dice ratio with U- Net,
and proposed U-Net postprocessing. The proposed U-net
outperformed basic</p>
      <p>U-Net in</p>
      <p>Aorta, Esophagus and
Trachea. The proposed U-Net run parllel with Heart. The
postproceesing helped to achieve high results in Trachea
than proposed U-Net.
the proposed U-Net in terms of
the Hausdorff distance
with</p>
      <p>U- Net, and proposed U-Net postprocessing. The
proposed U-net outperformed basic U-Net in Esophagus.
The post proceesing helped to achieve high results than
basic U-Net and the proposed U-Net in Trachea.</p>
    </sec>
    <sec id="sec-10">
      <title>4. Conclusion In this work, a new framework has been developed for segmentation of Heart, Aorta, Esophagus and Trachea using</title>
      <p>dilated U-Net d) Output after Post processing
60 CT scans dataset from SegTHOR challenge. Individual
segmentation of organs from background and augmentation
helped to train models on low level and high level features.
The results were further improved with a deliberated post
processing. For performance evaluation, Dice ratio and</p>
    </sec>
    <sec id="sec-11">
      <title>Hausdorff</title>
      <p>Distance
metrics
were
used
wherein the
segmentation of Esophagus and Trachea shows significant
improvement.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]. Voulodimos, Athanasios,
          <string-name>
            <given-names>N.</given-names>
            <surname>Doulamis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doulamis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Protopapadakis</surname>
          </string-name>
          .
          <article-title>"Deep learning for computer vision: a brief review." Computational intelligence</article-title>
          and neuroscience,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]. Norouzi, Alireza,
          <string-name>
            <given-names>M.S.M.</given-names>
            <surname>Rahim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Altameem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Saba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.E.</given-names>
            <surname>Rad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rehman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Uddin</surname>
          </string-name>
          .
          <article-title>"Medical image segmentation methods, algorithms</article-title>
          , and applications.
          <source>" IETE Technical Review</source>
          , vol.
          <volume>31</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>213</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]. Litjens, Geert,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kooi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.E.</given-names>
            <surname>Bejnordi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.A.A.</given-names>
            <surname>Setio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ciompi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghafoorian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.V.D.</given-names>
            <surname>Laak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.V.</given-names>
            <surname>Ginneken</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.I.</given-names>
            <surname>Sánchez</surname>
          </string-name>
          .
          <article-title>"A survey on deep learning in medical image analysis." Medical image analysis</article-title>
          ,
          <source>no.42</source>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>88</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]. He,
          <string-name>
            <surname>Kelei</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Nie</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Gao</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Shen</surname>
          </string-name>
          .
          <article-title>"Pelvic organ segmentation using distinctive curve guided fully convolutional networks." IEEE transactions on medical imaging</article-title>
          , vol.
          <volume>38</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>585</fpage>
          -
          <lpage>595</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]. Trullo, Roger,
          <string-name>
            <given-names>C.</given-names>
            <surname>Petitjean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dubray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Shen</surname>
          </string-name>
          .
          <article-title>"Segmentation of organs at risk in thoracic CT images using a sharpmask architecture and conditional random fields</article-title>
          .
          <source>" IEEE 14th International Symposium on Biomedical Imaging</source>
          , pp.
          <fpage>1003</fpage>
          -
          <lpage>1006</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chieh</surname>
          </string-name>
          , G. Papandreou, I. Kokkinos,
          <string-name>
            <given-names>K.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.L.</given-names>
            <surname>Yuille</surname>
          </string-name>
          .
          <article-title>"Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence</article-title>
          , vol.
          <volume>40</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>834</fpage>
          -
          <lpage>848</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]. Wu, Xundong.
          <article-title>"Fully convolutional networks for semantic segmentation</article-title>
          .
          <source>" Computer Science</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jonathan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Shelhamer</surname>
            , and
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Darrell</surname>
          </string-name>
          .
          <article-title>"Fully convolutional networks for semantic segmentation."</article-title>
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pp.
          <fpage>3431</fpage>
          -
          <lpage>3440</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]. Papandreou, George,
          <string-name>
            <surname>I. Kokkinos</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.A.</given-names>
            <surname>Savalle</surname>
          </string-name>
          .
          <article-title>"Modeling local and global deformations in deep learning: Epitomic convolution, multiple instance learning, and sliding window detection."</article-title>
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          , pp.
          <fpage>390</fpage>
          -
          <lpage>399</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10].Ronneberger, Olaf,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Brox</surname>
          </string-name>
          .
          <article-title>"U-net: Convolutional networks for biomedical image segmentation." In International Conference on Medical image computing and computer-assisted intervention</article-title>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          . Springer, Cham,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>