<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SEGMENTATION OF THORACIC ORGANS AT RISK IN CT IMAGES COMBINING COARSE AND FINE NETWORK Li Zhang, Lishen Wang, Yijie Huang, Huai Chen Shanghai Jiao Tong University</article-title>
      </title-group>
      <abstract>
        <p>Segmentation of multiple thoracic organs from three dimensional Computed Tomography (CT) images is challenging due to its huge computation and indistinguishable features. In this paper, we propose a novel multitask framework where coarse segmentation network and fine segmentation network share the same encoder part. We firstly use coarse segmentation network to obtain the regions of interest(ROIs) localization, and then crop multi-level ROIs from the encoder part to form decoder for detail-preserving segmentation. We apply our proposed method on test data set and achieve good results.</p>
      </abstract>
      <kwd-group>
        <kwd>Fine segmentation</kwd>
        <kwd>Multiple organs</kwd>
        <kwd>Coarse segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>As we all know, radiation therapy is preferred when treating
lung and esophageal cancer. Physicians would delineate the
target tumor manually, which is usually located between
normal organs, called Organs at Risk (OAR). The common way
for avoiding errors is to segment these organs on Computed
Tomography (CT) images firstly, so this task is of great
importance to surgeries.</p>
      <p>
        However, the segmentation of multiple organs is
challenging: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) The data to be processed is of large amount
and the computation is huge since medical images are
threedimensional; (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Some organs are neighbored closely and
their contours in CT images have low contrast; (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Some
organs’ shapes and locations differ greatly between patients.
      </p>
      <p>In this paper, our goal is to automatically segment four
kinds of thoracic organs at risk in CT images: heart, aorta,
trachea, esophagus. We design a kind of deep neural network
to segment these organs at the same time, which achieving a
relatively good result in ISBI 2019, Challenge 4: SegTHOR:
Segmentation of THoracic Organs at Risk in CT images[1].</p>
    </sec>
    <sec id="sec-2">
      <title>2.1. Data preprocessing</title>
      <p>The CT scans have the same resolution 512 512, but their
in-plane resolution varies per pixel and z-resolution is also
nonuniform. So the first step is to let these samples have the
same spacing. Considering the distribution of these samples,
we set the uniform spacing to [1.0, 1.0, 3.0], which
corresponds to x axis, y axis, z axis.</p>
      <p>However, due to the limitation of GPU memory, it’s
impossible for us to import the data directly into the deep
learning model. So the next step is to clip the CT images roughly
by using ostu threshold. We calculate the threshold value by
applying ostu method to the middle scan of every sample.
Then we use this threshold value to carry on binary
segmentation to remove background voxels, which greatly reduces the
amount of the data.</p>
    </sec>
    <sec id="sec-3">
      <title>2.2. Segmentation model</title>
      <p>Although we preprocess the CT images, the large amount of
data still troubles us. In order to obtain more accurate results,
we adopt two-stage methods. Firstly we use coarse
segmentation to get the regions of interest(ROIs), then we apply fine
segmentation in these regions to get final results. Our
experiments have demonstrated that our method is robust and
effective.</p>
      <p>Our overall network consists of two parts: encoder part
and decoder part. Since encoder part is to extract features
from CT images, coarse segmentation and fine segmentation
have the same encoder part but have different decoder part.</p>
      <sec id="sec-3-1">
        <title>2.2.1. Coarse segmentation</title>
        <p>Supposed the input resolution is [X,Y,Z], which corresponds
to x axis, y axis, z axis of CT images. As is shown in Fig. 1,
the outcome feature maps from the encoder part have 256
channels and their resolution is [X/8,Y/8,Z/2]. In the encoder
part, there are one separate convolution operation and four
ResBlocks[2] which consist of two or three convolution
operations. Considering some organs have large span, we
introduce dilation convolution operations in the latter two
ResBlocks to enlarge the receptive field. We set dilation d1=3 in
shallower layer while setting dilation d2 = 6 in deeper layer.</p>
        <p>In decoder part, for the purpose of coarse
segmentation, we execute convolution operation and simgoid function
for the outcome feature maps form latter three ResBlocks
respectively[3]called F1,F2, F3. Then we calculate the
average sum of these three new feature maps: Fave. Fave have
five channels, which denote the background and four organs
respectively.</p>
        <p>So, supposed the responding ground truth is Fori, we can
construct the corresponding loss function as followed:
Loss = 1
2</p>
        <sec id="sec-3-1-1">
          <title>Fave T Fori</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Fave S Fori</title>
          <p>As for every kind of organ, we use this loss function to
train. In the end, corresponding four channels display the
coarse segmentation of target organs.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>2.2.2. Fine segmentation</title>
        <p>Based on coarse segmentation output, we can get the rough
location of every kind of organ. So we don’t need to use full
information of encoder part to decoder to obtain the fine
segmentation. Firstly, we can get the three-dimensional
bounding box of target organs in original shape. Then we can adjust
the bounding box size of each layer in the encoder part
according to the resolution. We crop the voxels of
corresponding bounding boxes to obtain ROIs and carry out fine
segmentation decoder.</p>
        <p>
          As is shown in the left part of Fig. 1, after convolution
and Relu function, the deeper features are added to the
upper features to execute ResBlock operation, which allows the
network to propagate context information to higher resolution
layers[4]. So it is reasonable that we can capture features of
various sizes and obtain fine segmentation. It is worth noting
that we use different decoder paths for every kind of organ,
since: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) The parameters and computation have already
decreased greatly by only processing ROIs; (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) Each kind of
organ has distinctive features.
        </p>
        <p>The construction of the loss function is the same as the
above. In order to be more effective when learning, we adopt
hard voxel mining method, that is to exert higher weights to
these error-prone voxels. It is not difficult to find that we can
obtain better results.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. EXPERIMENTS</title>
    </sec>
    <sec id="sec-5">
      <title>3.1. Implementation details</title>
      <p>We distribute our model on two NVIDIA TITAN X and our
implementation is based on Pytorch. We adopt 3 3 3
kernel size in every convolution operation and use dilation
d1 = 3,d2 = 6 in the latter two layers in encoder part. When
learning, we firstly train the coarse segmentation network
including encoder part and its decoder part individually for 40
epochs. Then we train coarse segmentation network and fine
segmentation network together for 50 epochs.</p>
    </sec>
    <sec id="sec-6">
      <title>3.2. Results on test data</title>
      <p>We apply our proposed method on test data set and the results
can be seen in Table. 1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Roger</given-names>
            <surname>Trullo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Petitjean</surname>
          </string-name>
          , Su Ruan, Bernard Dubray, Dong Nie, and Dinggang Shen, “
          <article-title>Segmentation of organs at risk in thoracic CT images using a sharpmask architecture and conditional random fields,”</article-title>
          <source>in IEEE 14th International Symposium on Biomedical Imaging (ISBI)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1003</fpage>
          -
          <lpage>1006</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “
          <article-title>Deep residual learning for image recognition,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</article-title>
          ,
          <year>June 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Long</surname>
          </string-name>
          , Evan Shelhamer, and Trevor Darrell, “
          <article-title>Fully convolutional networks for semantic segmentation,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</article-title>
          ,
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Olaf</given-names>
            <surname>Ronneberger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and Thomas Brox, “
          <article-title>U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing</article-title>
          and
          <string-name>
            <surname>Computer-Assisted</surname>
            <given-names>Intervention - MICCAI</given-names>
          </string-name>
          <year>2015</year>
          ,
          <string-name>
            <given-names>Nassir</given-names>
            <surname>Navab</surname>
          </string-name>
          , Joachim Hornegger,
          <string-name>
            <surname>William M. Wells</surname>
          </string-name>
          , and Alejandro F. Frangi, Eds.,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          , Springer International Publishing.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>