=Paper= {{Paper |id=Vol-2349/paper-2 |storemode=property |title=Multi-task Learning for the Segmentation of Thoracic Organs at Risk in CT images |pdfUrl=https://ceur-ws.org/Vol-2349/SegTHOR2019_paper_2.pdf |volume=Vol-2349 |dblpUrl=https://dblp.org/rec/conf/isbi/HeGWX019 }} ==Multi-task Learning for the Segmentation of Thoracic Organs at Risk in CT images== https://ceur-ws.org/Vol-2349/SegTHOR2019_paper_2.pdf
 MULTI-TASK LEARNING FOR THE SEGMENTATION OF THORACIC ORGANS AT RISK
                             IN CT IMAGES

                            Tao He, Jixiang Guo, Jianyong Wang, Xiuyuan Xu and Zhang Yi

                                    Machine Intelligence Laboratory, Sichuan University


                           ABSTRACT                                       Organs at Risk (SegTHOR) [1] challenge. The segmentation
                                                                          task is challenging for following reasons: (1) the shape and
The automatic segmentation of thoracic organs has clinical
                                                                          position of each organ on CT slices vary greatly between pa-
significance. In this paper, we develop the U-Net architecture
                                                                          tients; (2) the contours in CT images have low contrast, and
and obtain a uniform U-like encoder-decoder segmentation
                                                                          can be absent. The challenge focuses on 4 organs as risk:
architecture for the segmentation of thoracic organs. The en-
                                                                          heart, aorta, trachea, esophagus.
coder part of this architecture could directly involve the wide-
                                                                              Recently, the developments of the automatic segmenta-
ly used popular networks (DenseNet or ResNet) by omitting
                                                                          tion based on deep learning have overthrown the traditional
their last linear connection layers. In our observation, we find
                                                                          feature extraction methods. The paragon of medical segmen-
out that individual organs could not appear independently in
                                                                          tation models is U-Net [2]. U-Net has carefully designed
one CT slice. Therefore, we empirically propose to use the
                                                                          encoder and decoder parts with shortcut connections. The
multi-task learning for the segmentation of thoracic organs.
                                                                          most significant advantage of shortcut connections is to
The major task focuses on the local pixel-wise segmentation
                                                                          combine low-level features with high-level features at dif-
and the auxiliary task focuses on the global slice classifica-
                                                                          ferent layers. Recent years, many similar models termed as
tion. There are two merits of the multi-task learning. First-
                                                                          encoder-decoder architectures were proposed, for example,
ly, the auxiliary task could improve the generalization perfor-
                                                                          Seg-Net [3] and DeepLab series networks [4, 5].
mance by concurrently learning with the main task. Secondly,
                                                                              In [6], an H-DenseUNet was proposed for liver and tumor
the predicted accuracy of the auxiliary task could achieve al-
                                                                          segmentation, where intra-slice and inter-slice features were
most 98% on the validation set, so the predictions of the auxil-
                                                                          extracted and jointly optimized through the hybrid feature fu-
iary task could be used to filter the false positive segmentation
                                                                          sion layer. In [7], a 3D Deeply Supervised Network (3D-
results. The proposed method was test on the Segmentation
                                                                          DSN) was proposed to address the liver segmentation prob-
of THoracic Organs at Risk (SegTHOR) challenge (submit-
                                                                          lem. The 3D-DSN involved additional supervision injected
ted name: MILab, till March 21, 2019, 8:44 a.m. UTC) and
                                                                          into hidden layers to counteract the adverse effects of gradi-
achieved the second place by the “All” rank and the second
                                                                          ent vanishing. This method achieved the state of the art on
place by the “Esophagus” rank, respectively.
                                                                          the MICCAI-SLiver07 dataset. V-Net [8] is much like the 3D
  Index Terms— Automatic segmentation, CT, U-Net,                         version of U-Net, which was directly applied in volumetric
Multi-task learning                                                       segmentation from MRI volumes depicting prostate.
                                                                              Honestly, the 3D-CNNs based models fully exploit the s-
                     1. INTRODUCTION                                      pace relative features but training a 3D-CNNs based model
                                                                          is usually time-consuming and requires large hyperparame-
The contrast-enhanced Computed Tomography (CT) is the                     ters capacity. Therefore, many previous works employed 2D-
widely used clinical tool for diagnosing plenty of thoracic dis-          CNNs and trained them on 2.5D data, which consisted of a
eases. The drab and boring manual segmentation of thoracic                stack of adjacent slices as input. Then the liver lesion region-
organs from CT images is very time-consuming. The auto-                   s were predicted according to the center slice. In order to
matic segmentation from CT images will be helpful for on-                 achieve the accurate segmentation results of 2D-CNNs, au-
cologists to diagnose the thoracic organs at risk in CT images.           thors of [9] proposed a two-step segmentation framework. At
In this paper, we focus on the automatic augmentation of tho-             the first step, an FCN was trained to segment liver as ROI in-
racic organs data, supported by the Segmentation of THoracic              put for the second FCN. The second FCN solely segmented
                                                                          lesions from the predicted liver ROIs of step 1. The two-step
     This work was supported by the National Natural Science Founda-      segmentation framework has been widely involved in many
tion of China under Grant 61432012. The authors are with the College      segmentation works [9, 6, 10].
of Computer Science, Sichuan University, Chengdu 610065, China (e-
mail: taohe@stu.scu.edu.cn; guojixiang@scu.edu.cn; wjy@scu.edu.cn; xux-       In this paper, we propose a uniform U-like encoder-
iuyuan@stu.scu.edu.cn; zhangyi@scu.edu.cn).                               decoder segmentation architecture. The previous U-Net and
      Slice Index       1    …    50    51   …    68   69    …      80    81     …       118    119    120   …    156   157        …       182     183     …

      Esophagus         N    N     N    N    N    N     Y    Y      Y      Y     Y        Y      Y      Y    Y     Y        Y      Y        Y       N      N

         Heart          N    N     N    N    N    N     N    N      N      Y     Y        Y      Y      N    N     N        N      N        N       N      N

       Trachea          N    N     N    N    N    N     N    N      N      N     N        N      Y      Y    Y     Y        Y      Y        Y       N      N

         Aorta          N    N     N    Y    Y    Y     Y    Y      Y      Y     Y        Y      Y      Y    Y     Y        N      N        N       N      N


Fig. 1. The macro view of Patient01’s CT slices. ‘Y’ (YES) and ‘N’ (NO) indicate whether the corresponding organ appears
in ith column. From 51th to 68th slices, only aorta appears; from 68th to 80th slices, esophagus appears; from 81th to 118th
slices, heart appears; in the 119th slice, all organs appear; from 120th to 156th slices, heart disappears; from 157th to 182th
slices, aorta disappears; from 183th slice to the end, all organs disappear.


its variants usually have symmetrical encoder and decoder                      Convolution                          Sigmoid
parts. In the uniform U-like architecture, the encoder part                    Copy
could directly involves the widely used popular networks                       Global Avg Pool                                                           Softmax
(ResNet or DenseNet) by omitting their last linear connection                  Bilinear Upsample             4, 512 × 512                       5, 512 × 512
layers. The encoder has more no-linear mapping ability and                     Concatenate
could adopts the transfer learning by initializing its parame-
                                                                               512 × 512                                        512 × 512
ters with the popular networks trained on image classification.
The decoder part only works on enlarging the size of feature
maps and shrinking the channel of networks. The uniform                        128 × 128
U-like architecture is trained under the multi-task learning                                                                    128 × 128
scheme. The major task focuses on the local pixel-wise seg-
mentation and the auxiliary task focuses on the global slice                    64 × 64
classification. There are two merits of multi-task learning.
                                                                                                                                 64 × 64
Firstly, the auxiliary task could improve the generalization
performance by concurrently learning with the main task.                        32 × 32
Secondly, the predictions of the auxiliary task are used for
filtering the false positive segmentation results.                                                                               32 × 32


                                                                                16 × 16

                                                                                                                                 16 × 16
                        2. METHOD
                                                                               Encoder                                          Decoder

In this section, we will introduce the multi-task learning               Fig. 2. The uniform U-like encoder-decoder architecture with
scheme and the uniform U-like encoder-decoder architecture.              multi-task learning, where the blue arrow indicates a convo-
                                                                         lutional layer, the dashed line indicates a copy operation, the
                                                                         solid line indicates a global average pooling layer, the green
                                                                         arrow indicates a bilinear upsample and the combined dashed
2.1. Multi-task Learning                                                 block indicates a concatenation operation.

During the automatic segmentation of thoracic organs on the
SegTHOR challenge data, we found out that individual organ-              ing is as follow:
s could not appear independently in one slice. In Fig. (1), we                       Ks
                                                                                            2 · ij pkij · gij
                                                                                                            k
                                                                                                P
give the detailed macro view of Patient01’s CT slices. All pa-                    1 X
                                                                           D =1 −        P     k 2
                                                                                                      P       k 2
tients have similar macro appearance orders. In other words,                      Ks      ij (pij ) +    ij (gij )
                                                                                     k=1
the organs appear dependently. If we could learn the macro                                    Kc
classification, we could use the classification results to filter
                                                                                              X
                                                                                  +α·              (hk · log q k + (1 − hk ) · log(1 − q k )). (1)
the false positive predictions of each organ. It will be much                                 k=1
more valuable since the organs appear dependently. We apply
the multi-task learning scheme to concurrently learn the seg-            where Ks = 5 and Kc = 4, which indicate the number of
mentation and classification tasks. The formulation of learn-            segmentation and classification categories, respectively. D is
the combined cost function. The major segmentation task is           and it took about 6 ∼ 8 hours. After each testing, we used
trained with dice loss and the auxiliary classification task is      a largest connected component labeling to refine the segmen-
trained with multi-label logistic regression. In the dice loss       tation results of each organ. The final submitted result is the
part, pkij and gij
                k
                   are the kth output produced by a softmax          ensemble result of those 6 U-like networks. The experimental
function and the kth one-hot target of pixel (i, j), respectively.   results are listed on Table 1. We achieved the second place in
In the multi-label logistic regression part, q k and hk are the      the “All” rank order and the second place in the “Esophagus”
kth output produced by the corresponding logistic function           rank order, respectively.
and the kth target, respectively. α is used to balance the loss.
In our experiments, we set α = 0.5.                                                       4. CONCLUSION

2.2. Uniform U-like Encoder-Decoder Architecture                     The uniform U-like architecture is abstracted from the wide-
                                                                     ly used U-Net. The encoder part of the uniform U-like ar-
In most segmentation tasks, manually labelling is time-              chitecture is free for setting different network structures and
consuming, therefore the train sets are always restrained.           the transfer learning is easy to be applied in this design. In
Transfer learning is a very useful strategy to train a network       our experimental observation, the transfer learning accelerat-
on a small data set. In order to apply the transfer learning         ed the training of those networks and boosted the performance
in the SegTHOR challenge, we abstract a uniform U-like               of them. The multi-task learning is helpful on discovering or-
encoder-decoder architecture, where the encoder part could           gans’ dependence. However, we did not analyze its advan-
directly involve the widely used ResNet or DenseNet by               tages because of the time limit of the challenge.
omitting their last linear connection layers. The encoder part           We need to emphasize the fact that the connected com-
could adopt the transfer learning by initializing the encoders       ponent labeling is very useful for the SegTHOR challenge
parameters with the corresponding networks trained on image          since all organs are indivisible and our method was based
classification. The decoder part only works on enlarging the         on the 2D-CNNs. Since the given CT is not enough for the
size of feature maps and shrinking the channel of networks.          SegTHOR data set compared with other segmentation tasks,
The U-like architecture is depicted in Fig. (2).                     the trained networks were easy to overfit. Therefore, the en-
                                                                     semble strategy is also very necessary for the SegTHOR chal-
                      3. EXPERIMENT                                  lenge.

There are 40 and 20 3D abdominal CT scans for training                                    5. REFERENCES
and testing on the SegTHOR Challenge dataset, respective-
ly. We randomly split the given 40 training CT volumes in-            [1] Roger Trullo, C. Petitjean, Su Ruan, Bernard Dubray,
to 32 for training and 8 for validation. The 3D CT scan-                  Dong Nie, and Dinggang Shen, “Segmentation of or-
s were cut into slices along z-axis. Under the architecture               gans at risk in thoracic CT images using a sharpmask
of the uniform U-like architecture, the encoder part is free              architecture and conditional random fields,” in IEEE
for setting. We implemented 6 widely used networks as the                 14th International Symposium on Biomedical Imaging
encoder part including ResNet-101, ResNet-152, DenseNet-                  (ISBI), 2017, pp. 1003–1006.
121, DenseNet-161, DenseNet-169 and DenseNet-201. The
decoder part of them only involved one convolutional layer to         [2] Olaf Ronneberger, Philipp Fischer, and Thomas Brox,
shrink the number of channels.                                            “U-net: Convolutional networks for biomedical image
    The training of networks stopped when the dice per case               segmentation,” in Proceedings of Medical Image Com-
of the validation set did not grow during 10 epochs. In or-               puting and Computer-Assisted Intervention (MICCAI),
der to fully use the given data, we then reloaded the trained             2015, pp. 234–241.
model and retrained it on the full 40 slices for fixed 10 e-          [3] Vijay Badrinarayanan, Alex Kendall, and Roberto
pochs. All networks were implemented by Pytorch [11] and                  Cipolla, “Segnet: A deep convolutional encoder-
trained using the stochastic gradient descent with momentum               decoder architecture for image segmentation,” IEEE
of 0.9. All networks were trained on the images with the o-               Transactions on Pattern Analysis and Machine Intelli-
riginal resolution and in form of 2.5D data, which consists of            gence, vol. 39, no. 12, pp. 2481–2495, 2017.
3 adjacent axial slices. The image intensity values of all s-
cans were truncated to the range of [−128, 384] HU to omit            [4] Liang-Chieh Chen, George Papandreou, Iasonas Kokki-
the irrelevant information. The initial learning rate was 0.01            nos, Kevin Murphy, and Alan L. Yuille, “Deeplab: Se-
and decayed by multiplying 0.9. For data augmentation, we                 mantic image segmentation with deep convolutional net-
adopted random horizontal and vertical flipping and scaling               s, atrous convolution, and fully connected crfs,” IEEE
between 0.6 and 1 to alleviate the overfitting problem. The               Transactions on Pattern Analysis and Machine Intelli-
networks were trained using four NVIDIA Titan Xp GPUs                     gence, vol. 40, no. 4, pp. 834–848, 2018.
                                          Table 1. The experiment results and ranks.
                         Rank                          Dice                                   Hausdorff
         User
                  All    Esophagus    Esophagus    Heart      Trachea   Aorta    Esophagus   Heart    Trachea   Aorta
        MILab     2.75          2       0.8594     0.9500     0.9201    0.9484    0.2743     0.1383   0.1824    0.1129


 [5] Liang-Chieh Chen, Yukun Zhu, George Papandreou,
     Florian Schroff, and Hartwig Adam, “Encoder-decoder
     with atrous separable convolution for semantic image
     segmentation,” in European Conference on Computer
     Vision (ECCV), 2018, pp. 833–851.
 [6] Xiaomeng Li, Hao Chen, Xiaojuan Qi, Qi Dou, Chi-
     Wing Fu, and Pheng-Ann Heng, “H-denseunet: Hybrid
     densely connected unet for liver and tumor segmenta-
     tion from CT volumes,” IEEE Transactions on Medical
     Imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
 [7] Qi Dou, Hao Chen, Yueming Jin, Lequan Yu, Jing Qin,
     and Pheng-Ann Heng, “3d deeply supervised network
     for automatic liver segmentation from ct volumes,” in
     Medical Image Computing and Computer-Assisted In-
     tervention (MICCAI), 2016, pp. 149–157.
 [8] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ah-
     madi, “V-net: Fully convolutional neural networks for
     volumetric medical image segmentation,” in Proceed-
     ings of International Conference on 3D Vision, 2016,
     pp. 565–571.
 [9] Patrick Ferdinand Christ, Mohamed Ezzeldin A.
     Elshaer, Florian Ettlinger, Sunil Tatavarty, Marc Bickel,
     Patrick Bilic, Markus Rempfler, Marco Armbruster, Fe-
     lix Hofmann, Melvin D’Anastasi, Wieland H. Sommer,
     Seyed-Ahmad Ahmadi, and Bjoern H. Menze, “Auto-
     matic liver and lesion segmentation in ct using cascaded
     fully convolutional neural networks and 3d conditional
     random fields,” in Proceedings of Medical Image Com-
     puting and Computer-Assisted Intervention (MICCAI),
     2016, pp. 415–423.
[10] Yuyin Zhou, Lingxi Xie, Elliot K. Fishman, and Alan L.
     Yuille, “Deep supervision for pancreatic cyst segmenta-
     tion in abdominal CT scans,” in Proceedings of Medical
     Image Computing and Computer Assisted Intervention
     (MICCAI), 2017, pp. 222–230.
[11] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
     Chanan, Edward Yang, Zachary DeVito, Zeming Lin,
     Alban Desmaison, Luca Antiga, and Adam Lerer, “Au-
     tomatic differentiation in pytorch,” in the Workshop of
     Conference on Neural Information Processing Systems
     (NIPS Workshop).