=Paper=
{{Paper
|id=Vol-2349/paper-2
|storemode=property
|title=Multi-task Learning for the Segmentation of Thoracic Organs at Risk in CT images
|pdfUrl=https://ceur-ws.org/Vol-2349/SegTHOR2019_paper_2.pdf
|volume=Vol-2349
|dblpUrl=https://dblp.org/rec/conf/isbi/HeGWX019
}}
==Multi-task Learning for the Segmentation of Thoracic Organs at Risk in CT images==
MULTI-TASK LEARNING FOR THE SEGMENTATION OF THORACIC ORGANS AT RISK
IN CT IMAGES
Tao He, Jixiang Guo, Jianyong Wang, Xiuyuan Xu and Zhang Yi
Machine Intelligence Laboratory, Sichuan University
ABSTRACT Organs at Risk (SegTHOR) [1] challenge. The segmentation
task is challenging for following reasons: (1) the shape and
The automatic segmentation of thoracic organs has clinical
position of each organ on CT slices vary greatly between pa-
significance. In this paper, we develop the U-Net architecture
tients; (2) the contours in CT images have low contrast, and
and obtain a uniform U-like encoder-decoder segmentation
can be absent. The challenge focuses on 4 organs as risk:
architecture for the segmentation of thoracic organs. The en-
heart, aorta, trachea, esophagus.
coder part of this architecture could directly involve the wide-
Recently, the developments of the automatic segmenta-
ly used popular networks (DenseNet or ResNet) by omitting
tion based on deep learning have overthrown the traditional
their last linear connection layers. In our observation, we find
feature extraction methods. The paragon of medical segmen-
out that individual organs could not appear independently in
tation models is U-Net [2]. U-Net has carefully designed
one CT slice. Therefore, we empirically propose to use the
encoder and decoder parts with shortcut connections. The
multi-task learning for the segmentation of thoracic organs.
most significant advantage of shortcut connections is to
The major task focuses on the local pixel-wise segmentation
combine low-level features with high-level features at dif-
and the auxiliary task focuses on the global slice classifica-
ferent layers. Recent years, many similar models termed as
tion. There are two merits of the multi-task learning. First-
encoder-decoder architectures were proposed, for example,
ly, the auxiliary task could improve the generalization perfor-
Seg-Net [3] and DeepLab series networks [4, 5].
mance by concurrently learning with the main task. Secondly,
In [6], an H-DenseUNet was proposed for liver and tumor
the predicted accuracy of the auxiliary task could achieve al-
segmentation, where intra-slice and inter-slice features were
most 98% on the validation set, so the predictions of the auxil-
extracted and jointly optimized through the hybrid feature fu-
iary task could be used to filter the false positive segmentation
sion layer. In [7], a 3D Deeply Supervised Network (3D-
results. The proposed method was test on the Segmentation
DSN) was proposed to address the liver segmentation prob-
of THoracic Organs at Risk (SegTHOR) challenge (submit-
lem. The 3D-DSN involved additional supervision injected
ted name: MILab, till March 21, 2019, 8:44 a.m. UTC) and
into hidden layers to counteract the adverse effects of gradi-
achieved the second place by the “All” rank and the second
ent vanishing. This method achieved the state of the art on
place by the “Esophagus” rank, respectively.
the MICCAI-SLiver07 dataset. V-Net [8] is much like the 3D
Index Terms— Automatic segmentation, CT, U-Net, version of U-Net, which was directly applied in volumetric
Multi-task learning segmentation from MRI volumes depicting prostate.
Honestly, the 3D-CNNs based models fully exploit the s-
1. INTRODUCTION pace relative features but training a 3D-CNNs based model
is usually time-consuming and requires large hyperparame-
The contrast-enhanced Computed Tomography (CT) is the ters capacity. Therefore, many previous works employed 2D-
widely used clinical tool for diagnosing plenty of thoracic dis- CNNs and trained them on 2.5D data, which consisted of a
eases. The drab and boring manual segmentation of thoracic stack of adjacent slices as input. Then the liver lesion region-
organs from CT images is very time-consuming. The auto- s were predicted according to the center slice. In order to
matic segmentation from CT images will be helpful for on- achieve the accurate segmentation results of 2D-CNNs, au-
cologists to diagnose the thoracic organs at risk in CT images. thors of [9] proposed a two-step segmentation framework. At
In this paper, we focus on the automatic augmentation of tho- the first step, an FCN was trained to segment liver as ROI in-
racic organs data, supported by the Segmentation of THoracic put for the second FCN. The second FCN solely segmented
lesions from the predicted liver ROIs of step 1. The two-step
This work was supported by the National Natural Science Founda- segmentation framework has been widely involved in many
tion of China under Grant 61432012. The authors are with the College segmentation works [9, 6, 10].
of Computer Science, Sichuan University, Chengdu 610065, China (e-
mail: taohe@stu.scu.edu.cn; guojixiang@scu.edu.cn; wjy@scu.edu.cn; xux- In this paper, we propose a uniform U-like encoder-
iuyuan@stu.scu.edu.cn; zhangyi@scu.edu.cn). decoder segmentation architecture. The previous U-Net and
Slice Index 1 … 50 51 … 68 69 … 80 81 … 118 119 120 … 156 157 … 182 183 …
Esophagus N N N N N N Y Y Y Y Y Y Y Y Y Y Y Y Y N N
Heart N N N N N N N N N Y Y Y Y N N N N N N N N
Trachea N N N N N N N N N N N N Y Y Y Y Y Y Y N N
Aorta N N N Y Y Y Y Y Y Y Y Y Y Y Y Y N N N N N
Fig. 1. The macro view of Patient01’s CT slices. ‘Y’ (YES) and ‘N’ (NO) indicate whether the corresponding organ appears
in ith column. From 51th to 68th slices, only aorta appears; from 68th to 80th slices, esophagus appears; from 81th to 118th
slices, heart appears; in the 119th slice, all organs appear; from 120th to 156th slices, heart disappears; from 157th to 182th
slices, aorta disappears; from 183th slice to the end, all organs disappear.
its variants usually have symmetrical encoder and decoder Convolution Sigmoid
parts. In the uniform U-like architecture, the encoder part Copy
could directly involves the widely used popular networks Global Avg Pool Softmax
(ResNet or DenseNet) by omitting their last linear connection Bilinear Upsample 4, 512 × 512 5, 512 × 512
layers. The encoder has more no-linear mapping ability and Concatenate
could adopts the transfer learning by initializing its parame-
512 × 512 512 × 512
ters with the popular networks trained on image classification.
The decoder part only works on enlarging the size of feature
maps and shrinking the channel of networks. The uniform 128 × 128
U-like architecture is trained under the multi-task learning 128 × 128
scheme. The major task focuses on the local pixel-wise seg-
mentation and the auxiliary task focuses on the global slice 64 × 64
classification. There are two merits of multi-task learning.
64 × 64
Firstly, the auxiliary task could improve the generalization
performance by concurrently learning with the main task. 32 × 32
Secondly, the predictions of the auxiliary task are used for
filtering the false positive segmentation results. 32 × 32
16 × 16
16 × 16
2. METHOD
Encoder Decoder
In this section, we will introduce the multi-task learning Fig. 2. The uniform U-like encoder-decoder architecture with
scheme and the uniform U-like encoder-decoder architecture. multi-task learning, where the blue arrow indicates a convo-
lutional layer, the dashed line indicates a copy operation, the
solid line indicates a global average pooling layer, the green
arrow indicates a bilinear upsample and the combined dashed
2.1. Multi-task Learning block indicates a concatenation operation.
During the automatic segmentation of thoracic organs on the
SegTHOR challenge data, we found out that individual organ- ing is as follow:
s could not appear independently in one slice. In Fig. (1), we Ks
2 · ij pkij · gij
k
P
give the detailed macro view of Patient01’s CT slices. All pa- 1 X
D =1 − P k 2
P k 2
tients have similar macro appearance orders. In other words, Ks ij (pij ) + ij (gij )
k=1
the organs appear dependently. If we could learn the macro Kc
classification, we could use the classification results to filter
X
+α· (hk · log q k + (1 − hk ) · log(1 − q k )). (1)
the false positive predictions of each organ. It will be much k=1
more valuable since the organs appear dependently. We apply
the multi-task learning scheme to concurrently learn the seg- where Ks = 5 and Kc = 4, which indicate the number of
mentation and classification tasks. The formulation of learn- segmentation and classification categories, respectively. D is
the combined cost function. The major segmentation task is and it took about 6 ∼ 8 hours. After each testing, we used
trained with dice loss and the auxiliary classification task is a largest connected component labeling to refine the segmen-
trained with multi-label logistic regression. In the dice loss tation results of each organ. The final submitted result is the
part, pkij and gij
k
are the kth output produced by a softmax ensemble result of those 6 U-like networks. The experimental
function and the kth one-hot target of pixel (i, j), respectively. results are listed on Table 1. We achieved the second place in
In the multi-label logistic regression part, q k and hk are the the “All” rank order and the second place in the “Esophagus”
kth output produced by the corresponding logistic function rank order, respectively.
and the kth target, respectively. α is used to balance the loss.
In our experiments, we set α = 0.5. 4. CONCLUSION
2.2. Uniform U-like Encoder-Decoder Architecture The uniform U-like architecture is abstracted from the wide-
ly used U-Net. The encoder part of the uniform U-like ar-
In most segmentation tasks, manually labelling is time- chitecture is free for setting different network structures and
consuming, therefore the train sets are always restrained. the transfer learning is easy to be applied in this design. In
Transfer learning is a very useful strategy to train a network our experimental observation, the transfer learning accelerat-
on a small data set. In order to apply the transfer learning ed the training of those networks and boosted the performance
in the SegTHOR challenge, we abstract a uniform U-like of them. The multi-task learning is helpful on discovering or-
encoder-decoder architecture, where the encoder part could gans’ dependence. However, we did not analyze its advan-
directly involve the widely used ResNet or DenseNet by tages because of the time limit of the challenge.
omitting their last linear connection layers. The encoder part We need to emphasize the fact that the connected com-
could adopt the transfer learning by initializing the encoders ponent labeling is very useful for the SegTHOR challenge
parameters with the corresponding networks trained on image since all organs are indivisible and our method was based
classification. The decoder part only works on enlarging the on the 2D-CNNs. Since the given CT is not enough for the
size of feature maps and shrinking the channel of networks. SegTHOR data set compared with other segmentation tasks,
The U-like architecture is depicted in Fig. (2). the trained networks were easy to overfit. Therefore, the en-
semble strategy is also very necessary for the SegTHOR chal-
3. EXPERIMENT lenge.
There are 40 and 20 3D abdominal CT scans for training 5. REFERENCES
and testing on the SegTHOR Challenge dataset, respective-
ly. We randomly split the given 40 training CT volumes in- [1] Roger Trullo, C. Petitjean, Su Ruan, Bernard Dubray,
to 32 for training and 8 for validation. The 3D CT scan- Dong Nie, and Dinggang Shen, “Segmentation of or-
s were cut into slices along z-axis. Under the architecture gans at risk in thoracic CT images using a sharpmask
of the uniform U-like architecture, the encoder part is free architecture and conditional random fields,” in IEEE
for setting. We implemented 6 widely used networks as the 14th International Symposium on Biomedical Imaging
encoder part including ResNet-101, ResNet-152, DenseNet- (ISBI), 2017, pp. 1003–1006.
121, DenseNet-161, DenseNet-169 and DenseNet-201. The
decoder part of them only involved one convolutional layer to [2] Olaf Ronneberger, Philipp Fischer, and Thomas Brox,
shrink the number of channels. “U-net: Convolutional networks for biomedical image
The training of networks stopped when the dice per case segmentation,” in Proceedings of Medical Image Com-
of the validation set did not grow during 10 epochs. In or- puting and Computer-Assisted Intervention (MICCAI),
der to fully use the given data, we then reloaded the trained 2015, pp. 234–241.
model and retrained it on the full 40 slices for fixed 10 e- [3] Vijay Badrinarayanan, Alex Kendall, and Roberto
pochs. All networks were implemented by Pytorch [11] and Cipolla, “Segnet: A deep convolutional encoder-
trained using the stochastic gradient descent with momentum decoder architecture for image segmentation,” IEEE
of 0.9. All networks were trained on the images with the o- Transactions on Pattern Analysis and Machine Intelli-
riginal resolution and in form of 2.5D data, which consists of gence, vol. 39, no. 12, pp. 2481–2495, 2017.
3 adjacent axial slices. The image intensity values of all s-
cans were truncated to the range of [−128, 384] HU to omit [4] Liang-Chieh Chen, George Papandreou, Iasonas Kokki-
the irrelevant information. The initial learning rate was 0.01 nos, Kevin Murphy, and Alan L. Yuille, “Deeplab: Se-
and decayed by multiplying 0.9. For data augmentation, we mantic image segmentation with deep convolutional net-
adopted random horizontal and vertical flipping and scaling s, atrous convolution, and fully connected crfs,” IEEE
between 0.6 and 1 to alleviate the overfitting problem. The Transactions on Pattern Analysis and Machine Intelli-
networks were trained using four NVIDIA Titan Xp GPUs gence, vol. 40, no. 4, pp. 834–848, 2018.
Table 1. The experiment results and ranks.
Rank Dice Hausdorff
User
All Esophagus Esophagus Heart Trachea Aorta Esophagus Heart Trachea Aorta
MILab 2.75 2 0.8594 0.9500 0.9201 0.9484 0.2743 0.1383 0.1824 0.1129
[5] Liang-Chieh Chen, Yukun Zhu, George Papandreou,
Florian Schroff, and Hartwig Adam, “Encoder-decoder
with atrous separable convolution for semantic image
segmentation,” in European Conference on Computer
Vision (ECCV), 2018, pp. 833–851.
[6] Xiaomeng Li, Hao Chen, Xiaojuan Qi, Qi Dou, Chi-
Wing Fu, and Pheng-Ann Heng, “H-denseunet: Hybrid
densely connected unet for liver and tumor segmenta-
tion from CT volumes,” IEEE Transactions on Medical
Imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
[7] Qi Dou, Hao Chen, Yueming Jin, Lequan Yu, Jing Qin,
and Pheng-Ann Heng, “3d deeply supervised network
for automatic liver segmentation from ct volumes,” in
Medical Image Computing and Computer-Assisted In-
tervention (MICCAI), 2016, pp. 149–157.
[8] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ah-
madi, “V-net: Fully convolutional neural networks for
volumetric medical image segmentation,” in Proceed-
ings of International Conference on 3D Vision, 2016,
pp. 565–571.
[9] Patrick Ferdinand Christ, Mohamed Ezzeldin A.
Elshaer, Florian Ettlinger, Sunil Tatavarty, Marc Bickel,
Patrick Bilic, Markus Rempfler, Marco Armbruster, Fe-
lix Hofmann, Melvin D’Anastasi, Wieland H. Sommer,
Seyed-Ahmad Ahmadi, and Bjoern H. Menze, “Auto-
matic liver and lesion segmentation in ct using cascaded
fully convolutional neural networks and 3d conditional
random fields,” in Proceedings of Medical Image Com-
puting and Computer-Assisted Intervention (MICCAI),
2016, pp. 415–423.
[10] Yuyin Zhou, Lingxi Xie, Elliot K. Fishman, and Alan L.
Yuille, “Deep supervision for pancreatic cyst segmenta-
tion in abdominal CT scans,” in Proceedings of Medical
Image Computing and Computer Assisted Intervention
(MICCAI), 2017, pp. 222–230.
[11] Adam Paszke, Sam Gross, Soumith Chintala, Gregory
Chanan, Edward Yang, Zachary DeVito, Zeming Lin,
Alban Desmaison, Luca Antiga, and Adam Lerer, “Au-
tomatic differentiation in pytorch,” in the Workshop of
Conference on Neural Information Processing Systems
(NIPS Workshop).