MULTI-ORGAN SEGMENTATION USING SIMPLIFIED DENSE V-NET WITH POST PROCESSING Ming Feng, Weiquan Huang, Yin Wang, Yuxia Xie Tongji University, Shanghai, China {1810865, 1730784, yinw, yuxia xie}@tongji.edu.cn ABSTRACT the testing data. Our postprocessing method further reduces fragments in the prediction mask. The overall improvement With the recent advances in the field of computer vision, Con- over the SM+CRF baseline model [5] is between 4 to 10 volutional Neural Networks (CNNs) are widely used in organ percent over different organs. segmentation of computed tomography (CT) images. Based on the Dense V-net model, this paper proposes a simplified version with postprocessing methods to help reduce the frag- 2. OUR MODEL ments in organ segmentation results. Comparing with the baseline method that uses a sharpmask model with condi- tional random field (SM+CRF), our model improves the Dice 1283 AAAB7HicbVBNTwIxEJ31E/EL9eilkZh4Ils8yJHEi0dMXCCBlXRLFxq67abtmpANv8GLB43x6g/y5r+xwB4UfMkkL+/NZGZelApurO9/exubW9s7u6W98v7B4dFx5eS0bVSmKQuoEkp3I2KY4JIFllvBuqlmJIkE60ST27nfeWLacCUf7DRlYUJGksecEuukANcbj9eDStWv+QugdYILUoUCrUHlqz9UNEuYtFQQY3rYT22YE205FWxW7meGpYROyIj1HJUkYSbMF8fO0KVThihW2pW0aKH+nshJYsw0iVxnQuzYrHpz8T+vl9m4EeZcppllki4XxZlAVqH552jINaNWTB0hVHN3K6Jjogm1Lp+yCwGvvrxO2vUa9mv4HlebjSKOEpzDBVwBhhtowh20IAAKHJ7hFd486b14797HsnXDK2bO4A+8zx+I+Y3L sha1_base64="msbQH587scMYceJtEv3WOk5igQk=">AAAB7HicbZDLTgIxFIbP4A3xhrp000hMXJEpLmAniRuXaBwggZF0SgcaOp1J2zEhE57BjQuNceszuPIh3Pk2lstCwT9p8uX/z0nPOUEiuDau++3k1tY3Nrfy24Wd3b39g+LhUVPHqaLMo7GIVTsgmgkumWe4EaydKEaiQLBWMLqa5q0HpjSP5Z0ZJ8yPyEDykFNirOXhSu3+olcsuWV3JrQKeAGly8+PW7Bq9Ipf3X5M04hJQwXRuoPdxPgZUYZTwSaFbqpZQuiIDFjHoiQR0342G3aCzqzTR2Gs7JMGzdzfHRmJtB5Hga2MiBnq5Wxq/pd1UhPW/IzLJDVM0vlHYSqQidF0c9TnilEjxhYIVdzOiuiQKEKNvU/BHgEvr7wKzUoZu2V8g0v1GsyVhxM4hXPAUIU6XEMDPKDA4RGe4cWRzpPz6rzNS3POoucY/sh5/wFHho/S Activation ratio of Esophagus, Heart, Trachea, and Aorta by 10%, 4%, Volume 7%, and 6%, respectively. Convolutional downsampling Index Terms— Convolutional Neural Networks, CT Seg- 2x Convolution Bilinear mentation, Dense V-net upsampling 4x Dense feature stack 1. INTRODUCTION Fig. 1. Simplified Dense V-net model Organ segmentation of CT images is of great importance in medical diagnosis. The identification and localization of The structure of our proposed model is shown in Fig. 1. organs are the daily work of the radiologist. Since CT im- Comparing with the original Dense V-net model, there are two ages are complex and three-dimensional(3D), distinguishing main differences. First, the input size is different. The input organs manually is a difficult and tedious task. Therefore, size of the original model is 1443 . The number of partial data segmentation using deep learning methods automatically slices in our data is less than 144, so we set the input size to have received a great deal of attention in medical imaging re- 1283 . Second, the spatial prior block is discarded. search. In the field of 3D medical image segmentation, there The encoder block of the segmentation network generates are two main methods. The first is to segment each slice inde- three sets of feature maps of different sizes. The decoder pendently, e.g., using the U-net model [1]. The other is to use block upsamples the smaller feature maps so the output mask the 3D convolution to aggregate inter-slice information and to is of the same size as the input image. The output layer gen- segment all slices of the CT image at once, e.g., V-net [2] is erates the segmentation mask with the probability vector of one of the 3D convolutional network models for this purpose. different segmentation classes at each pixel. Gibson et al. [3] integrated the two-dimensional segmenta- tion model of Dense net [4] into V-net and proposed a Dense V-net architecture for multiple organ segmentation. Overall, 3. IMPLEMENTATION single slice segmentation methods cannot utilize inter-layer dependencies for better results but are computationally more This section discusses various optimization techniques to re- efficient. All slice 3D segmentation can aggregate all layers duce the Dice loss and to minimize the Hausdorff distance. for better accuracy but is more expensive to compute. In this paper, we present our multi-organ segmentation so- 3.1. Data preprocessing lution used in the SegTHOR challenge hosted at the ISBI’19 conference. Observing that the training data is relatively Preprocessing is part of our fully automated organ segmenta- small and easy to overfit deep convolutional neural nets, we tion method. By analyzing the training data provided, we find simplify the Dense V-net model to achieve better results with the following issues. First, the dataset is small and is quite easy to overfit our deep neural networks. Second, for a single CT slice, the pro- portion of pixels of various organs is quite different. Fig. 2 shows the imbalance of different organs at different slices. Last, considering the relative position of the machine and the person while scanning, the CT images can be scaled and ro- tated. Based on these observations, we apply the following techniques. (a) Slice 119, label (b) Slice 119, prediction (a) The ratio of background (b) The ratio of organs. to organs. Fig. 2. Background and organ volume proportion in training data. (c) Slice 120, label (d) Slice 120, prediction 3.1.1. Patch sampling Fig. 3. The 119th and 120th slices of patient 30 in the la- beled data and prediction result. We can see that the heart We ensure that each class is sampled with the same probabil- disappears at the 120th slice in the labeled data. The sudden ity. According to the slice range of the test dataset, the sample disappearing of an organ often leads to incorrect predictions. block size is set to 1283 . 3.1.2. Data augmentation Fig.4 shows the predict results with the removal of discon- nected blocks. The CT image is sliced along the depth direc- During the training stage, we randomly rotate pictures (within tion, for each layer, 52 average filtering is used, which seri- -10◦ −10◦ ) and randomly scale pictures (-10 %−10 % range). ously affects the segmentation results of small sample organs We implement the data augmentation on the Niftynet frame- like Esophagus and Trachea, and has little effect on multi- work [6]. The data augmentation method used in the training sample organs as Heart and Aorta. The enlargement of the stage will not affect the structure of the Dense V-net. organs for each class within each layer has little effect on the segmentation. 3.2. Postprocessing By comparing the prediction result with the ground truth la- 3.3. DicePlusXEnt loss function bel, we find the following issues. The loss functions commonly used in segmentation are Cross- In the training data organs are all connected, but organs Entropy loss and Dice loss. The Cross-Entropy loss exam- are not connected in the predicted results. Some areas of the ines each pixel separately, and compares the prediction results CT image are not smooth, Fig.3. There are multiple organ with one-hot encoded target vector. It does not consider the inclusions in the same slice, which does not actually exist. imbalance of different segmentation classes, and can lead to In the prediction result, the organ is connected but there are poor prediction results with the minority classes. Imbalanced background noise inside. classes are very common in medical image segmentation. The For the first question above, we experimented with the Dice loss is essentially a measurement of the overlap between following methods. the predicted mask and the ground truth mask, calculated as The CT image is sliced along three dimensions respec- follows [7] : tively, then count the number of connected blocks of each k k P organ. For each dimension and each organ, the largest con- 2 X i∈I ui vi nected block is retained, and the other parts are considered ldice = − P k P k (1) |K| i∈I ui + i∈I vi k∈K background noise and therefore removed. Experiments show that our method achieves obvious increase; see Algorithm 1. where K is the set of segmentation classes, I is the entire Algorithm 1 Axis-based denoise method Input: The result from model, Tm ; Output: Remove noise block prediction result, Qm ; 1: for all axisi of Tm do 2: for all slicej of the axisi do 3: for all categoryk of Tm do 4: Sets slice[−1] and slice[max + 1] to −1; 5: if The current slice contain the categoryk and the previous slice does not contain categoryk then 6: The current slice index is added to blockIn; 7: end if 8: if The current slice contain the categoryk and the next layer does not contain categoryk then 9: The current slice index is added to blockOut; 10: end if 11: end for 12: end for 13: The blockIn corresponds to the blockOut element one by one, each set of them represents a continuous block, the data difference represents the contiguous block length, the contiguous block of the maximum length is reserved, and the other continuous blocks in Qm are set as the background class; 14: end for 15: return Qm ; image, and uki , vik are the predicted and ground truth value of class k at pixel i, respectively. Dice loss is more suitable 1 1 for sample’s extremely imbalance situation, but in our expe- Fig. 4. From top to bottom, main view and the left view of rience, using the Dice loss alone will adversely affect back the true label, the predicted result, the 3D denoise. The small propagation, making training extremely unstable. fragments are significantly reduced. We use DicePlusXEnt loss [8], which is the sum of the Cross-Entropy loss and the Dice loss, as follows: ltotal = ldice + lCE (2) Algorithm 2 Training model This loss function will improve the sample imbalance to a Input: The training data, X and label, Y ;The fusion model certain extent and improve the stability of network training. numbers, N ;The learning rate list, L; Due to the imbalance of the samples, we set the weight of Output: Segmentation result, R; the Cross-Entropy loss in DicePlusXEnt: w(Background)=1, 1: for all ni in range(N ) do w(Heart)=2, w(T rachea)=3, w(Aorta)=4, w(Esophagus) 2: for all li ∈ L do =5. 3: while loss does decrease in 500 iterations do 4: Forward and backward; 4. EXPERIMENTS 5: end while 6: end for Our experiment is conducted on the SegTHOR dataset [5]. 7: Save the model with the lowest validation set loss dur- Niftynet is used in our model training, which is implemented ing this iteration; by Tensorflow. Based on the preprocessed data, the Dense 8: end for V-net network is trained and then fine-tuned with different 9: Fusion saved models, get Rori ; parameter configurations. 10: R ← Axis-based denoise(Rori ); The activation function used in the network is Leaky 11: return R; ReLU. The batch size is four. We use the Adam optimizer with an initial learning rate of 0.01. If the loss value does Table 1. Performance of different methods Dice Hausdorff Methods Esophagus Heart Trachea Aorta Esophagus Heart Trachea Aorta Dense V-net (resize sampling) 0.588862 0.906035 0.772924 0.780659 1.531403 0.598427 1.783999 0.997311 Dense V-net (balanced sampling) 0.746470 0.937633 0.875301 0.914082 1.153503 0.221647 1.726525 0.402991 Dense V-net (balanced sampling and average filter) 0.490914 0.914966 0.589199 0.840300 3.246483 0.292705 2.417643 1.066558 Dense V-net (balanced sampling and organ enlargement) 0.486919 0.913697 0.575745 0.841042 4.128935 0.817668 5.587061 1.581914 7 Dense V-net fusion 0.763881 0.940254 0.883234 0.915550 0.771958 0.188203 0.597479 0.308775 7 Dense V-net fusion (1D denoise) 0.763973 0.940255 0.885504 0.915673 0.766507 0.188183 0.330171 0.295968 7 Dense V-net fusion (3D denoise) 0.765423 0.940225 0.885614 0.915954 0.661974 0.188183 0.325847 0.258024 7 Dense V-net fusion (3D denoise and weighted loss) 0.773450 0.941403 0.892730 0.923325 0.640093 0.182138 0.307711 0.235788 * This Dense V-net is simplified Dense V-net. not decrease after 500 iterations, then the learning rate de- [3] Eli Gibson, Francesco Giganti, Yipeng Hu, Ester Bon- creases by ten-fold, up to 0.0001. When the learning rate is mati, Steve Bandula, Kurinchi Gurusamy, Brian David- 0.0001 and after 500 iterations if the loss does not change, son, Stephen P Pereira, Matthew J Clarkson, and Dean C the learning rate is reset to 0.1. This process is repeated seven Barratt, “Automatic multi-organ segmentation on abdom- times, and the model with the lowest validation loss during inal ct with dense v-networks,” IEEE transactions on the training process is selected for comparison. In addition, medical imaging, vol. 37, no. 8, pp. 1822–1834, 2018. we pick the parameters of the minimum loss of the validation [4] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and set in each training cycle, seven models in total, and fuse the Kilian Q Weinberger, “Densely connected convolutional results together for comparison [9]; see Algorithm 2. Table 1 networks,” in Proceedings of the IEEE conference on shows the results with different settings. computer vision and pattern recognition, 2017, pp. 4700– Overall, the fusion results are much better than the single- 4708. model prediction. Denoise in postprocessing further improves the accuracy. Heart and Aorta have much better segmentation [5] Roger Trullo, Caroline Petitjean, Su Ruan, Bernard results than Esophagus and Trachea. Dubray, D Nie, and D Shen, “Segmentation of organs at risk in thoracic ct images using a sharpmask architecture and conditional random fields,” in 2017 IEEE 14th Inter- 5. CONCLUSION national Symposium on Biomedical Imaging (ISBI 2017). IEEE, 2017, pp. 1003–1006. Based on the analysis of the training data, we simplified Dense V-net to perform multi-organ segmentation effectively. [6] Eli Gibson, Wenqi Li, Carole Sudre, Lucas Fidon, We use a variety of optimization techniques such as multi- Dzhoshkun I Shakir, Guotai Wang, Zach Eaton-Rosen, scale prediction, data augmentation, and data postprocessing Robert Gray, Tom Doel, Yipeng Hu, et al., “Niftynet: a to improve the stability and performance of the model. Com- deep-learning platform for medical imaging,” Computer paring to the baseline model of SM+CRF [5], the Dice rate methods and programs in biomedicine, vol. 158, pp. 113– of organ segmentation is improved up to 10%. After our 122, 2018. optimization, there is still room for improvement for small [7] Carole H Sudre, Wenqi Li, Tom Vercauteren, Sebastien organs, and delineation algorithms could help to refine organ Ourselin, and M Jorge Cardoso, “Generalised dice over- boundaries. lap as a deep learning loss function for highly unbalanced segmentations,” in Deep learning in medical image anal- 6. REFERENCES ysis and multimodal learning for clinical decision sup- port, pp. 240–248. Springer, 2017. [1] Olaf Ronneberger, Philipp Fischer, and Thomas Brox, [8] Fabian Isensee, Jens Petersen, Andre Klein, David Zim- “U-net: Convolutional networks for biomedical image merer, Paul F Jaeger, Simon Kohl, Jakob Wasserthal, segmentation,” in International Conference on Medi- Gregor Koehler, Tobias Norajitra, Sebastian Wirkert, cal image computing and computer-assisted intervention. et al., “nnu-net: Self-adapting framework for u-net- Springer, 2015, pp. 234–241. based medical image segmentation,” arXiv preprint arXiv:1809.10486, 2018. [2] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ah- madi, “V-net: Fully convolutional neural networks for [9] Leslie N Smith, “Cyclical learning rates for training neu- volumetric medical image segmentation,” in 2016 Fourth ral networks,” in 2017 IEEE Winter Conference on Ap- International Conference on 3D Vision (3DV). IEEE, plications of Computer Vision (WACV). IEEE, 2017, pp. 2016, pp. 565–571. 464–472.