<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>3D ENHANCED MULTI-SCALE NETWORK FOR THORACIC ORGANS SEGMENTATION</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Qin Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weibing Zhao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chunhui Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liyue Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Changmiao Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhen Li?</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shuguang Cui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guanbin Li</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Shenzhen Research Institute of Big Data</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sun Yat-Sen University</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Chinese University of HongKong (Shenzhen)</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We focus on the segmentation of 4 OAR: heart, aorta, trachea, esophagus through Computed Tomography (CT) thoracic scans, where the esophagus is the most challenging to segment due to its volume imbalance and narrow shape. In this paper, a 3D Enhanced Multi-scale Network (EMSN) is proposed to improve the performance, which is a refined variant of the 3D FCN network. Specifically, an extra stage is added to refine the prediction map by concatenating a preliminary prediction map with the CT image to utilize autocontext. 3D dilated convolution is employed to enlarge the receptive field of convolution kernel without loss of resolution. Besides, more residual connections are added in V-Net to avoid gradient degradation during back-propagation. For data preprocessing, a maximum bounding box is calculated and used to crop raw data to reduce calculation cost. And to achieve data augmentation, registration is performed by aligning each CT image and corresponding ground truth with the others in training set to generate non-linearly transformed new CT images along with annotations. The consensus is utilized by retaining models in the last few epochs to segment and vote for the final prediction. Experiments demonstrate that our EMSN model achieves competitive performance on SegTHOR dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Index Terms— CT Segmentation, V-Net, multi-scale
refine, auto-context, image registration</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION</title>
      <p>Tumor disease is the most common cause of disease deaths.
Using screening with low-dose computed tomography (CT)
is essential for this situation [1]. The target tumor and healthy
organs located near the target tumor is called Organs at Risk
(OAR) which need to be delineated for further irradiation
planning.</p>
      <p>Some studies have aimed to OAR problem in automatic
segmentation over the years. Recently, deep learning-based
multi-organ segmentation in abdominal CT images has been
approached. Deep learning methods have achieved
state-ofthe-art performance due to the strong non-linear modeling
capability. For U-Net [2] in 2D medical image segmentation,
skip connections between upsampling and downsampling
layers combine high-resolution features with the upsampled
output. Then Milletari developed U-Net to 3D V-Net [3]. Both
of them inspired many new frameworks extensively [4]. For
example, the DenseV-Net segmentation network is proposed
to enable high-resolution activation maps due to feature reuse,
and it achieves good accuracy on 3D CT images [5].
Multiscale pyramid 3D-FCNs (MSN) perform good learning
properties for end-to-end training, and it improves segmentation
accuracy of fine structures [6].</p>
      <p>However, it’s still challenging to segment organs with
narrow structures like the esophagus, whose volume is also
explicitly unbalanced compared with other organs like the heart.
Inspired by the multi-scale pyramid of 3D FCN [6], we
propose a 3D Enhanced Multi-Scale Network to overcome the
difficulties mentioned above, where the third stage is
employed to enhance performance. In the last two stages,
contextual information is fused with a high-resolution image as
deep supervision. Preliminary prediction maps are aggregated
with image features to improve the overall segmentation
accuracy. Our proposed EMSN is trained in an end-to-end
fashion, which achieves a promising segmentation result on the
SegTHOR dataset.</p>
      <p>
        In summary, the key contributions are 3-fold: (
        <xref ref-type="bibr" rid="ref3">1</xref>
        ) We
enhance the Multi-scale network by adding one more stage
network and increasing the number of parameters on each
network, as well as dilated convolution layers employed in
EMSN. (
        <xref ref-type="bibr" rid="ref4">2</xref>
        ) According to the maximum bounding box, the
preprocessing method is used to crop the raw data to reduce
heavy calculation burden and improve the accuracy. (
        <xref ref-type="bibr" rid="ref5">3</xref>
        ) We
greatly enlarge the training dataset by using the Nifty image
registration software to change our training data from 40 to
1600, for faster convergence and higher accuracy, and we use
the weights of pretrained abdomen dataset to initialize our
model.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. METHODS</title>
      <p>Holger et al. [6] proposed a multi-scale pyramid of 3D FCN
network as shown in Figure 2. As shown in Figure 1, we
propose an Enhanced Multi-scale Network (EMSN) by adding a
third stage and other boosting methods, which will be
illustrated as follows.</p>
    </sec>
    <sec id="sec-4">
      <title>2.1. Enchanced Multi-scale Networks</title>
      <p>In the first stage, CT scan is downsampled and then fed into
V-Net to train a coarse segmentation map in low resolution to
delineate an approximate location of organs. Basically, lower
resolution downsampled prediction maps have more
contextual information, while high-resolution images are aimed for
local accurate segmentation. So the coarse segmentation map
is upsampled to the original size and then concatenated with
the original input CT image to aggregate multi-scale
contextual information. Assisted by the coarse prediction map,
VNet in stage 2 outputs a better segmentation result.</p>
      <p>We propose to add one more stage to refine segmentation
further. Since scale based auto-context improves the
segmentation performance to a large extent, the segmentation map
from stage 2 is concatenated with the original input cuboid
again to refine the prediction map. The architecture is
illustrated in Figure 1.</p>
      <p>Denote V( ) as the operation of V-Net which can
segment an input of 3D image X to a segmentation map S, i.e.,
S = V(X ). Denote as a concatenating operation. The
superscript ds and us indicate downsampling and upsampling
operation. Then the process of our model can be illustrated
as,</p>
      <p>S1 = V(X ds)
S2 = V(X
S3 = V(X</p>
      <p>S1us)</p>
      <p>S2)
3
^ = arg min 1 X
3
i=1</p>
      <p>
        L(Si; ; L)
(
        <xref ref-type="bibr" rid="ref3">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">4</xref>
        )
where denotes the network parameters, such as
convolutional kernel weights. And L is ground truth label image. L
is the loss function which will be introduced in 2.2.
      </p>
    </sec>
    <sec id="sec-5">
      <title>2.2. Loss function</title>
      <p>The net makes voxel-to-voxel predictions. Like the binary
case in [3], it can be trained by minimizing a dice loss
function:
Lj (Xj ; j ; Lj ) = 1
1 XK
K
k=1</p>
      <p>
        2 PiN pi;k li;k
PiN pi2;k + PiN li;k
2
!
: (
        <xref ref-type="bibr" rid="ref7">5</xref>
        )
where pi;k 2 [0; : : : ; 1] represents the continuous values of
the softmax 3D prediction maps for each class label k of K,
li;k represents the corresponding ground truth.
      </p>
      <p>The final loss function joints 3 stages’ loss functions
together, as following</p>
      <p>L (X ; ; L) =</p>
      <p>
        3
X (Lj (Xj ; j ; Lj )) :
j=1
(
        <xref ref-type="bibr" rid="ref8">6</xref>
        )
where j denotes the order number of the stage network in
the EMSN. We train the 3-stage network end to end with the
joint loss function 6.
      </p>
    </sec>
    <sec id="sec-6">
      <title>2.3. Residual V-net</title>
      <p>V-Net [3] is a volumetric convolution neural network which
performs segmentation on CT images as shown in Figure 1.
Residual connections are applied to our enhanced model to
avoid gradient vanishing problem. Encoder and decoder both
contain 4 blocks. This architecture performs voxel-to-voxel
predictions [7]. For the encoder, the first block has only one
convolution layer with kernel size 3, and the second block has
2 layers. The third and fourth blocks both have 4 layers, and
each layer has the dilation with size 4. And there are
downscale convolution layers with stride 2 to downsample the size
between two adjacent blocks. The number of output channels
of the downscale convolution layer is two times the number
of input channels. The decoder architecture is similar to the
inverse procedure of encoder, except using transpose
convolution layer to upscale the feature maps instead of downscale
convolution layers in the encoder. To enhance the fitting
ability and avoid the over-fitting problem, the dropout rate [8] is
set to be 0.3. Skip connections between encoder and decoder
are similar to U-Net [2] to get a better optimization result and
faster convergence.</p>
    </sec>
    <sec id="sec-7">
      <title>2.4. Increase the number of Parameters</title>
      <p>Parameters are increased in order to capture more features
of data and enhance the fitting ability of the model. Taking
the GPU memory limitation into account, we just increase
the deeper blocks in encoder and decoder path, because the
deeper block’s feature maps are in low resolutions relatively.
So we double the number of layers on the two deepest blocks
from 3 to 6 layers, and break 6 layers into two parts, with
3 layers in each part. Two parts are connected by a residual
connection to avoid gradient vanish. Besides, we add a new
stage 3 network which can also be considered as increasing
the number of parameters.</p>
    </sec>
    <sec id="sec-8">
      <title>2.5. 3D Dilated Convolution</title>
      <p>In the compression path of V-Net, downsampling is
performed after each block to enlarge the receptive field, while
in the decompression path, a de-convolution operation is
employed to recover the size of the feature map to realize
pixel-wise segmentation. Internal data structures and spatial
Fig. 1. Enhanced Multi-scale Network for thoracic organs segmentation in CT images. In the first level, CT scan is
downsampled to a low resolution and then fed into V-Net 1 to predict a rough segmentation map, which will be aggregated with
original CT image as a deep supervision. Therefore V-Net 2 can learn the contextual information and image feature information
simultaneously to provide a more accurate segmentation map in high resolution, which will be concatenated with the input
image again in level 3 to further refine segmentation result.
hierarchical information is lost during this series of
downsampling and upsampling. So 3D dilated convolution [9]
is employed to enlarge the receptive field with the size of
the feature map remains unchanged, which can maintain the
resolution and coverage.</p>
    </sec>
    <sec id="sec-9">
      <title>2.6. Consensus</title>
      <p>It’s biased for the model to do the segmentation task which
loads only one weights file at the end epoch of training. In
order to avoid errors and improve accuracy, we preserve 5
weights of the model in the very last time of the training
progress. These different weighted models to segment CT
image separately. Finally, we average the 5 probability maps
voxel-wise and then assign each voxel to a specific class
label with the largest probability. The prediction error is largely
reduced by this multi-model consensus method.</p>
    </sec>
    <sec id="sec-10">
      <title>3. EXPERIMENTS</title>
      <p>Multi-scale Pyramid 3D FCN Network (MSN) as shown in
Fig. 2 is chosen as the baseline for our experiment, which
will be compared with our Enhanced Multi-scale Network
(EMSN) to show the superiority of our proposed model.</p>
    </sec>
    <sec id="sec-11">
      <title>3.1. Dataset</title>
      <p>Abdomen Dataset [10] which consists of 13 classes of organs
are utilized to pretrain our model. Since the pre-trained model
has learned a rough outline about organs, the parameters of
convolutional kernels are loaded from the shallow layers of
the pre-trained model to initialize our model for a better
prediction on the SegTHOR dataset.</p>
      <p>SegTHOR dataset This dataset consists of 11084 slices
from 60 patients and it has been randomly split into two parts:
40 patients, 7390 slices for the training set; 20 patients, 3694
slices for the testing set. The size of CT scans is 512 512
with a resolution of 0.90 1.37 mm. The number of slices
for every patient is different between 150 and 284. Besides,
z-resolution varies from 0.90 mm to 1.37 mm. The patients
have non-small cell lung cancer and they were fully
anonymous. The reference standard for ground truths is manually
annotated.</p>
      <p>
        Abdomen dataset This dataset consists of 50 abdominal
CT scans which are randomly selected from a combination
of an ongoing colorectal cancer chemotherapy trial. The 50
scans were captured during portal venous contrast phase with
variable volume sizes (512 512 85 512 512 198).
Thirteen abdominal organs were manually labeled and
verified by a radiologist on a volumetric basis using the MIPAV
software, including: (
        <xref ref-type="bibr" rid="ref3">1</xref>
        ) spleen (
        <xref ref-type="bibr" rid="ref4">2</xref>
        ) right kidney (
        <xref ref-type="bibr" rid="ref5">3</xref>
        ) left kidney
(
        <xref ref-type="bibr" rid="ref6">4</xref>
        ) gallbladder (
        <xref ref-type="bibr" rid="ref7">5</xref>
        ) esophagus (
        <xref ref-type="bibr" rid="ref8">6</xref>
        ) liver (
        <xref ref-type="bibr" rid="ref9">7</xref>
        ) stomach (
        <xref ref-type="bibr" rid="ref1">8</xref>
        ) aorta
(
        <xref ref-type="bibr" rid="ref2">9</xref>
        ) inferior vena cava (10) portal vein and splenic vein (
        <xref ref-type="bibr" rid="ref10">11</xref>
        )
pancreas (
        <xref ref-type="bibr" rid="ref11">12</xref>
        ) right adrenal gland (13) left adrenal gland.
      </p>
    </sec>
    <sec id="sec-12">
      <title>3.2. Pre-processing</title>
      <p>The internal organs of the human body are roughly in the
same area. Therefore, we calculate a maximum bounding
box through the dataset statistics, which guarantees that all
the organs of interest in each sample in the dataset can be
contained in the box. Based on the bounding box, raw data is
cropped to get a dataset in a smaller size. After the cropping
pre-processing, the size of each CT data becomes 304 224
slicesgt, where slicesgt represents the number of slices in CT
data containing ground truth organs. This cropping method
can reduce the calculation cost and remove the noises to get
better performance during the training.</p>
      <p>Data augmentation In addition, we use data
augmentation when preparing raw training data, rotation and scale
in 0.7 probability are utilized. NiftyReg [11] is used to align
training data (40 CT scans and ground truth) with all the
other training data respectively. Thus, the training dataset can
be enlarged to 40 39 CT images with annotations, which
tremendously improves the scale of the training dataset.</p>
    </sec>
    <sec id="sec-13">
      <title>3.3. Post-processing</title>
      <p>The original prediction results have some noise like many
small black clouds around the organs. In order to remove
them, we use the largest connected component algorithm to
save only the main area of the respective predicted organs of
interest.</p>
      <p>The resolution of prediction results should be resumed to
raw dataset’s resolution by padding zeros because we crop
the datasets according to the bounding box during the
preprocessing phase.</p>
    </sec>
    <sec id="sec-14">
      <title>3.4. Training Details</title>
      <p>The implementation of our networks is based on
PyTorch1.0.0. The maximum iteration for training is set to 3000
epochs. We use Adam as the optimizer with a learning rate
of 1e 4 which decays at 2000, 2500 epoch. We train
our model end to end with joint loss in three stages on 4
NVIDIA TITAN Xp(Pascal) GPUs. Due to the limitation of
GPU memory, a patch of data with a size of 304 224 48
is randomly selected from a raw dataset and then fed into
the network. By this way, we train the model in patch based
fashion.</p>
      <sec id="sec-14-1">
        <title>Organ</title>
        <p>Esophagus
Heart
Trachea
Aorta
Mean</p>
      </sec>
      <sec id="sec-14-2">
        <title>Organ</title>
        <p>Esophagus
Heart
Trachea
Aorta</p>
        <p>Mean
Dice metric(DM) [3] and Hausdorff distance(HD) [12] serve
as the standard evaluation metrics. And the best segmentation
results obtained from our experiments are shown in table 1.
Especially, the organ which is hard to segment (e.g.
esophagus) outperforms than other teams.</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Individual Component Analysis Particularly, the seg</title>
      <p>mentation performance of hard organs, like esophagus with
long and narrow shape, is improved significantly by
multiscale refinement.</p>
    </sec>
    <sec id="sec-16">
      <title>Comparison between different models As shown in</title>
      <p>Table 1, high performance on all organs is obtained by
Multiscale network(EMSN). EMSN* denotes training EMSN with
data augmentation which includes initializing the weights by
loading the Abdomen dataset pretrained model and around
1600 new samples are generated by NiftyReg. EMSN+
denotes increasing more convolution layers based on EMSN.
EMSN*+ denotes including both two methods mentioned
above. The experiments show that the best performance for
each organ can be achieved by EMSN*+.</p>
    </sec>
    <sec id="sec-17">
      <title>4. CONCLUSIONS</title>
      <p>In this paper, we proposed a 3D 3-stage Enhanced
Multiscale Network (EMSN) to address the 4 organs at risk 3D
data segmentation problem. Our network refines the
prediction through a progressive auto-context procedure. The
experiment results demonstrate that compared with baseline, the
overall performance on all classes improves a lot. Specially,
the segmentation result of the hard class (esophagus) is also
remarkably improved.</p>
      <p>Acknowledgments</p>
      <p>This work is supported by the Shenzhen Fundamental
Research Fund under grants No. KQTD2015033114415450,
No. ZDSYS201707251409055, and grant No. 2017ZT07X152. [10] Bennett Landman et al, “Multi-atlas labeling
beyond the cranial vault - workshop and challenge,”
https://www.synapse.org/#!Synapse:
5. REFERENCES syn3193805/wiki/89480, February 14, 2015.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Geoffrey</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “
          <article-title>Improving neural networks by preventing co-adaptation of feature detectors,” CoRR</article-title>
          , vol.
          <source>abs/1207.0580</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Fisher</given-names>
            <surname>Yu</surname>
          </string-name>
          and
          <article-title>Vladlen Koltun, “Multi-scale context aggregation by dilated convolutions</article-title>
          ,
          <source>” arXiv preprint arXiv:1511.07122</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>National</given-names>
            <surname>Lung</surname>
          </string-name>
          Screening Trial Research Team, “
          <article-title>Reduced lung-cancer mortality with low-dose computed tomographic screening</article-title>
          ,
          <source>” New England Journal of Medicine</source>
          , vol.
          <volume>365</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>395</fpage>
          -
          <lpage>409</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ronneberger</surname>
            <given-names>O</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            <given-names>P</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Brox</surname>
            <given-names>T</given-names>
          </string-name>
          , “
          <article-title>U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention</article-title>
          . Springer,
          <year>2015</year>
          , pp.
          <fpage>234</fpage>
          -
          <lpage>241</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Milletari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Navab</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Ahmadi</surname>
          </string-name>
          , “
          <article-title>V-net: Fully convolutional neural networks for volumetric medical image segmentation</article-title>
          ,
          <source>” in Fourth International Conference on 3d Vision</source>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Özgün</given-names>
            <surname>Çiçek</surname>
          </string-name>
          , Ahmed Abdulkadir, Soeren S Lienkamp, Thomas Brox, and Olaf Ronneberger, “
          <article-title>3d u-net: learning dense volumetric segmentation from sparse annotation,” in International conference on medical image computing and computer-assisted intervention</article-title>
          . Springer,
          <year>2016</year>
          , pp.
          <fpage>424</fpage>
          -
          <lpage>432</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Eli</given-names>
            <surname>Gibson</surname>
          </string-name>
          , Francesco Giganti, Yipeng Hu, Ester Bonmati, Steve Bandula, Kurinchi Gurusamy, Brian Davidson, Stephen P. Pereira,
          <string-name>
            <given-names>Matthew J</given-names>
            . Clarkson, , and
            <surname>Dean</surname>
          </string-name>
          <string-name>
            <given-names>C.</given-names>
            <surname>Barratt</surname>
          </string-name>
          , “
          <article-title>Automatic multi-organ segmentation on abdominal ct with dense v-networks,” in IEEE Transactions on Medical Imaging</article-title>
          . IEEE,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Holger</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Roth</surname>
          </string-name>
          , Chen Shen, Hirohisa Oda, Takaaki Sugino, Masahiro Oda, Yuichiro Hayashi, Kazunari Misawa, and Kensaku Mori, “
          <article-title>A multi-scale pyramid of 3d fully convolutional networks for abdominal multi-organ segmentation</article-title>
          ,” in International Conference on Medical Image Computing and Computer Assisted Intervention. Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Long</surname>
          </string-name>
          , Evan Shelhamer, and Trevor Darrell, “
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          ,”
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>3431</fpage>
          -
          <lpage>3440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <article-title>The Centre for Medical Image Computing</article-title>
          at University College London, “Niftyreg Software,” http://cmictig.cs.ucl.ac.uk/wiki/ index.php/NiftyReg, February 5,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Daniel</surname>
            <given-names>P Huttenlocher</given-names>
          </string-name>
          , William J Rucklidge, and
          <article-title>Gregory A Klanderman, “Comparing images using the hausdorff distance under translation</article-title>
          ,”
          <source>in Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE</source>
          ,
          <year>1992</year>
          , pp.
          <fpage>654</fpage>
          -
          <lpage>656</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>