Robust 3D U-Net Segmentation
                                               of Macular Holes

                                          Jonathan Frawley1,2[0000−0002−9437−7399] , Chris
                                        1[0000−0001−6821−3924]
                          G. Willcocks                       , Maged Habib3,4[0000−0003−0931−3786] , Caspar
                                    3[0000−0002−2778−6344]
                             Geenen                        , David H. Steel3,4[0000−0001−8734−3089] , and
                                             Boguslaw Obara2,4,5[0000−0003−4084−7778]
                                  1
                                     Department of Computer Science, Durham University, Durham, UK
                                                           2
                                                             Gliff.ai, Durham, UK
                                               3
                                                 Sunderland Eye Infirmary, Sunderland, UK
                                 4
                                    Bioscience Institute, Newcastle University, Newcastle Upon Tyne, UK
                                5
                                   School of Computing, Newcastle University, Newcastle Upon Tyne, UK


                                 Abstract. Macular holes are a common eye condition which result in
                                 visual impairment. We look at the application of deep convolutional neu-
                                 ral networks to the problem of macular hole segmentation. We use the
                                 3D U-Net architecture as a basis and experiment with a number of de-
                                 sign variants. Manually annotating and measuring macular holes is time
                                 consuming and error prone, taking dozens of minutes to annotate a single
                                 3D scan. Previous automated approaches to macular hole segmentation
                                 take minutes to segment a single 3D scan. We found that, in less than one
                                 second, deep learning models generate significantly more accurate seg-
                                 mentations than previous automated approaches (Jaccard index boost of
                                 0.08 − 0.09) and expert agreement (Jaccard index boost of 0.13 − 0.20).
                                 We also demonstrate that an approach of architectural simplification, by
                                 greatly simplifying the network capacity and depth, results in a model
                                 which is competitive with state-of-the-art models such as residual 3D
                                 U-Nets.

                                 Keywords: Machine learning · image processing and computer vision ·
                                 medicine · segmentation · neural nets · retina · macular holes.


                          1    Introduction

                          Idiopathic full thickness macular holes (iFTMH) are a common, and visually
                          disabling condition, being bilateral in 10% of affected individuals. They occur
                          at a prevalence of approximately 1 in 200 of the over 60-year-old population
                          with an incidence of approximately 4000 per annum in the United Kingdom
                          (UK)[1,13]. If left untreated they result in visual acuity below the definition of
                          blindness and typically greater than 1.0 logMAR (logarithm of the minimum
                          angle of resolution), where 0.1 logMAR is classed as normal.
                              3D high-resolution images of the retina can be created using optical coher-
                          ence tomography (OCT) [9]. It is now the standard tool for diagnosing macular


Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
2       J. Frawley et al.

holes [7]. Compared to previous imaging methods, OCT can more easily assist a
clinician in differentiating a full-thickness macular hole from mimicking pathol-
ogy, which is important in defining appropriate treatment [9]. An OCT scan of
a macular hole is a 3D volume. Clinicians, however, typically view OCT scans
as a series of 2D images, choose the central slice with maximum dimensions and
perform measurements which are predictors of anatomical and visual success
such as base diameter, macular hole inner opening and minimum linear diam-
eter [12,2,15,19]. This approach is limited as it assumes that the macular hole
base is circular, and would give incorrect results when it is elliptical [16], which
is typically the case [2]. With the advent of automated 3D approaches, it is pos-
sible to begin to look at measurements in 3D and how they might be predictors
of anatomical and visual success.
    Neural networks are an interconnected group of artificial neurons, which can
be reconfigured to solve a problem based on data. Convolutional neural networks
(CNN) are a type of neural network inspired by how the brain processes visual
information [11]. CNNs have been very successful in computer vision problems,
such as automating the segmentation of medical images. For a CNN to learn to
segment images in a supervised manner, it needs to have access to images with
associated ground truth (GT) information which highlight the areas of the image
for the task at hand. This is often done manually which is time consuming and
requires expert knowledge.
    The U-Net CNN architecture [18] is a highly utilized CNN architecture for
biomedical image segmentation for use on 2D images. It has had success in seg-
mentation to help diagnose other eye conditions such as macular edema, even
when dataset sizes are limited [5]. We sought to examine the application of vari-
ants of the U-Net architecture to the problem of macular hole segmentation. Our
proposed model is a smaller version of the model from the original 3D U-Net
paper [3]. We also implemented and evaluated the proposed model with residual
blocks added, similar to those described by He et al. [8]. In addition, we imple-
mented a much more complex residual model, DeepMind’s OCT segmentation
model [4], and ran the same tests with it.
    Alternatives to U-Net have been created such as V-Net [14] which uses 3D
convolutions and a Dice score-based loss. We use binary cross-entropy as our loss
function, similar to the weighted cross entropy used in the original 3D U-Net
paper. Early experiments showed that binary cross-entropy outperformed a Dice
score-based loss for our problem. Additionally, a study that did a comparison
between multiple model architectures on another biomedical image segmenta-
tion problem showed that V-Net-based models did not outperform U-Net-based
models [6]. For these reasons, we chose 3D U-Net as the basis of our model rather
than V-Net.
    Our contribution can be summarized as developing an automated approach to
macular hole segmentation based on deep learning which yields significantly im-
proved results compared to prior methods. We present a comparison of the above-
mentioned models against the current state-of-the-art automated approach [16].
The state-of-the-art method is a level set approach which does not use deep learn-
                              Robust 3D U-Net Segmentation of Macular Holes                                             3


    160x188x49x1                                                                                         160x188x49x1


                            80x94x24x64                                 80x94x24x128

                                                                                  =    Conv3d, batchnorm3d + ReLU
                                                                                  =    ConvTranspose3d
                                                                                  =    Up convolution
                                                                                  =    Max pooling
                                                                                  =    Skip connection
                                          40x47x12x128   40x47x12x256


Fig. 1: Small 3D U-Net (M1 ). The proposed model is a cut-down version of 3D
U-Net [3]. It has fewer levels and a carefully optimized capacity for our datasets.


ing. We show that simple low-capacity 3D U-Nets are capable of outperforming
the state-of-the-art automated approach and that increasing the complexity of
the architecture does not improve performance. The PyTorch-based code for this
work has been released as an open-source project 6 .


2       Materials and Methods

2.1       Materials

All had undergone Spectral domain optical coherence tomography (SDOCT)
imaging using the Heidelberg Spectralis (Heidelberg, Germany) as part of routine
care, using the same imaging protocol. A high density central horizontal scanning
protocol with 29-30 micron line spacing was used in the central 15 by 5 degrees.
The individual OCT line scans were 768 × 496 pixels with the scaling varying
slightly between datasets but typically equating to 5.47 microns per pixel in the
x (horizontal) axis and 3.87 microns per pixel in the y (vertical) axis. With 29-
30 microns spacing between scans (z axis), there were 49 scans per dataset. All
scans used a 16 automatic real time setting enabling multisampling and noise
reduction over 16 images. All scans collected were from unique patients and were
stored using the uncompressed TIFF file format.
    All images were cropped to the same size and unnecessary information such
as the fundus image were removed. Annotations were created by a mixture of
clinicians and image experts using a 3D image annotation tool. Pixels on each
slice of the OCT scan which represented macular hole were highlighted. There
6
     https://github.com/gliff-ai/robust-3d-unet-macular-holes
4      J. Frawley et al.

were 85 (image, annotation) pairs in the training dataset, 56 after combining
annotations from multiple authors. There were 22 pairs in the validation dataset
and 9 in the unseen test set.
    Originally we had three annotations for each OCT image in the unseen test
set. However, due to inconsistencies between authors, we combined all ground
truths into a single ground truth per image. To do this, we used a voting system,
where if 23 of the authors had annotated a voxel, that voxel was annotated in the
resultant ground truth. All images and ground truths at full size had dimensions
321 × 376 × 49. We did not augment our dataset as we found that augmentations
did not improve the generalizability of our model. As we believe that our test and
validations sets are large enough to be representative of the real-world problem,
this was not deemed to be an issue.


2.2   Methods

Image segmentation involves the labelling of objects of interest in an image.
For a 3D image, this is done by assigning voxels with shared characteristics to
corresponding class labels. We wished to assign areas of the macular hole volume
in an OCT image to white voxels and all other regions to black voxels.
    We used binary cross-entropy as our loss function, which tells us how close
our predicted macular hole regions are to those in the ground truth:
                                N
                            1 X
                 LBCE = −         pi log qi + (1 − pi ) log(1 − qi ),          (1)
                            N i=1

N being the batch size, pi being the ground truth and qi being the output of our
model. For images with multiple annotations in our training set, we trusted them
with equal integrity and the target probabilities were averaged. The validation
set had no samples with multiple annotations. As described in Section 2.1, for
the unseen test set, we used a voting process to decide on the final target ground
truth.
    U-Net takes as input a 2D image and outputs a set of probabilities. Each
entry in the output is the probability of each part of the image being a part
of the segmented region. It is a U-shaped CNN architecture, consisting of a
contracting path and an expansive path. The contracting path consists of 2D
convolutions, ReLU activations and 2D max pooling at each level. The expansive
path’s levels use skip connections to their contracting path equivalent, along with
2D convolutions, ReLU activations and 2D up convolutions. Skip connections
allow for high-resolution information to be captured by the model while the
contracting/expansive paths capture the abstract shape of the segmentation.
The 3D U-Net architecture [3] is a version of U-Net designed for use with 3D
images which uses 3D convolutions, up convolutions and max pooling layers.
This allows for improved segmentation of 3D images as the context from multiple
slices are used to decide whether an individual voxel is an object or not.
    A number of models based on the 3D U-Net architecture were compared:
                           Robust 3D U-Net Segmentation of Macular Holes           5

 M1 : Small 3D U-Net (Proposal) [5,216,353 parameters],
 M2 : Small residual 3D U-Net (Residual) [13,928,833 parameters],
 M3 : Residual 3D U-Net for 2D slices (DeepMind) [4] [470,333,089 parameters].

    A diagram of model M1 is shown in Fig. 1. Early on, versions of 2D U-Net
were implemented as described in the original paper, however, performance was
very poor for our dataset. The original 2D U-Net model has an input size of
572 × 572 and an output size of 388 × 388. The poor performance we noticed is
likely due to a lack of context from multiple slices. In addition, the input to the
original 2D U-Net model is of a higher resolution than our image slices, which
have a resolution of 321 × 376. Therefore, we needed to upscale our images which
resulted in distortion and wasted memory usage. Similarly, we also implemented
3D U-Net as described in the original paper, however, this also performed poorly.
Again, this is primarily due to the input and output sizes of the model being too
dissimilar to our dataset’s. The original 3D U-Net model has an input size of
132 × 132 × 116. Our images only have 49 slices, which needed to be upscaled to
116 slices for this. This resulted in significant distortion of the input. The output
of the original 3D U-Net model is also of a very low resolution: 44 × 44 × 28
which would have been very coarse when upscaled to our image dimensions of
321 × 376 × 49. As our images were of resolution 321 × 376 × 49, we aimed to keep
the resolution of the input and output as close to this as possible. Different to
the original U-Net and 3D U-Net papers, it was decided to keep the input and
output dimensions equal to each other, to maximise the resolution of our output.
We tweaked convolution sizes, padding and strides until we achieved this goal,
while still fitting in available GPU memory.
    Our experiments showed that using three levels for this model resulted in the
best performance, rather than the four levels that the original 3D U-Net paper
used. A scaled-down input image of 160 × 188 × 49 yielded the best results for
models M1 and M2 . The output is of the same dimensions as the input. M2 is
similar to M1 except that residual blocks have been added to each level. M3 is
a very deep residual 3D U-Net architecture which takes nine slices of the OCT
image as input and outputs a 2D probability map as output, representing the
segmentation of a single slice of the OCT image. For M3 , the slice which we
want to segment, along with 4 slices on either side is input to the model, which
is a 321 × 376 × 9 image. This is based on a model architecture developed by
DeepMind for segmenting OCT images [4]. For slices near the boundaries, we
use mirroring to handle slices that are outside of the image. It outputs a set of
321×376 probabilities, corresponding to one slice of the 3D OCT. M3 , therefore,
requires 49 iterations to segment a whole 3D OCT image in our dataset. Model
M3 has the most parameters of the models tested, with M1 having the fewest
parameters.
    The Jaccard index was used as the primary metric for measuring the perfor-
mance of each method. This is one of the standard measures of the performance
of image segmentation methods, especially in medical image segmentation [20].
The Dice similarity coefficient (DSC) is another commonly used metric and is
closely related to the Jaccard index, with one being computable from the other.
6      J. Frawley et al.

For completeness and ease of comparison with other results, we also provide the
DSC for our proposed model in Section 4.2.


3   Implementation
Our experiments were all conducted using the Python programming language
and the PyTorch [17] deep learning framework on NVIDIA Turing GPUs with
24GB of memory. PyTorch is a state-of-the-art framework for building deep
learning models which is highly optimized for modern GPU hardware. We trained
each model for 500 epochs where each epoch ran over 10 3D images, which was
enough for all models to stop substantially improving. This means that the
models which output a 3D segmentation (M1 and M2 ) had 10 iterations per
epoch, and the slice-based model (M3 ) had 490 iterations per epoch. As source
code was not released for DeepMind’s model, M3 was implemented as closely as
possible to the description provided in the original paper and slightly adapted
to fit the binary classification problem.
    In order to evaluate models M1 and M2 , we scaled up the output probability
map to its original size using trilinear interpolation and thresholded it at 0.5
to generate a binary segmentation. For model M3 , we individually ran over
all 49 slices of an image and recombined the 49 2D probability maps into a
single 3D probability map. We then thresholded this combined map at 0.5 to
generate a 3D binary segmentation. The Adam optimization algorithm [10] was
used to optimize parameters of the models, with hyperparameters being found
by experimentation. The BCEWithLogitsLoss function in PyTorch was used for
loss calculation, which combines a sigmoid activation and binary cross entropy
loss into one function. A similar number of experiments were conducted for each
model. For model M1 , a learning rate of 1e−4 and weight decay of 1e−6 was
used. For model M2 , a learning rate of 1e−4 and weight decay of 1e−5 was
used. For model M3 , a learning rate of 7.5e−5 was used and weight decay was
disabled. The 3D OCT images were normalized to the [0, 1] range prior to scaling
or slicing.
    Each model was trained and evaluated separately three times to assess the
consistency of our results. We then calculated the Jaccard index, comparing each
of the models’ predictions with the ground truth. Due to the fact that we only
had a small number of images with multiple authors, we decided to keep the
training, validation and unseen test sets static for all tests rather than using
k-fold cross-validation. We reserved all images which had three annotations for
the unseen test set, in order for us to be able to compare our results with expert
agreement, which was a key goal of the research.


4   Results
In this section, we look at evaluating our models both qualitatively and quan-
titatively. For qualitative results, we primarily present results in 2D for ease of
comparison with other methods. We also present a sample of segmented macular
                           Robust 3D U-Net Segmentation of Macular Holes           7


  OCT scan    Ground truth    Nasrulloh   M1 (Proposal)M2 (Residual)M3 (DeepMind)

Fig. 2: Qualitative output on the unseen test set of our trained macular hole
model (M1 ) compared with the ground truth, the state-of-the-art automated
approach (Nasrulloh) [16], the residual model (M2 ) and DeepMind’s model (M3 ).
For clarity, we zoomed in on the predicted regions.


hole volumes in 3D to demonstrate that our method captures the 3D shape of
the volume. For quantitative results, we present an image-by-image comparison
of each model’s performance using the Jaccard index against the state-of-the-art
method. We then present a variety of other image segmentation metrics on the
proposed model for ease of comparison with other methods.

4.1   Qualitative Results
The qualitative results of running inference on the trained macular hole models
are generally quite close to the ground truth, as seen in Fig. 2. In general, predic-
tions from all of the models are closer to the ground truth than the state-of-the-
art automated approach. We can see that the qualitative difference between the
models tested is not hugely significant. This is surprising as M3 has significantly
more capacity than M1 . This shows that adding more capacity to a model of a
particular architecture does not necessarily yield an improvement in qualitative
output.
    3D visualizations of the output of our proposed model can be seen in Table 1.
We can see how the 3D shape of the macular holes is preserved, and matches
figures from similar works [16]. This type of view would allow the clinician to
view the macular hole from every angle, rather than the 2D views which are
currently widely used.

4.2   Quantitative Results
Fig. 3 shows how the average Jaccard index on the unseen test set improved
as M1 was trained and we can see that after 200 epochs it had surpassed the
performance of the state-of-the-art automated approach and expert agreement.
8      J. Frawley et al.


Table 1: 3D and 2D segmentation output of model M1 (Proposal) on the unseen
test along with ground truth.
           3D segmentation          2D segmentation     2D ground truth


    All of the trained macular hole models perform very well compared to the
state-of-the-art automated approach [16] as we see in Table 2. Despite model M1
having by far the fewest parameters, it achieves performance which is similar to
the highest-capacity model, and in some cases surpasses it. Further results in
Table 3 show that M1 performs consistently well under other standard segmen-
tation quality measures.


5   Discussion

The results show that previous automated approaches to this problem cannot
compete with deep learning methods. All of the models tested performed signif-
icantly better than the level set method.
    If we examine the model results in isolation, we can see that the results
can be divided into two categories: the high-capacity 3D U-Net model (M3 )
                                  Robust 3D U-Net Segmentation of Macular Holes   9

                         1


                        0.8
        Jaccard index

                        0.6


                        0.4
                                                            M1 (Proposal)
                                                               Nasrulloh
                        0.2                               Expert 1 vs Expert 2
                                                          Expert 1 vs Expert 3
                                                          Expert 2 vs Expert 3
                         0
                              0   100       200           300        400
                                                  Epoch

Fig. 3: Average Jaccard index of our proposed model (M1 ) over 3 runs on the
unseen test set as the model was trained. We see that the model achieves signif-
icantly better results than the state-of-the-art automated approach (Nasrulloh)
and expert agreement. M1 exceeded expert agreement by a Jaccard index of
0.13 − 0.20.


and the lower-capacity 3D U-Nets (M1 and M2 ). The low-capacity 3D U-Nets
achieve the best results on the unseen test set. The high-capacity model, which
has many times the number of parameters of the M1 model, does not have
better generalizability. This is even more surprising given that the high-capacity
model takes the full-resolution image as input and also outputs a full-resolution
segmentation. Given that the low-capacity 3D U-Nets use a downsized 3D image
as input and output, we would expect them to perform worse due to not having
the same amount of information available. The fact that this does not occur
implies that the chosen models do not need very high-resolution input and output
to make an accurate segmentation of macular holes in OCT images.
    It is a counterintuitive finding that we do not see an improvement in perfor-
mance for a model which takes a full-resolution image and which has a signifi-
cantly higher capacity. In a similar problem, a high-profile study used this high-
capacity model for their segmentation [4]. Since that work did not present results
from different architectures as we have done, it is difficult to know whether our
results would be replicated there. Our work clearly shows that for some biomed-
ical segmentation problems, it is important to consider lower-capacity models in
addition to higher-capacity models.
10        J. Frawley et al.


Table 2: Jaccard index comparison between the state-of-the-art automated ap-
proach (Nasrulloh) and tested models on the unseen test set (mean and standard
deviation over three runs except for state-of-the-art which is deterministic).
     Image       Nasrulloh     M1 (Proposal)     M2 (Residual)     M3 (DeepMind)
     Image 1       0.714       0.865 ± 0.009      0.868 ± 0.002     0.832 ± 0.006
     Image 2       0.743       0.891 ± 0.02       0.887 ± 0.014     0.893 ± 0.012
     Image 3       0.772       0.887 ± 0.004      0.885 ± 0.002     0.872 ± 0.006
     Image 4       0.811       0.895 ± 0.012      0.884 ± 0.001     0.894 ± 0.006
     Image 5       0.787       0.894 ± 0.005      0.901 ± 0.003     0.875 ± 0.014
     Image 6       0.678       0.804 ± 0.008      0.815 ± 0.007     0.765 ± 0.006
     Image 7       0.845       0.907 ± 0.002      0.905 ± 0.004     0.893 ± 0.009
     Image 8       0.874       0.874 ± 0.012      0.862 ± 0.002     0.893 ± 0.006
     Image 9       0.787       0.869 ± 0.019      0.853 ± 0.008     0.835 ± 0.007
     Mean          0.779       0.876 ± 0.012      0.874 ± 0.006     0.861 ± 0.008


Table 3: Other metrics for model M1 (Proposal) on the unseen test set (mean and
standard deviation over three runs, DSC refers to the Dice similarity coefficient,
AVD refers to absolute volume difference and AP refers to average precision).
Image          Precision         Recall            DSC              AVD              AP
Image 1      0.93 ± 0.009     0.926 ± 0.012    0.928 ± 0.005    1352 ± 908.357    0.862 ± 0.01
Image 2      0.954 ± 0.008    0.931 ± 0.014    0.942 ± 0.011    1379 ± 369.396   0.889 ± 0.021
Image 3      0.949 ± 0.003    0.931 ± 0.003     0.94 ± 0.002    2308 ± 517.533   0.885 ± 0.004
Image 4      0.974 ± 0.005    0.917 ± 0.016    0.945 ± 0.007   5293 ± 1817.849   0.895 ± 0.011
Image 5      0.915 ± 0.012    0.974 ± 0.007    0.944 ± 0.003     2320 ± 763.08   0.892 ± 0.005
Image 6      0.848 ± 0.003     0.94 ± 0.007    0.891 ± 0.005     1911 ± 80.168   0.797 ± 0.009
Image 7      0.965 ± 0.002    0.938 ± 0.001    0.951 ± 0.001     1564 ± 157.11   0.906 ± 0.002
Image 8      0.898 ± 0.006    0.971 ± 0.008    0.933 ± 0.007    4544 ± 155.656   0.872 ± 0.013
Image 9      0.917 ± 0.016    0.943 ± 0.012     0.93 ± 0.011    1186 ± 906.832    0.865 ± 0.02


   Our work concentrates on looking at OCT images from a particular type
of device, from a single manufacturer. For future work, other models of OCT
device should be tested and compared with our results. It has been shown that
models trained on one device can be relatively easily trained to work with other
devices [4]. Our data is from a particular population centre, namely North East
England. For future work, it would be interesting to see if our results are repli-
cated in other population centres, both nationally and internationally. As we
have made our code available as an open-source project, it is hoped that this
can be achieved.
                             Robust 3D U-Net Segmentation of Macular Holes               11

6    Conclusions
All of the models tested exceeded the performance of the state-of-the-art auto-
mated approach which is a level set method. It is clear that deep learning meth-
ods allow for the generation of segmentations that are closer to what humans
provide. Despite M3 having 90 times the parameters of M1 , M1 gives excellent
qualitative and quantitative results which are of a similar quality to M3 . M1 ’s
performance exceeded expert agreement by a Jaccard index of 0.13 − 0.20. As
M1 is the smallest model, it requires the least amount of resources to run. M1 is
also a quick model to run, requiring only one pass through the whole 3D image,
whereas M3 requires one pass per slice. Once trained, M1 is capable of seg-
menting an OCT image in less than one second. In contrast, the state-of-the-art
automated method requires minutes to run [16]. For these reasons, M1 is the best
candidate to form the basis of future studies in a clinical setting. These findings
show that careful tuning and in some cases architectural simplification can, for
some simple task distributions, be as effective as very deep residual designs.
    The code is provided as an open-source project in order for future researchers
to replicate our results and build upon this research. Training and testing on
different populations with different demographics will be crucial to determine
that our trained models do not exhibit any bias. The lack of large-scale open data
sets from different population centres for OCT imagery makes this a significant
challenge that needs to be overcome.

7    Conflicts of Interest
In accordance with his ethical obligation as a researcher, Jonathan Frawley re-
ports that he received funding for his PhD from Gliff.ai. Some of the work
described was developed as part of his work as an employee at Gliff.ai. Gliff.ai
also provided annotations created by non-clinicians. Data and annotations by
the clinician for this project were kindly provided by Maged Habib, Caspar Gee-
nen and David H. Steel of the Sunderland Eye Infirmary, South Tyneside and
Sunderland NHS Foundation Trust, UK. All images were collected as part of
routine care and anonymised.

References
 1. Ali, F.S., Stein, J.D., Blachley, T.S., Ackley, S., Stewart, J.M.: Incidence of and risk
    factors for developing idiopathic macular hole among a diverse group of patients
    throughout the United States. JAMA Ophthalmology 135(4), 299–305 (2017)
 2. Chen, Y., Nasrulloh, A.V., Wilson, I., Geenen, C., Habib, M., Obara, B., Steel,
    D.H.: Macular hole morphology and measurement using an automated three-
    dimensional image segmentation algorithm. BMJ Open Ophthalmology 5(1),
    e000404 (2020)
 3. Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net:
    learning dense volumetric segmentation from sparse annotation. In: International
    Conference on Medical Image Computing and Computer-Assisted Intervention. pp.
    424–432 (2016)
12      J. Frawley et al.

 4. De Fauw, J., Ledsam, J.R., Romera-Paredes, B., Nikolov, S., Tomasev, N., Black-
    well, S., et al.: Clinically applicable deep learning for diagnosis and referral in
    retinal disease. Nature Medicine 24(9), 1342 (2018)
 5. Frawley, J., Willcocks, C.G., Habib, M., Geenen, C., Steel, D.H., Obara, B.: Seg-
    mentation of macular edema datasets with small residual 3D U-Net architectures.
    In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering
    (BIBE). pp. 582–587 (2020). https://doi.org/10.1109/BIBE50027.2020.00100
 6. Ghavami, N., Hu, Y., Gibson, E., Bonmati, E., Emberton, M., Moore, C.M., Bar-
    ratt, D.C.: Automatic segmentation of prostate MRI using convolutional neural
    networks: Investigating the impact of network architecture on the accuracy of vol-
    ume measurement and MRI-ultrasound registration. Medical Image Analysis 58,
    101558 (2019)
 7. Goldberg, R.A., Waheed, N.K., Duker, J.S.: Optical coherence tomography in the
    preoperative and postoperative management of macular hole and epiretinal mem-
    brane. British Journal of Ophthalmology 98(Suppl 2), ii20–ii23 (2014)
 8. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks.
    In: European Conference On Computer Vision. pp. 630–645 (2016)
 9. Hee, M.R., Puliafito, C.A., Wong, C., Duker, J.S., Reichel, E., Schuman, J.S.,
    Swanson, E.A., Fujimoto, J.G.: Optical coherence tomography of macular holes.
    Ophthalmology 102(5), 748–756 (1995)
10. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio,
    Y., LeCun, Y. (eds.) International Conference on Learning Representations (2015)
11. Lindsay, G.W.: Convolutional neural networks as a model of the visual system:
    Past, present, and future. Journal of Cognitive Neuroscience p. 1–15 (Feb 2020).
    https://doi.org/10.1162/jocn a 01544
12. Madi, H.A., Masri, I., Steel, D.H.: Optimal management of idiopathic macular
    holes. Clinical Ophthalmology (Auckland, NZ) 10, 97 (2016)
13. McCannel, C.A., Ensminger, J.L., Diehl, N.N., Hodge, D.N.: Population-based
    incidence of macular holes. Ophthalmology 116(7), 1366–1369 (2009)
14. Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully convolutional neural networks
    for volumetric medical image segmentation. In: 2016 Fourth International Confer-
    ence on 3D Vision (3DV). pp. 565–571 (2016)
15. Murphy, D.C., Nasrulloh, A.V., Lendrem, C., Graziado, S., Alberti, M., la
    Cour, M., Obara, B., Steel, D.H.: Predicting postoperative vision for mac-
    ular hole with automated image analysis. Ophthalmology Retina (2020).
    https://doi.org/10.1016/j.oret.2020.06.005,         http://www.sciencedirect.com/
    science/article/pii/S2468653020302311
16. Nasrulloh, A., Willcocks, C., Jackson, P.T., Geenen, C., Habib, M.S., Steel, D.H.,
    Obara, B.: Multi-scale segmentation and surface fitting for measuring 3D macular
    holes. IEEE Transactions on Medical Imaging 37(2), 580–589 (2018)
17. Paszke, A., Gross, S., Massa, F., et al.: PyTorch: an imperative style, high-
    performance deep learning library. In: Advances in Neural Information Processing
    Systems 32, pp. 8024–8035 (2019)
18. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomed-
    ical image segmentation. In: Medical Image Computing and Computer-Assisted
    Intervention. vol. 9351, pp. 234–241 (2015)
19. Steel, D.H., Donachie, P.H., Aylward, G.W., Laidlaw, D.A., Williamson, T.H.,
    Yorston, D.: Factors affecting anatomical and visual outcome after macular hole
    surgery: findings from a large prospective UK cohort. Eye pp. 1–10 (2020)
20. Taha, A.A., Hanbury, A.: Metrics for evaluating 3D medical image segmentation:
    analysis, selection, and tool. BMC Medical Imaging 15(1), 1–28 (2015)