=Paper= {{Paper |id=Vol-3226/paper6 |storemode=property |title=Learning to Segment from Object Sizes |pdfUrl=https://ceur-ws.org/Vol-3226/paper6.pdf |volume=Vol-3226 |authors=Denis Baručić,Jan Kybic |dblpUrl=https://dblp.org/rec/conf/itat/BarucicK22 }} ==Learning to Segment from Object Sizes== https://ceur-ws.org/Vol-3226/paper6.pdf

Learning to segment from object sizes
Denis Baručić1 , Jan Kybic1
1
Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague

Abstract
Deep learning has proved particularly useful for semantic segmentation, a fundamental image analysis task. However, the
standard deep learning methods need many training images with ground-truth pixel-wise annotations, which are usually
laborious to obtain and, in some cases (e.g., medical images), require domain expertise. Therefore, instead of pixel-wise
annotations, we focus on image annotations that are significantly easier to acquire but still informative, namely the size of
foreground objects. We define the object size as the maximum Chebyshev distance between a foreground and the nearest
background pixel. We propose an algorithm for training a deep segmentation network from a dataset of a few pixel-wise
annotated images and many images with known object sizes. The algorithm minimizes a discrete (non-differentiable) loss
function defined over the object sizes by sampling the gradient and then using the standard back-propagation algorithm.
Experiments show that the new approach improves the segmentation performance.

Keywords
semantic segmentation, weakly-supervised learning, deep learning, distance transform

1. Introduction ing to each foreground pixel the shortest distance to the
background. Finally, the object size is defined as double
Semantic segmentation is the process of associating a the maximum of the computed distances.
class label to each pixel of an image. With the advent of Due to the thresholding, the cost function is not differ-
deep learning, deep networks have achieved incredible entiable and it is therefore not possible to use the standard
performance on many image processing tasks, including gradient descent for learning. We overcome this obstacle
semantic segmentation. Deep learning for semantic seg- by adding random noise to the output of our network.
mentation has many benefits; for example, it is flexible The predicted binary masks then become stochastic and
w.r.t. the model architecture and scales particularly well the gradient can be sampled. A detailed description of
[1, 2]. On the contrary, the standard deep learning de- our method is given later in Sec. 2 and 3.
mands many ground-truth (GT) pixel-wise annotations to
prevent overfitting. Since a human expert annotator must
usually provide the GT annotations, acquiring a good-
1.2. Related work
quality training dataset can be difficult. To combat this Cano-Espinosa et al. [4] considered a similar learning
issue, we focus on learning from GT image annotations problem. They proposed a network architecture that
that are easier to produce but still informative enough, performs a biomarker (fat contents) regression and im-
namely the sizes of foreground objects. In practice, our age segmentation after being trained directly on images
approach assumes a training dataset that consists of rela- annotated by biomarker values only. Similarly to ours,
tively few pixel-wise annotated images and many images their method derives the biomarker value from the pre-
with known object sizes. We present a work-in-progress dicted segmentation deterministically. The difference is
solution. that their biomarker, equivalent to the foreground area,
can be obtained by a simple summation. Furthermore,
1.1. Proposed approach the method assumes that the foreground objects can be
roughly segmented using thresholding. Pérez-Pelegrí et
Suppose a standard convolutional network for image seg- al. [5] took a similar approach. Although their method
mentation (e.g., a U-Net [3]). Given an input image, we does not involve thresholding to produce approximate
feed it to the network and collect the output prediction. segmentation, it was tailored explicitly for learning from
The prediction is then thresholded to obtain a binary images annotated by the foreground volume (as their
mask, which is processed by a distance transform, assign- images are 3D).
Karam et al. [6] implemented a differentiable distance
ITAT’22: Information technologies – Applications and Theory, Septem- transform via a combination of the convolution opera-
ber 23–27, 2022, Zuberec, Slovakia tions. The method is fast but exhibits numerical instabili-
$ barucden@fel.cvut.cz (D. Baručić); kybic@fel.cvut.cz (J. Kybic) ties for bigger images. Resolving the numerical instabili-
0000-0003-0428-3354 (D. Baručić); 0000-0002-9363-4947
ties, Pham et al. [7] later proposed a cascaded procedure
(J. Kybic)
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License with locally restricted convolutional distance transforms.
Attribution 4.0 International (CC BY 4.0).
CEUR

CEUR Workshop Proceedings (CEUR-WS.org)
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
Nonetheless, both methods substitute the minimum func-
tion with the log-sum-exp operation, which leads to inac-
curate results.
The way our method deals with a non-differentiable
cost function is borrowed from stochastic binary net- 𝑖 ˜
𝑠
works [8]. In a stochastic binary network, one needs to 𝑑𝑖
deal with zero gradient after each layer of the network.
However, methods such as ARM [9] or PSA [10] are un-
necessarily complex. Instead, we employ a single sample Figure 1: Illustrative example of an object and its derived size.
The object is outlined by the thick boundary line. The point 𝑖
estimation, which has been discussed in [11].
denotes the foreground pixel whose shortest distance to the
background, 𝑑𝑖 , is the highest among the pixels. The derived
2. Model object size ^𝑠 = 2𝑑𝑖 .

The proposed model consists of (1) a segmentation net-
work, 𝑓𝜃 , parametrized by 𝜃, and (2) a deterministic al- 2.1.1. Implementation details
gorithm to derive the object size based on distance trans-
form, denoted as 𝑔. There is an efficient, two-pass algorithm that computes
Given an input image 𝑥 = (𝑥1 , . . . , 𝑥𝑉 ), the network the distance transform in Θ(𝑉 ) time. Furthermore, when
produces a pixel-wise segmentation evaluating a batch of images, it is possible to compute
the distance transform on all images in parallel.
𝑎 = 𝑓𝜃 (𝑥), (1) We have implemented a CPU version1 of this algorithm
that works with PyTorch tensors and is faster than, e.g.,
such that 𝑎𝑖 ∈ R, 1 ≤ 𝑖 ≤ 𝑉 , where 𝑉 is the number the SciPy implementation.
of pixels. The method does not make any assumptions
about the network’s technical details, except that it can be
trained using the standard back-propagation algorithm 3. Learning
and gradient descent. In our experiments, we always
employed a U-Net [3] with a residual network encoder Suppose a training dataset 𝒟 = 𝒟𝑓 ∪ 𝒟𝑤 consists of
[12] and a mirroring decoder. fully- and weakly-annotated subsets 𝒟𝑓 and 𝒟𝑤 . The
To obtain a binary mask 𝑦 ^ ∈ {±1}𝑉 , the network fully-annotated subset 𝒟𝑓 contains pairs (𝑥, 𝑦), where
response 𝑎 is thresholded, 𝑥 is an input image and 𝑦 the corresponding GT pixel-
wise segmentation, while 𝒟𝑤 comprises of pairs (𝑥, 𝑠),
𝑦ˆ𝑖 = sign 𝑎𝑖 . (2) where 𝑠 is the size of the object present in the image 𝑥.
We focus on situations when |𝒟𝑓 | ≪ |𝒟𝑤 |.
2.1. Object size
3.1. Supervised pre-training
We use a distance transform of the binary mask to define
the object size (see Fig. 1). Distance transform assigns to Our method starts by optimizing a pixel-wise loss w.r.t.
each pixel the shortest distance to the background, i.e., the network parameters 𝜃 on the small subset 𝒟𝑓 , as in
the standard supervised learning. For a particular train-
𝑑𝑖 = min 𝛿(𝑖, 𝑗), 𝑖 = 1, . . . , 𝑉, (3) ing pair (𝑥, 𝑦) ∈ 𝒟𝑓 and the corresponding prediction
𝑗,𝑦
^𝑗 =−1
𝑎 ∈ R𝑉 , the loss function reads
where 𝛿(𝑖, 𝑗) is the Chebyshev ℓ∞ distance. After that, 𝑉
we take double the maximum distance to define the object
∑︁
(𝑎𝑖 (1 − 𝑦𝑖 ) + log(1 + exp(−𝑎𝑖 ))) , (6)
size, 𝑖=1
ˆ𝑠 = 2 max 𝑑𝑖 . (4)
𝑖 which is known as the binary cross-entropy with logits
The composition of the distance transform and the loss. The optimization continues until convergence.
maximum aggregation determines the object size, de- Using proper data augmentation to extend the training
noted as 𝑔 : {±1}𝑉 → R, dataset, the network tends to recognize useful features
and produces decent predictions after this initial stage
ˆ ) = 2 max min 𝛿(𝑖, 𝑗).
𝑔(𝑦 (5) (see Sec. 4.2).
𝑖 𝑗,𝑦
^𝑗 =−1

1
https://github.com/barucden/chdt
Segmentation 𝑌 Distance 𝑔(𝑌 )
Max Loss 𝑙 𝑙(𝑠, 𝑔(𝑌 ))
image 𝑥 network 𝑓𝜃 transform

noise 𝑍 Size derivation 𝑔 𝑠

Figure 2: An overview of the proposed probabilistic model.

3.2. Weakly-supervised training for each pixel 𝑖 = 1, . . . , 𝑉 . The gradient can be then
computed automatically by the back-propagation algo-
Consider a training pair (𝑥, 𝑠) ∈ 𝒟𝑤 . As described in
rithm. However, an exact computation of (12) leads to
Sec. 2, one can obtain a prediction of the object size,
ˆ𝑠 = 𝑔(𝑦ˆ ), from the thresholded network response 𝑦
ˆ . We ∑︁ Pr(𝑌 = 𝑦 | 𝑥; 𝜃)
penalize the prediction error by the square loss 𝑙(𝑠, 𝑔(𝑦))𝑦𝑖 , (13)
𝑉
Pr(𝑌 𝑖 = 𝑦𝑖 | 𝑥; 𝜃)
𝑦∈{±1}

𝑙(𝑠, ˆ𝑠) = (𝑠 − ˆ𝑠)2 . (7)
which involves summing 2𝑉 terms and is thus tractable
We propose to follow an approach similar to those only for very small images. Instead, we resort to a single
used in binary neural networks [10] and subtract random sample estimator.
noise 𝑍 from the real predictions 𝑎𝑖 before thresholding.
Consequently, the binary segmentation becomes a col- 3.2.3. Single sample estimator
lection 𝑌 = (𝑌1 , . . . , 𝑌𝑉 ) of 𝑉 independent Bernoulli
The single sample estimator is based on Lemma 1, which
variables,
is, in fact, a specific form of [10, Lemma B.1].
𝑌𝑖 = sign(𝑎𝑖 − 𝑍), (8)
Lemma 1. Let 𝑌 = (𝑌1 , . . . , 𝑌𝑉 ) be a collection of 𝑉 in-
with
dependent {±1}-valued Bernoulli variables with probabili-
𝑉
Pr(𝑌𝑖 = +1 | 𝑥; 𝜃) = Pr(𝑍 ≤ 𝑎𝑖 ) = 𝐹𝑍 (𝑎𝑖 ), (9) ties Pr(𝑌𝑖 = +1) = 𝑝𝑖 . Let ℎ be a function ℎ : {±1} →
R. Let 𝑦 = (𝑦1 , . . . , 𝑦𝑉 ) denote a random sample of 𝑌
where 𝐹𝑍 is the cumulative distribution function (CDF) and 𝑦↓𝑖 = (𝑦1 , . . . , 𝑦𝑖−1 , −𝑦𝑖 , 𝑦𝑖+1 , . . . , 𝑦𝑉 ). Then
of the noise 𝑍 (see Fig. 2).
Then, instead of minimizing the loss 𝑙 (7), we minimize 𝑦𝑖 (ℎ(𝑦) − ℎ(𝑦↓𝑖 )) (14)
the expected loss ℒ = E𝑌 [𝑙(𝑠, 𝑔(𝑌 ))], 𝜕
is an unbiased estimate of 𝜕𝑝𝑖 E𝑦∼𝑌 [ℎ(𝑦)].
∑︁
ℒ= Pr(𝑌 = 𝑦 | 𝑥; 𝜃)𝑙(𝑠, 𝑔(𝑦)). (10) Proof. We take the derivative of the expectation,
𝑦∈{±1}𝑉
𝜕 ∑︁ Pr(𝑦)
E𝑦∼𝑌 [ℎ(𝑦)] = ℎ(𝑦)𝑦𝑖 , (15)
Contrary to (7), the expected loss (10) is differentiable, 𝜕𝑝𝑖 𝑦
Pr(𝑦𝑖 )
assuming a smooth 𝐹𝑍 .
and write out the sum over 𝑦𝑖 ,
3.2.1. Noise distribution ∑︁ ∑︁ ∑︁ ∑︁
Pr(𝑦¬𝑖 )ℎ(𝑦)𝑦𝑖 = Pr(𝑦¬𝑖 ) ℎ(𝑦)𝑦𝑖
Following [10], we sample the noise 𝑍 from the logistic 𝑦¬𝑖 𝑦𝑖 𝑦¬𝑖 𝑦𝑖
distribution with mean 𝜇 = 0 and scale 𝑠 = 1. Hence, (16)
the CDF of 𝑍 is a smooth, sigmoid function, where 𝑦¬𝑖 denotes vector 𝑦 with the 𝑖-th component
omitted. Notice that the inner sum simplifies and no
1
𝐹𝑍 (𝑎) = . (11) longer depends on 𝑦𝑖 ,
1 + exp(−𝑎) ∑︁
Pr(𝑦¬𝑖 )(ℎ(𝑦𝑖=+1 ) − ℎ(𝑦𝑖=−1 )), (17)
3.2.2. Exact gradient 𝑦¬𝑖

To compute the gradient ∇𝜃 ℒ, we need to evaluate the where 𝑦𝑖=𝑧 is the vector 𝑦 with the 𝑖-th component set
derivative to 𝑧. Then, we multiply the inner subtraction by the
𝜕 E𝑌 [𝑙(𝑠, 𝑔(𝑌 ))] constant factor 1 = 𝑝𝑖 + (1 − 𝑝𝑖 ) = 𝑦𝑖 Pr(𝑦𝑖 ),
∑︀
(12)
𝜕𝐹𝑍 (𝑎𝑖 ) ∑︁ ∑︁
Pr(𝑦¬𝑖 ) Pr(𝑦𝑖 )(ℎ(𝑦𝑖=+1 ) − ℎ(𝑦𝑖=−1 )), (18)
𝑦¬𝑖 𝑦𝑖
0 1 -4 0

𝐹𝑍 𝑛=1 𝑛=8 𝑛 = 64 𝑛 = 512
Figure 3: Examples of derivatives (12) computed according
to (21) for different number of samples 𝑛, given the output
of 𝐹𝑍 , for a small, 6 × 6 image. The red frame outlines the
object.
Figure 4: Example of a hippocampus image [13] with the
object outlined in red.

ultimately leading to the following expression for (15):
200
∑︁ proposed
Pr(𝑦)(ℎ(𝑦𝑖=+1 ) − ℎ(𝑦𝑖=−1 )), (19) standard
𝑦 150

Duration [s]
which can be written as
100
∑︁
Pr(𝑦)𝑦𝑖 [ℎ(𝑦) − ℎ(𝑦↓𝑖 )] . (20)
𝑦 50

Thus, (14) is a single sample unbiased estimate of (15).
0
1 2 4 8

According to Lemma 1, an unbiased estimate of the Number of derivative samples n
derivative (12) is Figure 5: Average epoch duration for the proposed method
with different number of gradient samples. The duration of
𝜕 E𝑌 [𝑙(𝑠, 𝑔(𝑌 ))] the standard method is given as a reference.
≈ 𝑦𝑖 [𝑙(𝑠, 𝑔(𝑦)) − 𝑙(𝑠, 𝑔(𝑦↓𝑖 ))] ,
𝜕𝐹𝑍 (𝑎𝑖 )
(21)
where 𝑦 is a random sample of Bernoulli variables with
training, validation, and testing subsets containing 70%,
probabilities (9) (see a few examples of sampled deriva-
10%, and 20% of the images.
tives in Fig. 3).
Given a GT segmentation 𝑦 and a predicted segmenta-
tion 𝑦
ˆ , we evaluate two metrics, the squared size predic-
4. Experiments tion error 𝐸 and the intersection-over-union 𝐼𝑜𝑈 ,

The proposed method was implemented in the PyTorch 𝐸(𝑦, 𝑦ˆ ) = 𝑙(𝑔(𝑦), 𝑔(𝑦
ˆ )), (22)
Lightning framework2 using a ResNet implementation ∑︀𝑉
1 + 𝑦𝑖 + 𝑦ˆ𝑖 + 𝑦𝑖 𝑦ˆ𝑖
from the Segmentation Models PyTorch library3 . The pre- ˆ ) = ∑︀𝑖=1
𝐼𝑜𝑈 (𝑦, 𝑦 𝑉
. (23)
sented experiments were perfomed on a server equipped 𝑖=1 3 + 𝑦𝑖 + 𝑦ˆ𝑖 − 𝑦𝑖 𝑦ˆ𝑖
with Intel Xeon Silver 4214R (2.40GHz) and NVIDIA In the case of standard supervised method, vertical and
GeForce RTX 2080 Ti. horizontal flipping was randomly applied to augment the
The data for our experiments was based on a dataset of training dataset. The proposed method did not apply any
3D MRI images of the hippocampus [13]. The dataset con- augmentation.
sists of 394 volumes provided with GT segmentation of
classes hippocampus head, hippocampus body, and back-
ground. We decomposed the volumes into individual 2D 4.1. Number of derivative samples
slices of size 48 × 32 pixels and kept only those with A toy example (see Fig. 3) indicated that taking more
at least 1% foreground, obtaining a total of 6093 images. samples of the derivatives (21) might lead to better results
Next, we merged the hippocampus classes to get a binary than taking just one. This experiment investigates how
segmentation problem (see Fig. 4). Afterward, we derived the number of derivative samples 𝑛 impacts learning
the object sizes from the GT pixel-wise annotations to speed and prediction quality.
use in training. Finally, we randomly split the data into We considered four different numbers of samples 𝑛,
2
https://github.com/Lightning-AI/lightning 𝑛 ∈ {1, 2, 4, 8}. For each 𝑛, the other parameters (such
3
https://github.com/qubvel/segmentation_models.pytorch as the batch size or the learning rate) were the same, and
n=1 0.820
2.4
n=2
n=4
2.2
n=8 0.815

IoU
2.0
E

1.8 0.810 n=1
n=2
1.6 n=4
0.805 n=8

0 10 20 30 40 0 10 20 30 40
epoch epoch
Figure 6: Development of the squared size prediction error 𝐸 and the intersection-over-union 𝐼𝑜𝑈 on the validation images
over the course of learning for different numbers of derivative samples 𝑛.

0.900
the learning began with the same segmentation network pre-training
𝑓𝜃 that was pre-trained in the standard way on 85 pixel- 0.875 proposed
wise annotated images from the training subset. The
0.850
proposed method always ran until the squared error 𝐸
on the validation data stopped improving. IoU 0.825
To assess the learning speed, we measured the duration 0.800
of one learning epoch. For 𝑛 = 1, an epoch took ≈ 10×
longer than the standard supervised learning. Generally, 0.775
the duration grew roughly exponentially with 𝑛 (see 0.750
Fig. 5).
64 128 256 512 1024 2048 4096
Higher values of 𝑛 did not lead to a lower 𝐸 or a faster
m
convergence speed (see Fig. 6). In fact, 𝑛 = 1 and 𝑛 = 2
achieved the lowest 𝐸, but not by a large margin. Given Figure 7: 𝐼𝑜𝑈 on the test data for different sizes 𝑚 of the pre-
the speed benefits, we use 𝑛 = 1 always. Interestingly, training dataset. The plot shows results achieved by a network
even though 𝐸 kept decreasing over the course of learn- after pre-training and after subsequent fine-tuning by the
proposed method.
ing for all 𝑛, 𝐼𝑜𝑈 improved only slightly and started
declining after ≈ 20 epochs. This observation suggests
that the squared error of the object size is not a sufficient
objective for learning the segmentation. 5. Discussion
The method is promising but there is definitely potential
4.2. Pre-training impact for improvement in both speed and prediction perfor-
This experiment tests the essential question: given a seg- mance.
mentation model trained on a few pixel-level annotated The proposed method samples the derivatives accord-
images, can we improve its testing performance by fur- ing to (21) for each pixel 𝑖. However, flipping the predic-
ther learning from size annotations? tion, 𝑦𝑖 ↦→ −𝑦𝑖 , changes the derived size only for some
We trained different segmentation networks until con- 𝑖; particularly those within and on the border of the pre-
vergence on randomly selected training subsets of size 𝑚. dicted object. Therefore, given a sample 𝑦, 𝑙(𝑠, 𝑔(𝑦)) =
Then, we fine-tuned these networks on the whole train- 𝑙(𝑠, 𝑔(𝑦↓𝑖 )) for many pixels 𝑖, and the sampled deriva-
ing dataset using the proposed method. We measured tives (21) are sparse. The method might sample only
the test performance in terms of 𝐼𝑜𝑈 . those derivatives that are potentially non-zero and set
The proposed method led to a ≈ 5% increase of 𝐼𝑜𝑈 the rest to zero directly, which would save much compu-
for small 𝑚 < 100 (see Fig. 7), improving the segmen- tational time.
tation quality. For higher 𝑚, the effect was negligible, We have seen in the experiments that lower size pre-
which complements the observation from the previous diction error does not strictly imply better segmentation.
experiment that improving the size estimate does not We need to closely investigate in what cases the size pre-
necessarily improve the segmentation quality. diction loss is insufficient and adjust the objective. The
adjustment might involve adding an L1 regularization (as
in [4]) or drawing inspiration from unsupervised meth-
ods (e.g., demand for the segmentation to respect edges Computing and Computer-Assisted Intervention,
in images, etc.). Springer, 2015, pp. 234–241.
The proposed approach entails some principled lim- [4] C. Cano-Espinosa, et al., Biomarker localization
itations. For example, it allows only a single object in from deep learning regression networks, IEEE
an image. We also expect the method to be ill-suited for Transactions on Medical Imaging 39 (2020) 2121–
complex object shapes, but we have not performed any 2132.
experiments in that regard yet. [5] M. Pérez-Pelegrí, et al., Automatic left ventricle vol-
ume calculation with explainability through a deep
learning weak-supervision methodology, Com-
6. Conclusion puter Methods and Programs in Biomedicine 208
(2021) 106275.
We proposed a weakly-supervised method for training
[6] C. Karam, K. Sugimoto, K. Hirakawa, Fast convolu-
a segmentation network from a few pixel-wise annotated
tional distance transform, IEEE Signal Processing
images and many images annotated by the object size.
Letters 26 (2019) 853–857.
The key ingredients is a method for evaluating the object
[7] D. D. Pham, G. Dovletov, J. Pauli, A differen-
size from a probabilistic segmentation and a method for
tiable convolutional distance transform layer for
optimizing a deep network using a non-differentiable
improved image segmentation, in: DAGM German
objective.
Conference on Pattern Recognition, Springer, 2020,
The achieved results seem promising. We believe the
pp. 432–444.
improvements suggested in the discussion will improve
[8] T. Raiko, M. Berglund, G. Alain, L. Dinh, Tech-
performance, rendering the method valuable for training
niques for learning binary stochastic feedforward
segmentation models for biomedical images.
neural networks, in: 3rd International Conference
on Learning Representations, 2015.
Acknowledgments [9] M. Yin, M. Zhou, ARM: augment-REINFORCE-
merge gradient for stochastic binary networks, in:
The authors acknowledge the support of the OP VVV 7th International Conference on Learning Repre-
funded project “CZ.02.1.01/0.0/0.0/16_019/0000765 Re- sentations, 2019.
search Center for Informatics”, the Czech Science Foun- [10] A. Shekhovtsov, V. Yanush, B. Flach, Path sample-
dation project 20-08452S, and the Grant Agency of the analytic gradient estimators for stochastic binary
CTU in Prague, grant No. SGS20/170/OHK3/3T/13. networks, Advances in Neural Information Process-
ing Systems 33 (2020) 12884–12894.
[11] Y. Cong, M. Zhao, K. Bai, L. Carin, GO gradient for
References expectation-based objectives, in: 7th International
Conference on Learning Representations, 2019.
[1] X. Liu, et al., A review of deep-learning-based med-
[12] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn-
ical image segmentation methods, Sustainability
ing for image recognition, in: Proceedings of the
13 (2021) 1224.
IEEE Conference on Computer Vision and Pattern
[2] S. Minaee, et al., Image segmentation using deep
Recognition, 2016, pp. 770–778.
learning: A survey, IEEE transactions on pattern
[13] A. L. Simpson, et al., A large annotated med-
analysis and machine intelligence (2021).
ical image dataset for the development and
[3] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolu-
evaluation of segmentation algorithms, 2019.
tional networks for biomedical image segmentation,
arXiv:arXiv:1902.09063.
in: International Conference on Medical Image