BioSegment: Active Learning segmentation for
          3D electron microscopy imaging

Benjamin Rombaut1,2[0000−0002−4022−715X] , Joris Roels2,3[0000−0002−2058−8134] ,
                and Yvan Saeys1,2[0000−0002−0415−1506]
       1
          Department of Applied Mathematics, Computer Science and Statistics,
                Faculty of Science, Ghent University, Ghent, Belgium
                    {benjamin.rombaut, yvan.saeys}@ugent.be
              2
                Data Mining and Modelling for Biomedicine, VIB-UGent
                 Center for Inflammation Research, Ghent, Belgium
        3
          VIB Bioimaging Core, VIB-UGent Center for Inflammation Research,
                                   Ghent, Belgium


        Abstract. Large 3D electron microscopy images require labor-intensive
        segmentation for further quantitative analysis. Recent deep learning seg-
        mentation methods automate this computer vision task, but require large
        amounts of labeled training data. We present BioSegment, a turnkey
        platform for experts to automatically process their imaging data and
        fine-tune segmentation models. It provides a user-friendly annotation ex-
        perience, integration with familiar microscopy annotation software and a
        job queue for remote GPU acceleration. Various active learning sampling
        strategies are incorporated, with maximum entropy selection being the
        default. For mitochondrial segmentation, these strategies can improve
        segmentation quality by 10 to 15% in terms of intersection-over-union
        score compared to random sampling. Additionally, a segmentation of
        similar quality can be achieved using 25% of the total annotation bud-
        get required for random sampling. By comparing the state-of-the-art in
        human-in-the-loop annotation frameworks, we show that BioSegment
        is currently the only framework capable of employing deep learning and
        active learning for 3D electron microscopy data.

        Keywords: Active learning · Electron microscopy · Computer vision.


1     Introduction

Volume electron microscopy (vEM or 3D EM) describes a set of high-resolution
imaging techniques used in biomedical research to reveal the 3D structure of
cells, tissues and small model organisms at nanometer resolution. EM techniques
have emerged over the past 20 years, largely in response to the demands of
the connectomics field in neuroscience, and vEM is expected to be adopted
into mainstream biological imaging [23]. Generally, vEM data processing can be
divided into four consecutive steps: preprocessing, segmentation, post-processing
and downstream analysis.

    © 2022 for this paper by its authors. Use permitted under CC BY 4.0.
2
8      B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys

     For the imaging data to be used by deep learning networks, some additional
preprocessing transformations include normalization and data augmentation. An
imaging experiment often includes metadata of the multiple samples, which need
to be compared against each other in downstream analysis. This is documented
using a folder structure or a data table. Some preprocessing steps to improve
imaging data include denoising [29,38], histogram equalization [39] and artifact
removal. Usually, the imaging data is downsampled or binned in order to reduce
data size and to speed up expert and model annotation, while still retaining
enough resolution to allow correct segmentation.
Next is segmentation, the detection and delineation of structures of interest. Seg-
mentation is required for extraction of quantitative information from rich vEM
data sets. Non-discriminant contrast, diversity of appearance of structures and
large image volumes turn vEM segmentation into a highly non-trivial problem,
where cutting-edge methods relying on state-of-the-art computer vision tech-
niques are still far from reaching human parity in segmentation accuracy [23].
Here, we only consider segmentation of mitochondria, but other cellular compo-
nents or tissue regions can also be of interest. Pretrained models can be applied
to a small sample in order to evaluate segmentation quality. If no model of suf-
ficient quality is available, a new model is created by using some training data
annotated by an expert (microscopist or biologist). Machine learning methods
can be trained to produce different flavors of segmentation, labelling the pixels
either by semantics (for example, label all mitochondria pixels as 1 and the rest
as 0) or by the objects they belong to (for example, label all pixels of the first
mitochondrion as 1, of the second mitochondrion as 2, of the nth mitochondrion
as n, with non-mitochondrion pixels as 0).
There are various post-processing steps to transform a semantic segmentation to
an object instance segmentation, such as connected components and watershed
transform. To further clean up the segmentation, there is usually some filtering
based on instance size.
After processing all samples of the experiment, a research question is answered in
a downstream analysis. Statistics of interest are calculated such as number of mi-
tochondria, mitochondria surface and volume. These statistics are summarized
in a data table and combined with the experiment metadata to quantify effects.
Although significant progress has been made in recent years, largely owing to
the introduction of deep learning-based methods, there is not yet a single reli-
able and easy-to-use solution for fully automated segmentation of vEM images.
Imaging experts must choose between (or combine) manual, semi-automated and
fully automated solutions based on the difficulty of the segmentation problem,
the data size and the computational expertise and resources of their team or
institution. Furthermore, almost all automated solutions rely on machine learn-
ing and may require large amounts of example segmentations to train a model,
although in some cases models trained for the same task on similar data sets are
available and can be applied directly [23].
                  BioSegment: Active Learning segmentation for BioSegment
                                                               3D imaging         3
                                                                                  9

    Machine learning-based segmentation models can be divided into two cate-
gories: feature-based learning and deep learning. Feature-based learning methods
use a set of predefined features (usually linear and non-linear image filters) as
input to a non-linear classifier such as a support vector machine or a random
forest that outputs the (semantic) segmentation. They need few examples and
are available via user-friendly tools. Methods using deep learning do not rely
on pre-computed features but, instead, learn features and segmentation jointly.
They can solve more difficult segmentation problems, but their superior accuracy
requires much larger amounts of examples, and the training must be performed
on graphics processing units (GPUs). Efficient training and post-processing pro-
cedures for deep learning methods in vEM constitute an active area of research
[23].
For successful application, the deep learning model needs to be trained on data
very similar to the data at hand, but annotated vEM training data is time-
consuming to create. Various approaches try to alleviate this problem: increasing
annotator efficiency using professional annotation software (i.e. MIB or Imaris),
sparse labeling [36] or refining model predictions using only points [10]. Addition-
ally, model performance can increase through self-supervised learning on large
unlabeled and heterogeneous data sets [14], generalizability-enhancing tricks
such as data augmentation or domain adaptation [27]. In any case, additional
fine-tuning on some labeled domain-specific data will improve segmentation per-
formance and may be even required [11]. When fine-tuning, model performance
can be further increased by choosing the most interesting samples to annotation
using active learning [24].
Active learning (AL) is a subdomain of machine learning that aims to minimize
label effort without sacrificing model performance. This is achieved by iteratively
querying a batch of samples to a label providing oracle, adding them to the train
set and retraining the predictor. The challenge is to come up with a smart se-
lection criterion to query samples and maximize the steepness of the training
curve [33]. In the setting of vEM segmentation, the oracle is a human imaging
expert, such as a microscopist or biologist. This makes our application human
or expert-in-the-loop, as the expert will be queried to provide labels through an
annotation interface. We consider the total volume of EM data as an offline pool
of unlabeled 2D training patches. A general overview of a human-in-the-loop
annotation workflow using AL for semantic segmentation is given in Figure 1.
To our knowledge, segmentation of vEM data in an AL setting is not an es-
tablished practice, i.e. the recent Empanada napari plugin [11] for vEM only
supports random sampling. In other fields, various tools employ AL to great
effect: Label Studio [35] is a flexible data annotation tool that supports seman-
tic segmentation, AL and prediction refinement. MONAI Label [12] is an open
source image labeling and learning tool that helps researchers and clinicians to
collaborate, create annotated datasets, and build AI models. It features 3D seg-
mentation refinement using 3D Slicer and AL sample selection. Kaibu [21] is a
web application for visualizing and annotating multidimensional images, featur-
ing deep learning powered interactive segmentation. Ilastik [7] is an easy-to-use
4
10     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys


Fig. 1: Overview of active learning for image segmentation. A human imaging
expert starts an AL iteration and, using the existing segmentation model and
an active learning sampling strategy, ranks the unlabeled samples for labeling.
Batches of the most informative samples are annotated by the expert and added
to the labeled data pool. After enough new training data is created, the model
is fine-tuned on the labeled pool and model performance is expected to improve.
The expert can run subsequent AL iterations with the updated model on the
remaining unlabeled data, or stop the iterations when model performance is
sufficient or the annotation budget is spent.
                 BioSegment: Active Learning segmentation for BioSegment
                                                              3D imaging        5
                                                                               11

interactive tool that brings machine-learning-based (bio)image analysis to end
users without substantial computational expertise. It contains pre-defined work-
flows for image segmentation, object classification, counting and tracking.
    In this paper, we propose three new contributions:
1. A comparison of five AL strategies for semantic segmentation on three vEM
   datasets, on which we previously reported in our preprint [28].
2. A feature comparison between current state-of-the-art software frameworks
   for human-in-the-loop active learning using deep learning segmentation mod-
   els.
3. BioSegment, an integrated platform for imaging experts to process vEM
   datasets using AL strategies.
First, we describe the software architecture of an AL semantic segmentation
framework in Section 2.1, the deep learning models in Section 2.2. We continue
with used AL strategies in Section 2.3 and validation datasets in Section 2.4. Our
three contributions are presented and discussed in Section 3. Lastly, we envision
future work in Section 4 and conclude in Section 5.


2     Methods
2.1   Software Architecture


Fig. 2: Flowchart of the BioSegment software stack. Users interact with a fron-
tend using their browser. They can visualize a dataset, edit annotations and
create segmentations using AI models. The BioSegment backend handles the
tasks given by the frontend and fetches the datasets from disk storage. For long-
running tasks like conversion, active learning, segmentation and fine-tuning, sep-
arate workers are used.


    We give an overview of the BioSegment software architecture in Figure 2.
A central database is managed by a backend, implemented using FastAPI. It
features a documented REST API, database schemas for all modelled objects
and a job queue using Celery and Redis. For long-running tasks like conversion
and fine-tuning, separate workers are used, communicating via the messaging
bus of the job queue. For data conversion and viewing AICSImageIO [3] and
BioFormats [18] are used. The only communication requirement for the workers
is access to the Redis server port and the data storage. They can run on a
6
12      B. Rombaut
           Rombaut, et
                     J. al.
                        Roels, Y. Saeys

different machine with GPU acceleration or a network with access to secure and
confidential imaging data. Segmentation models and tasks are implemented in
PyTorch, and models are serialized to disk. Tensorboard is used to visualize
training progression and predicted segmentation performance on selected image
samples.
     The BioSegment software stack is reproducible using conda environments
and Docker containers. Staging and production deployments are managed us-
ing Docker Swarm. Restrictive enterprise firewalls can be overcome through the
Traefik reverse-proxy, which also provides security with automated HTTPS cer-
tificate management. Admin interfaces for network, user, database and job queue
management are also implemented. Clients can communicate with the backend
REST API to add imaging data, manage jobs and visualize results. Using a
code generation tool like OpenAPI Generator, the documented REST API from
the backend can automatically generate the client code library. This automated
step improves maintainability of multiple client interfaces and annotation soft-
ware plugins. A JavaScript frontend implements most of the backend API and
provides management of all data objects like users, datasets, segmentation, anno-
tations and models. A Dash dashboard provides an interface for sparse semantic
labelling. Datasets are accessed using file system paths in the backend and work-
ers. These paths resolve to a local mount of the remote disk storage. The mount
point is set up using sshfs.


2.2   Deep learning methods

We build on the PyTorch Lightning framework, which allows high-level but ad-
vanced training loops without the boilerplate code. It supports different ac-
celerator architectures and allows for reproducible and maintainable code. It
also features fine-tuning strategies, automated learning rate, batch size finders
and support for multiple GPUs and mixed integer training. Various segmen-
tation models are available: our own advanced U-Net implementations in the
published neuralnets [26] package and torchvision [5] which features pretrained
model weights.


2.3   Active learning strategies

Five AL strategies were implemented by us and are explained here. We consider
the task of semantic segmentation, i.e. given an image x ∈ X ⊂ RN with a
total amount of N pixels, we aim to compute a pixel-level labeling y ∈ Y , where
Y = {0, . . . , C −1}N is the label space and C is the number of classes. In particu-
lar, we focus on the case of binary segmentation, i.e. C = 2. Let pj (x) = [fθ (x)]j
be the probability class distribution of pixel j of a parameterized segmentation
algorithm fθ (i.e. an encoder-decoder network, such as U-Net[30]).
Consider a large pool of n i.i.d. sampled data points over the space Z = X × Y
as {xi , yi }i∈[n] , where [n] = {1, . . . , n}, and an initial pool of m randomly cho-
sen distinct data points indexed by S0 = {ij |ij ∈ [n]}j∈[m] . An active learning
                  BioSegment: Active Learning segmentation for BioSegment
                                                               3D imaging          7
                                                                                  13

algorithm initially only has access to {xi }i∈[n] and {yi }i∈S0 and iteratively ex-
tends the currently labeled pool St by querying k samples from the unlabeled
set {xi }i∈[n]\St to an oracle. After iteration t, the predictor is retrained with
the available samples {xi }i∈[n] and labels {yi }i∈St , thereby improving the seg-
mentation quality. Note that, without loss of generalization, the active learning
approaches below are described for k = 1. We can also query k > 1 samples
for k iterations, without retraining, to achieve a batch of samples. The complete
active learning workflow is shown in Figure 1.
Maximum entropy sampling [16,17] Maximum entropy is a straightforward
  selection criterion that aims to select samples for which the predictions are
  uncertain. Formally speaking, we adjust the selection criterion to a pixel-wise
  entropy calculation as follows:
                                           N
                                           X −1 C−1
                                                X
                 x∗t+1 = arg    max −                 [pj (x)]c log [pj (x)]c .   (1)
                               x∈[n]\St
                                           j=0 c=0

   In other words, the entropy is calculated for each pixel and summed up. Note
   that a high entropy will be obtained when pj (x) = C1 , this is exactly when
   there is no real consensus on the predicted class (i.e. high uncertainty).
Least confidence selection [9] Similar to maximum entropy sampling, the
   least confidence criterion selects samples for which the predictions are un-
   certain:
                                         N
                                         X −1
                     x∗t+1 = arg min            max [pj (x)]c .               (2)
                                   x∈[n]\St         c=0,...,C−1
                                              j=0

   As the name suggests, the least confidence criterion selects the probability
   that corresponds to the predicted class. Whenever this probability is small,
   the predictor is not confident about its decision. For image segmentation,
   we sum up the maximum probabilities in order to select the least confident
   samples.
Bayesian active learning disagreement [13] The Bayesian active learning
   disagreement (BALD) approach is specifically designed for convolutional
   neural networks (CNNs). It makes use of Bayesian CNNs in order to cope
   with the small amounts of training data that are usually available in active
   learning workflows. A Bayesian CNN assumes a prior probability distribu-
   tion placed over the model parameters θ ∼ p(θ). The uncertainty in the
   weights induces prediction uncertainty by marginalizing over the approxi-
   mate posterior:

                                          1 Xh               i
                                            T −1
                           [pj (x)]c ≈           pj (x; θ̂t ) ,                   (3)
                                          T t=0                c


    where θ̂t ∼ q(θ) is the dropout distribution, which approximates the prior
    probability distribution p. In other words, a CNN is trained with dropout
    and inference is obtained by leaving dropout on. This causes uncertainty in
8
14     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys

   the outcome that can be used in existing criteria such as maximum entropy
   (Equation (1)).
K-means sampling [8] Uncertainty-based approaches typically sample close
   to the decision boundary of the classifier. This introduces an implicit bias
   that does not allow for data exploration. Most explorative approaches that
   aim to solve this problem transform the input x to a more compact and
   efficient representation z = g(x) (e.g. the feature representation before the
   fully connected stage in a classification CNN). The representation that we
   used in our segmentation approach was the middle bottleneck representa-
   tion in the U-Net, which is the learned encoded embedding of the model.
   The k-means sampling approach in particular then finds k clusters in this
   embedding using k-means clustering. The selected samples are then the k
   samples in the different clusters that are closest to the k centroids.
Core set active learning [32] The core set approach is an active learning ap-
   proach for CNNs that is not based on uncertainty or exploratory sampling.
   Similar to k-means, samples are selected from an embedding z = g(x) in
   such a way that a model trained on the selection of samples would be com-
   petitive for the remaining samples. Similar as before, the representation that
   we used in our segmentation approach was the bottleneck representation in
   the U-Net. In order to obtain such competitive samples, this approach aims
   to minimize the so-called core set loss. This is the difference between the av-
   erage empirical loss over the set of labeled samples (i.e. St ) and the average
   empirical loss over the entire dataset that includes the unlabeled points (i.e.
   [n]).


2.4   Validation datasets

Three public EM datasets were used to validate our approach:

 – The EPFL dataset4 represents a 5 × 5 × 5 µm3 section taken from the CA1
   hippocampus region of the brain, corresponding to a 2048 × 1536 × 1065
   volume. Two 1048 × 786 × 165 subvolumes were manually labelled by experts
   for mitochondria. The data was acquired by a focused ion-beam scanning
   EM, and the resolution of each voxel is approximately 5 × 5 × 5 nm3 .
 – The VNC dataset5 represents two 4.7 × 4.7 × 1 µm3 sections taken from the
   Drosophila melanogaster third instar larva ventral nerve cord, corresponding
   to a 1024 × 1024 × 20 volume. One stack was manually labelled by experts
   for mitochondria. The data was acquired by a transmission EM and the
   resolution of each voxel is approximately 4.6 × 4.6 × 45 nm3 .
 – The MiRA dataset6 [37] represents a 17×17×1.6 µm3 section taken from the
   mouse cortex, corresponding to a 8624 × 8416 × 31 volume. The complete
   volume was manually labelled by experts for mitochondria. The data was
4
  Data available at https://cvlab.epfl.ch/data/data-em/
5
  Data available at https://github.com/unidesigner/groundtruth-drosophila-vnc/
6
  Data available at http://95.163.198.142/MiRA/mitochondria31/
                  BioSegment: Active Learning segmentation for BioSegment
                                                               3D imaging         9
                                                                                 15

      acquired by an automated tape-collecting ultramicrotome scanning EM, and
      the resolution of each voxel is approximately 2 × 2 × 50 nm3 .
    In order to properly validate the discussed approaches, we split the available
labeled data in a training and testing set. In the cases of a single labeled volume
(VNC and MiRA), we split these datasets halfway along the y axis. A smaller
U-Net (with 4 times less feature maps) was initially trained on m = 20 randomly
selected 128 × 128 samples in the training volume (learning rate of 1e−3 for 500
epochs). Next, we consider a pool of n = 2000 samples in the training data to be
queried. Each iteration, k = 20 samples are selected from this pool based on one
of the discussed selection criteria, and added to the labeled set St , after which
the segmentation network is fine-tuned (learning rate of 5e−4 for 200 epochs).
This procedure is repeated for T = 25 iterations, leading to a maximum training
set size of 500 samples. We validate the segmentation performance using the
intersection-over-union (IoU) metric, also known as the Jaccard score:
                                            P
                                               [y · ŷ]i
                    J(y, ŷ) = P            Pi          P                        (4)
                                  i [y] i +  i ŷ]i −
                                               [          i [y · ŷ]i


3     Results
3.1     Active Learning validation
We validated five AL strategies on three public EM datasets. The resulting learn-
ing curves of the discussed approaches on the three datasets are shown in Figure
3. We additionally show the performance obtained by full supervision (i.e. all
labels are available during training), which is the maximum achievable segmenta-
tion performance. There is an indication that maximum entropy sampling, least
confidence selection and BALD outperform the random sampling baseline. These
methods obtain about 10 to 15% performance increase for the same amount of
available labels for all datasets. Additionally, a segmentation of similar quality
can be achieved using 25% of the total annotation budget required for random
sampling. The core set approach performs similar to slightly better than the
baseline. We expect that this method can be improved by considering alterna-
tive embeddings. Lastly, we see that k-means performs significantly worse than
random sampling. Even though this could also be an embedding problem such
as with the core set approach, we think that exploratory sampling alone will not
allow the predictor to learn from challenging samples, which are usually outliers.
We expect that a hybrid approach based on both exploration and uncertainty
might lead to better results, and consider this future work.
    Figure 4 shows qualitative segmentation results on the EPFL dataset. In par-
ticular, we show results of the random, k-means and maximum entropy sampling
methods using 120 samples, and compare this to the fully supervised approach.
The maximum entropy sampling technique is able to improve the others by a
large margin and closes the gap towards fully supervised learning significantly.
    Lastly, we are interested in what type of samples the active learning ap-
proaches select for training. Figure 5 shows 4 samples of the VNC dataset that
10
16     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys

                                0.90


                                0.85


                                0.80
               Jaccard index


                                0.75


                                                                        Random sam pling
                                0.70                                    Ent ropy sam pling
                                                                        Least Confidence
                                                                        K-m eans
                                0.65                                    BALD
                                                                        Core set
                                                                        Full supervision
                                0.60
                                       0   100     200          300       400         500
                                                  Num ber of sam ples

                                                 (a) EPFL
                                0.9


                                0.8


                                0.7
                Jaccard index


                                0.6


                                                                        Random sam pling
                                0.5                                     Ent ropy sam pling
                                                                        Least Confidence
                                                                        K-m eans
                                0.4                                     BALD
                                                                        Core set
                                                                        Full supervision
                                0.3
                                       0   100    200          300       400          500
                                                 Num ber of sam ples

                                                 (b) VNC
                                0.7


                                0.6


                                0.5
                Jaccard index


                                0.4
                                                                        Random sam pling
                                                                        Ent ropy sam pling
                                0.3
                                                                        Least Confidence
                                                                        K-m eans
                                0.2                                     BALD
                                                                        Core set
                                                                        Full supervision
                                0.1
                                       0   100    200          300       400          500
                                                 Num ber of sam ples

                                                 (c) MiRA

Fig. 3: Learning curves for the five discussed active learning approaches, random
sampling and full supervision for the three different datasets. Entropy sampling
performs well across the datasets. Note that for entropy and random sampling
for EPFL, the difference in model performance for the same number of samples
(difference in y-axis) is 15% and difference in number of samples needed for the
same model performance (difference in x-axis) is 25%.
                  BioSegment: Active Learning segmentation for BioSegment
                                                               3D imaging         11
                                                                                  17


          (a) Input              (b) Ground truth       (c) Full supervision (0.857)


      (d) Random (0.733)        (e) k-means (0.710)    (f) Maximum entropy (0.813)

Fig. 4: Segmentation results obtained from an actively learned U-Net with 120
samples of the EPFL dataset based on random, k-means and maximum entropy
sampling, and a comparison to the fully supervised approach. Jaccard scores are
indicated between brackets.


correspond to the highest prioritized samples, according to the least confidence
criterion, that were selected in the first 4 iterations. The top row illustrates the
probability predictions of the network at that point in time, whereas the bot-
tom row shows the pixel-wise uncertainty of the sample (i.e. the maximum in
Equation (2)). Note that the initial predictions at t = 1 are of poor quality, as
the network was only trained on 20 samples. Moreover, the uncertainty is high
in regions where the network is uncertain, but it is low in regions where the
network is wrong. The latter is a common issue in active learning and related
to the exploration vs. uncertainty trade-off. However, over time, we see that the
network performance improves, and more challenging samples are being queried
to the oracle.

3.2    Feature comparison
We define five software features of interest for an AL software framework for
vEM data:
Interactive fine-tuning The expert should be able to fine-tune a segmentation
   model with their own newly annotated data. For deep learning models, this
12
18     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys


       t=1                  t=2                t=3                   t=4

Fig. 5: Illustration of the selected samples in the VNC dataset over time in the
active learning process. The top row shows the pixel-wise prediction of the se-
lected samples at iterations 1 through 4. The bottom row show the pixel-wise
least confidence score on the corresponding images.

                                 Software features
                         interactive     active    large     3D        remote
 Software frameworks
                         fine-tuning learning datasets     support    resources
 Label Studio [35]             x           x                              x
 Kaibu [21]                    x                                          x
 napari-empanada [11]          x                     x        x
 ilastik [7]                   x           x         x        x            x
 MONAI Label [12]              x           x         x        x            x
 BioSegment                    x           x         x        x            x
Table 1: Comparison of open-source software frameworks for human-in-the-loop
active learning using segmentation models.


   involves optional GPU acceleration and reporting on training status and
   accuracy. All considered frameworks have this feature.
Active learning The framework should support sampling the unlabeled data
   using an AL strategy. Some frameworks have only proposed this feature for
   future work and only implemented a random sampling strategy.
Large datasets The expert should be able to apply existing and newly trained
   models on their whole dataset, no matter the size. This feature is the most
   lacking, as it requires support for tiled inference and long-running jobs.
3D support The supported annotation interfaces of the framework should al-
   low the expert to freely browse consecutive slices or volumes in 3D.
Remote resources In order to process large datasets, large storage and compu-
   tational resources such as workstations and GPU’s are needed. This usually
                   BioSegment: Active Learning segmentation for BioSegment
                                                                3D imaging      13
                                                                                19

      requires a flexible software architecture and communication over a network
      interface or software worker queue.

    BioSegment combines the desirable software features needed for analyz-
ing vEM data in one framework and is the only AL framework currently used
as such. Ilastik is an established interactive annotation tool with support for
standard ML segmentation. Recently, it has added beta support for a remote
GPU task server (tiktorch [2]) and an active learning ML segmentation work-
flow [1] using SLIC features and supervoxels for vEM. All this functionality is
however still in beta, sparsely documented and not yet applied for deep learning
models or mitochondria segmentation. napari-empanada is the most recent de-
velopment in vEM segmentation, but has no support for AL. The lack of support
for remote resources could however be solved by running napari remotely using
VirtualGL [22] or using a remote Dask cluster or data store [25]. Lastly, the
recently developed feature set of MONAI Label is exciting. However, it is little
over a year old, has no reported usage by the EM community, and mostly targets
radiology and pathology use cases. Nevertheless, it can be adapted for EM and
integrated in our BioSegment workflow, as shown in 6c. We note that a remote
GPU-accelerated model execution is a hallmark of most frameworks: the worker
queue in BioSegment and MONAI Label, the Label Studio ML backend and
the ilastik tiktorch server.


3.3     BioSegment workflow

After image capturing and storing the raw microscopy data on disk, experts
start the BioSegment workflow. Through a dedicated dashboard (Figure 6a),
the expert can create a new dataset holder and import the imaging data directly
by providing the folder path. This starts a new annotation workflow. The expert
can start preprocessing and segmentation jobs for the whole dataset and visual-
ize the result (Figure 6b).
If no existing model has the desired quality, experts can choose a model to
fine-tune. A batch of sampled images from the unlabeled dataset is chosen for
annotation. An interface for sparse semantic labelling is provided, and the sub-
set can be exported to different bioimaging annotation software like 3D Slicer
(Figure 6c), Amira (ThermoFisher Scientific), Imaris (Oxford Instruments), Fiji
[31] or napari [34]. The chosen model can be fine-tuned on the created training
data and model performance can again be evaluated.
    The annotation workflow can be augmented using active learning loops: the
subset of images to be sampled can be selected by one of the five implemented
active learning strategies, informed by the chosen model. After annotation by the
expert, this model will be fine-tuned and again be used for selecting the following
batch of images, creating an active learning loop and immediately incorporating
the expert feedback in the sampling process. By empowering imaging experts
with a dashboard to run by themselves multiple active learning iterations and
segmentation jobs on their datasets, active learning can be incorporated into
their normal annotation workflow. The expert can stop the iterations when they
14
20     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys


              (a) BioSegment Dashboard

                                                           (b) BioSegment model
                                                           viewer


                         (c) External viewer (3D Slicer)

Fig. 6: Three example BioSegment interfaces. 6a: The dashboard where users
can manage all settings. 6b: Models can be viewed and fine-tuned with training
data using the viewer interface data. 6c: Results can be exported and used in
external programs such as 3D Slicer and MONAI Label.
                 BioSegment: Active Learning segmentation for BioSegment
                                                              3D imaging       15
                                                                               21

are satisfied with the segmentation quality in the preview or their annotation
budget is depleted. The number of iterations is usually three or higher, but this
highly depends on the dataset and on the computer vision task.
When a segmentation model of high enough quality is achieved, it can be applied
to the whole dataset like the other pre-existing models. The labelled data can
be added to a pool of general training data in order to train better performing
models for future fine-tuning tasks. Experts can download the segmented dataset
for further downstream analysis.
The BioSegment software stack is deployed at biosegment.ugent.be and used
internally at the Flemish Institute for Biotechnology (VIB) for annotating new
vEM datasets. It automates the previous manual active learning loops between
imaging experts at a partnering imaging facility and deep learning scientists in
our computational lab. The code is available at GitHub and features a docu-
mentation site.


4   Future work

Computer vision is not limited to single class semantic segmentation problems.
Mitochondria form 3D shapes and networks, requiring 3D post-processing to
achieve accurate instance segmentation. Other cell organelles are of equal inter-
est, and large amounts of existing data are now available through the OpenOr-
ganelle data portal [15]. Multi-class semantic segmentation is currently imple-
mented, but the label map is not standardized. Interfacing with the BioImage
Model Zoo [20] would help in this regard. We also plan to further integrate pre-
processing steps like denoising, as these are still done with a separate script.
Beside image enhancement, volume reconstruction and multimodal registration
are two different data processing workflows in EM that would be beneficial to
implement.
Recent advances in tooling include napari, an interactive, multidimensional im-
age viewer for Python and the Java-based Paintera [4] for dense labeling of
large 3D datasets. Together with cloud-based file formats like NGFF [19] these
would facilitate annotating and processing large imaging experiments. Integra-
tion with Dask [25], a flexible open-source Python library for parallel computing,
would allow immediate preview of complex workflows and scaling for the whole
dataset using long-running jobs. These advances allow for new annotation ex-
periences. For example, a region-of-interest free approach where the annotator
freely browses the whole dataset and the current model prediction and uncer-
tainty is lazily updated depending on the view-port. By creating multi-resolution
maps of the model uncertainty, the expert is informed on the model performance
over the whole dataset and is free to choose which regions to annotate.
Complexity of the software stack can be out-sourced to existing free software
libraries. Lightning AI further removes boilerplate code in deep learning models
by providing App and Flow interfaces. Data management and worker communi-
cation in BioSegment can be handled by Girder, which also utilizes the Celery
job queue. By creating or integrating with plugins for already established an-
16
22     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys

notation tools, adoption of the BioSegment workflow can be improved. Active
development in the 3D Slicer and napari communities for chunked and mul-
tidimensional file formats, instance segmentation and collaborative annotation
proofreading tools will also improve the future BioSegment feature set. For AL
research, it would be valuable to add instrumentation to these annotation tools
in order to better capture the burden of the annotation work by the expert.
Currently, number of samples and total annotated pixels can be measured, but
actual time and number of clicks would be more accurate metrics. BioSegment
can be adapted to capture these interesting metrics. Greater model performance
can be achieved by including automated hyperparameter optimization such as
Optuna [6]. This and other AutoML strategies would further automate model
training.


5    Conclusions
We present BioSegment, a turnkey solution for Active Learning segmentation
of vEM imaging. It provides a user-friendly annotation experience, integration
with familiar microscopy annotation software and a job queue for remote GPU
acceleration. Expert annotation is augmented using active learning strategies.
For mitochondrial segmentation, these strategies can improve segmentation qual-
ity by 10 to 15% in terms of intersection-over-union score compared to random
sampling. Additionally, a segmentation of similar quality can be achieved using
25% of the total annotation budget required for random sampling. The soft-
ware stack is maintainable through various automated tests, and the code base
is published under an open-source license. By comparing the state-of-the-art in
human-in-the-loop annotation frameworks, we show that BioSegment is cur-
rently the only framework capable of employing deep learning and active learning
for 3D electron microscopy data.

Acknowledgements The computational resources and services used in this
work were provided by NVIDIA, VIB IRC IT and the VSC (Flemish Super-
computer Center), funded by the Research Foundation – Flanders (FWO) and
the Flemish Government. Imaging data and feedback was provided by the VIB
BioImaging Core. Funding was provided by the Flanders AI Research Program.


References
 1. ilastik - Voxel Segmentation Workflow (beta), https://www.ilastik.org/
    documentation/voxelsegmentation/voxelsegmentation
 2. tiktorch (Dec 2021), https://github.com/ilastik/tiktorch, original-date: 2017-
    07-18T10:25:47Z
 3. AICSImageIO       (Jun     2022),     https://github.com/AllenCellModeling/
    aicsimageio, original-date: 2019-06-27T16:43:22Z
 4. Paintera (Jun 2022), https://github.com/saalfeldlab/paintera, original-date:
    2018-04-26T21:55:50Z
                   BioSegment: Active Learning segmentation for BioSegment
                                                                3D imaging           17
                                                                                     23

 5. pytorch/vision (Jun 2022), https://github.com/pytorch/vision, original-date:
    2016-11-09T23:11:43Z
 6. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A Next-
    generation Hyperparameter Optimization Framework (Jul 2019). https://doi.
    org/10.48550/arXiv.1907.10902, http://arxiv.org/abs/1907.10902, number:
    arXiv:1907.10902 arXiv:1907.10902 [cs, stat]
 7. Berg, S., Kutra, D., Kroeger, T., Straehle, C.N., Kausler, B.X., Haubold, C.,
    Schiegg, M., Ales, J., Beier, T., Rudy, M., Eren, K., Cervantes, J.I., Xu, B.,
    Beuttenmueller, F., Wolny, A., Zhang, C., Koethe, U., Hamprecht, F.A., Kreshuk,
    A.: ilastik: interactive machine learning for (bio)image analysis. Nature Methods
    16(12), 1226–1232 (Dec 2019). https://doi.org/10.1038/s41592-019-0582-9,
    https://www.nature.com/articles/s41592-019-0582-9, number: 12 Publisher:
    Nature Publishing Group
 8. Bodó, Z., Minier, Z., Csató, L.: Active Learning with Clustering. In: Active
    Learning and Experimental Design workshop In conjunction with AISTATS 2010.
    pp. 127–139. JMLR Workshop and Conference Proceedings (Apr 2011), https:
    //proceedings.mlr.press/v16/bodo11a.html, iSSN: 1938-7228
 9. Brinker, K.: Incorporating diversity in active learning with support vector ma-
    chines. In: Proceedings of the Twentieth International Conference on International
    Conference on Machine Learning. pp. 59–66. ICML’03, AAAI Press, Washington,
    DC, USA (Aug 2003)
10. Chen, X., Zhao, Z., Zhang, Y., Duan, M., Qi, D., Zhao, H.: FocalClick: Towards
    Practical Interactive Image Segmentation. arXiv:2204.02574 [cs] (Apr 2022), http:
    //arxiv.org/abs/2204.02574, arXiv: 2204.02574 version: 1
11. Conrad, R.W., Narayan, K.: Instance segmentation of mitochondria in elec-
    tron microscopy images with a generalist deep learning model. Tech. rep.,
    bioRxiv (May 2022). https://doi.org/10.1101/2022.03.17.484806, https://
    www.biorxiv.org/content/10.1101/2022.03.17.484806v2, section: New Results
    Type: article
12. Diaz-Pinto, A., Alle, S., Ihsani, A., Asad, M., Nath, V., Pérez-García, F., Mehta,
    P., Li, W., Roth, H.R., Vercauteren, T., Xu, D., Dogra, P., Ourselin, S., Feng, A.,
    Cardoso, M.J.: MONAI Label: A framework for AI-assisted Interactive Labeling
    of 3D Medical Images. arXiv:2203.12362 [cs, eess] (Mar 2022), arXiv: 2203.12362
13. Gal, Y., Islam, R., Ghahramani, Z.: Deep Bayesian Active Learning with Im-
    age Data. Tech. Rep. arXiv:1703.02910, arXiv (Mar 2017). https://doi.org/10.
    48550/arXiv.1703.02910, http://arxiv.org/abs/1703.02910, arXiv:1703.02910
    [cs, stat] type: article
14. Han, H., Dmitrieva, M., Sauer, A., Tam, K.H., Rittscher, J.: Self-Supervised
    Voxel-Level Representation Rediscovers Subcellular Structures in Volume Electron
    Microscopy. pp. 1874–1883 (2022), https://openaccess.thecvf.com/content/
    CVPR2022W/CVMI/html/Han_Self-Supervised_Voxel-Level_Representation_
    Rediscovers_Subcellular_Structures_in_Volume_Electron_Microscopy_
    CVPRW_2022_paper.html
15. Heinrich, L., Bennett, D., Ackerman, D., Park, W., Bogovic, J., Eckstein, N.,
    Petruncio, A., Clements, J., Xu, C.S., Funke, J., Korff, W., Hess, H.F., Lippincott-
    Schwartz, J., Saalfeld, S., Weigel, A.V., Team, C.P.: Automatic whole cell or-
    ganelle segmentation in volumetric electron microscopy (Nov 2020). https://
    doi.org/10.1101/2020.11.14.382143, https://www.biorxiv.org/content/10.
    1101/2020.11.14.382143v1, pages: 2020.11.14.382143 Section: New Results
18
24      B. Rombaut
           Rombaut, et
                     J. al.
                        Roels, Y. Saeys

16. Joshi, A.J., Porikli, F., Papanikolopoulos, N.: Multi-class active learning for image
    classification. In: 2009 IEEE Conference on Computer Vision and Pattern Recogni-
    tion. pp. 2372–2379 (Jun 2009). https://doi.org/10.1109/CVPR.2009.5206627,
    iSSN: 1063-6919
17. Li, X., Guo, Y.: Adaptive Active Learning for Image Classification. In: 2013 IEEE
    Conference on Computer Vision and Pattern Recognition. pp. 859–866 (Jun 2013).
    https://doi.org/10.1109/CVPR.2013.116, iSSN: 1063-6919
18. Linkert, M., Rueden, C.T., Allan, C., Burel, J.M., Moore, W., Patterson, A., Lo-
    ranger, B., Moore, J., Neves, C., MacDonald, D., Tarkowska, A., Sticco, C., Hill,
    E., Rossner, M., Eliceiri, K.W., Swedlow, J.R.: Metadata matters: access to image
    data in the real world. Journal of Cell Biology 189(5), 777–782 (May 2010). https:
    //doi.org/10.1083/jcb.201004104, https://doi.org/10.1083/jcb.201004104
19. Moore, J., Allan, C., Besson, S., Burel, J.M., Diel, E., Gault, D., Ko-
    zlowski, K., Lindner, D., Linkert, M., Manz, T., Moore, W., Pape, C., Tis-
    cher, C., Swedlow, J.R.: OME-NGFF: a next-generation file format for ex-
    panding bioimaging data-access strategies. Nature Methods pp. 1–3 (Nov
    2021). https://doi.org/10.1038/s41592-021-01326-w, https://www.nature.
    com/articles/s41592-021-01326-w, bandiera_abtest: a Cc_license_type: cc_by
    Cg_type: Nature Research Journals Primary_atype: Research Publisher: Na-
    ture Publishing Group Subject_term: Computational platforms and envi-
    ronments;Data publication and archiving Subject_term_id: computational-
    platforms-and-environments;data-publication-and-archiving
20. Ouyang, W., Beuttenmueller, F., Gómez-de Mariscal, E., Pape, C., Burke, T.,
    Garcia-López-de Haro, C., Russell, C., Moya-Sans, L., de-la Torre-Gutiérrez, C.,
    Schmidt, D., Kutra, D., Novikov, M., Weigert, M., Schmidt, U., Bankhead, P.,
    Jacquemet, G., Sage, D., Henriques, R., Muñoz-Barrutia, A., Lundberg, E., Jug,
    F., Kreshuk, A.: BioImage Model Zoo: A Community-Driven Resource for Accessi-
    ble Deep Learning in BioImage Analysis (Jun 2022). https://doi.org/10.1101/
    2022.06.07.495102, https://www.biorxiv.org/content/10.1101/2022.06.07.
    495102v1, pages: 2022.06.07.495102 Section: New Results
21. Ouyang, W., Le, T., Xu, H., Lundberg, E.: Interactive biomedical seg-
    mentation tool powered by deep learning and ImJoy. Tech. Rep. 10:142,
    F1000Research (Feb 2021). https://doi.org/10.12688/f1000research.50798.1,
    https://f1000research.com/articles/10-142, type: article
22. Paradis, D.J., Segee, B.: Remote Rendering and Rendering in Virtual Machines.
    In: 2016 International Conference on Computational Science and Computational
    Intelligence (CSCI). pp. 218–221 (Dec 2016). https://doi.org/10.1109/CSCI.
    2016.0048
23. Peddie, C.J., Genoud, C., Kreshuk, A., Meechan, K., Micheva, K.D., Narayan,
    K., Pape, C., Parton, R.G., Schieber, N.L., Schwab, Y., Titze, B., Verkade, P.,
    Weigel, A., Collinson, L.M.: Volume electron microscopy. Nature Reviews Methods
    Primers 2(1), 1–23 (Jul 2022). https://doi.org/10.1038/s43586-022-00131-9,
    https://www.nature.com/articles/s43586-022-00131-9, number: 1 Publisher:
    Nature Publishing Group
24. Ren, P., Xiao, Y., Chang, X., Huang, P.Y., Li, Z., Gupta, B.B., Chen, X.,
    Wang, X.: A Survey of Deep Active Learning. ACM Computing Surveys 54(9),
    180:1–180:40 (Oct 2021). https://doi.org/10.1145/3472291, https://doi.org/
    10.1145/3472291
25. Rocklin, M.: Dask: Parallel Computation with Blocked algorithms and Task
    Scheduling. Proceedings of the 14th Python in Science Conference pp.
                   BioSegment: Active Learning segmentation for BioSegment
                                                                3D imaging             19
                                                                                       25

    126–132 (2015). https://doi.org/10.25080/Majora-7b98e3ed-013, https://
    conference.scipy.org/proceedings/scipy2015/matthew_rocklin.html, confer-
    ence Name: Proceedings of the 14th Python in Science Conference
26. Roels, J.: NeuralNets (May 2022), https://github.com/JorisRoels/neuralnets,
    original-date: 2019-11-29T09:59:01Z
27. Roels, J., Hennies, J., Saeys, Y., Philips, W., Kreshuk, A.: Domain Adaptive Seg-
    mentation In Volume Electron Microscopy Imaging. In: 2019 IEEE 16th Inter-
    national Symposium on Biomedical Imaging (ISBI 2019). pp. 1519–1522. IEEE,
    Venice, Italy (Apr 2019). https://doi.org/10.1109/ISBI.2019.8759383, https:
    //ieeexplore.ieee.org/document/8759383/
28. Roels, J., Saeys, Y.: Cost-efficient segmentation of electron microscopy images
    using active learning. arXiv:1911.05548 [cs] (Nov 2019), http://arxiv.org/abs/
    1911.05548, arXiv: 1911.05548
29. Roels, J., Vernaillen, F., Kremer, A., Gonçalves, A., Aelterman, J., Luong, H.Q.,
    Goossens, B., Philips, W., Lippens, S., Saeys, Y.: An interactive ImageJ plugin for
    semi-automated image denoising in electron microscopy. Nature Communications
    11(1), 771 (Feb 2020). https://doi.org/10.1038/s41467-020-14529-0, https:
    //www.nature.com/articles/s41467-020-14529-0, number: 1 Publisher: Nature
    Publishing Group
30. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for
    Biomedical Image Segmentation (May 2015). https://doi.org/10.48550/arXiv.
    1505.04597, http://arxiv.org/abs/1505.04597, number: arXiv:1505.04597
    arXiv:1505.04597 [cs]
31. Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Piet-
    zsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., Tinevez, J.Y.,
    White, D.J., Hartenstein, V., Eliceiri, K., Tomancak, P., Cardona, A.: Fiji: an
    open-source platform for biological-image analysis. Nature Methods 9(7), 676–
    682 (Jul 2012). https://doi.org/10.1038/nmeth.2019, https://www.nature.
    com/articles/nmeth.2019, number: 7 Publisher: Nature Publishing Group
32. Sener, O., Savarese, S.: Active Learning for Convolutional Neural Networks: A
    Core-Set Approach. arXiv:1708.00489 [cs, stat] (Jun 2018), http://arxiv.org/
    abs/1708.00489, arXiv: 1708.00489
33. Settles, B.: Active Learning Literature Survey. Technical Report, University of
    Wisconsin-Madison Department of Computer Sciences (2009), https://minds.
    wisconsin.edu/handle/1793/60660, accepted: 2012-03-15T17:23:56Z
34. Sofroniew, N., Lambert, T., Evans, K., Nunez-Iglesias, J., Bokota, G., Winston, P.,
    Peña-Castellanos, G., Yamauchi, K., Bussonnier, M., Doncila Pop, D., Can Solak,
    A., Liu, Z., Wadhwa, P., Burt, A., Buckley, G., Sweet, A., Migas, L., Hilsenstein, V.,
    Gaifas, L., Bragantini, J., Rodríguez-Guerra, J., Muñoz, H., Freeman, J., Boone,
    P., Lowe, A., Gohlke, C., Royer, L., PIERRÉ, A., Har-Gil, H., McGovern, A.:
    napari: a multi-dimensional image viewer for Python (May 2022). https://doi.
    org/10.5281/zenodo.6598542, https://zenodo.org/record/6598542
35. Tkachenko, M., Malyuk, M., Holmanyuk, A., Liubimov, N.: Label Studio: Daata
    labeling software (2020), https://github.com/heartexlabs/label-studio,
    original-date: 2019-06-19T02:00:44Z
36. Wolny, A., Yu, Q., Pape, C., Kreshuk, A.: Sparse Object-level Supervision
    for Instance Segmentation with Pixel Embeddings (Apr 2022). https://doi.
    org/10.48550/arXiv.2103.14572, http://arxiv.org/abs/2103.14572, number:
    arXiv:2103.14572 arXiv:2103.14572 [cs]
20
26     B. Rombaut
          Rombaut, et
                    J. al.
                       Roels, Y. Saeys

37. Xiao, C., Chen, X., Li, W., Li, L., Wang, L., Xie, Q., Han, H.: Automatic Mito-
    chondria Segmentation for EM Data Using a 3D Supervised Convolutional Net-
    work. Frontiers in Neuroanatomy 12 (2018). https://doi.org/10.3389/fnana.
    2018.00092, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6224513/, pub-
    lisher: Frontiers Media SA
38. Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., Timofte, R.: Plug-and-
    Play Image Restoration with Deep Denoiser Prior (Jul 2021). https://doi.
    org/10.48550/arXiv.2008.13751, http://arxiv.org/abs/2008.13751, number:
    arXiv:2008.13751 arXiv:2008.13751 [cs, eess]
39. Zuiderveld, K.: VIII.5. - Contrast Limited Adaptive Histogram Equalization.
    In: Heckbert, P.S. (ed.) Graphics Gems, pp. 474–485. Academic Press (Jan
    1994). https://doi.org/10.1016/B978-0-12-336156-1.50061-6, https://www.
    sciencedirect.com/science/article/pii/B9780123361561500616