<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Kvikstad and E. A. M. Janssen.
Multiscale approach for whole-slide image segmentation of ve tissue classes
in urothelial carcinoma slides. https://doi.org/10.1177/1533033820946787.
Accepted for publication in Journal of Technology in Cancer Research and
Treatment (TCRT) on 19 June 2020. (in press).
[21] E. Edston</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1361-8415</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Semi-supervised Tissue Segmentation of Histological Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ove Nicolai Dalheim</string-name>
          <email>ove.nicolai@dalheim.as</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rune Wetteland</string-name>
          <email>rune.wetteland@uis.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vebj rn Kvikstad</string-name>
          <email>vebjorn.kvikstad@sus.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emiel A.M. Janssen</string-name>
          <email>emilius.adrianus.maria.janssen@sus.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kjersti Engan</string-name>
          <email>kjersti.engan@uis.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Chemistry, Bioscience and Environmental Engineering, University of Stavanger</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Electrical Engineering and Computer Science, University of Stavanger</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Pathology, Stavanger University Hospital</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>1</volume>
      <fpage>978</fpage>
      <lpage>989</lpage>
      <abstract>
        <p>Supervised learning of convolutional neural networks (CNN) used for image classi cation and segmentation has produced state-of-theart results, including in many medical image applications. In the medical eld, making ground truth labels would typically require an expert opinion, and a common problem is the lack of labeled data. Consequently, the models might not be general enough. Digitized histological microscopy images of tissue biopsies are very large, and detailed truth markings for tissue-type segmentation are scarce or non-existing. However, in many cases, large amounts of unlabeled data that could be exploited are readily accessible. Methods for semi-supervised learning exists, but are hardly explored in the context of computational pathology. This paper deals with semi-supervised learning on the application of tissue-type classi cation in histological whole-slide images of urinary bladder cancer. Two semi-supervised approaches utilizing the unlabeled data in combination with a small set of labeled data is presented. A multiscale, tile-based segmentation technique is used to classify tissue into six di erent classes by the use of three individual CNNs. Each CNN is presented tissue at di erent magni cation levels in order to detect di erent feature types, later fused in a fully-connected neural network. The two self-training approaches are: using probabilities and using a clustering technique. The clustering method performed best and increased the overall accuracy of the tissue tile classi cation model from 94.6% to 96% compared to using supervised learning with labeled data. In addition, the clustering method generated visually better segmentation images.</p>
      </abstract>
      <kwd-group>
        <kwd>CNN</kwd>
        <kwd>semi-supervised learning</kwd>
        <kwd>bladder cancer</kwd>
        <kwd>histolog- ical images</kwd>
        <kwd>tissue segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In Norway, 1 748 patients were diagnosed, and 319 people died from bladder
cancer in 2018. The majority of these, at 73%, were male, while the remaining
27% were female [
        <xref ref-type="bibr" rid="ref3">1</xref>
        ]. Worldwide in 2018, 199 922 people of both sexes died of
bladder cancer [
        <xref ref-type="bibr" rid="ref4">2</xref>
        ], and 549 393 new patients were diagnosed, placing bladder
cancer as the 10th most common cancer type in the world. Since 2001, bladder
cancer (including the urinary tract) has been the fourth most common cancer
diagnosis for men in Norway [
        <xref ref-type="bibr" rid="ref1 ref5">3</xref>
        ][
        <xref ref-type="bibr" rid="ref6">4</xref>
        ][
        <xref ref-type="bibr" rid="ref7">5</xref>
        ][
        <xref ref-type="bibr" rid="ref8">6</xref>
        ]. In addition, bladder cancer is known as
one of the most recurring cancer types, with the probability of recurrence for
high-risk patients after one year at 61% [
        <xref ref-type="bibr" rid="ref9">7</xref>
        ].
      </p>
      <p>
        An important step in determining the cancer stage and correct treatment
plan for bladder cancer patients is to examine the tissue samples that are
extracted during transurethral resection. The tissue samples contain large amounts
of information from individual cell characteristics, to speci c cell quantities in
large tissue clusters. Scanning and digitalization of the histological stains
produce whole slide images (WSI), uncovering the eld of computational pathology.
A signi cant increase is seen in the number of tissue samples sent to pathologist
labs, a ecting the waiting time for patients [
        <xref ref-type="bibr" rid="ref10">8</xref>
        ]. The increase in amount of
specimens is unfortunately not seen in the number of pathologists. Another aspect
is that since the WSI is studied manually, pathologists staging and grading of
bladder cancer may di er in relation to the same tissue as pathologists have
a di erent set of subjective expectations and experiences. With computational
pathology, computerized tools can aid the pathologist in diagnostic predictions,
localization of interesting regions, and segmentation, to name a few applications.
      </p>
      <p>
        During the last decade, convolutional neural networks (CNN) have proven
very useful in image processing and image classi cation tasks [
        <xref ref-type="bibr" rid="ref11">9</xref>
        ][
        <xref ref-type="bibr" rid="ref12">10</xref>
        ]. CNNs are
gaining popularity also in medical image processing and in computational
pathology. The most common way to train neural networks (NN) is by supervised
learning (SL) and backpropagation. This requires a large training set where
all samples have associated relevant ground truth labels. Labeled data within
medicine is often limited, and producing it is a time-consuming process that
requires annotations made by experts. A way around the lack of labels is
clustering or unsupervised learning. One method is the use of autoencoders, where a
compression-decompression setup is used, making the network try to reconstruct
the original input [
        <xref ref-type="bibr" rid="ref13">11</xref>
        ]. The learned features are found at the most compressed
state, and might ultimately be connected to a classi cation network. The
drawback here is that they rarely perform as well as models trained with a supervised
method.
      </p>
      <p>
        CNNs are referred to as shift-invariant, meaning that a particular feature
can be detected wherever it may be located in the image. Intuitively, the initial
layers of a CNN can be viewed as feature extraction, while the last layers can be
viewed as the most task-speci c object detection or classi cation layers. There
are many parameters to go about when setting up a new CNN, and normally
large quantities of labeled data are needed to do so. Therefore, the rst layers
can be inherited from a pre-trained network, and the last layers are trained from
scratch, a process known as transfer learning [
        <xref ref-type="bibr" rid="ref14">12</xref>
        ].
      </p>
      <p>
        A consolidation of the above methods is semi-supervised learning (SSL),
where both labeled and unlabeled data is used to train a network. This can
be bene cial in cases where there are small amounts of labeled data, but large
quantities of unlabeled data. Di erent semi-supervised methods exist, like
graphbased learning methods that often implement clustering algorithms to locate and
distinguish inputs in feature space [
        <xref ref-type="bibr" rid="ref15">13</xref>
        ]. One other semi-supervised method called
self-training aims to rst train a NN on labeled data in a supervised manner.
Thereafter, predictions are found for new unlabeled data using the rst model,
and nally, a new model can be trained on both the ground truth labels from
annotations and the weak labels from the predictions [14].
      </p>
      <p>In very recent years, we nd some works on semi-supervised learning within
computational pathology. In Dercksen et al. [15], a method based on
autoencoders and k-means clustering of features is presented. A combination of
contrastive predictive coding and multiple instance learning on breast cancer data is
presented in Lu et al. [16]. In Peikari et al. [17], a cluster-then-label approach is
taken using SVM classi ers. Our group presented a method for multiclass tissue
classi cation of urothelial carcinoma in [18][19][20]. Encouraged by the results,
but challenged by the lack of labeled data to generalize the model further and
utilize larger amounts of unlabeled data, we propose to combine the TRI-CNN
transfer-learning based architecture with semi-supervised learning.</p>
      <p>This paper presents two methods within self-training applied to tissue
segmentation of WSIs of urothelial carcinoma. The rst method is a
probabilitybased method based on predicted probabilities from an initial model. The second
method is a cluster-based self-training method based on both predicted
probability from the initial model and local neighborhood in the predictions.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Material and Methods</title>
      <p>Data Material
The material used in this paper consists of tissue samples from tumors of
patients with bladder cancer in the form of urothelial carcinoma. The tumor is
removed from the patient through Transurethral Resection of Bladder Tumor
(TURBT) by the use of a resectoscope. The resectoscope holds a heated wire
loop for removing the tumors, and the resulting tissue will often bear marks with
burnt or torn tissue. After the tumor is removed, it is xed in formalin before
being embedded into para n. When the para n is solidi ed, it has a similar
consistency to tissue and can more easily be sliced into 4 m thick slides with
a microtome. Variation in slice thickness can occur, in turn sourcing problems
like color variation and tissue folds in the resulting image, opposing and extra
challenge to the classi er. The slices are then stained with Hematoxylin Eosin
Sa ron [21] and further scanned with the digital slide scanner system, Leica
SCN400, to produce the WSI. This, as well as previous work done on the same
dataset, leads to the six classes which can be seen in Fig. 1.</p>
      <p>The manually marked ground truth dataset, Dgt, consists of 37 patients, from
which 125,020 tiles have been extracted. The labels originate from annotations
made at 400x magni cation by a pathologist at Stavanger University Hospital,
(VK), illustrated in Fig. 2. It is a private dataset, however, reasonable requests
may be made to the corresponding author. The three extracted tiles have the
same size of 128x128 pixels, but are extracted at di erent magni cation levels.
The lower magni cation tiles (25x, 100x) have a larger eld-of-view than the
high magni cation tile (400x), allowing the multiscale model to capture both
details and context of the input images. The coordinates are then saved with
the three magni cation levels, accompanied by its corresponding ground truth
label. The dataset was divided into Dgtftraing consisting of 103,650 tiles from
29 patients, and Dgtftestg consisting of 21,370 tiles from 8 patients.</p>
      <p>46 new patients from the unlabeled dataset were chosen to extract tiles from,
with the two self-training methods. For the probability-based method, a total of
121,239 tiles were extracted from all 46 patients and formed the probability-weak
dataset, Dpw. For the cluster-based method, a total of 221,612 tiles were collected
from 44 patients and formed the cluster-weak dataset, Dcw. An overview is
presented in Table 1.
This section presents the original model, which originates from the framework
developed by Wetteland et al. [20]. Afterwards, the methods behind the two
self-training approaches within semi-supervised learning are explained.
Initial supervised approach The original model arises from a traditional
supervised learning method, using the ground truth labels presented in Table
1. The dataset, Dgt, is split between a train/test ratio of approximately 83/17,
taking into account that the same patient does not exist in both sets regardless
of class. The individual per-class train/test split varies from a 86/14 ratio for
blood to a 74/26 ratio for stroma. All models trained using a SL approach are
referred to as TRI-SL.</p>
      <p>Fig. 3: Illustration of the multiscale TRI-architecture used in all models.</p>
      <p>As illustrated in Fig. 3, the architecture of the TRI-CNN model utilizes
transfer learning by implementing three VGG16 models [22] in parallel that
operate individually. The VGG16 network converts a 128x128x3 input RGB
image into a feature vector with dimension 1x512. This is done by a sequence
of ve CNN blocks that each consist of two or three CNN layers followed by
a recti ed linear unit (ReLU) layer and nally an average pooling layer. The
three 1x512 outputs from the VGG16 models are then merged into a single
1x1536 layer followed by a fully-connected neural network (FCNN). The FCNN
consists of two layers with 4096 neurons each, with one dropout layer between
them. Thereafter, another dropout layer before the nal output layer classi es
the tissue with a Softmax activation function. Each VGG16 network is fed the
input tiles at the three di erent magni cations 25x, 100x and 400x, to allow for
di erent features to be detected at each level. The multiscale model is therefore
abbreviated with the name TRI-CNN, which originates from the nomenclature
in Wetteland et al. [20].</p>
      <p>Probability-based self-training The probability-based self-training method
is the most straight forward approach within self-training. Each of the 46 images
is split up into tiles of size 128x128 pixels, and each tile is classi ed by the original
model, TRI-SL-AF, which is trained on the ground truth labels. Every tile that
is classi ed with a minimum probability threshold of 60% is saved, while tiles
classi ed with lower probability are discarded. The 60% threshold is a
tradeo between acquiring enough tiles while having a large enough probability. As
illustrated in Fig. 4, the saved tiles are then selected based on several criteria
given in Table 2. All models trained using the probability-based self-training
method are referred to as TRI-P-SSL.</p>
      <p>The method used to select tiles from the 46 patients is designed to select
the tiles only based on its probability score across all WSIs. First, a scan runs
through all the patients and counts the number of tiles per patient. Patients with
an insu cient number of tiles according to the minimum number of tiles per
WSI are discarded, and tiles are collected from the remaining patients. For each
patient, tiles with the highest probability are collected rst, until the maximum
number of tiles per WSI has been collected, or no more su cient tiles remain.
All tiles from all WSIs are then appended to an array and sorted based on
probability. The tiles with the highest probability are then selected from this
array according to the maximum total number of tiles. This is done for each
class and later saved to the probability-weak dataset Dpw, see Table 1.</p>
      <p>Cluster-based self-training Similar to the probability-based method, the
cluster-based method uses model TRI-SL-AF to classify the WSIs. The tiles are
classi ed with a minimum of 60% probability, and tiles with a lower
probability are discarded. The classi ed tiles are then selected based on several criteria
listed in Table 3. A visual representation of this is illustrated in Fig. 5. All
models trained using the cluster-based self-training method are referred to as
TRI-C-SSL.</p>
      <p>An algorithm searches through the tiles and groups them into clusters. If, at
any point in the search, the maximum number of tiles per cluster is not reached,
the di erence is appended to the limit of the next cluster in line. The average
cluster probability is calculated per cluster, and the clusters are sorted after
the highest probability. Each cluster originating in the WSI is then sorted into
an array, and the program selects the clusters based on the highest probability
according to the maximum number of clusters. The labels are then saved to the
cluster-weak dataset Dcw, see Table 1.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental setup</title>
      <p>Six multiscale models are presented in this paper, and the following letters
are used to describe them: SL is short for supervised learning, and SSL for
semi-supervised learning. P indicates that the models are trained through the
probability-based self-training method, and C implies that the cluster-based
selftraining method is used. A refers to that augmentation by rotation of tiles is
involved. F and U refer to the weights in the VGG16 models being frozen or
unfrozen during training, respectively. An overview is given in Table 4.
Models TRI-SL and TRI-SL-AU were trained through supervised learning
on dataset Dgtftraing and tested on Dgtftestg, see Table 1. The models based
on the probability-based self-training method, TRI-P-SSL and TRI-P-SSL-AU,
were trained on the labels in both Dgtftraing and Dpw. TRI-C-SSL and
TRI-CSSL-AU were trained with the cluster-based self-training method on labels from
both datasets Dgtftraing and Dcw. The models TRI-SL-F, TRI-P-SSL-F, and
TRI-C-SSL-F were trained with VGG16 frozen, meaning only the FCNN and
output layer was trained. For models TRI-SL-AU, TRI-P-SSL-AU, and
TRIC-SSL-AU, the VGG16 model was unfrozen during training, and weight in the
whole network was adjusted.</p>
      <p>For the original model, TRI-SL-F, stroma and muscle tissue tiles were
augmented by rotation to produce two times as many tiles in an e ort to equalize
the dataset with respect to tiles per class. For models TRI-P-SSL-F and
TRIC-SSL-F, no augmentation was used. Models TRI-SL-AU, TRI-P-SSL-AU, and
TRI-C-SSL-AU were all trained with 3x augmentation of tiles in all classes
except background, as the background is ltered out such that only the foreground
is processed by the models. This is done to save processing time, however,
background tiles containing debris were not ltered out, and needs to be processed.</p>
      <p>During training of all six models the learning rate was set to 1.5e-4 at a
batchsize of 128. The stochastic gradient descent (SGD) backpropagation algorithm
was used as optimizer, and the dropout rate was set to 20%. An early-stopping
criterion was set to end training when the change in validation loss was smaller
than 1e-6 for six consecutive epochs. No weighting of the di erent labels in the
datasets was used during training. All methods were implemented in Python
3.5, with TensorFlow 1.13 [23] and Keras 2.3 [24]. Scikit-learn [25] was used for
evaluation, and PyVips [26] was used to process the images.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>All six multiscale models were tested on dataset Dgtftestg, yielding the results
in Table 5. To further investigate the individual model performance with
regards to segmentation, a new WSI was segmented by all six models by tile-wise
classifying all foreground regions without overlap. The WSI has been annotated
by a pathologist and has not been used during training before. This WSI is
referred to as WSI segment test, and the predictions of the WSI is compared to
the ground truth annotations in it. Fig. 6 shows the close-up 400x image of an
area in WSI segment test, where the whole foreground is labeled as blood, with
the corresponding prediction by all six models. A visual comparison of an area
in WSI segment test with multiple tissue classes is presented at a lower
magni cation in Fig. 7a. Predictions of the corresponding area made by both the
models with the lowest and highest accuracy are compared in Fig. 7b and 7c.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and limitations</title>
      <p>The most accurate model is the SSL based model TRI-C-SSL-AU, which
improved the accuracy by 1.38% compared to the model from a pure supervised
approach, TRI-SL-AF. Through a comparison of the predictions of
TRI-C-SSLAU with the other models, it also appears superior with regards to segmentation,
being the model with the least faulty predictions in the annotated regions in Fig.
7. In addition, the prediction map in Fig. 7c, produced with model C-SSL-AU,
appears to have less noise when compared to the others for WSI segment test.
Comparing the results in Table 5 with the di erent predictions in Fig. 6,
it would be reasonable to assume that the model with the highest F1-Score
for blood, TRI-SL-AU, would produce the most accurate prediction.
TRI-SLAU is trained through a traditional supervised approach on dataset Dgtftraing
that contains a relatively large amount of urothelium tiles, and achieves the
2nd highest F1-Score for urothelium. This is, however, quite the opposite of the
situation, as it is the model that predicted the most urothelium tiles in the blood
area in Fig. 6. This is most likely an outcome with several underlying factors:
The labeled training set Dgtftraing is quite small, with an even smaller test set
Dgtftestg. It is also possible that the area in Fig. 6 contains features not present
in the ground truth dataset.</p>
      <p>Each WSI will typically produce hundreds of thousands of tiles, opposing a
challenge when selecting tiles through a probability-based self-training method.
A large number of tiles will have a high probability if the speci c class is trained
with many labels in the original model, i.e., more features have been learned for
that class. To counter this, a minimum tile per patient threshold was set to
discard WSI containing a small number of tiles, as they are most likely misclassi ed.
This does, however, not prevent over-representation of the top-left portion of the
WSIs, which will occur when a WSI contains large amounts of su cient tiles of
a certain class. One might also argue that the model will not learn that many
new features from tiles it already is 100% certain about and that the method
becomes more of an alternative to augmentation.</p>
      <p>By using the cluster-based approach, it is safer to include tiles of lower
probability, as it is safe to assume that tiles closer to each other are more likely to hold
the same label. Also, the method ensures that tiles are distributed more evenly
across the WSIs in comparison to the probability-based self-training method.
This can also be seen as augmentation, and an unfrozen VGG16 model has a
signi cant improvement when comparing the two cluster-based models
TRI-CSSL-F and TRI-C-SSL-AU, where accuracy increases from 95.12% to 95.99%
respectively. The opposite e ect is observed for augmenting and unfreezing with
the probability-based models, decreasing the accuracy from 95.19% for
TRI-PSSL-F to 94.85% for TRI-P-SSL-AU. The SSL models without augmentation,
TRI-P-SSL-F and TRI-C-SSL-F, performed relatively equal with regards to
classi cation, however, TRI-C-SSL-F performs best with regards to segmentation.</p>
      <p>As the models are fed three levels of magni cation, where the ground truth
marking is based on the 400x magni cation, the corresponding 100x and 25x
images contain very little of the same tissue type in some cases. This causes
problems for the models, especially if the 100x and 25x images are both of a
di erent tissue class than the 400x image. An example of this is how several tiles
of ground truth label blood are predicted as background in Fig. 6, as this area
is rather isolated from nearby tissue.</p>
      <p>A limiting factor of this study is the small size of muscle and stroma compared
to the other classes in the ground truth dataset. Augmentation techniques are
implemented to try and mitigate this issue, but still, the accuracy of muscle is
not as high as the other classes.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and future work</title>
      <p>The supervised model, TRI-SL-AF, trained only on the ground truth dataset,
Dgtftraing, achieved an accuracy of 94.61%, with 2x augmentation of the two
classes with the lowest representation. By including the cluster-weak dataset,
Dcw, the model TRI-C-SSL-AU improved the accuracy by 1.38%. Furthermore,
F1-Score stayed the same or increased for every single class, and a distinct
improvement is seen when comparing the prediction maps in Fig. 6 and 7.</p>
      <p>The probability-based model TRI-P-SSL-AU saw a signi cant improvement
in classifying urothelium, with an increase of 1.44% in F1-Score, from an initial
98.08%. The accuracy was, however, only increased by 0.24%, as the model had
a large reduction in F1-Score for blood.</p>
      <p>The two di erent semi-supervised methods tested, both outperformed the
supervised methods with regards to classi cation and segmentation. This shows
that the combination of clusters and probability is better than only probability.
The lack of labeled data makes both methods well suited to increase the
training data, however, our experiments conclude that no augmentation and frozen
VGG16 weights are preferred to using augmentation and unfrozen weights in a
pure probability-based approach.</p>
      <p>For the probability-based self-training method, better distribution of tiles in
the WSI is needed for this method to be improved. This can be achieved by
implementing linear spacing between all tiles of a su cient probability score per
WSI. For the cluster-based self-training method, several things can be considered
for future work: a) implementing a random selection of clusters with su cient
average probability, b) selecting clusters more evenly spaced, or c) increase
criteria for stroma and muscle tissue classes. Implementing mixup [27] to generate
more training data of under-represented classes could be a viable method for
improving segmentation capabilities with regards to tiles of several tissue types.</p>
      <p>A viable segmentation method for histological images can assist pathologists
in faster evaluation speeds, as pre-segmented images can immediately point out
regions of interest. In addition, the system could contribute to computer-aided
diagnosis systems, which can improve the rate of grading and staging of cancer
and result in a more unison and objective diagnosis.</p>
      <p>(a) Region location in WSI segment test.
(b) TRI-SL-AF.
(c) TRI-SL-AU-F.
(d) TRI-P-SSL-F.
(e) TRI-P-SSL-AU.
(f) TRI-C-SSL-F.
(g) TRI-C-SSL-AU.</p>
      <p>Fig. 6: Predictions for a region in WSI segment test with ground truth label
blood. Color speci es predicted tile class: Blue = Urothelium tissue, Red =
Blood cells, Black = Background.
(a) Ground truth annotations. Colours represent ground truth annotated areas: Green
= Blood, Black = Urothelium, Cyan = Damaged.</p>
      <p>(b) TRI-SL-AF.</p>
      <p>(c) TRI-C-SSL-AU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>Table 3: Tile criteria for cluster-weak dataset Dcw</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Ba = Background tiles, Bl = Blood tiles, Da = Damaged tissue tiles, Mu = Muscle tissue tiles, St = Stroma tissue tiles, Ur = Urothelium tiles</article-title>
          .
          <source>Criteria Ba Bl Da Mu St Ur Min. tiles per WSI 50 20 50 20 50 50 Max. tiles per WSI 20 000 20 000</source>
          <volume>798 4 815 1 440 1</volume>
          <fpage>235</fpage>
          Max. clusters
          <source>per WSI 100 100 100 100 100 100 Min. cluster size 50 20 50 20 50 50 Max. tiles per cluster 20 500 20 500 20 500 20 500 20 500 20 500 Min. avg. cluster probability 60% 60% 60% 60% 60% 60%</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Kreftregisteret. BL REKREFT</surname>
          </string-name>
          . URL https://www.kreftregisteret.no/ Temasider/kreftformer/blarekreft/.
          <source>Last accessed 29.04</source>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>The</given-names>
            <surname>Global Cancer Observatory</surname>
          </string-name>
          ,
          <source>World Health Organization. Bladder, source: Globocan</source>
          <year>2018</year>
          . URL https://gco.iarc.fr/today/data/factsheets/ cancers/30-
          <string-name>
            <surname>Bladder-</surname>
          </string-name>
          fact-sheet.
          <source>pdf. Last accessed 15.04</source>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[3] Cancer Registry of Norway. Cancer in norway 2005</article-title>
          .
          <article-title>Cancer incidence, mortality, survival and</article-title>
          prevalence in Norway, page
          <volume>18</volume>
          ,
          <year>2006</year>
          . ISSN 0332-
          <fpage>9631</fpage>
          . URL https://www.kreftregisteret.no/globalassets/ publikasjoner-og
          <article-title>-rapporter/cin2005 del1 web</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[4] Cancer Registry of Norway. Cancer in norway 2010</article-title>
          .
          <article-title>Cancer incidence, mortality, survival and</article-title>
          prevalence in Norway, page
          <volume>26</volume>
          ,
          <year>2012</year>
          . ISSN 0332-
          <fpage>9631</fpage>
          . URL https://www.kreftregisteret.no/globalassets/cin/2010.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[5] Cancer Registry of Norway. Cancer in norway 2015</article-title>
          .
          <article-title>Cancer incidence, mortality, survival and</article-title>
          prevalence in Norway, page
          <volume>28</volume>
          ,
          <year>2016</year>
          . ISSN 0332-
          <fpage>9631</fpage>
          . URL https://www.kreftregisteret.no/globalassets/cancer-in-norway/
          <year>2015</year>
          /cin-2015.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[6] Cancer Registry of Norway. Cancer in norway 2018</article-title>
          .
          <article-title>Cancer incidence, mortality, survival and</article-title>
          prevalence in Norway, page
          <volume>20</volume>
          ,
          <year>2019</year>
          . ISSN 0806-
          <fpage>3621</fpage>
          . URL https://www.kreftregisteret.no/globalassets/cancer-in-norway/
          <year>2018</year>
          /cin2018.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Anastasiadis</surname>
          </string-name>
          , T. M. de Reijke.
          <article-title>Best practice in the treatment of nonmuscle invasive bladder cancer</article-title>
          .
          <source>4Therapeutic Advances in Urology:13{32</source>
          ,
          <year>2012</year>
          . https://doi.org/10.1177/17562872114319763.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Stavanger</given-names>
            <surname>Aftenblad</surname>
          </string-name>
          .
          <article-title>Pasienter ma vente atte uker pa pr vesvar</article-title>
          .
          <year>2020</year>
          . URL https://www.aftenbladet.no/lokalt/i/Wk332/ pasienter-ma
          <article-title>-vente-atte-uker-pa-prvesvar.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Litjens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kooi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Bejnordi</surname>
          </string-name>
          et al.
          <article-title>A survey on deep learning in medical image analysis</article-title>
          .
          <source>Medical Image Analysis</source>
          ,
          <volume>42</volume>
          :
          <fpage>60</fpage>
          {
          <fpage>88</fpage>
          ,
          <year>2017</year>
          . https://doi.org/10.1016/j.media.
          <year>2017</year>
          .
          <volume>07</volume>
          .005.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Suzuki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nakamura</surname>
          </string-name>
          ,
          <string-name>
            <surname>H. Itoh.</surname>
          </string-name>
          <article-title>Survey on neural networks used for medical image processing</article-title>
          .
          <source>International journal of computational science</source>
          ,
          <volume>3</volume>
          :
          <fpage>86</fpage>
          {
          <fpage>100</fpage>
          ,
          <year>2009</year>
          . URL https://www.ncbi.nlm.nih.gov/pmc/articles/ PMC4699299/.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zou</surname>
          </string-name>
          .
          <article-title>Unsupervised spectral{spatial feature learning with stacked sparse autoencoder for hyperspectral imagery classi cation</article-title>
          .
          <source>IEEE Geoscience and Remote Sensing Letters</source>
          ,
          <volume>12</volume>
          (
          <issue>12</issue>
          ):
          <volume>2438</volume>
          {
          <fpage>2442</fpage>
          ,
          <year>2015</year>
          . https://doi.org/10.1109/LGRS.
          <year>2015</year>
          .
          <volume>2482520</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Pan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>22</volume>
          (
          <issue>10</issue>
          ):
          <volume>1345</volume>
          {
          <fpage>1359</fpage>
          ,
          <year>2010</year>
          . https://doi.org/10.1109/TKDE.
          <year>2009</year>
          .
          <volume>191</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          et al.
          <article-title>Semi-supervised discriminative classi cation with application to tumorous tissues segmentation of mr</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>