=Paper= {{Paper |id=Vol-3181/paper12 |storemode=property |title=Unsupervised Image Segmentation via Self-Supervised Learning Image Classification |pdfUrl=https://ceur-ws.org/Vol-3181/paper12.pdf |volume=Vol-3181 |authors=Andrea Storås |dblpUrl=https://dblp.org/rec/conf/mediaeval/Storas21 }} ==Unsupervised Image Segmentation via Self-Supervised Learning Image Classification== https://ceur-ws.org/Vol-3181/paper12.pdf
Unsupervised Image Segmentation via Self-Supervised Learning
                    Image Classification
                                                                      Andrea M. Storås1
                                                                       1 SimulaMet, Norway

                                                                         andrea@simula.no
ABSTRACT
This paper presents the submission of team Medical-XAI for the
Medico: Transparency in Medical Image Segmentation task held at
MediaEval 2021. We propose an unsupervised method that utilizes
tools from the field of explainable artificial intelligence to create
segmentation masks. We extract heat maps, which are useful in
order to explain how the ‘black box’ model predicts the category of
a certain image, and the segmentation masks are directly derived
from the heat maps. Our results show that the created masks can
capture the relevant findings to a certain extent using only a small
amount of image-level labeled data for the classification model and                  Figure 1: Overview of the complete pipeline of the pro-
no segmentation masks at all for the training. This is promising for                 posed solution. We start with a small number of labeled
addressing different challenges within the intersection of artificial                and large number of unlabeled images, which are clustered
intelligence for medicine such as availability of data, cost of labeling             using global image features. The clusters are used to label
and interpretable and explainable results.                                           unlabeled images. This is repeated until we have a sufficient
                                                                                     number of labeled images for training a deep neural network.
1    INTRODUCTION                                                                    From the resulting model, we extract the Grad-CAM repre-
                                                                                     sentation from the layers and use a threshold to obtain the
Medical image segmentation is one of the focus areas for researchers
                                                                                     segmentation mask.
working on artificial intelligence (AI) and medicine. Especially since
the release of U-Net [9], the field has somehow exploded, leading                    are clustered together with a high number of unlabeled images from
to a myriad of publications of different segmentation approaches                     the HyperKvasir data set [1]. Unlabeled images that fall into the
within different medical specialisations. One of the areas that get                  cluster with the highest number of labeled images get the same
most of the attention is the segmentation of polyps in the colon.                    label as the labeled images in the cluster. The process is repeated
An important motivation is that colon cancer is one of the most                      until a sufficient number of labeled images is reached. The k-means
prominent cancers worldwide and early detection by finding polyps                    algorithm from scikit-learn [6] is used for clustering, and different
is an efficient method to reduce mortality. The MediaEval Medico                     numbers of clusters are tested. The k-means algorithm is always
challenge 2021 [2] is using this important medical challenge as a                    initialized with init = ‘k-means++’, random_state = 0, max_iter =
task in addition to adding an extra challenge by asking participants                 300 and algorithm = ‘auto’. The rest of the hyperparameters are set
to provide as transparent solutions as possible. Transparency within                 to default values. An EfficientNet-b1 classifier [4] implemented in
AI is a rather new concept and has several sub-parts that contribute                 Pytorch [5] is trained on the labeled images to predict the correct
to it ranging from open data, over explainable and interpretable                     category.
results to open source code. Our solution for solving this year’s task                  The Adam optimizer and the BCEWithLogitsLoss from PyTorch
is going a step further by also addressing the problem of dealing                    with default hyperparameter settings are applied during model
with a low amount of labeled segmentation data because obtaining                     training. The classifier is evaluated on images with known labels
accurate labels for the medical data is often difficult due to the avail-            to measure model performance as well as evaluating the clustering
ability of medical experts. In the following, we provide a detailed                  technique for labeling images. Grad-CAM [10] is applied to create
description of our approach, followed by experimental results and                    heat maps highlighting which pixels the classifier focuses on during
a detailed discussion of the advantages and disadvantages of our                     classification. Since the model is trained to detect polyps, we expect
method.                                                                              the heat maps to highlight these as segments. Segmentation masks
                                                                                     are constructed from the heat maps. Several thresholds are tested
2    APPROACH                                                                        and the most promising values are selected based on visual inspec-
Our method consists of several steps, as illustrated in Figure 1. First,             tion. All created models and source code can be found publicly
global features are extracted from the images, and clustering is                     online1 .
applied for labeling unlabeled medical images. A few labeled images                     For each task we submitted five different runs with different
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons   configurations of our system. Run 1 applied the development data
License Attribution 4.0 International (CC BY 4.0).                                   set provided in the challenge. In order to get images without polyps,
MediaEval’21, December 13-15 2021, Online
                                                                                     1 https://github.com/kelkalot/Medico-2021-Team-Medical-XAI
MediaEval’21, December 13-15 2021, Online                                                                                            A. Storås

                                                                               Table 1: Results for Subtask 1: polyp segmentation.

                                                                          Run     Accuracy    Jaccard   Dice      F1        Recall    Precision
                                                                          #1      0.6762      0.1115    0.1812    0.1812    0.3971    0.1465
                                                                          #2      0.6009      0.0991    0.1696    0.1696    0.4105    0.1316
       (a)              (b)               (c)              (d)            #3      0.6018      0.1075    0.1816    0.1816    0.4497    0.1391
                                                                          #4      0.6402      0.1175    0.1861    0.1861    0.3728    0.1505
                                                                          #5      0.5214      0.1388    0.2211    0.2211    0.6337    0.1572


                                                                         Table 2: Results for Subtask 2: inference speed. Abbreviations:
                                                                         Av: average, Mi: minimum, Ma: maximum.
       (e)              (f)               (g)              (h)

                                                                          Run     Av-time    Mi-time    Ma-time    Av-fps    Mi-fps     Ma-fps
Figure 2: (a, e) Original images, (b,f) ground truth segmenta-
tion masks, (c, g) heat maps and (d, h) generated segmentation            #1      0.11       0.09       0.12       9.34      10.72      8.05
masks for Run 1 and Run 3.                                                #2      0.11       0.10       0.13       8.98      9.97       7.75
                                                                          #3      0.11       0.10       0.13       8.95      10.10      7.94
                                                                          #4      0.09       0.09       0.09       11.00     11.19      10.54
                                                                          #5      0.09       0.09       0.09       10.97     11.14      10.71
each image was split into 4 tiles. The corresponding segmentation
masks were used to label 100 tiles as ‘polyps’ and 100 as ‘non-              In terms of how fast the inference is performed, we actually
polyps’. The rest of the tiles were unlabeled. For Runs 2 - 5 we         observe that the model based on the larger number of clusters is
used 100 images labeled as ‘polyps’ and 100 images labeled as            faster than the others. The reason for this is not clear, although
‘non-polyps’ from the Kvasir data set [7]. Unlabeled images from         it seems that higher performance is connected to faster inference
the HyperKvasir data set [1] were included for labeling using the        time. This needs to be investigated more in future work.
clustering technique described above.
   For all runs, the following global features were extracted using      4     DISCUSSION AND CONCLUSION
the LIRE [3, 8] library: EdgeHistogram, Tamura, LuminanceLayout          To cluster the medical images, we used global features, which are
and SimpleColorHistogram. The internal evaluations of the classi-        easy to interpret and increase the transparency of our system. The
fiers were performed on 60 images not used for training. All models      heat maps are useful in order to explain how the ‘black box’ model
were trained for 25 epochs on 1, 000 labeled images for a fair com-      predicts the category of a certain image, and the segmentation
parison. Heat maps were generated by passing the images through          masks are directly derived from the heat maps. We believe the level
the final model. We explored extracting heat maps from several           of transparency of our system is quite high. The overall performance
layers in order to identify the most appropriate model. The different    of the approach is for sure on the lower scale, but taking into
configurations are the following: Run 1: 50 clusters, layer 14; Run      account that it is completely unsupervised segmentation, it can
2: 200 clusters, layer 13; Run 3: 200 clusters, layer 14; Run 4: 250     still be considered as good. Overall the presented method seems
clusters; layer 20; Run 5: 250 clusters, layer 22. ‘Layer’ corresponds   promising for medical applications and opens up several directions
to the layer in the model that the heat maps were extracted from.        of future work.
The segmentation masks were constructed from the heat maps. The
thresholds were set as > 0.4 and < 0.7 for Runs 1 - 3, while the         5     FUTURE WORK
threshold was > 0.28 for Runs 4 and 5.                                   1, 000 images were used to train the classifiers. For future work, we
                                                                         will test how the performance changes with an increasing number
                                                                         of images. Moreover, we want to explore other global features for
3    RESULTS AND ANALYSIS                                                clustering the images as well as deep features. Other clustering
Segmentation masks for two of the runs are illustrated in Figures 2a     algorithms should also be tested. By connecting the knowledge we
- 2h. Table 1 shows the results for Subtask 1, which is polyp seg-       have about the global features with the deep features, we might get
mentation. Overall, we observe that our method does not reach            an interpretation about what features (color, texture) are important
perfect scores, but taking into account that the segmentation is         for a certain disease. We will also look into other techniques for
performed unsupervised, it still achieves acceptable results. We also    generating segmentation masks from heat maps. We plan to test the
observe that the number of clusters is connected to how good the         system on other types of medical data sets, including non-image
performance is. This is most probably due to the fact that a higher      data.
number of clusters leads to more specialized clusters, which then
have a positive effect on the performance of the following model.        6     ACKNOWLEDGEMENTS
However, the choice of layer for the heat maps and thresholds for        I thank Michael A. Riegler for the assistance with experiments,
the segmentations could also affect the performance.                     methodology and writing.
Medico: Transparency in Medical Image Segmentation                               MediaEval’21, December 13-15 2021, Online


REFERENCES
 [1] Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks,
     Debesh Jha, Eskeland Sigrun L, Kristin Ranheim Rand l, Konstantin
     Pogorelov, Mathias Lux, Duc Tien Dang Nguyen, Dag Johansen,
     Carsten Griwodz, Stensland Håkon K, Enrique Garcia-Ceja, Peter T
     Schmidt, Hugo L Hammer, Michael A Riegler, Pål Halvorsen, and
     Thomas de Lange. 2020. HyperKvasir, a comprehensive multi-class
     image and video dataset for gastrointestinal endoscopy. Scientific Data
     7, 1 (2020), 283. https://doi.org/10.1038/s41597-020-00622-y
 [2] Steven Hicks, Debesh Jha, Vajira Thambawita, Hugo Hammer, Thomas
     de Lange, Sravanthi Parasa, Michael Riegler, and Pål Halvorsen. 2021.
     Medico Multimedia Task at MediaEval 2021: Transparency in Medical
     Image Segmentation. In Proceedings of MediaEval 2021 CEUR Work-
     shop.
 [3] Mathias Lux, Michael Riegler, Pål Halvorsen, Konstantin Pogorelov,
     and Nektarios Anagnostopoulos. 2016. Lire: open source visual infor-
     mation retrieval. In Proceedings of the 7th International Conference on
     Multimedia Systems. 1–4.
 [4] Luke Melas-Kyriazi. 2019. EfficientNet-Pytorch. https://github.com/
     lukemelas/EfficientNet-PyTorch. (2019).
 [5] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James
     Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia
     Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward
     Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chil-
     amkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala.
     2019. PyTorch: An Imperative Style, High-Performance Deep Learn-
     ing Library. In Advances in Neural Information Processing Systems 32,
     H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and
     R. Garnett (Eds.). Curran Associates, Inc., 8024–8035.
 [6] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O.
     Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
     A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
     2011. Scikit-learn: Machine Learning in Python. Journal of Machine
     Learning Research 12 (2011), 2825–2830.
 [7] Konstantin Pogorelov, Kristin Ranheim Randel, Carsten Griwodz,
     Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Con-
     cetto Spampinato, Duc-Tien Dang-Nguyen, Mathias Lux, Peter The-
     lin Schmidt, Michael Riegler, and Pål Halvorsen. 2017. KVASIR: A
     Multi-Class Image Dataset for Computer Aided Gastrointestinal Dis-
     ease Detection. In Proceedings of the 8th ACM on Multimedia Sys-
     tems Conference (MMSys’17). ACM, New York, NY, USA, 164–169.
     https://doi.org/10.1145/3083187.3083212
 [8] Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler.
     2014. How ’how’ reflects what’s what: content-based exploitation
     of how users frame social images. In Proceedings of the 22nd ACM
     international conference on Multimedia. 397–406.
 [9] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net:
     Convolutional networks for biomedical image segmentation. In Inter-
     national Conference on Medical image computing and computer-assisted
     intervention. Springer, 234–241.
[10] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakr-
     ishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual
     explanations from deep networks via gradient-based localization. In
     Proceedings of the IEEE international conference on computer vision.
     618–626.