Mapping Slums with Deep Learning Feature Extraction
Agatha Mattos1 , Michela Bertolotto1 and Gavin McArdle1
1
    School of Computer Science, University College Dublin, Ireland


                                          Abstract
                                          Many real-world problems present challenges that still have not been solved by the machine learning community, despite the
                                          high availability of satellite imagery and recent advances in computer vision. In particular, techniques which are cheaper and
                                          less reliant on large data sets are needed to map slums in cities. This study presents preliminary results using deep learning
                                          feature extraction followed by clustering using k-means, an unsupervised method, to detect slums in Sentinel-2 satellite
                                          imagery. The clusters that represented deprived areas in cities are identified using a data set which contains information
                                          about the topology of the urban areas derived from crowd-sourced digital maps. Overall, the unsupervised method performed
                                          worse than the baseline, a fine-tuned ResNet18 model (a supervised approach). The mean Intersection over Union for the
                                          two investigated locations (Mumbai and Capetown) was 0.46 and 0.51 for the supervised model, and 0.27 and 0.31 for the
                                          unsupervised model. Results suggest that other strategies for dealing with such imbalanced data sets need to be investigated
                                          to improve the results obtained for the slum class, and also strategies to automatically identify the clusters that represent
                                          deprived areas/slums. The code used in this paper is available at: https://github.com/ml-labs-crt/slums-unsupervised.

                                          Keywords
                                          Deep learning Feature Extraction, Slums, Deprived Areas, Machine Learning, Earth Observation


1. Introduction                                                                                  cessing that could provide current estimates [5, 1, 3]. The
                                                                                                 next section outlines the literature pertinent to slum map-
The last decade saw a surge in the availability of satel- ping and further details the motivations for this paper.
lite imagery and the development of image processing
techniques. With this increase, it was expected that more
societal challenges would be solved using remote sensing
                                                                                                 1.1. Related Work
data and machine learning. However, many important Since 2012, there has been a popularisation of deep learn-
societal problems have not yet completely benefited from ing architectures, and they have been shown to perform
the higher availability of imagery or current develop- well in many classification tasks. In line with this trend,
ments in computer vision. Many factors contribute to the research to map slums moved from traditional image
this situation, especially the high cost of acquiring and processing approaches to supervised learning methods
processing very-high-resolution satellite imagery [1, 2], using deep learning and high or very-high-resolution im-
and the lack of labelled data related to many societal agery [6]. In 2017, Mboga et al. [7] and Persello and Stein
problems, required to train supervised machine-learning [8] demonstrated that convolutional neural networks
models [3].                                                                                      outperformed feature extraction methods and since then,
              This work investigates the potential of employing many works employing neural networks to map slums
freely available medium-resolution satellite imagery and have been published.
feature extraction using deep learning, an unsupervised                                             However, the great majority of studies to date rely on
approach that does not require labelled data, to detect de- supervised learning and costly high or very-high satellite
prived/slum areas in two cities (Mumbai and Capetown). imagery [6], and hence consider only small areas [5, 2].
Slums, according to the United Nations Habitat, are loca- Additionally, many researchers have found that models
tions where residents lack at least one of the following: developed for one city do not generalise well to other
water, sanitation, housing durability, security of tenure areas [9, 10, 1]. For a global slum inventory to be pos-
or sufficient living area [4]. The UN-Habitat estimates sible, these issues need to be tackled, and unsupervised
that over one billion people live in such conditions, but learning may be a suitable alternative.
because most of the information about these settlements                                             Nonetheless, the literature on mapping slums with un-
comes from outdated census surveys [5], there is an in- supervised learning techniques is limited. To the best of
terest to explore other forms of data collection and pro- our knowledge, [11] and [12] are currently the most repre-
                                                                                                 sentative works, though both have limitations. Block et al.
CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth
Observation, July 25, 2022, Vienna, Austria                                                      [11] employs high-resolution imagery and St. Amand
Envelope-Open agatha.hennigendemattos@ucdconnect.ie (A. Mattos);                                 [12] relies heavily on visual inspection for decision mak-
michela.bertolotto@ucd.ie (M. Bertolotto); gavin.mcardle@ucd.ie                                  ing. This paper presents our initial results of developing
(G. McArdle)                                                                                     a pipeline to map slums using freely available medium-
                    © 2022 Copyright for this paper by its authors. Use permitted under Creative
                    Commons License Attribution 4.0 International (CC BY 4.0).                   resolution satellite imagery, unsupervised learning and
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: Left: Sentinel-2 image with ten meters resolution of Mumbai, India. The yellow polygons were annotated as slum
areas and were used as ground truth in the supervised model. Right: Similar to the image on the left, but of Capetown, South
Africa.


automated classification of slum clusters using topologi-      As expected, this is a hugely imbalanced data set, as only
cal information derived from crowd-sourced digital maps.       3% of the tiles are slums in Mumbai and less than 1%
In the next section, the methodology used in this study        in Capetown. Table 1 presents an analysis of the areas
is described.                                                  covered in this paper.
                                                                  As suggested by other researchers [14] and to mimic
                                                               a real-world scenario, only 20% of the slum tiles were
2. Methodology                                                 used to train the model. Also, the non-slum class was
                                                               undersampled with a proportion of 4 to 1, in an attempt to
Two locations were used to investigate the potential of
                                                               account for the imbalance in the data set. The remaining
feature extraction using deep learning and posterior clus-
                                                               80% of the slum tiles and non-slum tiles were used to test
tering: Mumbai, in India, and Capetown, in South Africa.
                                                               the model. As a result, the baseline was trained with 399
The satellite imagery was collected by Gram-Hansen et al.
                                                               tiles (80 slums and 319 non-slums), in the case of Mumbai,
[1] and consists of Sentinel-2 images with ten metres res-
                                                               and with 532 tiles (106 slums and 426 non-slums) for
olution. These cities have been investigated by other
                                                               Capetown. The code used in this paper is available at
researchers, and hence are ideal for testing the proposed
                                                               https://github.com/ml-labs-crt/slums-unsupervised.
unsupervised method. As in Block et al. [11]’s experi-
ments, three bands were used (blue, green and red) and
the imagery was scaled from 16-bit to 8-bit. Figure 1          2.1. Baseline
shows the satellite imagery of the locations.                  The model adopted as the baseline was a fine-tuned
   The imagery was split into tiles of 20 by 20 pixels         ResNet18, trained initially on ImageNet images. The su-
(approximately 200 x 200 metres), slightly bigger than         pervised model choice followed from the results obtained
those used by Taubenböck et al. [13], who also adopted         by Bell and Veeeraraghavan [15], who tested ResNet mod-
medium-resolution imagery in their research. The base-         els of different sizes. Both the supervised model (baseline)
line model to which the unsupervised approach was com-         and the unsupervised model were implemented in Py-
pared was a supervised model trained with ground-truth         Torch 1.10.2. The supervised model was trained using
data collected by Gram-Hansen et al. [1]. For a tile to be     a batch size of 8 and for 50 epochs. Early stopping was
considered as belonging to a certain class, at least 50% of    triggered when the average loss of the validation set was
the pixels in that tile would have to be from that class.      20% higher than the average of the last 10 epochs. The re-
Table 1
Analysis of the Areas

                           Height        Width       Non-slum      Slum          Non-slum              Slum
             Location    # of pixels   # of pixels   # of tiles   # of tiles   % of total tiles   % of total tiles
             Mumbai         3920          1980         18831        573            97.0%               3.0%
            Capetown        6080          5300         79799        761            99.1%               0.9%


Figure 2: Left: Complexity scores for Mumbai, India. Darker polygons denote smaller complexity scores. In the background,
Sentinel-2 satellite imagery of the same city. Right: Similar to the image on the left, but of Capetown, South Africa.


sults were evaluated using Intersection over Union (IoU), complexity score designed by Soman et al. [18] was lever-
as is commonly done in the related literature.            aged. Figure 2 shows the complexity score for the two
                                                          areas investigated in this paper. The complexity score
2.2. Unsupervised Approach                                for Mumbai ranged between 0 and 20, and for Capetown,
                                                          between 0 and 18. Lower scores denote less developed
The unsupervised model’s features were extracted using areas. This complexity score was set based on informa-
a ResNet18 model pre-trained with the ImageNet data set. tion available on OpenStreetMap. For this reason, some
Care was taken so that the exact same tiles were used locations within the city do not have a complexity score.
to train both models. The extracted features for each In the case of Mumbai, 41% of all pixels did not have a
tile (a vector with 1000 rows) were subsequently fitted complexity score (mostly areas where water bodies are)
to a k-means model initialised using sklearn’s default and in Capetown that was the case for 63% of the pixels.
initialisation and 100 repetitions. The number of repeti- The median complexity score of each cluster was calcu-
tions was set following from Fränti and Sieranoja [16]. lated using the average complexity score of the pixels in
The number of clusters chosen was seventeen, and it was each tile. Subsequently, clusters with the lowest values
selected based on Taubenböck et al. [17]’s work, who of median complexity score were assigned as “slum clus-
analysed satellite imagery of 110 cities worldwide using ters” (see details of each ones on Table 2). In the next
the Local Climate Zones Classification Scheme (that has section, the results are discussed.
seventeen different climate zones).
   Lastly, to decide which clusters should be considered
slums and which should be labelled as non-slums, the
Table 2
Number of Tiles in Each Cluster and Median Complexity of Clusters. In Bold, Clusters That Were Identified as “Slum Clusters”

                                       Mumbai                                               Capetown
  Cluster_ID     Non-slum       Slum      Tiles per     Median        Non-slum        Slum       Tiles per    Median
                 # of tiles   # of tiles Cluster %     Complexity     # of tiles    # of tiles Cluster %     Complexity
       0           1812          37          9.8%          3.01           1452        35          1.9%           2.87
       1            698          18          3.8%          3.00           6553        72          8.3%           3.00
       2           1252          25          6.8%          3.89          3760          2          4.7%           3.00
       3            272           5          1.5%          2.95           1413         9          1.8%           2.21
       4           1708          32          9.2%          3.49          5853          4          7.3%           3.00
       5           1887          62         10.3%          3.00           5147        56          6.5%           2.77
       6             6            0          0.0%          3.75          4588          6          5.8%           3.00
       7           1461          69          8.1%          3.00           4276        42          5.4%           2.58
       8           1922          28         10.4%          3.61          8238          54         10.4%          3.00
       9           1038          21          5.6%          3.00           7036        26          8.8%           3.00
      10            1951         22         10.5%          3.73          3121         37           4.0%          3.00
      11             18           3          0.1%          3.77          6944         37           8.7%          2.91
      12            1133         35          6.2%          4.00          5088         73           6.5%          2.49
      13            1102         35          6.0%          3.60          3006           5          3.8%          3.00
      14            556           9          3.0%          3.79          5808         32           7.3%          3.00
      15            829          22          4.5%          3.07           5167        106         6.6%           2.16
      16            730          36          4.1%          3.02           1741        13          2.2%           2.74
     Total         18375         459        100%                         79191        609         100%


3. Results and Discussion                                    with the ground truth to obtain Intersection over Union
                                                             (IoU) scores that could be compared to the baseline results
The extraction of features using deep learning was carried obtained with the supervised model. Due to all clusters
out for two locations (Mumbai and Cape Town). For having a non-negligible amount of non-slum tiles in them
Mumbai, the percentage of tiles assigned to each cluster (see Table 2), overall, the unsupervised learning model
was in the range of 0.03% to 10.5%, and for Capetown it performed worse than the supervised method. Figure 3
was in the range of 1.8% to 10.4%. Using the ground-truth shows a visualisation of the clusters and Table 3 has the
data, it was possible to observe that some clusters did intersection over union (IoU) for each class and for each
contain most of the slum tiles; for instance, clusters 5 model.
and 7 for Mumbai contained 13.5% and 15% of the total           Both models had an intersection over union (IoU) be-
slums tiles. Similarly, clusters 1, 12 and 15 for Capetown low 0.10 for the slum class, caused by tiles being classified
contained 11.8%, 12.0% and 17.4% of all slum tiles. Table as slums even when they were not labelled like that in
2 describes the number of tiles assigned to each cluster. the ground-truth data. The obtained results suggest that
   As mentioned in Section 2, the decision of which clus- oversampling the non-slum areas with a 4 to 1 ratio may
ters would be considered “slum clusters” took into consid- not be an appropriate strategy for dealing with the huge
eration the average complexity of the pixels of each tile imbalance in this problem. Moreover, the use of com-
in that cluster. Though Soman et al. [18] suggests in their plexity scores needs further investigation to determine
paper that areas with a complexity score smaller than 5 or the best strategy to set the complexity threshold for each
6 could be considered informal settlements, in the cities location. In the way that it was employed in this experi-
covered in this study, this would result in all clusters ment, it did not help identify the less developed/slums
being labelled as slums. For example, for Capetown the clusters. Other parameters set in the experiment may
median complexity for all clusters was in the range of 2.16 need to be reviewed to increase performance, such as the
to 3.0. In the case of Mumbai it was in the range of 2.95 to tile dimension and number of clusters.
4.0. For this reason, only clusters that had a complexity       Nonetheless, the mean IoU of the unsupervised method
below the median cluster complexity for each location outperformed the results obtained by Gram-Hansen et al.
were considered ”slum clusters”. In the case of Mumbai, [1] in the case of Capetown (0.17 versus 0.31) and was
it meant clusters with a median complexity below 3.49 only slightly worse than the case of Mumbai (0.40 versus
and for Capetown clusters with a median complexity 0.27). The intersection over union (IoU) for the slum class,
below 3.0 (see Table 2). All tiles in the so-called “slum however, was smaller than obtained by Gram-Hansen
clusters” were then assigned a slum label and compared et al. [1] for both locations. Still, Gram-Hansen et al.
Figure 3: Left: Areas in yellow were predicted as non-slums (unsupervised approach). Areas in red were labelled as slums
(ground truth). In the background, satellite imagery of Mumbai. Right: Similar to the image on the left, but of Capetown.


Table 3
Results of the Binary Classification of Urban Areas into Slum/Non-slum Classes Using Intersection over Union (Iou)

                                    Supervised learning                       Unsupervised learning
             Location      IoU Non-slum IoU Slum mean IoU             IoU Non-slum IoU Slum mean IoU
             Mumbai             0.84           0.08         0.46           0.52           0.03         0.27
            Capetown            0.95           0.07         0.51           0.60           0.01         0.31
           All locations        0.90           0.08         0.49           0.56           0.02         0.29


[1] used convolutional neural networks and very-high-       techniques could be used to mask out regions that are
resolution imagery (30cm per pixel) in their experiments,   clearly not urban, such as water and vegetation. These
which indicates that unsupervised learning and freely       changes would reduce the total number of non-slum tiles
available medium-resolution imagery can be promising        and potentially make the problem less imbalanced. Addi-
for this real-world application.                            tionally, the adoption of block complexity derived from
                                                            crowd-sourced digital maps requires further investiga-
                                                            tion to determine its usability as a tool to identify clusters
4. Conclusions and Future Work                              that represent deprived areas/slums. Performing feature
                                                            extraction using a deep learning model pre-trained with
This experiment presents the initial results of an attempt
                                                            a remote sensing data, as opposed to ImageNet, may
to use deep learning feature extraction and unsupervised
                                                            also be beneficial. Also, it would be interesting to see a
learning to map slums. Results demonstrate that the
                                                            comparison of the deep features extracted from medium-
proposed method performed worse than the baseline, a
                                                            resolution satellite imagery and very-high-resolution im-
supervised learning approach.
                                                            agery for the same location with the intention of confirm-
   Looking to the future, it would be desirable to investi-
                                                            ing that the former can satisfactorily be employed for
gate strategies to improve the results for the slum class,
                                                            mapping slums using unsupervised learning. Lastly, to
such as oversampling the slum class to the point of elim-
                                                            develop a global slum inventory, the analysis developed
inating the imbalance, as suggested in [19], or adopting
                                                            here could be extended to estimate the population living
more sophisticated sampling for the non-slum class. It
                                                            in the areas identified as deprived/slums.
is also possible that more traditional image processing
Acknowledgments                                                   ing Letters 14 (2017) 2325–2329. doi:10.1109/LGRS.
                                                                  2017.2763738 .
This publication has emanated from research supported         [9] Y. Gao, Assessing the spatial transferability of fully
in part by a grant from Science Foundation Ireland under          convolutional networks for slum mapping (2020).
Grant number 18/CRT/6183. For the purpose of Open            [10] T. Stark, M. Wurm, X. X. Zhu, H. Taubenböck,
Access, the author has applied a CC BY public copyright           Satellite-Based Mapping of Urban Poverty With
license to any Author Accepted Manuscript version aris-           Transfer-Learned Slum Morphologies, IEEE Jour-
ing from this submission.                                         nal of Selected Topics in Applied Earth Observa-
                                                                  tions and Remote Sensing 13 (2020) 5251–5263.
                                                                  doi:10.1109/JSTARS.2020.3018862 .
References                                                   [11] J. Block, M. Yazdani, M. Nguyen, D. Crawl,
 [1] B. J. Gram-Hansen, P. Helber, I. Varatharajan,               M. Jankowska, J. Graham, T. DeFanti, I. Altintas, An
     F. Azam, A. Coca-Castro, V. Kopackova, P. Bilin-             Unsupervised Deep Learning Approach for Satellite
     ski, Mapping Informal Settlements in Developing              Image Analysis with Applications in Demographic
     Countries using Machine Learning and Low Res-                Analysis, in: 2017 IEEE 13th International Con-
     olution Multi-spectral Data, in: Proceedings of              ference on E-Science (e-Science), 2017, pp. 9–18.
     the 2019 AAAI/ACM Conference on AI, Ethics, and              doi:10.1109/eScience.2017.13 .
     Society, AIES ’19, Association for Computing Ma-        [12] F. St. Amand, Identification of Slums in Mum-
     chinery, New York, NY, USA, 2019, pp. 361–368.               bai, India: Unsupervised Classification Techniques,
     doi:10.1145/3306618.3314253 .                                Thinking Matters Symposium Archive (2014).
 [2] A. C. H. de Mattos, G. McArdle, M. Bertolotto,          [13] H. Taubenböck, H. Debray, C. Qiu, M. Schmitt,
     Mapping Slums with Medium Resolution Satel-                  Y. Wang, X. X. Zhu, Seven city types represent-
     lite Imagery: A Comparative Analysis of Multi-               ing morphologic configurations of cities across the
     Spectral Data and Grey-level Co-occurrence Ma-               globe, Cities 105 (2020) 102814. doi:10.1016/j.
     trix Techniques, 2021. doi:10.48550/arXiv.2106.              cities.2020.102814 .
     11395 . arXiv:2106.11395 .                              [14] G. Leonita, M. Kuffer, R. Sliuzas, C. Persello, Ma-
 [3] M. Burke, A. Driscoll, D. Lobell, S. Ermon, Using            chine Learning-Based Slum Mapping in Support of
     Satellite Imagery to Understand and Promote Sus-             Slum Upgrading Programs: The Case of Bandung
     tainable Development, Working Paper 27879, Na-               City, Indonesia, Remote Sensing 10 (2018) 1522.
     tional Bureau of Economic Research, 2020. doi:10.            doi:10.3390/rs10101522 .
     3386/w27879 .                                           [15] B. Bell, R. Veeeraraghavan, Locating informal urban
 [4] U. N. Habitat, Tracking progress towards inclusive,          settlements, in: AI for Social Good Workshop, 2020.
     safe, resilient and sustainable cities and human set-   [16] P. Fränti, S. Sieranoja, How much can k-means be
     tlements (2018).                                             improved by using better initialization and repeats?,
 [5] R. Mahabir, A. Croitoru, A. T. Crooks, P. Agouris,           Pattern Recognition 93 (2019) 95–112. doi:10.1016/
     A. Stefanidis, A Critical Review of High and Very            j.patcog.2019.04.014 .
     High-Resolution Remote Sensing Approaches for           [17] H. Taubenböck, M. Weigand, T. Esch, J. Staab,
     Detecting and Mapping Slums: Trends, Challenges              M. Wurm, J. Mast, S. Dech, A new ranking of the
     and Emerging Opportunities, Urban Science 2                  world’s largest cities—Do administrative units ob-
     (2018) 8. doi:10.3390/urbansci2010008 .                      scure morphological realities?, Remote Sensing
 [6] M. Kuffer, K. Pfeffer, R. Sliuzas, Slums from                of Environment 232 (2019) 111353. doi:10.1016/j.
     Space—15 Years of Slum Mapping Using Remote                  rse.2019.111353 .
     Sensing, Remote Sensing 8 (2016) 455. doi:10.3390/      [18] S. Soman, A. Beukes, C. Nederhood, N. Marchio,
     rs8060455 .                                                  L. M. A. Bettencourt, Worldwide Detection of Infor-
 [7] N. Mboga, C. Persello, J. R. Bergado, A. Stein, De-          mal Settlements via Topological Analysis of Crowd-
     tection of informal settlements from VHR satellite           sourced Digital Maps, ISPRS International Jour-
     images using convolutional neural networks, in:              nal of Geo-Information 9 (2020) 685. doi:10.3390/
     2017 IEEE International Geoscience and Remote                ijgi9110685 .
     Sensing Symposium (IGARSS), 2017, pp. 5169–5172.        [19] M. Buda, A. Maki, M. A. Mazurowski, A system-
     doi:10.1109/IGARSS.2017.8128166 .                            atic study of the class imbalance problem in con-
 [8] C. Persello, A. Stein, Deep Fully Convolutional Net-         volutional neural networks, Neural Networks 106
     works for the Detection of Informal Settlements in           (2018) 249–259. doi:10.1016/j.neunet.2018.07.
     VHR Images, IEEE Geoscience and Remote Sens-                 011 . arXiv:1710.05381 .