Mapping Slums with Deep Learning Feature Extraction Agatha Mattos1 , Michela Bertolotto1 and Gavin McArdle1 1 School of Computer Science, University College Dublin, Ireland Abstract Many real-world problems present challenges that still have not been solved by the machine learning community, despite the high availability of satellite imagery and recent advances in computer vision. In particular, techniques which are cheaper and less reliant on large data sets are needed to map slums in cities. This study presents preliminary results using deep learning feature extraction followed by clustering using k-means, an unsupervised method, to detect slums in Sentinel-2 satellite imagery. The clusters that represented deprived areas in cities are identified using a data set which contains information about the topology of the urban areas derived from crowd-sourced digital maps. Overall, the unsupervised method performed worse than the baseline, a fine-tuned ResNet18 model (a supervised approach). The mean Intersection over Union for the two investigated locations (Mumbai and Capetown) was 0.46 and 0.51 for the supervised model, and 0.27 and 0.31 for the unsupervised model. Results suggest that other strategies for dealing with such imbalanced data sets need to be investigated to improve the results obtained for the slum class, and also strategies to automatically identify the clusters that represent deprived areas/slums. The code used in this paper is available at: https://github.com/ml-labs-crt/slums-unsupervised. Keywords Deep learning Feature Extraction, Slums, Deprived Areas, Machine Learning, Earth Observation 1. Introduction cessing that could provide current estimates [5, 1, 3]. The next section outlines the literature pertinent to slum map- The last decade saw a surge in the availability of satel- ping and further details the motivations for this paper. lite imagery and the development of image processing techniques. With this increase, it was expected that more societal challenges would be solved using remote sensing 1.1. Related Work data and machine learning. However, many important Since 2012, there has been a popularisation of deep learn- societal problems have not yet completely benefited from ing architectures, and they have been shown to perform the higher availability of imagery or current develop- well in many classification tasks. In line with this trend, ments in computer vision. Many factors contribute to the research to map slums moved from traditional image this situation, especially the high cost of acquiring and processing approaches to supervised learning methods processing very-high-resolution satellite imagery [1, 2], using deep learning and high or very-high-resolution im- and the lack of labelled data related to many societal agery [6]. In 2017, Mboga et al. [7] and Persello and Stein problems, required to train supervised machine-learning [8] demonstrated that convolutional neural networks models [3]. outperformed feature extraction methods and since then, This work investigates the potential of employing many works employing neural networks to map slums freely available medium-resolution satellite imagery and have been published. feature extraction using deep learning, an unsupervised However, the great majority of studies to date rely on approach that does not require labelled data, to detect de- supervised learning and costly high or very-high satellite prived/slum areas in two cities (Mumbai and Capetown). imagery [6], and hence consider only small areas [5, 2]. Slums, according to the United Nations Habitat, are loca- Additionally, many researchers have found that models tions where residents lack at least one of the following: developed for one city do not generalise well to other water, sanitation, housing durability, security of tenure areas [9, 10, 1]. For a global slum inventory to be pos- or sufficient living area [4]. The UN-Habitat estimates sible, these issues need to be tackled, and unsupervised that over one billion people live in such conditions, but learning may be a suitable alternative. because most of the information about these settlements Nonetheless, the literature on mapping slums with un- comes from outdated census surveys [5], there is an in- supervised learning techniques is limited. To the best of terest to explore other forms of data collection and pro- our knowledge, [11] and [12] are currently the most repre- sentative works, though both have limitations. Block et al. CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth Observation, July 25, 2022, Vienna, Austria [11] employs high-resolution imagery and St. Amand Envelope-Open agatha.hennigendemattos@ucdconnect.ie (A. Mattos); [12] relies heavily on visual inspection for decision mak- michela.bertolotto@ucd.ie (M. Bertolotto); gavin.mcardle@ucd.ie ing. This paper presents our initial results of developing (G. McArdle) a pipeline to map slums using freely available medium- © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). resolution satellite imagery, unsupervised learning and CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Left: Sentinel-2 image with ten meters resolution of Mumbai, India. The yellow polygons were annotated as slum areas and were used as ground truth in the supervised model. Right: Similar to the image on the left, but of Capetown, South Africa. automated classification of slum clusters using topologi- As expected, this is a hugely imbalanced data set, as only cal information derived from crowd-sourced digital maps. 3% of the tiles are slums in Mumbai and less than 1% In the next section, the methodology used in this study in Capetown. Table 1 presents an analysis of the areas is described. covered in this paper. As suggested by other researchers [14] and to mimic a real-world scenario, only 20% of the slum tiles were 2. Methodology used to train the model. Also, the non-slum class was undersampled with a proportion of 4 to 1, in an attempt to Two locations were used to investigate the potential of account for the imbalance in the data set. The remaining feature extraction using deep learning and posterior clus- 80% of the slum tiles and non-slum tiles were used to test tering: Mumbai, in India, and Capetown, in South Africa. the model. As a result, the baseline was trained with 399 The satellite imagery was collected by Gram-Hansen et al. tiles (80 slums and 319 non-slums), in the case of Mumbai, [1] and consists of Sentinel-2 images with ten metres res- and with 532 tiles (106 slums and 426 non-slums) for olution. These cities have been investigated by other Capetown. The code used in this paper is available at researchers, and hence are ideal for testing the proposed https://github.com/ml-labs-crt/slums-unsupervised. unsupervised method. As in Block et al. [11]’s experi- ments, three bands were used (blue, green and red) and the imagery was scaled from 16-bit to 8-bit. Figure 1 2.1. Baseline shows the satellite imagery of the locations. The model adopted as the baseline was a fine-tuned The imagery was split into tiles of 20 by 20 pixels ResNet18, trained initially on ImageNet images. The su- (approximately 200 x 200 metres), slightly bigger than pervised model choice followed from the results obtained those used by Taubenböck et al. [13], who also adopted by Bell and Veeeraraghavan [15], who tested ResNet mod- medium-resolution imagery in their research. The base- els of different sizes. Both the supervised model (baseline) line model to which the unsupervised approach was com- and the unsupervised model were implemented in Py- pared was a supervised model trained with ground-truth Torch 1.10.2. The supervised model was trained using data collected by Gram-Hansen et al. [1]. For a tile to be a batch size of 8 and for 50 epochs. Early stopping was considered as belonging to a certain class, at least 50% of triggered when the average loss of the validation set was the pixels in that tile would have to be from that class. 20% higher than the average of the last 10 epochs. The re- Table 1 Analysis of the Areas Height Width Non-slum Slum Non-slum Slum Location # of pixels # of pixels # of tiles # of tiles % of total tiles % of total tiles Mumbai 3920 1980 18831 573 97.0% 3.0% Capetown 6080 5300 79799 761 99.1% 0.9% Figure 2: Left: Complexity scores for Mumbai, India. Darker polygons denote smaller complexity scores. In the background, Sentinel-2 satellite imagery of the same city. Right: Similar to the image on the left, but of Capetown, South Africa. sults were evaluated using Intersection over Union (IoU), complexity score designed by Soman et al. [18] was lever- as is commonly done in the related literature. aged. Figure 2 shows the complexity score for the two areas investigated in this paper. The complexity score 2.2. Unsupervised Approach for Mumbai ranged between 0 and 20, and for Capetown, between 0 and 18. Lower scores denote less developed The unsupervised model’s features were extracted using areas. This complexity score was set based on informa- a ResNet18 model pre-trained with the ImageNet data set. tion available on OpenStreetMap. For this reason, some Care was taken so that the exact same tiles were used locations within the city do not have a complexity score. to train both models. The extracted features for each In the case of Mumbai, 41% of all pixels did not have a tile (a vector with 1000 rows) were subsequently fitted complexity score (mostly areas where water bodies are) to a k-means model initialised using sklearn’s default and in Capetown that was the case for 63% of the pixels. initialisation and 100 repetitions. The number of repeti- The median complexity score of each cluster was calcu- tions was set following from Fränti and Sieranoja [16]. lated using the average complexity score of the pixels in The number of clusters chosen was seventeen, and it was each tile. Subsequently, clusters with the lowest values selected based on Taubenböck et al. [17]’s work, who of median complexity score were assigned as “slum clus- analysed satellite imagery of 110 cities worldwide using ters” (see details of each ones on Table 2). In the next the Local Climate Zones Classification Scheme (that has section, the results are discussed. seventeen different climate zones). Lastly, to decide which clusters should be considered slums and which should be labelled as non-slums, the Table 2 Number of Tiles in Each Cluster and Median Complexity of Clusters. In Bold, Clusters That Were Identified as “Slum Clusters” Mumbai Capetown Cluster_ID Non-slum Slum Tiles per Median Non-slum Slum Tiles per Median # of tiles # of tiles Cluster % Complexity # of tiles # of tiles Cluster % Complexity 0 1812 37 9.8% 3.01 1452 35 1.9% 2.87 1 698 18 3.8% 3.00 6553 72 8.3% 3.00 2 1252 25 6.8% 3.89 3760 2 4.7% 3.00 3 272 5 1.5% 2.95 1413 9 1.8% 2.21 4 1708 32 9.2% 3.49 5853 4 7.3% 3.00 5 1887 62 10.3% 3.00 5147 56 6.5% 2.77 6 6 0 0.0% 3.75 4588 6 5.8% 3.00 7 1461 69 8.1% 3.00 4276 42 5.4% 2.58 8 1922 28 10.4% 3.61 8238 54 10.4% 3.00 9 1038 21 5.6% 3.00 7036 26 8.8% 3.00 10 1951 22 10.5% 3.73 3121 37 4.0% 3.00 11 18 3 0.1% 3.77 6944 37 8.7% 2.91 12 1133 35 6.2% 4.00 5088 73 6.5% 2.49 13 1102 35 6.0% 3.60 3006 5 3.8% 3.00 14 556 9 3.0% 3.79 5808 32 7.3% 3.00 15 829 22 4.5% 3.07 5167 106 6.6% 2.16 16 730 36 4.1% 3.02 1741 13 2.2% 2.74 Total 18375 459 100% 79191 609 100% 3. Results and Discussion with the ground truth to obtain Intersection over Union (IoU) scores that could be compared to the baseline results The extraction of features using deep learning was carried obtained with the supervised model. Due to all clusters out for two locations (Mumbai and Cape Town). For having a non-negligible amount of non-slum tiles in them Mumbai, the percentage of tiles assigned to each cluster (see Table 2), overall, the unsupervised learning model was in the range of 0.03% to 10.5%, and for Capetown it performed worse than the supervised method. Figure 3 was in the range of 1.8% to 10.4%. Using the ground-truth shows a visualisation of the clusters and Table 3 has the data, it was possible to observe that some clusters did intersection over union (IoU) for each class and for each contain most of the slum tiles; for instance, clusters 5 model. and 7 for Mumbai contained 13.5% and 15% of the total Both models had an intersection over union (IoU) be- slums tiles. Similarly, clusters 1, 12 and 15 for Capetown low 0.10 for the slum class, caused by tiles being classified contained 11.8%, 12.0% and 17.4% of all slum tiles. Table as slums even when they were not labelled like that in 2 describes the number of tiles assigned to each cluster. the ground-truth data. The obtained results suggest that As mentioned in Section 2, the decision of which clus- oversampling the non-slum areas with a 4 to 1 ratio may ters would be considered “slum clusters” took into consid- not be an appropriate strategy for dealing with the huge eration the average complexity of the pixels of each tile imbalance in this problem. Moreover, the use of com- in that cluster. Though Soman et al. [18] suggests in their plexity scores needs further investigation to determine paper that areas with a complexity score smaller than 5 or the best strategy to set the complexity threshold for each 6 could be considered informal settlements, in the cities location. In the way that it was employed in this experi- covered in this study, this would result in all clusters ment, it did not help identify the less developed/slums being labelled as slums. For example, for Capetown the clusters. Other parameters set in the experiment may median complexity for all clusters was in the range of 2.16 need to be reviewed to increase performance, such as the to 3.0. In the case of Mumbai it was in the range of 2.95 to tile dimension and number of clusters. 4.0. For this reason, only clusters that had a complexity Nonetheless, the mean IoU of the unsupervised method below the median cluster complexity for each location outperformed the results obtained by Gram-Hansen et al. were considered ”slum clusters”. In the case of Mumbai, [1] in the case of Capetown (0.17 versus 0.31) and was it meant clusters with a median complexity below 3.49 only slightly worse than the case of Mumbai (0.40 versus and for Capetown clusters with a median complexity 0.27). The intersection over union (IoU) for the slum class, below 3.0 (see Table 2). All tiles in the so-called “slum however, was smaller than obtained by Gram-Hansen clusters” were then assigned a slum label and compared et al. [1] for both locations. Still, Gram-Hansen et al. Figure 3: Left: Areas in yellow were predicted as non-slums (unsupervised approach). Areas in red were labelled as slums (ground truth). In the background, satellite imagery of Mumbai. Right: Similar to the image on the left, but of Capetown. Table 3 Results of the Binary Classification of Urban Areas into Slum/Non-slum Classes Using Intersection over Union (Iou) Supervised learning Unsupervised learning Location IoU Non-slum IoU Slum mean IoU IoU Non-slum IoU Slum mean IoU Mumbai 0.84 0.08 0.46 0.52 0.03 0.27 Capetown 0.95 0.07 0.51 0.60 0.01 0.31 All locations 0.90 0.08 0.49 0.56 0.02 0.29 [1] used convolutional neural networks and very-high- techniques could be used to mask out regions that are resolution imagery (30cm per pixel) in their experiments, clearly not urban, such as water and vegetation. These which indicates that unsupervised learning and freely changes would reduce the total number of non-slum tiles available medium-resolution imagery can be promising and potentially make the problem less imbalanced. Addi- for this real-world application. tionally, the adoption of block complexity derived from crowd-sourced digital maps requires further investiga- tion to determine its usability as a tool to identify clusters 4. Conclusions and Future Work that represent deprived areas/slums. Performing feature extraction using a deep learning model pre-trained with This experiment presents the initial results of an attempt a remote sensing data, as opposed to ImageNet, may to use deep learning feature extraction and unsupervised also be beneficial. Also, it would be interesting to see a learning to map slums. Results demonstrate that the comparison of the deep features extracted from medium- proposed method performed worse than the baseline, a resolution satellite imagery and very-high-resolution im- supervised learning approach. agery for the same location with the intention of confirm- Looking to the future, it would be desirable to investi- ing that the former can satisfactorily be employed for gate strategies to improve the results for the slum class, mapping slums using unsupervised learning. Lastly, to such as oversampling the slum class to the point of elim- develop a global slum inventory, the analysis developed inating the imbalance, as suggested in [19], or adopting here could be extended to estimate the population living more sophisticated sampling for the non-slum class. It in the areas identified as deprived/slums. is also possible that more traditional image processing Acknowledgments ing Letters 14 (2017) 2325–2329. doi:10.1109/LGRS. 2017.2763738 . This publication has emanated from research supported [9] Y. Gao, Assessing the spatial transferability of fully in part by a grant from Science Foundation Ireland under convolutional networks for slum mapping (2020). Grant number 18/CRT/6183. For the purpose of Open [10] T. Stark, M. Wurm, X. X. Zhu, H. Taubenböck, Access, the author has applied a CC BY public copyright Satellite-Based Mapping of Urban Poverty With license to any Author Accepted Manuscript version aris- Transfer-Learned Slum Morphologies, IEEE Jour- ing from this submission. nal of Selected Topics in Applied Earth Observa- tions and Remote Sensing 13 (2020) 5251–5263. doi:10.1109/JSTARS.2020.3018862 . References [11] J. Block, M. Yazdani, M. Nguyen, D. Crawl, [1] B. J. Gram-Hansen, P. Helber, I. Varatharajan, M. Jankowska, J. Graham, T. DeFanti, I. Altintas, An F. Azam, A. Coca-Castro, V. Kopackova, P. Bilin- Unsupervised Deep Learning Approach for Satellite ski, Mapping Informal Settlements in Developing Image Analysis with Applications in Demographic Countries using Machine Learning and Low Res- Analysis, in: 2017 IEEE 13th International Con- olution Multi-spectral Data, in: Proceedings of ference on E-Science (e-Science), 2017, pp. 9–18. the 2019 AAAI/ACM Conference on AI, Ethics, and doi:10.1109/eScience.2017.13 . Society, AIES ’19, Association for Computing Ma- [12] F. St. Amand, Identification of Slums in Mum- chinery, New York, NY, USA, 2019, pp. 361–368. bai, India: Unsupervised Classification Techniques, doi:10.1145/3306618.3314253 . Thinking Matters Symposium Archive (2014). [2] A. C. H. de Mattos, G. McArdle, M. Bertolotto, [13] H. Taubenböck, H. Debray, C. Qiu, M. Schmitt, Mapping Slums with Medium Resolution Satel- Y. Wang, X. X. Zhu, Seven city types represent- lite Imagery: A Comparative Analysis of Multi- ing morphologic configurations of cities across the Spectral Data and Grey-level Co-occurrence Ma- globe, Cities 105 (2020) 102814. doi:10.1016/j. trix Techniques, 2021. doi:10.48550/arXiv.2106. cities.2020.102814 . 11395 . arXiv:2106.11395 . [14] G. Leonita, M. Kuffer, R. Sliuzas, C. Persello, Ma- [3] M. Burke, A. Driscoll, D. Lobell, S. Ermon, Using chine Learning-Based Slum Mapping in Support of Satellite Imagery to Understand and Promote Sus- Slum Upgrading Programs: The Case of Bandung tainable Development, Working Paper 27879, Na- City, Indonesia, Remote Sensing 10 (2018) 1522. tional Bureau of Economic Research, 2020. doi:10. doi:10.3390/rs10101522 . 3386/w27879 . [15] B. Bell, R. Veeeraraghavan, Locating informal urban [4] U. N. Habitat, Tracking progress towards inclusive, settlements, in: AI for Social Good Workshop, 2020. safe, resilient and sustainable cities and human set- [16] P. Fränti, S. Sieranoja, How much can k-means be tlements (2018). improved by using better initialization and repeats?, [5] R. Mahabir, A. Croitoru, A. T. Crooks, P. Agouris, Pattern Recognition 93 (2019) 95–112. doi:10.1016/ A. Stefanidis, A Critical Review of High and Very j.patcog.2019.04.014 . High-Resolution Remote Sensing Approaches for [17] H. Taubenböck, M. Weigand, T. Esch, J. Staab, Detecting and Mapping Slums: Trends, Challenges M. Wurm, J. Mast, S. Dech, A new ranking of the and Emerging Opportunities, Urban Science 2 world’s largest cities—Do administrative units ob- (2018) 8. doi:10.3390/urbansci2010008 . scure morphological realities?, Remote Sensing [6] M. Kuffer, K. Pfeffer, R. Sliuzas, Slums from of Environment 232 (2019) 111353. doi:10.1016/j. Space—15 Years of Slum Mapping Using Remote rse.2019.111353 . Sensing, Remote Sensing 8 (2016) 455. doi:10.3390/ [18] S. Soman, A. Beukes, C. Nederhood, N. Marchio, rs8060455 . L. M. A. Bettencourt, Worldwide Detection of Infor- [7] N. Mboga, C. Persello, J. R. Bergado, A. Stein, De- mal Settlements via Topological Analysis of Crowd- tection of informal settlements from VHR satellite sourced Digital Maps, ISPRS International Jour- images using convolutional neural networks, in: nal of Geo-Information 9 (2020) 685. doi:10.3390/ 2017 IEEE International Geoscience and Remote ijgi9110685 . Sensing Symposium (IGARSS), 2017, pp. 5169–5172. [19] M. Buda, A. Maki, M. A. Mazurowski, A system- doi:10.1109/IGARSS.2017.8128166 . atic study of the class imbalance problem in con- [8] C. Persello, A. Stein, Deep Fully Convolutional Net- volutional neural networks, Neural Networks 106 works for the Detection of Informal Settlements in (2018) 249–259. doi:10.1016/j.neunet.2018.07. VHR Images, IEEE Geoscience and Remote Sens- 011 . arXiv:1710.05381 .