1. Introduction

L. Hebryn-Baidy);

Machine Learning Algorithms Evaluated for Urban Land Use and Land Cover Classification Using Sentinel 2 Data

Liliia Hebryn-Baidy

Gareth Rees

0 Scott Polar Research Institute, University of Cambridge , Lensfield Road, Cambridge, CB2 1ER , United Kingdom

000 0 0001

Machine learning algorithms (MLAs) are used to solve a variety of problems that arise when processing satellite images obtained using remote sensing techniques. This emphasizes the difficulty of choosing the most appropriate MLA for land use and land cover (LULC) classification, especially when dealing with multifactorial urban areas. Therefore, the goal of this study was to evaluate the effectiveness of different MLAs in improving the accuracy of land cover classification. This was achieved by studying the performance of several algorithms, namely: Random Forest (RF), Classification and Regression Trees (CART), Gradient Tree (GTB), Naive Bayes (NB), and K-nearest Neighbors (KNN). The study used Sentinel 2 satellite images, which are characterized by high spatial resolution. The Google Earth Engine (GEE) was used for pre-processing, training samples and algorithm training, as well as for generating a validation sample. Subsequently, the thematic accuracy of the algorithms was evaluated and compared. The findings indicate that the RF algorithm achieves the highest accuracy, with an overall accuracy of 94%. Although CART, GTB, and KNN also exhibited commendable performance with accuracies exceeding 90%. The MLA excels in classifying bare land (CA 94%, PA 97%) and performs well in identifying water bodies (CA 97%, PA 88%) and urban zones (CA 95%, PA 93%). It faces challenges with forest areas (CA 76%, PA 94%), which are often confused with other classes, and it struggles with vegetation (CA 88%, PA 73%), leading to a higher misclassification rate for this category. NB demonstrated relatively lower accuracy by 77%. This study conclusively identifies RF as the superior choice for achieving optimal land cover classification in particular for urban surface.

Machine learning algorithm supervised classification Sentinel 2 data processing data analysis

1. Introduction

Satellite imagery, especially the extensive datasets from Sentinel and Landsat missions, ha s dramatically changed how we observe and analyze the Earth's surface. The high-resolution satellite data provided is crucial for monitoring environmental and land cover changes both globally and locally [[ 1 ],[ 2 ],[ 2 ]]. The adoption of open data policies by entities such as the United States Geological Survey (USGS) and the European Space Agency (ESA), in conjunction with the utilization of tools such as GEE, has significantly enhanced the ease of data acquisition and rendered sophisticated analyses of LULC more attainable [[ 8 ],[ 9 ],[ 11 ]]. However, the effectiveness of these analyses heavily relies on the selection of appropriate classification algorithms, which range from basic unsupervised methods to sophisticated machine learning techniques [[ 13 ],[ 14 ],[25]].

Each algorithm has its strengths and weaknesses and has been extensively tested across various landscapes and conditions, leading to diverse outcomes and discussions regarding their comparative performance [[ 11 ],[18],[24]]. Numerous research endeavors focusing on LULC classification have employed MLAs, with a considerable body of literature identifying the RF algorithm as superior in terms of OA and the Kappa coefficient [[ 1 ],[ 2 ],[ 5 ],[19]]. The investigations also unveil variances in algorithm sensitivities in RF, CART, and GTB which demonstrate heightened sensitivity towards agricultural land identification, whereas NB is notably more effective in forest cover classification [[ 2 ],[20]]. Further corroborating these findings, research [[ 8 ],[ 9 ]] validates the adeptness of RF and GTB in wetlands classification. Moreover, a study [[ 9 ],[ 11 ]] illustrates that RF and SVM exhibit minimal sensitivity to training sample sizes, unlike k-NN. Exploring urban areas, research employing band combinations and various vegetation indexes [[16]] asserts RF superior classification accuracy in urbanized regions. Interestingly, SVM and NB efficacy in dense urban locales diverges from that of CART and KNN, underscoring the significant influence of classifier parameters and sample size on accuracy [[17]]. Additionally, NB and KNN performance was found to be highly contingent on sample sizes. Echoing these observations, a subsequent study [[23]] determines RF and SVM as the paramount classifiers for settlements and vegetation, respectively, with RF and CART excelling in bare land classification, and SVM alongside GTB being optimal for water bodies and GTB for forests. And this affirming the critical role of choosing appropriate machine learning algorithms for specific LULC classifications.

Building on these findings, this study focuses on urban LULC classification within Kharkiv, assessing the effectiveness of MLA such as RF, CART, GTB, NB, and KNN using Sentinel imagery. The primary objective is to compare these MLAs to identify the most efficient classifier, thereby making a substantial contribution to the field of land cover assessment in urban settings. This research not only underscores the critical role of selecting appropriate MLAs for accurate LULC classification but also seeks to advance our understanding of their application in urban analysis.

2. Materials and Methods 2.1. Data

The Sentinel-2A and Sentinel-2B satellites are multispectral optical imaging systems. Launched in June 2015 and March 2017, respectively, these satellites are operated by ESA as part of the land monitoring component of the Copernicus Program, the European Union's Earth observation initiative. The primary aim of Sentinel-2 is to provide continuous access to high-resolution satellite imagery at no cost for various applications, offering comprehensive coverage with a swath width of 290 km and an enhanced revisit capability of every 5 days. The Sentinel-2 satellites are equipped with 13 highresolution spectral bands: three in the visible spectrum (B2, B3, and B4) and one near-infrared (B8) band, all with a spatial resolution of 10 meters, intended for primary land-cover classification tasks. Additionally, vegetation red-edge bands (B5, B6, B7, B8A) with a 20-meter resolution to advanced land-cover classification. Furthermore, several short-wave infrared (SWIR) bands, featuring a 60meter resolution, are primarily used for atmospheric corrections and cirrus cloud detection [[27]]. The main characteristics of the satellite system are shown in Table 1.

Red Edge 4

SWIR 1 SWIR 2

For the classification process utilizing selected MLA, a cloudless image from June 20, 2022, of the Sentinel-2 collection, expressed in spectral reflectance units, was employed. The identifier ID: COPERNICUS/S2_SR/20220620T083611_20220620T084448_T36UYA indicates the capture time between 08:36:11 and 08:44:48 UTC. This image, precisely located in the region of Kharkiv via the Military Grid Reference System code T36UYA, facilitates the identification of its geographical area. Metadata analysis revealed cloud cover metrics: CLOUDY_PIXEL_OVER_LAND_PERCENTAGE at 0.000572 and CLOUDY_PIXEL_PERCENTAGE at 0.00076. 2.2.

Land cover classification.

The training and validation samples were gathered using a manual interpretation of the original Sentinel 2 data as well as high-resolution imagery from Google Earth. The number of training and validation samples per class is shown in Table 2. hydrographic features 250 75

Adhering to established guidelines, a minimum of 50 training samples per class was generated [[ 1 ],[ 7 ],[ 9 ]]. The allocation of these samples was proportionate to the prevalence of each LULC class within the study area, with a deliberate and even distribution across the territory of Kharki v. To evaluate the accuracy of the resultant LULC maps, a comprehensive accuracy assessment was conducted. This involved the compilation of a confusion matrix, from which key descriptive statistics were calculated to assess classification efficacy. These statistics included OA, PA, UA, and Kappa, providing a robust measure of the classification's reliability and precision [[22]].

According to Figure 2 it is displayed reflectance for water, bare land, forest, vegetation, and urban areas across various spectral bands. This helps visualize how each type of surface uniquely reflects light across different bands, thus showing that each land cover class has its own brightness characteristics.

2.2.1. Machine Learning Algorithms for Classification Random Forest Classifier.

The RF algorithm is a widely used ML method for land cover classification based on satellite imagery. The effectiveness of RF hinges on the size of the training dataset and the quantity of trees generated. The RF creates various decision trees by randomly selecting subsets of variables and data for training. Its performance is gauged using out-of-bag samples to ensure a robust evaluation. The optimal tree count varies from 100 to 500, with the number of variables sampled for each tree being a function of the total number of variables' square root [[ 4 ],[ 5 ]]. In our study, we employed the ee.Classifier.smileRandomForest method from the GEE platform, with 150 numbers of trees.

Classification and Regression Tree.

Research has demonstrated that the CART technique and its associated software are capable of handling large datasets. Utilizing a decision tree, which is a widely recognized decision support tool in machine learning, the CART classifier segregates nodes into sub nodes using a threshold value. This process continues until terminal nodes are reached. CART categorizes the input data into various group sets and constructs trees using all these sets. The robustness of this algorithm is bolstered by t he sample size used within each group [[ 6 ],[ 12 ]]. We used ee.Classifier.smileCart method from the GEE platform.

Naive Bayes.

NB classifiers are based on the Bayesian probability theorem and are known for their simplicity and efficiency, especially in classification tasks involving large datasets. The 'lambda' parameter in this method allows for tuning the classifier's smoothing parameter, which is crucial for handling features that may not be present in the training set but appear in the testing set [[27]]. In our study, we employed the ee.Classifier.smileNaiveBayes method from the GEE platform.

Gradient Tree Boosting.

GBT method involves creating multiple trees in a sequential manner where each subsequent tree attempts to correct the errors of the previous ones. The key parameters that control the behavior of Gradient Tree Boosting are: number of trees, shrinkage is known as the learning rate, this parameter scales the contribution of each tree, sampling rate and max nodes [[ 7 ]]. The loss function 'deviance' is used, which is suitable for classification problems as it aims to improve the model's predictive accuracy. Setting these parameters carefully help optimize the performance of the GTB model, balancing the trade-off between model complexity and generalization ability [[ 9 ]]. We employed ee.Classifier.smileGradientTreeBoost with number of trees = 100, shrinkage = 0.1, sampling Rate = 0.8, max nodes = 20, var loss = 'deviance' and seed = 123.

K-nearest neighbours.

KNN method is used for classifying objects based on the majority vote of their neighbors, with the object being assigned to the class most common among its k nearest neighbors. "Nearest" is determined using a distance metric, such as Euclidean distance [[ 9 ]]. To create a k-NN classifier on the GEE platform, the ee.Classifier.smileKNN where k = 5 the number of nearest neighbors considered in the classification, search Method = COVER_TREE which is efficient when work ing with large datasets, metric = EUCLIDEAN the distance metric used to determine the "closeness" of neighbors. Euclidean distance reflects the direct distance between points in space. These parameters have helped to make the KNN classifier optimal performance with our dataset, ensuring efficient and accurate classification.

3. Results and discussion.

Following the classification of the Kharkiv city territory using various MLAs, we have developed visualizations to depict the efficacy of these methodologies, as illustrated in Figure 3. These maps delineate the primary surface classes that were identified. The outcomes indicate that specific MLAs, such as RF, CART, and GTB, achieved the highest precision, with the corresponding accuracies detailed in the provided Table 3. Additionally, the visualizations facilitate a clear comparison of the classification results for each class and allow for the evaluation of each algorithm's precision. Notably, although the KNN algorithm exhibited relatively high accuracy, it predominantly misclassified urban areas as bare land and vegetation. Conversely, the NB algorithm demonstrated inferior performance, both in terms of accuracy and visualization, often misidentifying urban regions as water bodies.

Upon analyzing the classification results in terms of the average CA and PA per class, it is evident that the algorithms RF, CART, GTB, and KNN excel in classifying bare land, achieving a CA of 94% and a PA of 97%. They also perform well in recognizing water bodies, with a CA of 97% and a PA of 88%, and urban areas, with a CA of 95% and a PA of 93%. However, challenges arise in the identification of forests, which have a CA of 76% and a PA of 94%, often leading to confusion with other classes. In this study, such confusion predominantly occurred with the vegetation class, which has a CA of 88% and a PA of 73%, resulting in a higher rate of misclassification for this category. The use of NB algorithm highlights significant confusion in distinguishing between forest CA 59%, PA 79% and vegetation CA 55%, PA 44% classes. All algorithms to some extent mistakenly classify water pixels as forest or urban, while urban pixels are slightly misclassified as bare land. The confusion matrix is shown in the Table 4.

Upon magnifying the scale of the obtained classification maps, specific pixels that were misclassified become distinctly observable and it is shown in Figure 4. It is apparent that the classifications by RF, CART, and GTB exhibit a similarity in results with high precision, which was comparable to Google Earth imagery. Regarding KNN algorithm, it is notably more sensitive in classifying urban areas, particularly those with sparse construction where vegetation is predominant. Evaluating the classification map generated by the NB algorithm, there is a clear identification of pixels that were incorrectly classified, for example, urban territories misclassified as wate r or vegetation. It is plausible that in some instances, this algorithm's performance was affected by pixels corresponding to shadows cast by buildings.

The main recommendations based on our results for reduce the misclassification rate, it is advisable to further refine the classification algorithms, especially in distinguishing between urban areas and natural features like water and vegetation [[ 2 ],[ 8 ]]. This could involve adjusting the parameters or incorporating more sophisticated feature extraction techniques. Moreover, implementing advanced preprocessing techniques, such as shadow correction and spectral unmixing, could mitigate the impact of shadows and mixed pixels, particularly in urban areas [[ 9 ],[ 11 ],[24]]. This may improve the OA of classification, especially for algorithms like NB. Additionally, incorporating supplementary data layers, such as vegetation indices, could enhance the classification accuracy by providing additional context that helps differentiate between classes [[ 2 ],[ 14 ],[16],[18]]. For algorithms as a KNN, which show higher sensitivity in specific contexts, adjusting the sensitivity settings or employing contextual filters could optimize performance, particularly in classifying urban areas with varying degrees of development.

4. Conclusion

This study has meticulously evaluated the efficacy of various MLA in addressing the complexities of LULC classification, with a special focus on urban environments. Utilizing data from Sentinel 2, acquired via GEE, it has been meticulously compared the performance of RF, CART, GTB, NB, and KNN in classifying different land covers. It is highlighted the RF algorithm's superior accuracy, achieving an impressive OA of 94%. Similarly, CART, GTB, and KNN also demonstrated significant efficacy, with accuracies surpassing 90%, underscoring the potential of MLAs in high-precision land classification tasks in complex urban environments. The meticulous assessment of thematic accuracy, including CA and PA, provides a granular understanding of each MLA's performance, revealing their capabilities in identifying specific land covers while highlighting areas of confusion, such as between forests and other classes or the misclassification of urban areas as bare land and vegetation. Furthermore, the study's findings emphasize the critical need for algorithm refinement and the integration of sophisticated preprocessing techniques to enhance classification accuracy, especially in complex urban landscapes.

Acknowledgements

The author Lillia Hebryn-Baidy express gratitude to the British Academy and the Council for At-Risk Academics for support to this research through the Researchers at Risk Research Support Grants. Sincere gratitude to the Department of Geography at the University of Cambridge and the Scott Polar Research Institute for their support throughout the research process.

Ghayour, L., Neshat, A., Paryani, S., Shahabi, H., Shirzadi, A., Chen, W., Al-Ansari, N., Geertsema, M., Pourmehdi Amiri, M., Gholamnia, M., et al. Performance Evaluation of Sentinel-2 and Landsat 8 OLI Data for Land Cover/Use Classification Using a Comparison between Machine Learning Algorithms. Remote Sens., 13, 1349, 2021. https://doi.org/10.3390/rs13071349.

Belenok, V., Hebryn-Baidy, L., Bielousova, N., Gladilin, V., Kryachok, S., Tereshchenko, A., Alpert, S., Bodnar, S. Machine learning based combinatorial analysis for land use and land cover assessment in Kyiv City (Ukraine). J. Appl. Remote Sens., 17 (1). 2023. https://doi.org/10.1117/1.JRS.17.014506.

Qian, Y., Zhou, W., Yan, J., Li, W., Han, L. Comparing Machine Learning Classifiers for Object-Based Land Cover Classification Using Very High Resolution Imagery. Remote Sens., 7, 153-168, 2015. https://doi.org/10.3390/rs70100153.

Abdi, M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data, GIScience & Remote Sensing, 57(1), 1-20, 2020. https://doi.org/10.1080/15481603.2019.1650447.

Nery, R. Sadler, M. Solis-Aulestia, B. White, M. Polyakov and M. Chalak. Comparing supervised algorithms in Land Use and Land Cover classification of a Landsat time-series. 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 2016, pp. 5165-5168. https://doi.org/10.1109/IGARSS.2016.7730346.

Delalay, M., Tiwari, V., Ziegler, A., Gopal, V., Passy, P. Land-use and land-cover classification using Sentinel-2 data and machine-learning algorithms: operational method and its implementation for a mountainous area of Nepal. Journal of Applied Remote Sensing 13(1), 014530, 2019. https://doi.org/10.1117/1.JRS.13.014530.

McCarty, D.A., Kim, H.W., Lee, H.K. Evaluation of Light Gradient Boosted Machine Learning Technique in Large Scale Land Use and Land Cover Classification. Environments 7, 84, 2020. https://doi.org/10.3390/environments7100084.

Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. https://doi.org/10.1016/00344257(91)90048-B.

Ouma, Y., Nkwae, B., Moalafhi, D., Odirile, P., Parida, B., Anderson, G., and Qi, J.: Comparison of Machine learning classifiers for multitemporal and multisensor mapping of urban LULC features. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B3-2022, 681–689. https://doi.org/10.5194/isprs-archives-XLIII-B3-2022-681-2022.

Shih, H.C.; Stow, D.A.; Tsai, Y.H. Guidance on and comparison of machine learning classifiers for Landsat-based land cover and land use mapping. Int. J. Remote Sens. 40, 1248– 1274, 2019. https://doi.org/10.1080/01431161.2018.1524179.

Camargo, F.F., Sano, E.E., Almeida, C.M., Mura, J.C., Almeida, T. A Comparative Assessment of Machine-Learning Techniques for Land Use and Land Cover Classification of the Brazilian Tropical Savanna Using ALOS-2/PALSAR-2 Polarimetric Images. Remote Sens. 11, 1600, 2019. https://doi.org/10.3390/rs11131600.

Jamali, A. Evaluation and comparison of eight machine learning models in land use/land cover mapping using Landsat 8 OLI: A case study of the northern region of Iran. SN Appl. Sci. 1, 1448, 2019. https://doi.org/10.1007/s42452-019-1527-8.

Revel, C., Lonjou, V., Marcq, S., Desjardins, C., Fougnie, B., Luche, C., Guilleminot, N., Lacamp, A., Lourme, E., Miquel, C., Lenot, X. Sentinel-2A and 2B absolute calibration monitoring, European Journal of Remote Sensing, 52:1, 122-137, 2019. https://doi.org/10.1080/22797254.2018.1562311.

Domingos, P., and Pazzani, M. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–130, 1997.

[1] Oo , T.K. , Arunrat , N. , Sereenonchai , S. , Ussawarujikulchai , A. , Chareonwong , U. , Nutmagul , W. Comparing Four Machine Learning Algorithms for Land Cover Classification in Gold Mining: A Case Study of Kyaukpahto Gold Mine, Northern Myanmar . Sustainability, 14 , 10754 , 2022 . https://doi.org/10.3390/su141710754.

[2] Loukika , K.N. , Keesara , V.R. , Sridhar , V. Analysis of Land Use and Land Cover Using Machine Learning Algorithms on Google Earth Engine for Munneru River Basin , India. Sustainability 13 , 13758 , 2021 . https://doi.org/10.3390/su132413758.

[3] Patil , A. , Panhalkar , S. A comparative analysis of machine learning algorithms for land use and land cover classification using google earth engine platform . Journal of Geomatics , 17 ( 2 ), 2023 . https://doi.org/10.58825/jog. 2023 . 17 .2.96.

[4] Breiman , L. ( 2001 ). Random Forests. Machine Learning , 45 , 5 - 32 . https://doi.org/10.1023/A: 1010933404324 .

[5] Pal , M. Random forest classifier for remote sensing classification . Int. J. Remote Sens . 26 , 217 - 222 , 2005 . https://doi.org/10.1080/01431160412331269698.

[6] Breiman , L. , Friedman , J. H. , Olshen , R. A. , & Stone , C. J. ( 1984 ). Classification and Regression Trees (1st ed .). London, UK: Routledge.

[7] Friedman , J. H. ( 1999 ). Greedy Function Approximation: A Gradient Boosting Machine . Annals of Statistics , 1189 - 1232 .

[8] Eric Ariel L. Salas , Kumaran, S. , Bennett , R. , Willis , L., Mitchell, K. Machine LearningBased Classification of Small-Sized Wetlands Using Sentinel-2 Images [J] . AIMS Geosciences , 10 ( 1 ), 62 - 79 , 2024 . https://doi:10.3934/geosci.2024005.

[9] Friedman , J.H. , 2002 : Stochastic gradient boosting . Computational Statistics & Data Analysis 38 ( 4 ), 367 - 378 .

[10]

Thanh

Noi , P. , Kappas , M. Comparison of Random Forest, k-Nearest Neighbor, and Support Vector Machine Classifiers for Land Cover Classification Using Sentinel-2 Imagery . Sensors, 18 ( 18 ), 2018 . https://doi.org/10.3390/s18010018.

[11] Li , C. , Wang , J. , Wang , L. , Hu , L. , Gong , P. Comparison of Classification Algorithms and Training Sample Sizes in Urban Land Classification with Landsat Thematic Mapper Imagery . Remote Sens., 6 , 964 - 983 , 2014 . https://doi.org/10.3390/rs6020964.

[12] Friedl , M.A. ; Brodley , C.E. Decision tree classification of land cover from remotely sensed data . Remote. Sens. Environ. , 61 , 399 - 409 , 1997 . https://doi.org/10.1016/S0034- 4257 ( 97 ) 00049 - 7 .

[13] Pacheco , A.d.P. , Junior , J.A.d.S. , Ruiz-Armenteros , A.M. , Henriques , R.F.F. Assessment of k-Nearest Neighbor and Random Forest Classifiers for Mapping Forest Fire Areas in Central Portugal Using Landsat-8 , Sentinel- 2 ,

and Terra

Imagery . Remote Sens . 13 , 1345 , 2021 . https://doi.org/10.3390/rs13071345.

[14] Talukdar , S. , Singha , P. , Mahato , S. , Shahfahad , Pal, S. , Liou , Y.-A. , Rahman , A. Land-Use Land -Cover Classification by Machine Learning Classifiers for Satellite Observations-A Review. Remote Sens ., 12 , 1135 , 2020 . https://doi.org/10.3390/rs12071135.