Pixel-based forest classification of Sentinel-2 images using automatically generated datasets Arminas Šidlauskas1, Andrius Kriščiūnas1 1 Kaunas University of Technology, K. Donelaičio str. 73, LT-44249 Kaunas, Lithuania Abstract Remote sensing tools are becoming popular in gathering information about forest area changes. The European Space Agency has launched multiple Sentinel satellites for land and marine monitoring. The Sentinel-2 (S2) satellite has great forest monitoring capabilities with its 13 high resolution bands. With the capabilities provided by this satellite, high accuracy pixel-based classification can be applied. In order to train a model that would be well suited to recognize forested areas from S2 images, a solid training dataset must be provided. In this study, two different information sources, Copernicus High Resolution Layers (HRL) and OpenStreetMap (OSM), were used to automatically create datasets. Models were trained and evaluated using the same artificial neural network architecture. After further analysis, it was noted that both OSM and HRL trained models yielded similar numerical evaluation results. Both models adjusted well to their data source classification and reached similar evaluation results of around 0.92 pixel accuracy. Upon further visual inspection, it was noted that OSM trained models created more false negative classifications identifying small forest patches and forest areas along rivers/lakes, HRL on the other hand created more false positives when identifying not only areas along rivers but rivers themselves as forest. All models failed to properly identify forest clearings in large forest areas, although HRL-trained models provided slightly better results. Keywords 1 Forest classification, Sentinel-2 imagery, fully convolutional network, copernicus high resolution layers, openstreetmap 1. Introduction delivery of satellite imagery, global and frequent coverage, good data accessibility for the general public, and a wide variety of observation methods Monitoring of forest areas is carried out on a (radar, spectral bands) [4]. continuous global and national scale. Field Sentinel-2 (S2) mission satellites provide 13 monitoring methods are not sufficient to monitor high resolution bands for land and sea monitoring. changes on a continuous basis, and there is a need These bands and their combinations have already to automate the process to achieve the highest been used in various ways to classify forests [5, 6, possible accuracy. The use of remote sensing tools 7, 8]. Reference [5] evaluates S2 capabilities to to monitor forest cover is increasing worldwide classify forest categories and European Forest [1]. One of the major drivers for frequent forest types in the Mediterranean area, [6] evaluates the monitoring is deforestation [2] and illegal logging performance of dense S2 time series in forest [3]. species mapping in a challenging mountainous The European Space Agency’s Sentinel environment, [7] investigates the use of multi- satellites are well suited for global forest temporal S2 data to identify tree species, [8] observation. The main advantages of Sentinel assesses the suitability of S2 data of typical land satellites in forest monitoring are the long-term cover classifications (crop and forest). Often these IVUS 2022: 27th International Conference on Information Technology EMAIL: sidlauskasarminas@gmail.com (A. Šidlauskas); andrius.krisciunas@ktu.lt (A. Kriščiūnas) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings classification tasks are completed using machine is to avoid introducing new forest types during learning. In the case of referenced studies, a training and evaluation. supervised random forest (RF) algorithm has been applied for pixel-based classification. 2.2. OpenStreetMap polygons To train a precise model, a good dataset is required. Failure to prepare a precise dataset can result in an inaccurate classification model. OpenStreetMap is a free editable geographic database of the world. During this research Studies often use national data provided by instance, data from the OSM database was taken forest/statistics agencies [5, 6, 7], which can then for the year 2020. The database contains polygons be further processed manually [6]. This data is of various areas – buildings, rivers, lakes, states, provided in polygon form, polygons are then used to classify a certain area as a forest or specific forests, etc. Forest polygons from the database can easily be converted to shapefile, geojson, or any forest species. Information about land use other geospatial vector file format type. This is classification can also be received from OSM [9]. administrative information, meaning that if the This information also includes forest polygons, database returns a polygon with a forest, it does similar to national data which is provided in polygon form. In other cases, Copernicus High not necessarily mean that there is a forest in that area, only that there should be a forest in that area. Resolution Layers are used [10], these layers, The opposite is true as well, small patches of which are provided in raster form, are then forests might not be marked with polygons, which processed to act as pixel-based masks. The model again introduces obscurity. Since OSM is massive accuracy in these papers varies from 83% to 95% when evaluating using pixel accuracy metrics. in its scope, it is obvious that small inaccuracies are to arise and data will take longer to be updated. Preparation of a precise dataset can take a long This becomes especially apparent with forest time if classification is done by hand or if clearings which are officially marked as forest institution data is used. The latter can have areas as shown in Figure 1. outdated/incomplete data which could severely restrict the ability to create a good dataset for certain areas. Additionally, different states may restrict access to this data. From this, the necessity of open access data, which could always be accessed and would be constantly updated, arises. In this work comparison of two open access data sources suitable for automatic ground-truth mask generation are investigated to evaluate their applicability to use directly for the selected Figure 1: Forest with clearing and mask machine learning model. Both HRL and OSM data sources are used as ground truths during generated from OSM polygons evaluation. Accuracy has been tested using Copernicus S2 True Color Images (TCI). These 2.3. Copernicus pan-European images were collected in the summertime. The pixel-based classification was applied to a fully The Copernicus pan-European HRL portfolio convolutional network (FCN) model with a provides detailed land monitoring information resnet50 architecture. including the HRL Forest layer. The approach to constructing the HRL Forest layer is based on a 2. Materials random forest classifier and is able to handle outliers to a certain context for forest 2.1. Study area classification problems achieving an accuracy of more than 98%. Unfortunately, implementation of Lithuania has been selected as the study area. such an approach requires intensive initial data The territory of Lithuania consists mostly of preparation from different sources including the flatlands with lakes, swamps, and forests. Sentinel missions and ancillary data sources like Dominant species – pine, spruce, and birch. land-parcel identification systems (LIPIS), OSM Lithuania covers an area of 65 300 km2. The main data, and other local data sources. Respectively reason for limiting the study area to one country validation steps require semi-automatic validation steps [11]. HRL data is provided only every three 2.5. Randomly generating points years, while the last available forest coverage data is from 2018. The data provided by HRL on forest One of the main advantages of automatically coverage can be received in raster files separated generating datasets is the ability to change the size by European countries. This information can be of the dataset easily. Additionally, you can select used to create forest/non-forest pixel-based masks specific areas of interest from which to generate for training datasets which may be stated as valid datasets. Within these areas, points can be information and used as ground truth labels during specified or they can be randomly selected. In the the periods of layer construction. Copernicus present case, points were generated randomly provides various three main forest layers – tree within the entire study area. Raster of the territory cover density (TCD), dominant leaf type (DLT), of Lithuania contains geocoordinates. Using these and forest type product (FTY). In this case, the coordinates, boundaries of latitude and longitude TCD layer will be used. can be extracted. These boundaries are then used to generate two random floating-point numbers, 2.4. Mosaic of the study area one for latitude, and the other for longitude. Two randomly generated numbers then make up a The mosaic of the study area is a single raster point. Then it can be calculated if the generated image merged from multiple S2 images after they point is within the study area polygon. After undergo preprocessing. Preprocessing includes generating the required number of random points cropping S2 images into small parts and merging inside the area of interest, these points can be used them. Although a single S2 image already takes to crop out fixed size images from the study area. up only a part of the study area, it may contain Using this method, a subset of random images can clouds. Areas of image that contain clouds are be created. The subsets are then used as the basis unusable, no forest can be classified over them. for new datasets. Selected points in the study area Hence, there is a need to “remove” these clouds are presented in Figure 3. from the study area. The removal method infers cropping a single S2 image into small parts, then using a cloud mask (provided by S2) we ignore images that contain clouds. If all images in a given area contain clouds, select the image with the lowest amount of cloudiness. In this study, mosaic was created from images from the 2018 summer period. This year's choice was motivated by the need to align it with the latest HRL data. Since S2 images are heavily Figure 3: Randomly generated points within the impacted by clouds and shadows of clouds, polygon of the study area. Image displays priority has been placed to month with the lowest 1600 generated points percentile of clouds in images. The month of June provided most images with a low distribution of 2.6. Generating datasets clouds; hence, the study area is comprised of images from June. To create a single raster of the In the scope of this paper, three random subsets study area 21 S2 images have been used. Created of points were generated consisting of 800, 1600, mosaic of the study area is provided in Figure 2. and 3200 points respectively. Then for each point, an image sized 200x200 pixels is generated. The S2 TCI images have 10m spatial resolution. From this, a single image forms a square with a single side of 2000 meters, the image’s area is 4km2. To create the datasets each subset of images is then duplicated, this is done so that mirrored datasets can be created, the only difference between these datasets is their masks. Every image in a dataset Figure 2: The study area of Lithuania. Image has a mask image. Mask images contain the comprised of S2 data classification of every pixel from the original image. From 800 randomly generated points, 2 datasets have been generated – 800 images and models against the same data, since during OSM masks and 800 images and HRL masks. training they have their validation subset. Finally, each dataset was split into 9/10 training images and 1/10 validation images. In Figure 4 2.8. Training model several examples are provided to explain the most noticeable differences among generated masks. In the first example, we can see that the OSM The main goal of this paper is to evaluate the differences between two pixel-based database provides a generalized forest area, which classification datasets. This means that during does not take into account any forest clearings, whereas HRL does. The second example shows training the same model has to be used with all that OSM fails to precisely identify forests along datasets. A fully convolutional network model has the river. The last example provides not a single been selected with resnet50 architecture. The larger forest area, but small patches of forests, and model distinguishes itself as fast, which is perfect again OSM is at a detriment, lacking a substantial when training multiple pixel-based classification number of polygons to identify small forest models on different datasets. patches. 3. Methods 3.1. Overview (1) To compare two different datasets and their precision, pixel-based classification will be performed. Figure 5 provides a general workflow. (2) (3) a) b) c) Figure 4: S2 image and generated forest (green) and non-forest (black) masks; a) True Color Image (TCI), 10m special resolution b) masks generated from OSM data c) masks generated from HRL data Figure 5: Workflow of automatically creating datasets and testing their performance 2.7. Evaluation dataset The workflow consists of: For evaluation, two new unique datasets are 1. Gather S2 images for the study area from generated, one based on HRL data and the other Copernicus Open Access Hub. on OSM data. These datasets were created using 2. Involves cloud removal and changing the the same principle as the training datasets. A coordinate system to WGS84. single dataset is made up of 200 images. After 3. Forming a cloudless single raster mosaic training all models will be additionally evaluated of the study area. using these datasets, which means that both HRL 4. Generate a specified number of random and OSM will be regarded as ground truth during images from the study area. evaluation. The evaluation datasets were created 5. Gather HRL images from Copernicus to introduce new images that have not been Land Monitoring Service. processed by models and test their accuracy. 6. Convert pan-European raster into pixel- Additionally, both datasets allow evaluating based forest/non-forest mask. 7. Generated a complete dataset from the Each model was trained for 1000 epochs with HRL raster and a list of random images. its own dataset. Validation masks were created 8. Gather forest polygons from the OSM from their own data source (OSM had its database. polygons, pan-European its raster). In Figure 6 we 9. Generate a geojson format file that can see that pixel accuracy is generally similar contains the required forest polygons. across all datasets. MIoU, however, does vary 10. Generate a complete dataset of OSM more with OpenStreetMap. The lower MIoU can polygons and a list of random images. be attributed to inaccuracies of OSM. Validation 11. Feed datasets to an FCN model. data from this dataset could contain forested areas 12. Get a trained FCN model. that are not marked as forest, thus impacting the 13. Generate a validation only dataset from a validation results. Increasing the size of the new list of random images in the study area training dataset also produced better overall and HRL raster. validation results during training. 14. Test trained FCN model accuracy against validation dataset. 15. Check the evaluation results. 800 3.2. Calculating accuracy Accuracy during training and evaluation will be calculated using pixel accuracy and mean intersection over union (MIoU). Although pixel accuracy is a more common accuracy metric, it 1600 suffers when predicted images have a class imbalance. For example, an image consists of 100 pixels, 90 of which are non-forest pixels, and the rest are forest pixels. Then a trained model predicts that 100 pixels (the entire image) are non- forest. Pixel accuracy will be 90%. However, if 3200 we take intersection over union of forest and non- forest, we will have 0% and 90% accuracy, respectively. Then, if we calculate the mean of both classes, prediction accuracy drops to 45%. In a) b) this instance, mean intersection over union is a Figure 6: Validation results with different more accurate metric since datasets have images datasets a) using pixel accuracy b) using MIoU generated randomly, which can lead to a severe class imbalance in a single image. Both accuracy 4.2. Evaluation results metrics are provided in the scope of this research. Pixel accuracy equation: 𝑇𝑃 All trained models have been evaluated using (1) two different datasets, one based on HRL data, 𝑎= 𝑇𝑃 + 𝐹𝑃 and the other on OSM data. Table 1 provides where TP – true positive pixels, FP – false evaluation results from the HRL based testing positive pixels. dataset, whereas Table 2 provides evaluation Mean intersection over union equation: 𝑃∩𝐴 results from the OSM based dataset. Based on the (2) 𝐼𝑜𝑈 = results, it can be seen that both models have 𝑃∪𝐴 adjusted well to their training datasets. When 𝐼𝑜𝑈𝑓𝑜𝑟𝑒𝑠𝑡 + 𝐼𝑜𝑈𝑛𝑜𝑛−𝑓𝑜𝑟𝑒𝑠𝑡 (3) evaluating HRL trained models with a newly 𝑚𝐼𝑜𝑈 = created HRL evaluation dataset it performs better 2 where P – predicted pixels, A – actual pixels. than the dataset trained with OSM data. However, when evaluation is done the other way around, OSM trained datasets to perform better. 4. Results Additionally, it can be noted that models with 4.1. Training results larger training dataset sizes had slightly better accuracy, especially when validating against a dataset from the same source. Both models reach completely ignore forest areas around rivers, a similar accuracy ceiling of ~0.92 pixel accuracy while HRL trained models have a recurring issue and ~0.84 MIoU, when tested against their of often identifying river itself as a forest. The last relative evaluation dataset. Based on these example shows how OSM has an issue with evaluation results, it cannot be stated that either recognizing small forest patches. This is probably HRL or OSM prove to be better sources for the most noticeable difference of all. On the other ground truth. Direct comparison of these results hand, pan-European is very good at identifying cannot be conducted with referenced papers, these patches, however it can at times identify because different data is regarded as ground truth. larger areas that are no longer outside the bounds of small forest patches. Table 1 HRL based evaluation results Data Dataset Pixel MIoU 1) source size accuracy HRL 800 0.891 0.798 HRL 1600 0.907 0.828 HRL 3200 0.917 0.843 OSM 800 0.844 0.717 2) OSM 1600 0.859 0.742 OSM 3200 0.854 0.734 Table 2 OSM based evaluation results 3) Data Dataset Pixel MIoU source size accuracy a) b) c) HRL 800 0.872 0.758 Figure 7: Examples of trained model classification HRL 1600 0.861 0.741 a) S2 images, 10m spatial resolution b) HRL 3200 0.859 0.738 classification, model trained with OSM data c) OSM 800 0.902 0.802 classification, model trained with HRL data OSM 1600 0.910 0.819 OSM 3200 0.921 0.838 5. Conclusion 4.3. Noticeable differences Six pixel-based forest/non-forest classification datasets were generated, three based on OSM Although the evaluation results are very data, and another three on HRL data, in order to similar, certain differences can be identified by evaluate the applicability of using open access visually inspecting how models predict more edge data for dataset generation. All datasets were used cases. In Figure 7 we can see how the trained to train a model that represents them. After models compare. The first example shows that the training they were evaluated using additional OSM trained model ignores forest clearings while evaluation datasets. Evaluation showed that both the HRL trained model recognizes clearings, data sources yielded similar numerical accuracy albeit not very precisely. Both models still results. Both data sources provided accurate data, suffered heavy inaccuracies when they had to that allowed models to reach ~0.92 pixel accuracy recognize forest clearings in large forested areas. and ~0.84 MIoU, when evaluating with datasets Models would simply opt out to mark the entire from relative data source. During the evaluation, area as forest and ignore clearings. The second it was also noted that increasing the training example provides evidence of pan-European dataset size increased the accuracy of the relative being better at recognizing forest areas along dataset evaluation. After additional visual rivers. Since OSM rarely provides forest polygons inspection of edge cases, it was noted that models for areas along rivers and lakes, HRL trained trained with OSM datasets tend to create a false models become better at recognizing them. When negative classification of forest areas along rivers it comes to small rivers, OSM models tend to and small forest patches scattered in an area. Models that were trained using HRL datasets were [4] Z. Malenovský et al., “Sentinels for science: better at classifying forest clearings, forest areas Potential of Sentinel-1, -2, and -3 missions along rivers and small forest patches scattered in for scientific observations of ocean, an area. However, HRL trained models could cryosphere, and land,” Remote Sensing of provide false positive classification, identifying Environment, vol. 120, pp. 91–101, May parts of the river as forest. Numerical differences 2012, doi: 10.1016/j.rse.2011.09.026. between these two data sources proved to be [5] N. Puletti, F. Chianucci, and C. Castaldi, negligible, one data source cannot be regarded as “Use of Sentinel-2 for forest classification in worse than the other. Although HRL data is Mediterranean environments,” Annals of produced only once every three years, visual Silvicultural Research, vol. 42, no. 1, pp. inspections of generated dataset masks and 32–38, 2018, doi: 10.12899/ASR-1463. trained model classified masks prove that it is [6] E. Grabska, P. Hostert, D. Pflugmacher, and better at detecting fine details in remote sensing K. Ostapowicz, “Forest stand species images. Taking this into account, a pixel-based mapping using the sentinel-2 time series,” classification model can be trained using 2018 Remote Sensing, vol. 11, no. 10, May 2019, data, which can then be used to classify newer or doi: 10.3390/rs11101197. older remote sensing data by year, which is [7] M. Persson, E. Lindberg, and H. Reese, especially important in HRL dataset case which is “Tree species classification with multi- expensive to prepare and are provided only once temporal Sentinel-2 data,” Remote Sensing, every three years. vol. 10, no. 11, Nov. 2018, doi: 10.3390/rs10111794. 6. Data availability statement [8] M. Immitzer, F. Vuolo, and C. Atzberger, “First experience with Sentinel-2 data for crop and tree species classifications in Datasets that were generated during this central Europe,” Remote Sensing, vol. 8, no. research, both training and evaluation, together 3, 2016, doi: 10.3390/rs8030166. with complete study area and HRL raster of the [9] J. Estima and M. Painho, “Exploratory study area can be found at analysis of OpenStreetMap for land use https://zenodo.org/record/6548615 (accessed on classification,” in GEOCROWD 2013 - 20 May 2022). OSM data can be found at Proceedings of the 2nd ACM SIGSPATIAL https://planet.openstreetmap.org (accessed on 20 International Workshop on Crowdsourced May 2022). and Volunteered Geographic Information, 2013, pp. 39–46. doi: 7. References 10.1145/2534732.2534734. [10] A. Dostálová, M. Lang, J. Ivanovs, L. T. [1] M. K. Nesha et al., “An assessment of data Waser, and W. Wagner, “European wide sources, data quality and changes in national forest classification based on sentinel-1 forest monitoring capacities in the Global data,” Remote Sensing, vol. 13, no. 3, pp. 1– Forest Resources Assessment 2005-2020,” 27, Feb. 2021, doi: 10.3390/rs13030337. Environmental Research Letters, vol. 16, [11] European Environment Agency (EEA), no. 5. IOP Publishing Ltd, May 01, 2021. “Copernicus Land Monitoring Service User doi: 10.1088/1748-9326/abd81b. Manual Consortium Partners,” 2018. [2] M. A. Zambrano-Monserrate, C. Carvajal- [Online]. Available: Lara, R. Urgilés-Sanchez, and M. A. Ruano, https://land.copernicus.eu/ “Deforestation as an indicator of environmental degradation: Analysis of five European countries,” Ecological Indicators, vol. 90, pp. 1–8, Jul. 2018, doi: 10.1016/j.ecolind.2018.02.049. [3] S. T. Thompson and W. B. Magrath, “Preventing illegal logging,” Forest Policy and Economics, vol. 128. Elsevier B.V., Jul. 01, 2021. doi: 10.1016/j.forpol.2021.102479.