Pixel-based forest classification of Sentinel-2 images using
                         automatically generated datasets
                         Arminas Šidlauskas1, Andrius Kriščiūnas1
                         1
                                Kaunas University of Technology, K. Donelaičio str. 73, LT-44249 Kaunas, Lithuania


                                              Abstract
                                              Remote sensing tools are becoming popular in gathering information about forest area changes.
                                              The European Space Agency has launched multiple Sentinel satellites for land and marine
                                              monitoring. The Sentinel-2 (S2) satellite has great forest monitoring capabilities with its 13
                                              high resolution bands. With the capabilities provided by this satellite, high accuracy pixel-based
                                              classification can be applied. In order to train a model that would be well suited to recognize
                                              forested areas from S2 images, a solid training dataset must be provided. In this study, two
                                              different information sources, Copernicus High Resolution Layers (HRL) and OpenStreetMap
                                              (OSM), were used to automatically create datasets. Models were trained and evaluated using
                                              the same artificial neural network architecture. After further analysis, it was noted that both
                                              OSM and HRL trained models yielded similar numerical evaluation results. Both models
                                              adjusted well to their data source classification and reached similar evaluation results of around
                                              0.92 pixel accuracy. Upon further visual inspection, it was noted that OSM trained models
                                              created more false negative classifications identifying small forest patches and forest areas
                                              along rivers/lakes, HRL on the other hand created more false positives when identifying not
                                              only areas along rivers but rivers themselves as forest. All models failed to properly identify
                                              forest clearings in large forest areas, although HRL-trained models provided slightly better
                                              results.

                                              Keywords 1
                                              Forest classification, Sentinel-2 imagery, fully convolutional network, copernicus high
                                              resolution layers, openstreetmap


                         1. Introduction                                                                                  delivery of satellite imagery, global and frequent
                                                                                                                          coverage, good data accessibility for the general
                                                                                                                          public, and a wide variety of observation methods
                            Monitoring of forest areas is carried out on a
                                                                                                                          (radar, spectral bands) [4].
                         continuous global and national scale. Field
                                                                                                                             Sentinel-2 (S2) mission satellites provide 13
                         monitoring methods are not sufficient to monitor
                                                                                                                          high resolution bands for land and sea monitoring.
                         changes on a continuous basis, and there is a need
                                                                                                                          These bands and their combinations have already
                         to automate the process to achieve the highest
                                                                                                                          been used in various ways to classify forests [5, 6,
                         possible accuracy. The use of remote sensing tools
                                                                                                                          7, 8]. Reference [5] evaluates S2 capabilities to
                         to monitor forest cover is increasing worldwide
                                                                                                                          classify forest categories and European Forest
                         [1]. One of the major drivers for frequent forest
                                                                                                                          types in the Mediterranean area, [6] evaluates the
                         monitoring is deforestation [2] and illegal logging
                                                                                                                          performance of dense S2 time series in forest
                         [3].
                                                                                                                          species mapping in a challenging mountainous
                            The European Space Agency’s Sentinel
                                                                                                                          environment, [7] investigates the use of multi-
                         satellites are well suited for global forest
                                                                                                                          temporal S2 data to identify tree species, [8]
                         observation. The main advantages of Sentinel
                                                                                                                          assesses the suitability of S2 data of typical land
                         satellites in forest monitoring are the long-term
                                                                                                                          cover classifications (crop and forest). Often these

                         IVUS 2022: 27th International Conference on Information Technology
                         EMAIL: sidlauskasarminas@gmail.com (A. Šidlauskas);
                         andrius.krisciunas@ktu.lt (A. Kriščiūnas)
                                          ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative
                                          Commons License Attribution 4.0 International (CC BY 4.0).

                                          CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
classification tasks are completed using machine         is to avoid introducing new forest types during
learning. In the case of referenced studies, a           training and evaluation.
supervised random forest (RF) algorithm has been
applied for pixel-based classification.                  2.2.    OpenStreetMap polygons
    To train a precise model, a good dataset is
required. Failure to prepare a precise dataset can
result in an inaccurate classification model.                OpenStreetMap is a free editable geographic
                                                         database of the world. During this research
Studies often use national data provided by
                                                         instance, data from the OSM database was taken
forest/statistics agencies [5, 6, 7], which can then
                                                         for the year 2020. The database contains polygons
be further processed manually [6]. This data is
                                                         of various areas – buildings, rivers, lakes, states,
provided in polygon form, polygons are then used
to classify a certain area as a forest or specific       forests, etc. Forest polygons from the database can
                                                         easily be converted to shapefile, geojson, or any
forest species. Information about land use
                                                         other geospatial vector file format type. This is
classification can also be received from OSM [9].
                                                         administrative information, meaning that if the
This information also includes forest polygons,
                                                         database returns a polygon with a forest, it does
similar to national data which is provided in
polygon form. In other cases, Copernicus High            not necessarily mean that there is a forest in that
                                                         area, only that there should be a forest in that area.
Resolution Layers are used [10], these layers,
                                                         The opposite is true as well, small patches of
which are provided in raster form, are then
                                                         forests might not be marked with polygons, which
processed to act as pixel-based masks. The model
                                                         again introduces obscurity. Since OSM is massive
accuracy in these papers varies from 83% to 95%
when evaluating using pixel accuracy metrics.            in its scope, it is obvious that small inaccuracies
                                                         are to arise and data will take longer to be updated.
    Preparation of a precise dataset can take a long
                                                         This becomes especially apparent with forest
time if classification is done by hand or if
                                                         clearings which are officially marked as forest
institution data is used. The latter can have
                                                         areas as shown in Figure 1.
outdated/incomplete data which could severely
restrict the ability to create a good dataset for
certain areas. Additionally, different states may
restrict access to this data. From this, the necessity
of open access data, which could always be
accessed and would be constantly updated, arises.
In this work comparison of two open access data
sources suitable for automatic ground-truth mask
generation are investigated to evaluate their
applicability to use directly for the selected
                                                         Figure 1: Forest with clearing and mask
machine learning model. Both HRL and OSM
data sources are used as ground truths during            generated from OSM polygons
evaluation. Accuracy has been tested using
Copernicus S2 True Color Images (TCI). These             2.3.    Copernicus pan-European
images were collected in the summertime. The
pixel-based classification was applied to a fully           The Copernicus pan-European HRL portfolio
convolutional network (FCN) model with a                 provides detailed land monitoring information
resnet50 architecture.                                   including the HRL Forest layer. The approach to
                                                         constructing the HRL Forest layer is based on a
2. Materials                                             random forest classifier and is able to handle
                                                         outliers to a certain context for forest
2.1. Study area                                          classification problems achieving an accuracy of
                                                         more than 98%. Unfortunately, implementation of
    Lithuania has been selected as the study area.       such an approach requires intensive initial data
The territory of Lithuania consists mostly of            preparation from different sources including the
flatlands with lakes, swamps, and forests.               Sentinel missions and ancillary data sources like
Dominant species – pine, spruce, and birch.              land-parcel identification systems (LIPIS), OSM
Lithuania covers an area of 65 300 km2. The main         data, and other local data sources. Respectively
reason for limiting the study area to one country        validation steps require semi-automatic validation
steps [11]. HRL data is provided only every three      2.5.    Randomly generating points
years, while the last available forest coverage data
is from 2018. The data provided by HRL on forest
                                                           One of the main advantages of automatically
coverage can be received in raster files separated
                                                       generating datasets is the ability to change the size
by European countries. This information can be
                                                       of the dataset easily. Additionally, you can select
used to create forest/non-forest pixel-based masks
                                                       specific areas of interest from which to generate
for training datasets which may be stated as valid
                                                       datasets. Within these areas, points can be
information and used as ground truth labels during
                                                       specified or they can be randomly selected. In the
the periods of layer construction. Copernicus
                                                       present case, points were generated randomly
provides various three main forest layers – tree
                                                       within the entire study area. Raster of the territory
cover density (TCD), dominant leaf type (DLT),
                                                       of Lithuania contains geocoordinates. Using these
and forest type product (FTY). In this case, the
                                                       coordinates, boundaries of latitude and longitude
TCD layer will be used.
                                                       can be extracted. These boundaries are then used
                                                       to generate two random floating-point numbers,
2.4.    Mosaic of the study area                       one for latitude, and the other for longitude. Two
                                                       randomly generated numbers then make up a
    The mosaic of the study area is a single raster    point. Then it can be calculated if the generated
image merged from multiple S2 images after they        point is within the study area polygon. After
undergo preprocessing. Preprocessing includes          generating the required number of random points
cropping S2 images into small parts and merging        inside the area of interest, these points can be used
them. Although a single S2 image already takes         to crop out fixed size images from the study area.
up only a part of the study area, it may contain       Using this method, a subset of random images can
clouds. Areas of image that contain clouds are         be created. The subsets are then used as the basis
unusable, no forest can be classified over them.       for new datasets. Selected points in the study area
Hence, there is a need to “remove” these clouds        are presented in Figure 3.
from the study area. The removal method infers
cropping a single S2 image into small parts, then
using a cloud mask (provided by S2) we ignore
images that contain clouds. If all images in a given
area contain clouds, select the image with the
lowest amount of cloudiness.
    In this study, mosaic was created from images
from the 2018 summer period. This year's choice
was motivated by the need to align it with the
latest HRL data. Since S2 images are heavily           Figure 3: Randomly generated points within the
impacted by clouds and shadows of clouds,              polygon of the study area. Image displays
priority has been placed to month with the lowest      1600 generated points
percentile of clouds in images. The month of June
provided most images with a low distribution of        2.6.    Generating datasets
clouds; hence, the study area is comprised of
images from June. To create a single raster of the
                                                           In the scope of this paper, three random subsets
study area 21 S2 images have been used. Created
                                                       of points were generated consisting of 800, 1600,
mosaic of the study area is provided in Figure 2.
                                                       and 3200 points respectively. Then for each point,
                                                       an image sized 200x200 pixels is generated. The
                                                       S2 TCI images have 10m spatial resolution. From
                                                       this, a single image forms a square with a single
                                                       side of 2000 meters, the image’s area is 4km2. To
                                                       create the datasets each subset of images is then
                                                       duplicated, this is done so that mirrored datasets
                                                       can be created, the only difference between these
                                                       datasets is their masks. Every image in a dataset
Figure 2: The study area of Lithuania. Image           has a mask image. Mask images contain the
comprised of S2 data                                   classification of every pixel from the original
                                                       image. From 800 randomly generated points, 2
datasets have been generated – 800 images and           models against the same data, since during
OSM masks and 800 images and HRL masks.                 training they have their validation subset.
Finally, each dataset was split into 9/10 training
images and 1/10 validation images. In Figure 4          2.8.    Training model
several examples are provided to explain the most
noticeable differences among generated masks. In
the first example, we can see that the OSM                  The main goal of this paper is to evaluate the
                                                        differences      between       two      pixel-based
database provides a generalized forest area, which
                                                        classification datasets. This means that during
does not take into account any forest clearings,
whereas HRL does. The second example shows              training the same model has to be used with all
that OSM fails to precisely identify forests along      datasets. A fully convolutional network model has
the river. The last example provides not a single       been selected with resnet50 architecture. The
larger forest area, but small patches of forests, and   model distinguishes itself as fast, which is perfect
again OSM is at a detriment, lacking a substantial      when training multiple pixel-based classification
number of polygons to identify small forest             models on different datasets.
patches.
                                                        3. Methods
                                                        3.1. Overview
(1)
                                                           To compare two different datasets and their
                                                        precision, pixel-based classification will be
                                                        performed. Figure 5 provides a general workflow.

(2)


(3)


           a)               b)               c)

Figure 4: S2 image and generated forest (green)
and non-forest (black) masks; a) True Color Image
(TCI), 10m special resolution b) masks generated
from OSM data c) masks generated from HRL
data
                                                        Figure 5: Workflow of automatically creating
                                                        datasets and testing their performance
2.7.    Evaluation dataset
                                                           The workflow consists of:
    For evaluation, two new unique datasets are            1. Gather S2 images for the study area from
generated, one based on HRL data and the other                  Copernicus Open Access Hub.
on OSM data. These datasets were created using             2. Involves cloud removal and changing the
the same principle as the training datasets. A             coordinate system to WGS84.
single dataset is made up of 200 images. After             3. Forming a cloudless single raster mosaic
training all models will be additionally evaluated         of the study area.
using these datasets, which means that both HRL            4. Generate a specified number of random
and OSM will be regarded as ground truth during            images from the study area.
evaluation. The evaluation datasets were created           5. Gather HRL images from Copernicus
to introduce new images that have not been                 Land Monitoring Service.
processed by models and test their accuracy.               6. Convert pan-European raster into pixel-
Additionally, both datasets allow evaluating               based forest/non-forest mask.
   7. Generated a complete dataset from the                Each model was trained for 1000 epochs with
   HRL raster and a list of random images.             its own dataset. Validation masks were created
   8. Gather forest polygons from the OSM              from their own data source (OSM had its
   database.                                           polygons, pan-European its raster). In Figure 6 we
   9. Generate a geojson format file that              can see that pixel accuracy is generally similar
   contains the required forest polygons.              across all datasets. MIoU, however, does vary
   10. Generate a complete dataset of OSM              more with OpenStreetMap. The lower MIoU can
   polygons and a list of random images.               be attributed to inaccuracies of OSM. Validation
   11. Feed datasets to an FCN model.                  data from this dataset could contain forested areas
   12. Get a trained FCN model.                        that are not marked as forest, thus impacting the
   13. Generate a validation only dataset from a       validation results. Increasing the size of the
   new list of random images in the study area         training dataset also produced better overall
   and HRL raster.                                     validation results during training.
   14. Test trained FCN model accuracy against
   validation dataset.
   15. Check the evaluation results.
                                                        800
3.2.    Calculating accuracy
    Accuracy during training and evaluation will
be calculated using pixel accuracy and mean
intersection over union (MIoU). Although pixel
accuracy is a more common accuracy metric, it           1600
suffers when predicted images have a class
imbalance. For example, an image consists of 100
pixels, 90 of which are non-forest pixels, and the
rest are forest pixels. Then a trained model
predicts that 100 pixels (the entire image) are non-
forest. Pixel accuracy will be 90%. However, if
                                                        3200
we take intersection over union of forest and non-
forest, we will have 0% and 90% accuracy,
respectively. Then, if we calculate the mean of
both classes, prediction accuracy drops to 45%. In                     a)                    b)

this instance, mean intersection over union is a       Figure 6: Validation results with different
more accurate metric since datasets have images        datasets a) using pixel accuracy b) using MIoU
generated randomly, which can lead to a severe
class imbalance in a single image. Both accuracy       4.2.    Evaluation results
metrics are provided in the scope of this research.
    Pixel accuracy equation:
                        𝑇𝑃                                All trained models have been evaluated using
                                                (1)    two different datasets, one based on HRL data,
                𝑎=
                    𝑇𝑃 + 𝐹𝑃                            and the other on OSM data. Table 1 provides
    where TP – true positive pixels, FP – false
                                                       evaluation results from the HRL based testing
positive pixels.
                                                       dataset, whereas Table 2 provides evaluation
    Mean intersection over union equation:
                        𝑃∩𝐴                            results from the OSM based dataset. Based on the
                                                (2)
                𝐼𝑜𝑈 =                                  results, it can be seen that both models have
                        𝑃∪𝐴                            adjusted well to their training datasets. When
            𝐼𝑜𝑈𝑓𝑜𝑟𝑒𝑠𝑡 + 𝐼𝑜𝑈𝑛𝑜𝑛−𝑓𝑜𝑟𝑒𝑠𝑡        (3)       evaluating HRL trained models with a newly
   𝑚𝐼𝑜𝑈 =                                              created HRL evaluation dataset it performs better
                         2
   where P – predicted pixels, A – actual pixels.      than the dataset trained with OSM data. However,
                                                       when evaluation is done the other way around,
                                                       OSM trained datasets to perform better.
4. Results                                             Additionally, it can be noted that models with
4.1. Training results                                  larger training dataset sizes had slightly better
                                                       accuracy, especially when validating against a
dataset from the same source. Both models reach       completely ignore forest areas around rivers,
a similar accuracy ceiling of ~0.92 pixel accuracy    while HRL trained models have a recurring issue
and ~0.84 MIoU, when tested against their             of often identifying river itself as a forest. The last
relative evaluation dataset. Based on these           example shows how OSM has an issue with
evaluation results, it cannot be stated that either   recognizing small forest patches. This is probably
HRL or OSM prove to be better sources for             the most noticeable difference of all. On the other
ground truth. Direct comparison of these results      hand, pan-European is very good at identifying
cannot be conducted with referenced papers,           these patches, however it can at times identify
because different data is regarded as ground truth.   larger areas that are no longer outside the bounds
                                                      of small forest patches.
Table 1
HRL based evaluation results
   Data       Dataset        Pixel        MIoU
                                                        1)
  source        size       accuracy
    HRL         800          0.891        0.798
    HRL        1600          0.907        0.828
    HRL        3200          0.917        0.843
   OSM          800          0.844        0.717         2)
   OSM         1600          0.859        0.742
   OSM         3200          0.854        0.734

Table 2
OSM based evaluation results                            3)

   Data      Dataset        Pixel         MIoU
  source       size      accuracy
                                                                  a)               b)              c)
    HRL        800         0.872          0.758       Figure 7: Examples of trained model classification
    HRL       1600         0.861          0.741       a) S2 images, 10m spatial resolution b)
    HRL       3200         0.859          0.738       classification, model trained with OSM data c)
   OSM         800         0.902          0.802       classification, model trained with HRL data
   OSM        1600         0.910          0.819
   OSM        3200         0.921          0.838
                                                      5. Conclusion
4.3.    Noticeable differences                            Six pixel-based forest/non-forest classification
                                                      datasets were generated, three based on OSM
    Although the evaluation results are very          data, and another three on HRL data, in order to
similar, certain differences can be identified by     evaluate the applicability of using open access
visually inspecting how models predict more edge      data for dataset generation. All datasets were used
cases. In Figure 7 we can see how the trained         to train a model that represents them. After
models compare. The first example shows that the      training they were evaluated using additional
OSM trained model ignores forest clearings while      evaluation datasets. Evaluation showed that both
the HRL trained model recognizes clearings,           data sources yielded similar numerical accuracy
albeit not very precisely. Both models still          results. Both data sources provided accurate data,
suffered heavy inaccuracies when they had to          that allowed models to reach ~0.92 pixel accuracy
recognize forest clearings in large forested areas.   and ~0.84 MIoU, when evaluating with datasets
Models would simply opt out to mark the entire        from relative data source. During the evaluation,
area as forest and ignore clearings. The second       it was also noted that increasing the training
example provides evidence of pan-European             dataset size increased the accuracy of the relative
being better at recognizing forest areas along        dataset evaluation. After additional visual
rivers. Since OSM rarely provides forest polygons     inspection of edge cases, it was noted that models
for areas along rivers and lakes, HRL trained         trained with OSM datasets tend to create a false
models become better at recognizing them. When        negative classification of forest areas along rivers
it comes to small rivers, OSM models tend to          and small forest patches scattered in an area.
Models that were trained using HRL datasets were       [4] Z. Malenovský et al., “Sentinels for science:
better at classifying forest clearings, forest areas        Potential of Sentinel-1, -2, and -3 missions
along rivers and small forest patches scattered in          for scientific observations of ocean,
an area. However, HRL trained models could                  cryosphere, and land,” Remote Sensing of
provide false positive classification, identifying          Environment, vol. 120, pp. 91–101, May
parts of the river as forest. Numerical differences         2012, doi: 10.1016/j.rse.2011.09.026.
between these two data sources proved to be            [5] N. Puletti, F. Chianucci, and C. Castaldi,
negligible, one data source cannot be regarded as           “Use of Sentinel-2 for forest classification in
worse than the other. Although HRL data is                  Mediterranean environments,” Annals of
produced only once every three years, visual                Silvicultural Research, vol. 42, no. 1, pp.
inspections of generated dataset masks and                  32–38, 2018, doi: 10.12899/ASR-1463.
trained model classified masks prove that it is        [6] E. Grabska, P. Hostert, D. Pflugmacher, and
better at detecting fine details in remote sensing          K. Ostapowicz, “Forest stand species
images. Taking this into account, a pixel-based             mapping using the sentinel-2 time series,”
classification model can be trained using 2018              Remote Sensing, vol. 11, no. 10, May 2019,
data, which can then be used to classify newer or           doi: 10.3390/rs11101197.
older remote sensing data by year, which is            [7] M. Persson, E. Lindberg, and H. Reese,
especially important in HRL dataset case which is           “Tree species classification with multi-
expensive to prepare and are provided only once             temporal Sentinel-2 data,” Remote Sensing,
every three years.                                          vol. 10, no. 11, Nov. 2018, doi:
                                                            10.3390/rs10111794.
6. Data availability statement                         [8] M. Immitzer, F. Vuolo, and C. Atzberger,
                                                            “First experience with Sentinel-2 data for
                                                            crop and tree species classifications in
   Datasets that were generated during this
                                                            central Europe,” Remote Sensing, vol. 8, no.
research, both training and evaluation, together            3, 2016, doi: 10.3390/rs8030166.
with complete study area and HRL raster of the
                                                       [9] J. Estima and M. Painho, “Exploratory
study       area     can     be     found      at
                                                            analysis of OpenStreetMap for land use
https://zenodo.org/record/6548615 (accessed on              classification,” in GEOCROWD 2013 -
20 May 2022). OSM data can be found at
                                                            Proceedings of the 2nd ACM SIGSPATIAL
https://planet.openstreetmap.org (accessed on 20
                                                            International Workshop on Crowdsourced
May 2022).                                                  and Volunteered Geographic Information,
                                                            2013,          pp.        39–46.           doi:
7. References                                               10.1145/2534732.2534734.
                                                       [10] A. Dostálová, M. Lang, J. Ivanovs, L. T.
[1] M. K. Nesha et al., “An assessment of data              Waser, and W. Wagner, “European wide
    sources, data quality and changes in national           forest classification based on sentinel-1
    forest monitoring capacities in the Global              data,” Remote Sensing, vol. 13, no. 3, pp. 1–
    Forest Resources Assessment 2005-2020,”                 27, Feb. 2021, doi: 10.3390/rs13030337.
    Environmental Research Letters, vol. 16,           [11] European Environment Agency (EEA),
    no. 5. IOP Publishing Ltd, May 01, 2021.                “Copernicus Land Monitoring Service User
    doi: 10.1088/1748-9326/abd81b.                          Manual Consortium Partners,” 2018.
[2] M. A. Zambrano-Monserrate, C. Carvajal-                 [Online].                          Available:
    Lara, R. Urgilés-Sanchez, and M. A. Ruano,              https://land.copernicus.eu/
    “Deforestation as an indicator of
    environmental degradation: Analysis of five
    European countries,” Ecological Indicators,
    vol. 90, pp. 1–8, Jul. 2018, doi:
    10.1016/j.ecolind.2018.02.049.
[3] S. T. Thompson and W. B. Magrath,
    “Preventing illegal logging,” Forest Policy
    and Economics, vol. 128. Elsevier B.V., Jul.
    01,                 2021.                doi:
    10.1016/j.forpol.2021.102479.