=Paper= {{Paper |id=Vol-2841/DARLI-AP_12 |storemode=property |title=Double-Step deep learning framework to improve wildfire severity classification |pdfUrl=https://ceur-ws.org/Vol-2841/DARLI-AP_12.pdf |volume=Vol-2841 |authors=Simone Monaco,Andrea Pasini,Daniele Apiletti,Luca Colomba,Alessandro Farasin,Paolo Garza,Elena Baralis |dblpUrl=https://dblp.org/rec/conf/edbt/MonacoPACFGB21 }} ==Double-Step deep learning framework to improve wildfire severity classification== https://ceur-ws.org/Vol-2841/DARLI-AP_12.pdf
                               Double-Step deep learning framework
                             to improve wildfire severity classification
                Simone Monaco                                              Andrea Pasini                               Daniele Apiletti
           Politecnico di Torino                                       Politecnico di Torino                          Politecnico di Torino
                Torino, Italy                                               Torino, Italy                                  Torino, Italy
     simone.monaco@studenti.polito.it                                 andrea.pasini@polito.it                       daniele.apiletti@polito.it

                  Luca Colomba                                         Alessandro Farasin                                 Paolo Garza
              Politecnico di Torino                                    Politecnico di Torino                         Politecnico di Torino
                   Torino, Italy                                            Torino, Italy                                 Torino, Italy
             luca.colomba@polito.it                                alessandro.farasin@polito.it                      paolo.garza@polito.it

                                                                            Elena Baralis
                                                                        Politecnico di Torino
                                                                             Torino, Italy
                                                                       elena.baralis@polito.it

ABSTRACT                                                                               remote-sensing sensors such as satellites. The latter two data
Wildfires are dangerous events which cause huge losses under                           sources can be used to develop computer vision systems, mainly
natural, humanitarian and economical perspectives. To contrast                         based on neural networks, to automatize the entire detection and
their impact, a fast and accurate restoration can be improved                          damage estimation process.
through the automatic census of the event in terms of (i) delin-                           For this purpose, we use satellite images acquired by Coperni-
eation of the affected areas and (ii) estimation of damage severity,                   cus Sentinel-2 mission to automatically identify burnt areas [25]
using satellite images. This work proposes to extend the state-                        and to assess the damage severity without requiring human ef-
of-the-art approach, named Double-Step U-Net (DS-UNet), able                           forts. We can identify two different approaches to address this
to automatically detect wildfires in satellite acquisitions and to                     task: (i) assigning a class label to each pixel of the satellite image
associate a damage index from a defined scale. As a deep learning                      (i.e., burnt or unburnt), or (ii) an increasing number represent-
network, the DS-UNet model performance is strongly dependent                           ing the damage intensity. The former can be modeled with the
on many factors. We propose to focus on alternatives in its main                       well-known computer vision task called semantic segmentation,
architecture by designing a configurable Double-Step Framework,                        while the latter requires a regression methodology.
which allows inspecting the prediction quality with different loss-                        The current state of the art proposes a solutions based on Con-
functions and convolutional neural networks used as backbones.                         volutional Neural Networks (CNNs), called Double-Step U-Net
Experimental results show that the proposed framework yields                           (DS-UNet) [8], which involves both binary semantic segmenta-
better performance with up to 6.1% lower RMSE than current                             tion and regression to obtain a damage-severity map. Specifically,
state of the art.                                                                      each pixel is labeled with a numerical value representing the
                                                                                       damage level: 0 - No damage, 1 - Negligible to slight damage, 2 -
                                                                                       Moderately damaged, 3 - Highly Damaged, and 4 - Completely
1    INTRODUCTION                                                                      destroyed. The network is trained according to the official hazard
In the recent years, European countries witnessed an increasing                        annotations, named grading maps, publicly available on Coper-
trend in the occurrence of wildfires. According to the annual                          nicus EMS [1].
report of the European Forest Fire Information System, in 2019                             Previous works on semantic segmentation showed that the
more than 1,600 wildfires have been recorded in the European                           appropriate configuration of the CNN structure and the choice
Union: about three times more than the average over the past                           of loss functions have significant impacts on the final results [13,
decade [2, 3]. Those events are causing large losses not only to                       16]. In this work we aim to improve the performances of the
forests and animals, but also to human lives and cities. The geo-                      Double-Step U-Net maintaining the base architecture, composed
graphical delineation of the affected regions and the estimation                       of two separated CNN modules, but assessing different CNNs and
of the damage severity are fundamental for planning a proper                           loss functions. Hence, we propose the Double-Step Framework
environment restoration.                                                               (DSF), a configurable architecture whose modules allow an in-
   The European Union is active in natural disasters monitoring                        depth analysis on the effects of different loss-functions and CNNs,
and risk management through the Copernicus Emergency Man-                              comparing the results with the baseline in [8]. Based on this result,
agement Service platform (EMS) [1]: it provides data about past                        we train all our models on portions of satellite images containing
disasters such as forest wildfires and floods. The census of an haz-                   burnt areas only.
ard is usually performed either manually or semi-automatically                             Our contribution can be summarized as follows: (i) we define
using in-situ information, images captured from aircrafts and                          the Double-Step Framework, inspired by the DS-UNet neural net-
                                                                                       work, and (ii) we show detailed experimental results on classifica-
Β© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   tion and regression tasks, comparing the different configurations.
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
                                                                                           Our paper is organized as follows. Section 2 presents the re-
International (CC BY 4.0)                                                              lated works, while Section 3 discusses the neural network model
                                          Figure 1: Double-Step Framework architecture.


and the proposed variations in terms of deep network backbones        shows that U-Net [21] is a valuable choice for addressing the
and loss-functions. Finally, Section 4 shows the experimental         wildfire damage-severity estimation task.
results and Section 5 draws conclusions.                                 The state-of-the-art solution proposes a Double-Step U-Net
                                                                      architecture. This double step configuration relies on the Dice
                                                                      loss function to learn predicting the boundaries of wildfires, and
2   RELATED WORK                                                      on the Mean Squared Error (MSE) function for estimating the
                                                                      final severity level. Many other different loss functions have been
In this section, we firstly review previous works on wildfire
                                                                      proposed in literature [12], and several works showed that a
prediction and severity classification, then we analyze the state-
                                                                      correct choice typically makes a real difference in the results [13].
of-the-art architecture, addressing the semantic segmentation
problem. Then we focus on the adopted loss functions, highlight-
ing the differences with the proposed techniques.                     3    DOUBLE-STEP FRAMEWORK
   Many previous works are used to monitor the evolution of           In this section we define the Double-Step Framework (DSF), with
wildfires during the event to support domain experts. Some            the aim of obtaining a configurable architecture based on the
of these techniques are implemented via deep learning mod-            Double-Step U-Net working principles. The proposed framework
els [6, 19]. Differently, in this paper we are focused on automatic   allows a complete customization of both training loss functions
detection of involved areas and damage estimation after the event,    and backbone neural networks. The main building blocks of
by only exploiting post-event satellite images.                       the DSF are depicted in Figure 1 and their functionalities are
   The burnt area identification problem is well-known in re-         described in the following paragraphs.
mote sensing literature: many different approaches have been              Binary class backbone. This building block has the task of
proposed and recently, machine learning and deep learning-based       assigning a binary label (i.e., burnt or unburnt) to each pixel of
approaches are being considered, such as [10, 20]. Some map-          the input image. Its output is a probability map with values in
ping operations are performed based on in-situ information,           the range [0, 1].
such as the Composite Burned Index (CBI) [15], which are time-            Binary threshold. The output probabilities of the Binary
consuming and requires evaluations of the soil and vegetation         class backbone are thresholded to obtain the final binary mask,
conditions for the entire area of interest (AoI). Other approaches    highlighting regions affected by wildfires. The value of the thresh-
exploit the use of remote sensing techniques and burnt area in-       old is fixed to 0.5 in all the experiments.
dexes: satellites collects information across different bandwidths,       Regression backbone. This step aims at deriving a severity
some of which are sensible to water and vegetation. Specifi-          map to specify the damage intensity in range [0, 4] for each
cally, we consider 12 bandwidths available from Sentinel2-L2A         pixel. It takes as input the product between the binary mask and
products. Burnt area indexes highlight burnt regions by combin-       the original input image, in order to consider the satellite image
ing specific bandwidths and eventually comparing pre-fire and         information only for the regions that have been classified as burnt
post-fire acquisitions: Normalized Burn Ratio (NBR) [18], delta       by the Binary class backbone. Indeed, accurate binary masks are
Normalized Burn Ratio (dNBR) [17] and Burned Area Index for           fundamental to provide only the information related to regions
Sentinel2 (BAIS2) [9] are some examples. Different approaches         affected by wildfires. False positives (i.e., unburnt areas classified
use such indexes to identify damaged areas and eventually assess      as burnt) have shown to negatively affect the regression quality.
the severity level [22].                                                  Binary loss. This loss function is exploited to train the Binary
   These methodologies showed so far suffer from a strong de-         class backbone, by comparing its output with ground-truth binary
pendence on the different weather conditions of the satellite ac-     masks.
quisitions. Moreover, the usage of indexes to estimate the damage         Regression loss. After the completion of the training process
severity level typically requires the manual or semi-manual defi-     of the Binary class backbone, this loss function is used to train
nition of predefined thresholds that are usually soil-dependent       the Regression backbone. During this training phase, the weights
and cannot be easily set. The solutions adopted in this work          of the Binary class backbone are kept constant.
solve the previously mentioned issues by only including post-             The Binary class backbone, the Regression backbone, and
fire images and applying a supervised prediction approach on          the two loss functions defined for the DSF can be customized
pre-labelled severity maps. Specifically, we apply a semantic seg-    to obtain several configurations. In the following, we present
mentation model, combined with a regression one, to derive the        the different options available for these configurable modules,
final result. Many different semantic segmentation architectures      dividing the analysis in two parts: (i) backbone architectures, and
have been proposed in literature [5, 7, 26], but the work in [8]      (ii) loss functions.
       Table 1: Loss function selection experiments.                   formalized as follows:
                                                                                             πΏπ‘ πΌπ‘œπ‘ˆ = 1 βˆ’ πΌπ‘ π‘œ 𝑓 𝑑 /π‘ˆπ‘ π‘œ 𝑓 𝑑
    Config. name     Binary loss               Regression loss
                                                                       where πΌπ‘ π‘œ 𝑓 𝑑 and π‘ˆπ‘ π‘œ 𝑓 𝑑 are the soft intersection and the soft union,
    BCE-MSE          BCE                       MSE
                                                                       respectively. Compound loss functions have shown to be an
    Dice-MSE         Dice                      MSE
                                                                       effective way for training neural networks [16].
    B+D-MSE          Compound BCE, Dice        MSE
                                                                          They are typically defined as a weighted sum of standard loss
    B+S-MSE          Compound BCE, sIoU        MSE
                                                                       functions. In this work we inspected the effectiveness of B+D,
    sIoU-sIoU        sIoU                      sIoU
                                                                       defined as B+D = 0.5 Β· 𝐡𝐢𝐸 + 0.5 Β· 𝐷𝑖𝑐𝑒, and B+S, defined as
    sIoU-MSE         sIoU                      MSE
                                                                       B+S = 0.5 Β· 𝐡𝐢𝐸 + 0.5 Β· πΏπ‘ πΌπ‘œπ‘ˆ .
                                                                          Regression loss. Since the output values of the Regression
                                                                       backbone can range into 5 severity levels, for the regression loss
3.1    Backbone architectures                                          we considered a second set of functions. Specifically, we inspected
The Binary class and the Regression backbones can be imple-            the results obtained with the Mean Squared Error (MSE), a gen-
mented with a custom encoder-decoder neural network. We pro-           eralization of the sIoU to a multiclass case, and a combination of
pose three different DSF configurations for these modules, by          the MSE and the F1 score.
changing the backbone architectures. Specifically, we selected the        In the case of sIoU, predictions and ground truth are com-
following models: U-Net [21], U-Net++ [27], and SegU-Net [14].         pared by considering separately the pixels corresponding to each
When choosing one among the proposed backbone architectures,           severity level. The division of the pixels based on the severity
we use the same one for both the Binary class and the Regression       level is made by applying rectangular functions to the matrices.
backbone. In the next sections of this paper we refer to these         In the case of the network prediction matrix, to avoid defining
configurations with the names Double-Step U-Net (DS-UNet),             a sharp selection of the severity levels (i.e., loosing important
Double-Step U-Net++ (DS-UNet++), and Double-Step SegU-Net              information for the gradient), we applied smooth rectangular
(DS-SegU).                                                             functions. After computing the intersections and the unions be-
   The state-of-the-art Double-Step U-Net [8] is exactly repro-        tween ground truth and predictions, the contribution of each
duced by our framework when choosing the DS-UNet configu-              severity level is finally summed up in the final sIoU function.
ration. The U-Net in the Binary class backbone is set up with             Let Π𝑐 be a sharp rectangular function that takes the value 1
a sigmoid activation function to generate the probability map,         when the input pixel belongs to class 𝑐 and 0 otherwise. Let πœŽΛœπ‘ (π‘₯)
while for the Regression backbone we do not use any activation         be a smooth rectangular function, defined as πœŽΛœπ‘ (π‘₯) = 𝜎 (πœ– βˆ’|π‘₯ βˆ’π‘ |),
function, since the output values may range in [0, 4].                 where πœ– = 0.5 and 𝜎 is the sigmoid function. The sIoU loss
   The DS-UNet++ follows the same working principles and dif-          function, is defined as:
                                                                                                    Í
fers only by the selected neural network. Specifically, U-Net++                                         𝑐 |Π𝑐 (π‘ŒGT ) β—¦ 𝜎˜ 𝑐 (π‘ŒPR )|
                                                                          πΏπ‘ πΌπ‘œπ‘ˆ ,π‘Ÿπ‘’π‘” = Í                                                    ,
enhances the structure of the standard U-Net by adding convolu-                          𝑐 |Ξ  𝑐 GT + 𝜎˜ 𝑐 (π‘ŒPR ) βˆ’ Π𝑐 (π‘ŒGT ) β—¦ 𝜎˜ 𝑐 (π‘ŒPR )|
                                                                                                (π‘Œ    )
tional layers in correspondence of the skip connections between
                                                                       where π‘ŒGT is the ground-truth matrix, π‘ŒPR are the predictions,
the encoder and the decoder.
                                                                       and the symbol β—¦ represents the element-wise product between
   Finally, the DS-SegU configuration exploits another variation
                                                                       two matrices. Given this definition, for each class, the intersection
of the standard U-Net. In particular, with the SegU-Net network,
                                                                       is represented by the product between the two matrices and the
the skip-connections typical of the U-Net are integrated into Seg-
                                                                       union is given by their sum minus the intersection.
Net [5], which is based on pooling indices to provide information
                                                                           The last loss function we considered is inspired from the fact
from the encoder to the decoder.
                                                                       that the second network is designed for a regression task, but
                                                                       actually the final result admit a set of classes. Hence we built
3.2    Loss functions                                                  a function both penalizing the distance from the ground truth
This section describes the different loss functions that we propose    and favouring the consistency with the real classes. The two
for training the Binary class and the Regression framework. The        contributions are provided by the MSE loss and the F1 score the
complete list of configurations is specified in Table 1. The first     result obtain on the 5 classes, multiplied together following:
column of the table provides the configuration name, used in the
experiments in Section 4, while the other two columns specify                              𝐿𝑀𝑆𝐸 Β·F1 = 𝐿𝑀𝑆𝐸 Β· (1 βˆ’ F1 ).
the corresponding Binary and Regression loss.
   Binary loss. For the Binary loss function we consider Bi-           4    EXPERIMENTAL RESULTS
nary Cross Entropy (BCE), Dice, sIoU, and two compound loss            In this section we provide the evaluation of the proposed Double-
functions (i.e., B+D, B+S). In the following we provide the main       Step Framework, by inspecting the results with all the configura-
characteristics of these loss functions.                               tions described in Section 3. We also show a detailed comparison
   The sIoU (soft Intersection over Union) is defined as a per-pixel   with other standard encoder-decoder architectures.
AND-like operation applied between the ground-truth image and             The next subsections are organized as follows. Section 4.1
the network estimation to get the Intersection, and a per-pixel        describes the analyzed dataset, Section 4.2 outlines the experi-
OR-like operation to get the Union. Differently to standard IoU,       mental setting, while Section 4.3 provides the results to assess
the sIoU is computed directly on the probability map predicted         the modules of the DSF. Finally, Section 4.4 compares our frame-
by the neural network, without discretizing the values to a binary     work with other single-step architectures. All the final results
mask. This allows evaluating the actual distance between the           are obtained using the HPC resources at HPC@PoliTO [4], using
prediction and ground truth, for a more effective calculation          a single GPU NVIDIA Tesla V100 SXM2. The full dataset consist
of gradients. The definition of the sIoU loss function can be          of approximately 5 Gb of memory.
                                     Figure 2: Distribution of the 5 severity levels for each fold.

                                 Table 2: IoU of burnt class for the Binary classification backbone.

                                            Model           BCE     Dice ([8])        B+D    B+S    sIoU
                                            DS-UNet         0.80         0.58         0.58   0.38   0.39
                                            DS-UNet++       0.79         0.47         0.50   0.37   0.30
                                            DS-SegU         0.63         0.24         0.19   0.15   0.14


4.1    Dataset analysis                                                         of seven are used for training, 1 for validation (i.e., to enable
The experimental setting adopted in this paper follows the same                 early stopping), and 1 for testing. The early stopping process is
dataset preparation as in [8]. Specifically, the satellite images are           configured with patience 5 and a tolerance of 0.01 on the loss
extracted from the Copernicus Emergency Management Service                      function.
dataset (Copernicus EMS) [1], focusing on the samples acquired                      To enhance the reliability of the results, cross-validation is run
by Sentinel2 (L2A products). The satellite acquisitions represent               5 times for each model configuration. All the evaluation metrics
terrain areas with matrices of variable size (approximately 5000 Γ—              are computed separately for each run and each cross-validation
5000) and 12 channels (for the different acquisition bandwidths).               iteration, then averaged to obtain the final scores.
Each sample is manually annotated with pixel-wise ground-truth                      The output of the analyzed neural networks is evaluated in a
severity levels corresponding to the damage intensity caused by                 (i) regression fashion, and a (ii) classification fashion. The first
the wildfire. The number of severity levels is 5 (i.e., from 0 for no           case exploits the Root Mean Squared Error (RMSE) to verify
damage, to 4 for completely destroyed).                                         the quality of the predictions. Due to dataset imbalancing, the
    The images are provided to the neural networks under analy-                 RMSE is computed separately for each severity level for a proper
sis by tiling them into squares with size 480 Γ— 480 and using a                 evaluation. Specifically, given a severity level, we compute the
batch size of 8. Indeed, their original size is too large for being             RMSE between all the ground-truth pixels with that value and
consumed by these deep learning models. After excluding the                     the neural network predictions.
samples without burnt regions, the dataset contains a total of 135                  Since severity levels in the ground-truth annotations are pro-
tiles. These data are then distributed into 7 different folds based             vided in the form of discrete numbers, we also applied a clas-
on the geographical proximity of the analyzed regions (i.e., close              sification metric for the evaluation. Specifically, we computed
regions typically share the same morphology).                                   the Intersection over Union (IoU) between ground truth and the
    The percentage of pixels of the 5 severity levels in the 7 dataset          predictions discretized to integer values. Similarly to the RMSE
folds is provided in Figure 2. The plot shows that the class 0                  evaluation, the IoU is computed separately for each severity level.
(i.e., no damage) is predominant over all the others. Moreover,
different folds present significantly different distributions of the            4.3     Loss function selection
severity levels, which confirms the difficulty of the prediction                We begin the assessment of the Double-Step Framework by fo-
task.                                                                           cusing on the Binary classification backbone. To this aim, Table 2
                                                                                evaluates the Binary classification backbone by providing the
4.2    Experimental setting                                                     IoU of the burnt class. This phase inspects the ability of the
Motivated by the small dataset size and the unbalanced classes,                 network in distinguishing between burnt and undamaged ar-
data augmentation techniques have been performed to change                      eas, regardless of the severity levels. The results clearly show
the variability of the training data at each epoch, applying random             that the BCE loss function brings an important advantage with
rotations, horizontal/vertical flips, and random shears.                        respect to the others, reaching 0.80 IoU for the DS-UNet and
   After applying data augmentation, we run a cross-validation                  0.79 for the DS-UNet++. The Dice loss function, exploited in the
for each model under analysis. At each iteration, five folds out                original Double-Step U-Net, compares to BCE with moderately
                                Table 3: Results on burnt-areas only, with different loss functions.

       Metric        Model            BCE-MSE     Dice-MSE [8]     B+D-MSE      B+S-MSE      BCE-MSEΒ·F1           sIoU-sIoU   sIoU-MSE
                     DS-UNet           1.08            1.15            1.13        1.27             1.12             1.64       1.31
       avg RMSE      DS-UNet++         1.10            1.28            1.19        1.35             1.14             2.31       1.28
                     DS-SegU           1.45            1.60            1.73        1.79             1.38             2.50       1.79
                     DS-UNet           0.16            0.13            0.14        0.14             0.13             0.10       0.12
       avg IoU       DS-UNet++         0.16            0.11            0.13        0.13             0.14             0.15       0.11
                     DS-SegU           0.12            0.14            0.14        0.15             0.14             0.14       0.13

                                           Table 4: Architecture selection results (RMSE).

                                       DS-UNet      DS-UNet++      DS-SegU         Unet++     PSPNet       SegU-Net
                          Severity
                                       BCE-MSE      BCE-MSE        BCE-MSEΒ·F1      MSE        MSE          MSE
                          0              0.30           0.33            0.23         1.04       1.14          0.39
                          1              1.09           1.00            0.79         1.16       1.37          0.91
                          2              1.04           0.95            1.09         0.93       1.21          1.11
                          3              0.96           0.97            1.33         0.91       1.09          1.44
                          4              1.25           1.50            2.33         1.35       1.38          2.14
                          avg (1-4)      1.08           1.10            1.38         1.09       1.26          1.40

                                              Table 5: Architecture selection results (IoU).

                                       DS-UNet      DS-UNet++      DS-SegU      Unet++      PSPNet     SegU-Net
                          Severity
                                       BCE-MSE      BCE-MSE        B+S-MSE      MSE         MSE        MSE
                          0              0.95           0.94           0.68       0.00       0.00          0.82
                          1              0.11           0.13           0.08       0.01       0.01          0.09
                          2              0.22           0.21           0.07       0.19       0.11          0.14
                          3              0.03           0.07           0.28       0.01       0.01          0.08
                          4              0.28           0.21           0.14       0.14       0.16          0.06
                          avg (1-4)      0,16           0,16           0,15       0,09       0,07          0,09


lower results for the DS-UNet and the DS-UNet++ (0.58 and 0.47            4.4    Architecture comparison
respectively) and a very low score (i.e., 0.24) for the DS-SegU.          We complete our experimental results by comparing the predic-
   Motivated by these results, we inspect the ability of the Double-      tion quality of the Double-Step Framework with other single-
Step Framework in distinguishing the different severity levels            step neural networks. In the following, for the DS-UNet, the
for burnt regions. To this aim, we computed the RMSE and the              DS-UNet++, and the DS-SegU, we only show the results with the
IoU, averaged for the levels in range [1, 4]. Level 0 is excluded         best overall loss function configurations for each network. The
by the average, since it represents the majority class, describing        other neural networks analyzed in this section are the UNet++,
unburnt regions. Table 3 provides the results for all the config-         PSPNet, and SegU-Net. All of them are trained by means of the
urations proposed in Section 3. Both the loss functions and the           MSE loss function. PSPNet is considered as example of a more
Binary/Regression backbones are evaluated at this step.                   complex neural network with respect to the other ones. Indeed,
   The results clearly show that the BCE-MSE loss function con-           this model exploits multiple pyramidal pooling filters to capture
figuration is able to achieve the best results according to RMSE.         features at different resolutions. In our case we used a PSP layer
The only difference is for the DS-SegU, which reach better result         including pooling kernels with size 1, 2, 3, 6 and the ResNet18 [11]
with the BCE-MSEΒ·F1 loss function. For what concerns IoU, the             as backbone. We did not use deeper ResNet models due to possi-
BCE-MSE confirms its first place for the DS-UNet and DS-UNet++,           ble underfitting issues (caused by the small size of the analyzed
while the loss functions including the sIoU for the Binary classi-        dataset).
fication backbone, namely the combo loss B+S-MSE, achieve a                  Table 4 and 5 show the complete set of results, analyzed with
better score for the DS-SegU. Among the three proposed DSF ar-            RMSE and IoU respectively. The first five lines of the two tables
chitectures, the DS-UNet with BCE-MSE presents the best result            present the scores separately for the severity levels. The final
in terms of avg RMSE, while the DS-UNet and DS-UNet++ with                line provides the average score excluding level 0 (i.e., undamaged
BCE-MSE achieve the best IoU.                                             regions).
                                                                             According to the average RMSE (Table 4), the best model is the
                                                                          DS-UNet, with value 1.08. Despite this result, it only reach the
best score for level 4 regions with respect to other models. Look-                       [9] Federico Filipponi. 2018. BAIS2: burned area index for Sentinel-2. In Multidis-
ing at average IoU scores (Table 5), the DS-UNet obtains again                               ciplinary Digital Publishing Institute Proceedings, Vol. 2. 364.
                                                                                        [10] Leonardo A Hardtke, Paula D Blanco, Hector F del Valle, Graciela I Metternicht,
the best result, together with the DS-UNet++ when considering                                and Walter F Sione. 2015. Semi-automated mapping of burned areas in semi-
the average score.                                                                           arid ecosystems using MODIS time-series imagery. International Journal of
                                                                                             Applied Earth Observation and Geoinformation 38 (2015), 25–35.
   For the majority class (i.e., level 0), SegNet achieves a very low                   [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep resid-
RMSE (0.01). However, this model makes important errors on all                               ual learning for image recognition. In Proceedings of the IEEE conference on
the other severity levels, probably because the model is incline                             computer vision and pattern recognition. 770–778.
                                                                                        [12] Shruti Jadon. 2020. A survey of loss functions for semantic segmentation.
to predict class 0 most of the times.                                                        arXiv preprint arXiv:2006.14822 (2020).
                                                                                        [13] Shruti Jadon, Owen P Leary, Ian Pan, Tyler J Harder, David W Wright, Lisa H
5   CONCLUSIONS AND FUTURE WORKS                                                             Merck, and Derek L Merck. 2020. A comparative study of 2D image segmenta-
                                                                                             tion algorithms for traumatic brain lesions using CT data from the ProTECTIII
The objective of this paper was to define a complete experimental                            multicenter clinical trial. In Medical Imaging 2020: Imaging Informatics for
setting to compare different architectures for wildfire severity                             Healthcare, Research, and Applications, Vol. 11318. International Society for
                                                                                             Optics and Photonics, 113180Q.
prediction. We defined a Double-Step Framework, with customiz-                          [14] Uday Kamal, Thamidul Islam Tonmoy, Sowmitra Das, and Md Kamrul Hasan.
able loss-functions and network backbones. Different backbones                               2019. Automatic traffic sign detection and recognition using SegU-Net and
                                                                                             a modified Tversky loss function with L1-constraint. IEEE Transactions on
and loss functions have been evaluated according to RMSE and                                 Intelligent Transportation Systems 21, 4 (2019), 1467–1479.
IoU, showing that the Double-Step Framework tends to give more                          [15] Carl H Key and Nathan C Benson. 2006. Landscape assessment (LA). In: Lutes,
                                                                                             Duncan C.; Keane, Robert E.; Caratti, John F.; Key, Carl H.; Benson, Nathan
accurate results with respect to single-step neural networks. In                             C.; Sutherland, Steve; Gangi, Larry J. 2006. FIREMON: Fire effects monitoring
order to improve and give solidity to these solutions, we then                               and inventory system. Gen. Tech. Rep. RMRS-GTR-164-CD. Fort Collins, CO: US
intend to apply explainability algorithms (as Grad-CAM [24]) to                              Department of Agriculture, Forest Service, Rocky Mountain Research Station. p.
                                                                                             LA-1-55 164 (2006).
detect correlations in networks errors and punctually improve                           [16] Jun Ma. 2020. Segmentation Loss Odyssey. arXiv preprint arXiv:2005.13449
the models. One other possible step over could be to measure our                             (2020).
approach ’s carbon footprint [23].                                                      [17] Jay D Miller and Andrea E Thode. 2007. Quantifying burn severity in a
                                                                                             heterogeneous landscape with a relative version of the delta Normalized Burn
                                                                                             Ratio (dNBR). Remote Sensing of Environment 109, 1 (2007), 66–80.
ACKNOWLEDGMENT                                                                          [18] Gabriel Navarro, Isabel Caballero, Gustavo Silva, Pedro-Cecilio Parra, Águeda
                                                                                             VΓ‘zquez, and Rui Caldeira. 2017. Evaluation of forest fire on Madeira Is-
The research leading to these results has been partially supported                           land using Sentinel-2A MSI imagery. International Journal of Applied Earth
by the SmartData@PoliTO center for Big Data and Machine                                      Observation and Geoinformation 58 (2017), 97–106.
Learning technologies, and the HPC@PoliTO center for High                               [19] Miguel M Pinto, Renata Libonati, Ricardo M Trigo, Isabel F Trigo, and Car-
                                                                                             los C DaCamara. 2020. A deep learning approach for mapping and dating
Performance Computing. The authors are grateful to Moreno La                                 burned areas using temporal sequences of satellite images. ISPRS Journal of
Quatra for his help in exploiting the HPC resources.                                         Photogrammetry and Remote Sensing 160 (2020), 260–274.
                                                                                        [20] Ruben Ramo and Emilio Chuvieco. 2017. Developing a random forest algorithm
                                                                                             for MODIS global burned area classification. Remote Sensing 9, 11 (2017), 1193.
REFERENCES                                                                              [21] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolu-
[1] 2009 (accessed November 9, 2020). European Union. Copernicus Emergency                   tional networks for biomedical image segmentation. In International Confer-
    Management Service. 2020. https://emergency.copernicus.eu/                               ence on Medical image computing and computer-assisted intervention. Springer,
[2] 2019.            Euronews.           https://www.euronews.com/2019/08/15/                234–241.
    there-have-been-three-times-more-wildfires-in-the-eu-so-far-this-year.              [22] David P Roy, Luigi Boschetti, and Simon N Trigg. 2006. Remote sensing of
    Accessed: 2020-12-03.                                                                    fire severity: assessing the performance of the normalized burn ratio. IEEE
[3] 2019. European Forest Fire Information System (EFFIS) - Annual Reports. https:           Geoscience and Remote Sensing Letters 3, 1 (2006), 112–116.
    //effis.jrc.ec.europa.eu/reports-and-publications/annual-fire-reports/. Ac-         [23] Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. 2020. Green AI.
    cessed: 2020-12-03.                                                                      Commun. ACM 63, 12 (2020), 54–63.
[4] 2019. HPC@POLITO. https://hpc.polito.it/legion_cluster.php.                         [24] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna
[5] Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. Segnet: A                 Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations
    deep convolutional encoder-decoder architecture for image segmentation.                  from deep networks via gradient-based localization. In Proceedings of the IEEE
    IEEE transactions on pattern analysis and machine intelligence 39, 12 (2017),            international conference on computer vision. 618–626.
    2481–2495.                                                                          [25] Dimitris Stavrakoudis, Thomas Katagis, Chara Minakou, and Ioannis Z Gitas.
[6] Yifang Ban, Puzhao Zhang, Andrea Nascetti, Alexandre R Bevington, and                    2019. Towards a fully automatic processing chain for operationally map-
    Michael A Wulder. 2020. Near Real-Time Wildfire Progression Monitoring                   ping burned areas countrywide exploiting Sentinel-2 imagery. In RSCy2019,
    with Sentinel-1 SAR Time Series and Deep Learning. Scientific Reports 10, 1              Vol. 11174. 1117405.
    (2020), 1–15.                                                                       [26] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia.
[7] Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam.                  2017. Pyramid scene parsing network. In Proceedings of the IEEE conference on
    2017. Rethinking atrous convolution for semantic image segmentation. arXiv               computer vision and pattern recognition. 2881–2890.
    preprint arXiv:1706.05587 (2017).                                                   [27] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jian-
[8] Alessandro Farasin, Luca Colomba, and Paolo Garza. 2020. Double-Step U-Net:              ming Liang. 2018. Unet++: A nested u-net architecture for medical image
    A Deep Learning-Based Approach for the Estimation of Wildfire Damage                     segmentation. In Deep Learning in Medical Image Analysis and Multimodal
    Severity through Sentinel-2 Satellite Data. Applied Sciences 10, 12 (2020), 4332.        Learning for Clinical Decision Support. Springer, 3–11.