=Paper= {{Paper |id=Vol-2744/paper49 |storemode=property |title=Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite Images for Monitoring Forest Pathology |pdfUrl=https://ceur-ws.org/Vol-2744/paper49.pdf |volume=Vol-2744 |authors=Evgeniy Trubakov,Olga Trubakova }} ==Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite Images for Monitoring Forest Pathology== https://ceur-ws.org/Vol-2744/paper49.pdf
    Analysis of the Influence of Vegetation Index Choice on
     the Classification of Satellite Images for Monitoring
                        Forest Pathology*

      Evgeniy Trubakov 1[0000-0002-8381-9737] and Olga Trubakova 1[0000-0003-4057-5362]
                  1 Bryansk State Technical University, Bryansk, Russia

               trubakoveo@gmail.com, trubakovaor@gmail.com



       Abstract. Rational use of natural resources and control over their recovery, as
       well as over destruction due to natural and technogenic causes, is currently one
       of the most urgent problems of the humanity. Forests are no exception. Multi-
       spectral images from Earth’s satellites are most often used for monitoring
       changes in forest planting. This is due to the fact that merging images taken in
       certain spectra makes it possible to recognize vegetation containing chlorophyll
       quite well. It also allows to detect changes in the level of chlorophyll, which
       shows the differences between healthy and damaged plants. Large areas of
       planted forests create the need to process huge amounts of data, which is difficult
       to do manually. One of the most important stages of image processing is the clas-
       sification of objects in these images. This paper deals with various classification
       methods used to solve the problem of classifying images of remote sensing of the
       Earth. As a result, it was decided to evaluate the accuracy of classification meth-
       ods on various vegetation indices. In the course of the study, the evaluation algo-
       rithm was determined, as well as one of the options for analyzing the results ob-
       tained. Conclusions were made about the work of classification methods on dif-
       ferent vegetation indices.

       Keywords: Remote Sensing of the Earth, Forest Pathology Monitoring, Vege-
       tation Indices, Image Processing, Methods of Image Classification.


1      Introduction

Today, wood remains a very valuable material in many industries, so deforestation has
become a profitable business. This often happens illegally, without control, without
taking into account the damage to forest plantings and the environment. Also, major
damage to the forest is caused by natural phenomena, such as droughts or windfalls,
forest pathologies such as tree diseases or insect pests, which is a bigger problem. In
addition, forest fires also cause great damage to forests, destroying more than a million

Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).

* Publication supported by RFBR grant № 19-07-00844
2 E.Trubakov, O. Trubakova


hectares of forest per year. For this reason, it is necessary to monitor the state of the
forest constantly [1].
    The main source of data for monitoring the state of forests is digital images obtained
by artificial earth satellites. Because of vast forest territories, it is necessary to track
dozens of images for one region, and taking into account their updating (for example,
for Sentinel-2satellite system every 2-3 days), the volume of processed information in-
creases tenfold [2].
    At the moment, the monitoring system operation can be divided into three parts. The
first part consists of selecting a suitable satellite image, which will be a reference. The
essence of this stage is to search for an image in which the region of interest will not be
blocked by interference, such as clouds, cloud shadows, and so on [3]. The next stage
is processing of space images. The problem while working with satellite images is that
the image is taken in different spectra that are difficult to be processed by humans, so
it is necessary to pre-process the image, i.e. to construct a vegetation index. Then, in
order to search for objects of interest in the image, we need to make a training sample,
classify the satellite image, and vectorize the classification results. After, the described
actions should be performed for another image obtained after a period of time for the
same territory. The final step is to compare the results of work on the reference image
and the new one. This algorithm is iterative and repeats throughout the vegetation sea-
son [4]. The complexity of the algorithm is that it is necessary to involve experts to
process images, since monitoring systems are not able to identify problem regions au-
tomatically.
     The relevance of this topic is due to the fact that forest monitoring involves checking
large amounts of data received from satellites. At the same time, most of the work is
performed manually and takes a long time, so it is necessary to execute some stages of
data processing semi-automatically or automatically. For example, the region for mon-
itoring may be blocked by clouds or other interference, but the operator will spend time
performing this step. Therefore, it is necessary to investigate methods for identifying
space images suitable for monitoring and automate this stage. And given that the de-
tection of forest pathologies by remote means is based on the fact that the stressed tree
is vegetatively dries out, a big problem is the shortest possible time to identify pathol-
ogies and eliminate them. Therefore, it is necessary to reduce the operator's inefficient
working time as much as possible.


2      Vegetation indices

Almost all satellite systems provide medium and high-resolution images in the form of
multispectral images. This feature of such images allows to select channels that provide
more information about the typical objects under study, i.e. cut off information about
extraneous objects from the image and emphasize the data for the task being solved.
The selected channels are combined according to certain rules, forming a single image.
This procedure is the first in space image processing, so it is performed on all images
      Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite… 3


used in monitoring [5]. Since it is necessary to recognize vegetation in images to mon-
itor the forest, specialized methods for merging image parameters – vegetation indexes
are used.
   Vegetation index is an indicator calculated as a result of operations with different
spectral data ranges (channels) of remote sensing, and it is related to vegetation param-
eters in a given pixel of the image.
   Let us consider the most common and well-established indices that are used in re-
search.


2.1    Normalized Difference Vegetation Index
Normalized Difference Vegetation Index (NDVI) is the most popular and frequently
used vegetation index, which takes positive values for vegetation, and the larger the
green phytomass, the higher the index is [6]. The index values are also affected by the
species composition of vegetation, its closeness, state, exposure, the angle of the sur-
face, and the color of the soil under thinned vegetation. NDVI is often used as one of
the tools for conducting complex types of analysis, which can result in maps of forest
and agricultural productivity, maps of landscapes and natural zones, soil, arid, phyto-
hydrological, phenological and other ecological and climatic maps.
   The index is calculated using the formula:
                                          NIR − RED
                               NDVI =               ,                                     (1)
                                          NIR + RED
where NIR is the pixel value in the near-infrared region; RED stands for the pixel value
in the red region. The NDVI itself varies between -1.0 and +1.0.


2.2    Infrared Percentage Vegetation Index
Infrared Percentage Vegetation Index (IPVI) in contrast to NDVI does not require sub-
tracting the red component from the numerator, which makes this index faster regarding
calculations [7].
   The index is calculated using the formula:
                                            NIR
                                IPVI =             ,                                      (2)
                                         NIR + RED
where NIR is the pixel value in the near-infrared region; RED stands for the pixel value
in the red region. The index varies between 0 and 1.


2.3    Atmospherically Resistant Vegetation Index
Atmospherically Resistant Vegetation Index (ARVI) was developed by Kaufman and
Tanre [8]. This index is an improved NDVI, used to correct the influence of the atmos-
phere. It is most useful in regions with high atmospheric aerosol content, including
tropical areas contaminated with soot.
4 E.Trubakov, O. Trubakova


    The index is calculated using the formula:
                                        NIR − Rb
                              ARVI =             ,                                  (3)
                                        NIR + Rb
where Rb = RED − α ∗ (RED − BLUE), as a rule, α = 1 (if there is small vegetation
covering and unknown type of atmosphere α = 0.5); NIR is the pixel value in the near-
infrared region; RED stands for the pixel value in the red region; BLUE is the pixel
value in the blue region. The index varies between -1 and 1.


2.4     Enhanced Vegetation Index
Enhanced Vegetation Index (EVI) is an optimized vegetation index NDVI, when as-
sessing the state of plants, it has advantages, since the influence of soil and atmosphere
in the values of this index is minimized [9]. The index allows to assess the state of
plants, both in the conditions of dense and thinned vegetation covering.
   The index is computed following this equation:

                               NIR – RED
             EVI =                                  ∗ (1 + L),                      (4)
                     NIR + C1 ∗ RED − C2 ∗ BLUE + L

where BLUE stands for the pixel value in the blue region; RED is the pixel value in the
red region; NIR is the pixel value in the near-infrared region; coefficients C1, C2 and L
empirically defined as equal to 6.0, 7.5 and 1.0 respectively. The index varies between
-1 and 1.


2.5     Soil-Adjusted Vegetation Index
Soil-Adjusted Vegetation Index (SAVI) is a vegetation index that tries to minimize the
impact of soil brightness by using a soil brightness correction factor [10].
  The index is calculated using the formula:
                                 NIR − RED
                      SAVI =                 ∗ (1 + L) ,                            (5)
                               NIR + RED + L

where NIR is the pixel value in the near-infrared region; RED stands for the pixel value
in the red region; L is a canopy background adjustment factor. The index varies be-
tween -1 and 1.


3       Image classification

The next stage of monitoring, after creating the vegetation index, is the search for ob-
jects in the image - classification of the image. Currently, the most commonly used
approach for topical processing is relative classification, based on widely used multi-
spectral images and additionally collected data, which are necessary to establish a cor-
respondence between groups of pixels with similar characteristic values and classes of
      Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite… 5


the Earth's surface. This data can be collected as a result of field studies, and more
limited in comparison with classical field methods, since classes must be identified only
for a small number of pixels [11].
   There are two types of relative classification: supervised classification (with train-
ing) and unsupervised classification (without training).
   The essence of the supervised classification is to assign each of the image pixels to
a specific class of objects on the ground, which corresponds to a certain area in the
characteristics space.
   Supervised classification includes several stages. The first step is to determine which
object classes will be allocated as a result of the entire procedure. These may include
vegetation types, agricultural crops, forest species, hydrographic objects, and so on. At
the second stage, typical pixels are selected for each of the object classes, i.e. a training
sample is formed. The third stage is the calculation of parameters, the "spectral image"
of each of the classes formed as a result of a set of reference pixels. The set of parame-
ters depends on the algorithm that is supposed to be used for classification. The fourth
stage of the classification procedure is to view the entire image and assign each pixel
to a particular class. The result of this stage is an image (classification map), as well as
a table that gives the coordinates of the pixel and the name of the class it belongs to.
   Unsupervised classification is based on a fully automatic distribution of pixels into
classes based on statistics of pixel brightness distribution. This type of classification is
used if it is initially unknown which objects are present in the image, or if the number
of objects is large. As a result, the machine itself gives the resulting classes.
   Let us consider the most common classification methods used in researches.


3.1    Minimum distance method
This method is used when spectral characteristics of different classes are similar, and
the ranges of their brightness overlap. In the classification the method of minimum
brightness of pixels is used to consider a vector in the space of spectral characteristics.
Spectral distance between the reference vectors and vectors of brightness of all image
pixels is calculated, then pixels are distributed into classes, if the distance from this
vector to the reference one is less than a predetermined value (which is set in advance),
then this vector is referred to this class. If the distance is greater than the specified value,
it is referred to another class, or it does not belong to any of the classes.
    Minimum distance calculates the spectral distance between the pixel vector and the
average vector for each signature.
Euclidean distance. Euclidean distance is a common distance function. It represents a
geometric distance in a multidimensional space:
                                          n
                               E = √∑           |t i − xi |2 ,                            (6)
                                          i=1

where n is the number of ranges; i is a certain range; t is an unknown spectrum; x is a
reference spectrum; E is Euclidean distance.
6 E.Trubakov, O. Trubakova


   Manhattan distance. Manhattan distance is the distance which is the average of the
differences in coordinates. In most cases, this measure of distance leads to the same
results as for the usual Euclidean distance. However, for this measure, the impact of
individual large differences is reduced (because they are not squared). Formula for cal-
culating Manhattan distance is the following:
                                       n

                                M = ∑|t i − xi |,                                    (7)
                                      i=1
where n is the number of ranges; i is a certain range; t is an unknown spectrum; x is a
reference spectrum; M is Manhattan distance.
   The disadvantage of this method is that it does not take into account the distribution
(dispersion) of the pixel brightness in the reference areas. This can lead to errors during
classification.


3.2    Method of spectral angle
Classification by the method of spectral angle is used to compare the spectral charac-
teristics of an image with the spectral characteristics of references. The algorithm de-
termines the proximity between these two characteristics by calculating the spectral
angle between them. To do this, they are represented as vectors in n-dimensional space,
where n is the number of spectral channels.
   Since the method of spectral angle uses only the direction of vectors, it is not sensi-
tive to the absolute brightness of pixels, since it is the length of the vector that deter-
mines the measure of their brightness. All possible brightness levels are treated in the
same way, since pixels with lower brightness are simply located closer to the origin of
coordinates of the scatterplot. The color of pixels corresponding to their class in the n-
dimensional characteristics space is determined by the direction of their radius vectors.
   The following formula is used to calculate the spectral angle:
                                                ⃗t ∗ x⃗⃗
                             α = cos −1 (                ),                          (8)
                                            ‖t⃗‖ ∗ ‖x⃗⃗‖
where α is the spectral angle between vectors x and t; t is an unknown spectrum; x is a
reference spectrum.
   The expression can also be represented as:

                                      ∑nbi=1 t i ∗ xi
                     α = cos −1 (          1               1) ,                      (9)
                                 (∑nb t
                                   i=1 i
                                        2 2
                                         )   ∗  (∑ nb 2 2
                                                      x
                                                   i=1 i )
where nb is the number of image spectral channels.
      Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite… 7


4        Assessment of classification accuracy

An important step of the classification is to assess the accuracy of the results obtained.
This assessment is performed by comparing the image resulting from the classification
with field measurement data and other data, such as data of relevant thematic maps.
These materials are called reference data. This comparison is possible because each
pixel in the resulting image has geographical coordinates, and it is possible to compare
the type of surface that the pixel belongs to as a result of classification with the actual
surface type known from other sources. The accuracy of classification is assessed by
comparing the classification result with reference data, which are thematic maps, a set
of points studied in the field, etc. Points are selected on the resulting classification, and
the corresponding points on the reference data are considered. The comparison results
are recorded into a table called the matrix of errors (table 1). It contains the number of
right (located on the diagonal) and wrongly classified points [12].
   The reliability of the obtained assessments of classification accuracy is achieved by
selecting a sufficient number of points for each of the classes obtained during classifi-
cation. In the best case, each point of the classification result is compared with the ref-
erence data.
   If we add the diagonal elements (correctly recognized image points) and divide this
number by the total number of points involved in the assessment, we get the overall
classification accuracy. For each class, there are two values: the ratio of correctly rec-
ognized pixels either to the line sum (the number of points in this class) or to the column
sum (the number of points in the reference data). A user error is a value that indicates
the probability that a point marked as class 2 on the classification result is actually class
2 point. Kappa parameter is also calculated based on the matrix of errors. This param-
eter compares the number of pixels in each of the matrix cells with the possibility of
distributing pixels as a random variable.

                                     Table 1. Matrix of errors
                                              Classes according to reference     Number of
 Classes                                                   data                reference pix-
                                                 Class 1            Class 1          els

 Classes in classifica-    Class 1                  a                 b               e
 tion results              Class 2                  c                 d               f
 Total                                             a+c               b+d             e+f
    Kappa parameter is defined as follows:
                               N ∗ ∑m             m
                                    i=j=1 Dij − ∑i=j=1 R i ∗ Cj
                          κ=                                    ,                          (10)
                                    N 2 − ∑mi=j=1 R i ∗ Cj

where κ is Kappa parameter, N stands for the number of image pixels, m is the total
number of classes, ∑ Dij stands for the sum of diagonal elements of the error matrix
(the sum of correctly classified pixels of the whole image), R i is the total number of
8 E.Trubakov, O. Trubakova


pixels in i-line (pixel sum in i-line), Cj is the total number of pixels in j-column (pixel
sum in j-column).
    Kappa statistics can be calculated for each selected class. For a qualitative assess-
ment of map matching based on Kappa statistics the following ratios are used: poor and
very poor matching if κ<0.4, satisfactory if 0.4<κ<0.55, good if 0.55<κ<0.7, very good
if 0.7<κ<0.85, and excellent if κ<0.85.


5      Results

At the initial stage of the classification with training of the satellite image, it is neces-
sary to identify all classes of the underlying surface that are present in this territory.
The task of classification research was to identify deforestation.
   The classification was performed using three methods: the minimum distance
method, which uses Euclidean distance, the minimum distance method, which uses
Manhattan distance, and the spectral angle method. NDVI, IPVI, ARVI, EVI, SAVI
indices were used as vegetation indices for preprocessing of satellite images.
   As a result of the classification of the image fragment, four types of underlying sur-
face (classes) are defined: deforestations (red), coniferous forests (dark green), decidu-
ous forests (light green), lakes (blue).
   The result of the classification methods on the selected vegetation indices is shown
in table 2.
   After receiving the results, the classification accuracy was assessed. Accuracy was
evaluated using the matrix of errors and Kappa statistics.
   An image provided by experts was used as reference data. A matrix of classification
errors was formed for deforestation class (Table 3). Coniferous forests, deciduous for-
ests, and lake classes were combined into one class-background. Deforestations were
defined into a separate class.
   The following conclusions were made for a qualitative assessment of map matching
based on the results of Kappa statistics:
• To detect deforestation, the minimum distance method (Euclidean distance), the
   minimum distance method (Manhattan distance), and the spectral angle method
   showed excellent classification results, using the following indices as the vegetation
   ones: NDVI, ARVI, EVI.
• For IPVI and SAVI indices, only two methods showed excellent results: the mini-
   mum distance method (Euclidean distance), and the minimum distance method
   (Manhattan distance).
• The spectral angle method performed poorly for IPVI vegetation index. And very
   good, but not excellent it performed for SAVI.
       Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite… 9


             Table 2. Result of classification methods on various vegetation indices
              Minimum distance             Minimum distance
VI           method (Euclidean dis-      method (Manhattan dis-        Spectral angle method
                     tance)                      tance)




NDVI




IPVI




ARVI




EVI




SAVI
10 E.Trubakov, O. Trubakova



                             Table 3. Matrix of classification errors
               Minimum distance       Minimum distance
               method (Euclidean      method (Manhattan Spectral angle method
                   distance)               distance)                               Number of
  Classes                                                                          reference
              Reference data classes Reference data classes Reference data classes   pixels
                           Deforesta-            Deforesta-            Deforesta-
              Background              Background            Background
                             tion                  tion                  tion
                                               NDVI
Background      10486        111       10486          111      10481      116       10597
 Deforesta-
                 133        1370         138          1365      118      1385       1503
    tion
     ∑          10619       1481       10624          1476     10599     1501       12100
                                               IPVI
Background      10485        112       10468          129       9458     1139       10597
 Deforesta-
                 196        1307         161          1342      640       863       1503
    tion
     ∑          10681       1419       10629          1471     10098     2002       12100
                                               ARVI
Background      10526         71       10521           76      10522      75        10597
 Deforesta-
                 104        1399         94           1409       96      1407       1503
    tion
     ∑          10630       1470       10615          1485     10618     1482       12100
                                               EVI
Background      10547         50       10547           50      10545      52        10597
 Deforesta-
                 78         1425         78           1425       78      1425       1503
    tion
     ∑          10625       1475       10625          1475     10623     1477       12100


   By the results of Table 4 Kappa statistics was calculated. Table 3 gives the calcula-
tion results.
    Analysis of the Influence of Vegetation Index Choice on the Classification of Satellite… 11


                                    Table 4. Kappa statistics

                                                    Kappa statistics
     Index name            Minimum distance        Minimum distance
                                                                               Spectral angle
                           method (Euclidean       method (Manhattan
                                                                                  method
                               distance)               distance)
        NDVI                      0.9                     0.9                       0.91
        IPVI                     0.88                    0.88                        0.4
        ARVI                     0.93                    0.94                       0.93
         EVI                     0.95                    0.95                       0.95
        SAVI                      0.9                    0.89                       0.75


6      Conclusion

The paper analyzes monitoring of forest pathologies. The necessity to automate some
stages of the forest monitoring algorithm was identified. Empirical research was con-
ducted for using vegetation indices and methods of classification of forests on space
images.
   The research reveals the relationship between the choice of vegetation index and the
classification method. Depending on the area under study, it is offered to use the nec-
essary index (for example, in areas with tropical climate, it is better to use an index that
takes into account high air humidity (ARVI), etc.) and the proposed appropriate classi-
fication method to improve the effectiveness of the results.


References
 1. Forest code of the Russian Federation as amended on December 27, 2018 (part 4, arti-
    cle 60.5).
 2. Earth Observing System. Sentinel-2 Homepage, https://eos.com/sentinel-2/c, last accessed
    2020/05/15.
 3. Trubakov, E., Trubakov, А., Korostelyov, D., Titarev, D. Selection of Satellite Image Series
    for the Determination of Forest Pathology Dynamics Taking Into Account Cloud Coverage
    and Image Distortions Based on the Data Obtained from the Key Point Detector. Proceed-
    ings of the 29th International Conference on Computer Graphics and Vision, Moscow, pp.
    159-163 (2019). DOI: 10.30987/graphicon-2019-2-159-163
 4. The order of April 5, 2017 N 156 «On approval of the state forest pathology monitoring
    procedure».
 5. Showengerdt, R. Remote sensing. Models and methods of image processing. M., 2010.
    560 p.
 6. Pettorelli, N., Vik, J. O., Mysterud, A., Gaillard, J.-M., Tucker, C. J., Stenseth, N. C. Using
    the satellite-derived NDVI to assess ecological responses to environmental change. Trends
    in Ecology and Evolution. 2005. Vol. 20. P. 503–510. DOI: 10.1016/j.tree.2005.05.011
 7. Crippen, R. E., Calculating the Vegetation Index Faster. Remote Sensing of Environment.
    vol 34. pp. 71-73 (1990).
 8. Kaufman, Y. J., Tanre D. Atmospherically resistant vegetation index (ARVI). Proc. IEEE
    Int. Geosci. and Remote Sensing Symp, IEEE, New York, pp. 261-270 (1992).
12 E.Trubakov, O. Trubakova


 9. Skakun, R.S., Wulder, M.A., Franklin, S.E. Sensitivity of the thematic mapper enhanced
    wetness difference index to detect mountain pine beetle red-attack damage. Remote Sensing
    of Environment. vol. 86. pp. 433-443 (2003).
10. Mozgovoy, D.K., Kravets, O.V. Using multispectral images for classification of agricultural
    crops. Ekologiya I Noosphera (1-2), - 54-58 (2019).
11. Oreshkina, LV, Shidlovsky, Comparison, AV, Kovalenok, V.G. Comparison of classifica-
    tion methods for multi-zone satellite images. Proceedings of the Second Belorussia Space
    Congress. 25-27 October, Minsk, Belarus. OIPI NAS of Belarus. 205-208 s (2015).
12. Foody, G.M. Status of land cover classification accuracy assessment. Remote Sensing of
    Environment (80), pp. 185-201 (2002).