=Paper=
{{Paper
|id=Vol-3041/251-255-paper-46
|storemode=property
|title=Detection of Fertile Soils Based on Satellite Imagery Processing
|pdfUrl=https://ceur-ws.org/Vol-3041/251-255-paper-46.pdf
|volume=Vol-3041
|authors=Valery Grishkin,Evgeniy Zhivulin,Anastasiia Khokhriakova,Sardor Karimov
}}
==Detection of Fertile Soils Based on Satellite Imagery Processing==
<pdf width="1500px">https://ceur-ws.org/Vol-3041/251-255-paper-46.pdf</pdf>
<pre>
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


   DETECTION OF FERTILE SOILS BASED ON SATELLITE
               IMAGERY PROCESSING
            V. Grishkin1,a, E. Zhivulin1, A. Khokhriakova1, S. Karimov1
  1
      Saint Petersburg State University, 7–9 Universitetskaya nab., Saint Petersburg, 199034,
                                              Russia

                                  E-mail: a valery-grishkin@yandex.ru
The paper proposes a method for detecting fertile soils based on the processing of satellite images. As
a result of its application, a map of the location of fertile and infertile soils for a given region of the
earth's surface is formed and the corresponding areas are calculated. The method for detecting fertile
soils is based on the fact that fertile soil includes areas covered with vegetation in the spring-summer
period. Therefore, by measuring the spectral characteristics of these areas in the late autumn period,
when there is no vegetation on them, it is possible to obtain objective parameters of fertile soils. For
detection, a number of classifiers are being built that recognize two classes - fertile soil and sand,
which is especially important when monitoring areas prone to desertification. The feature vector used
for classification is a set of indices similar to the well-known NDVI index. This set of indices is
calculated for each pixel of the image by its values in different spectral channels. Classifiers are
implemented using CUDA parallel computing technology on a GPU. Based on the results of the
experimental study, a classifier is selected that has shown the best characteristics of the recognition
quality.

Keywords: satellite image segmentation, multispectral images, cene classification


                         Valery Grishkin, Evgeniy Zhivulin, Anastasiia Khokhriakova, Sardor Karimov


                                                             Copyright © 2021 for this paper by its authors.
                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                   251
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


1. Introduction
        Currently, there is a problem - land desertification. To effectively combat this, it is necessary
to know the situation in a particular region at the current moment in time, as well as to be able to track
the dynamics of desertification. One way to do this is to monitor the area using satellites. This method
allows you to cover the entire territory, while providing sufficient accuracy for such a scale. In
addition, it does not require large financial costs and allows you to collect and process information
about the state of the soil over a long period of time.
         When monitoring the state of land, it is necessary to distinguish between fertile and barren
soil. Since land desertification occurs mainly due to the onset of sand, this paper considers a method
that detects areas occupied by fertile soils and sandy areas on satellite images. We propose a method
for detecting fertile soils in specified regions based on processing multispectral satellite images. These
images from various satellites are publicly available on their respective websites. In this work, we use
open data obtained by the Sentinel satellites within the European Copernicus program [1].


2. Data acquisition and preprocessing
         Data of multispectral imagery from Sentinel-2 satellites and the results of their preprocessing
are used as input images. These data are freely available on the Copernicus Open Access Hub [2].
Images of the earth's surface for the region of interest are taken at intervals of 5 days. The surface
survey from satellites is carried out in various spectral ranges with a resolution of at least 20 by 20
meters in one image pixel. Table 1 shows a list of spectral and additional channels used to detect
fertile and sandy soils.
                                      Table 1. Wave range and maximum resolution of the channels used.
                 Channel        Description                  Wavelength     Resolution
                 B02            Blue                          492.4 nm      10 m
                 B03            Green                         559.8 nm      10 m
                 B04            Red                           664.6 nm      10 m
                 B05            Vegetation red edge           704.1 nm      20 m
                 B06            Vegetation red edge           740.5 nm      20 m
                 B07            Vegetation red edge           782.8 nm      20 m
                 B08            NIR                           832.8 nm      20 m
                 B08A           Narrow NIR                    864.7 nm      20 m
                 B09            Water vapor                   945.1 nm      20 m
                 B11            SWIR                         1613.7 nm      20 m
                 SNW            Snow probability                     -      20 m
                 CLD            Cloud probability                    -      20 m
                 SCL            Scene classification data        -          20 m

         Additional channels contain the results of preprocessing images captured in the Sentinel data
center. It should be noted that for the detection of fertile and sandy soils, we use only 13 channels out
of 23 available.
        Channels B02, B03, B04 represent reflected light in the optical range. Channels B05, B06, and
B07 represent infrared radiation. Two channels B08 and B08A belong to the near infrared range. The
next spectral range B09 is associated with reflections from water vapor. Then three channels follow,


                                                    252
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


reflecting the results of data processing by the Sentinel-hub service itself. The first one (SNW) is the
probability that a given pixel in the image is part of a snow-covered area, and the second one (CLD) is
the probability that it belongs to a cloud cover. The last of these channels (SCL) is an image mask
describing the belonging of each pixel to one of the 12 classes. Among these classes are the following:
vegetation, bare soil, water, clouds, cloud shadows, snow.
        Interaction with the Copernicus Open Access Hub website is carried out using requests that
are supported by the corresponding Sentinel-hub service. The request specifies the date range (the
service provides the latest snapshot from the specified range), the coordinates of the area of interest,
the size of the resulting image in pixels, the list of channels from which the data is requested, and the
view in which it will be loaded. The size of the requested area is chosen equal to 10 x 10 km, which
corresponds to the dimensions of the resulting image equaling to 512 x 512 pixels when using a
channel resolution of 20 meters.
        Before loading data, it is possible to pre-process it directly on the site by performing
arithmetic operations on it. These operations allow for normalizing the values in
different channels. Data from channels is requested in the form of standard PNG images. Each image
includes 3 channels of data displayed as red, green and blue.
         Downloading the data required for exploring large areas also requires a large number of
downloads. Therefore, a specially developed script is used to load data. This script generates a request
for satellite data for a specified location within a specified date period. When requesting data from the
service, the access key is transmitted which has been obtained earlier. This key is updated hourly and
can be obtained as a result of another request to the service. Therefore, when exchanging with the hub,
the size of uploaded images is checked, and if it is too small, this indicates that the service did not
provide data, therefore, you need to request a new key is to be requested, and the script takes care of
that. For interaction with the site, the standard cURL utility [3] is used, which is part of almost all
operating systems. After the request is generated, the script is passed to this utility as a parameter.

3. Fertile soil detection
         The algorithm for detecting fertile soils is based on a fairly simple idea of comparing images
of a region in the spring-summer and late autumn periods. The first period was chosen because it is in
this period that the vegetation cover is the strongest and is well determined from satellite images using
standard methods. The second period is characterized by almost no vegetation, and in these images,
the surface that was covered with vegetation looks like soil. Thus, comparing these images, one can
draw conclusions about the location and presence of productive soils in the observed area, since the
soil on which nothing grows is either degraded or unsuitable for agricultural purposes.
         The input data contains, in addition to spectral images, the results of preliminary classification
of images of the region, obtained by processing the data of the Copernicus Open Access Hub [2]. This
classification results are tied to each requested area of the earth's surface and are represented as a pixel
mask Mscl of the image of this area. The values of each pixel in this mask correspond to specific types
of ground surface or types of clouds. This classification mask is used to segment areas containing
agricultural soils. These areas are covered with vegetation in spring and summer. Therefore,
segmenting such areas in the images of the late autumn period, we will obtain data on the distribution
of fertile soils. Thus, it is necessary to form two new masks from the original summer mask. The
Msummer_veg mask represents the vegetation cover only, while the Msummer_soil mask represents the soil.
Two similar masks are also formed out of the autumn classification mask. The first Mautumn_soil mask
represents the soil cover, and the second Mautumn_veg mask represents the vegetation cover. The Mgoodsoil
fertile soil mask is calculated by applying a bitwise AND to the summer vegetation mask and the
autumn soil distribution mask.
                          Mgoodsoil = Msummer_veg & Mautumn_soil                                        (1)
The second mask contains additional information about the location of the fertile soil hidden under the
vegetation. This mask can be used to obtain a complete mask of fertile soil Mf goodsoil.
                      Mf_goodsoil = Mgoodsoil | (Msummer_soil & Mautumn veg)                            (2)


                                                      253
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


         Figure 1 shows satellite images of the same region in the visible range in the spring-summer
and autumn periods, the corresponding classification masks and the resulting masks of fertile soils.
The last mask shows the distribution of fertile soils in the area not covered by clouds, and it is used
later for marking when training the recognition system.


              (a)                                    (b)                                (c)


               (d)                              (e)                                 (f)
      Figure 1. Images of the region and corresponding masks: (a) image of the region – August;
(b) classification mask – August; (c) image of the region - October; (d) classification mask – October;
(e) the fertile soil mask without autumn vegetation; (f) the complete fertile soil mask; – vegetation;
                                        – bare soil; – clouds
4. Experimental results
       For training and testing the classifiers for recognition fertile soils, a set of multispectral images
and calculated masks of fertile soils were used. This dataset was formed from 379 pairs of sets of
combined spectral images of subregions. Each pair of sets for the same subregion consists of four
combined images taken in August and four combined ones taken at the end of October. For these pairs
of images, masks of fertile soils were calculated, which were also included in the dataset.
         We investigated the following classifiers – Bayesian classifier, Random forest, and SVM
classifier [4-6]. All these classifiers are based on heuristic features. We used a vector of normalized
differential indices as a feature vector. Each index is calculated based on the values of two different
spectral channels. The calculation of the corresponding index Vij was carried out similarly to the
procedure for calculating the known vegetation index NDVI.
                                 Vij = (Bi - Bj ) / (Bi + Bj )                                              (3)
         As a result of the experiments, the best accuracy was achieved with the dimension of the
feature vector equal to 8. In this case, the following indices were used: V47; V46; V89; V38; V92; V01,V48;
V57. These indices are calculated using the following channel pairs: (B06 B8A), (B06 B08), (B09
B11), (B05 B09), (B8A B04), (B02 B03), (B06 B09), (B07 B8A). The results of soil type recognition
for the studied classifiers are shown in Table 2.
                                                                   Table 2. The results of soil type recognition
                    Classifier          Precision of determining      Precision of determination
                                               fertile soil                    of sand
            Bayesian                              0.834                         0.933
            Random forest                         0.945                         0.958
            SVM                                   0.929                         0.963


                                                           254
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


4. Conclusion
         We propose a method for detecting fertile soils in specified regions based on processing
multispectral satellite images. The data required for detection is requested during interaction with the
hub service for accessing remote database. In addition to spectral images, we also use the results of
their preprocessing, which are also requested from this database. The feature vector used for
classification is a set of indices similar to the well-known NDVI index. The paper presents
experimental estimates of the recognition quality for the classifiers under study, which are used to
detect fertile soils. Based on the analysis of the estimates obtained, the choice of the type of classifier
for the detection of fertile soils is made.


References
[1] ESA Earth Observation Portal. Available at: https://directory.eoportal.org/web/eoportal/ satellite-
missions/c-missions/copernicus-sentinel-2/ (accessed 30. 06. 2021)
[2] Copernicus Open Access Hub. Available at: https://scihub.copernicus.eu/dhus/ (accessed 02. 07.
2021)
[3] Command line tool and library for transferring data with URLs. Available at: https://curl.se/
(accessed 14.05.2021)
[4] Cutler A., Cutler D.R. and Stevens J.R. Random Forests // Ensemble Machine Learning: Methods
and Applications, pp. 157-176, Chapter5, Springer, 2011
[5] Cristianini N, Shawe T. J. An Introduction to Support Vector Machines and other kernel-based
learning methods. Cambridge University Press, UK: Cambridge, 2000
[6] Domingos P., Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss
// Machine Learning, V. 29 (2/3), pp. 103–137, 1997. - DOI:10.1023/A:1007413511361.


                                                   255

</pre>