=Paper=
{{Paper
|id=Vol-3006/19_short_paper
|storemode=property
|title=Selection of features system and network parameters for hyperspectral images classification using convolutional neural networks
|pdfUrl=https://ceur-ws.org/Vol-3006/19_short_paper.pdf
|volume=Vol-3006
|authors=Victor I. Kozik,Evgeniy S. Nezhevenko
}}
==Selection of features system and network parameters for hyperspectral images classification using convolutional neural networks==
<pdf width="1500px">https://ceur-ws.org/Vol-3006/19_short_paper.pdf</pdf>
<pre>
Selection of features system and network parameters
for hyperspectral images classification using
convolutional neural networks
Victor I. Kozik1 , Evgeniy S. Nezhevenko1
1
    Institute of Automation and Electrometry of SB RAS, Novosibirsk, Russia


                                         Abstract
                                         A classification system for hyperspectral images using convolutional neural networks is described. A
                                         specific network was selected and analyzed. The network parameters, ensured the maximum classification
                                         accuracy: dimension of the input layer, number of the layers, size of the fragments into which the classified
                                         image is divided, number of learning epochs, are experimentally determined. High percentages of correct
                                         classification were obtained with a large-format hyperspectral image, and some of the classes into
                                         which the image is divided are very close to each other and, accordingly, are difficult to distinguish by
                                         hyperspectra.

                                         Keywords
                                         Hyperspectral images, convolutional neural networks, deep learning, principal components, probability
                                         of correct classification.


1. Introduction
Classification of the land areas is gaining in importance for a wide variety of applications, and
one of the most effective systems of classification features is hyperspectral data. It is known that
the greatest advances in the field of recognition in recent years have been obtained using deep
learning and convolutional neural networks. This report examines exactly this problem. The
most important question in this case — what features to use at the input of the neural network.
Earlier it was shown that a high probability of correct classification is possible only using
spatial-spectral features. The dimension of the input image of convolutional neural network is
limited; therefore, a shortened system of features — the principal components — is formed from
the spectral components. Spatial features are obtained by forming fragments from the resulting
system of spectral features, and the methods of this formation significantly affect the quality of
the classification. The analysis of these methods is the main subject of research in this report.


2. Description of the analyzed object
The object that will be investigated in this report has appeared in many publications (Figure 1) [1,
2, 3]. The reason for this is its unique properties: it is a satellite image of a sufficiently large size

SDM-2021: All-Russian conference, August 24–27, 2021, Novosibirsk, Russia
" kozik@iae.nsk.su (V. I. Kozik); nejevenko@iae.nsk.su (E. S. Nezhevenko)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                         152
Victor I. Kozik et al. CEUR Workshop Proceedings                                         152–160


              Figure 1:                                  Figure 2:


(1408×614 pixels), pixel size is 20 m, and each pixel is characterized by 220 spectral components
in the range of 0.4–2.5 𝜇m. A hyperspectral image of the site obtained within the framework of
the AVIRIS program (Airborne Visible Infrared Imaging Spectrometer) at the Indian Pice test
site (Indiana, USA) [4]. Figure 2 shows in pseudo colors the markup of this GSI into classes.
There are 57 classes in total. However, the specificity of the spatial processing method we have
chosen is such that in some areas classification cannot be formed due to the small size of these
areas.


3. Convolutional network used for classification
Works in the direction of using neural networks for GSI classification have already been carried
out [5, 6, 7], in our report another object is processed and a different method of element
extraction is used. Currently, a huge number of networks have been published, designed to
classify a wide variety of objects.
   The use of networks pre-trained on millions of data is recommended. However, we have a
special case. Our training and recognizable images are small terrain fragments that cover the
marked (i.e. classified) areas of the GSI. Therefore, we will use a neural network that is not too
complex and without such layers as Max Pooling, Dropout, etc. [8]. This network is shown in
Figure 3. The network contains an input layer, convolutional layers, and a fully connected layer.


                                                   153
Victor I. Kozik et al. CEUR Workshop Proceedings                                           152–160


Figure 3:


We will not describe the functioning of the network, this is submitted in detail in the literature.
Let’s define the parameters of the network, and the most important of them is the character of
the input signal. As such, a cube 𝑀 ×𝑁 ×𝐹 is selected, where 𝑀 ×𝑁 is the size of the fragment
cut out from the 1408×614 image and shifted throughout this image, and 𝐹 is the number of
spectral features characterizing each pixel. As mentioned in the description of the object, the
number of spectral components is 220, however, there are highly correlated components among
them. As it’s known from the theory of pattern recognition, using of correlated features reduces
the correctness of recognition, therefore, for effective recognition, as a rule, decorrelation of
features is carried out.
   The most effective way to do this is by converting the array of spectral features to the
principal components. The number of principal components will be determined by analyzing
the “rocky talus” — the graph of eigenvalues decreasing. It is presented in [1]. It can be seen
that already the 5th eigenvalue is 1/500 of the first value, this means that it accounts for 0.2% of
the variance of the spectral components, therefore, most of the experiments will be carried out
with the number of principal components equal to 5. Thus, based on the foregoing above, our
classification system is a 3D convolutional neural network, the dimension of the input layer
is 5, the dimension of the input signal is 𝑀 × 𝑁 × 5, total number of layers is 13. Subsampling
is not used in our network, since the classified images are already relatively small size. The
dimension of the output layer is equal to the number of classes.
   The most important role is played the MxN parameter, the dimension of the fragment cut out
from the input layer. Too small size of the fragment will not allow revealing its spatial features.
A large size of fragments reduces their number in the class, since the areas belonging to the
classes have an arbitrary shape, as a rule, curvilinear, and the fragments are rectangular, so that
too few (and sometimes no one) fragments fall on some classes. Thus, it is necessary to find a
compromise between the size of fragments and their number in an area belonging to one class,
and this is the main theme of this report.
   Let us explain how the training sample is formed in our case. Its elements are fragments of
the GSI, divided into marked areas. Each area contains its own number of elements, depending
on the size of the section and its configuration. When forming a sample, all GSI is covered with
square fragments of a given size, and if all the pixels of a fragment belong to the same class,
this fragment is considered as an element of the corresponding class.


                                                   154
Victor I. Kozik et al. CEUR Workshop Proceedings                                          152–160


4. GSI classification experiments
The sequence of classification stages.
   1. The principal components of the GSI are calculated.
   2. Directories of classes from 1 to 57 are formed.
   3. From the file containing the GSI separation into classes (Figure 2), using a sliding window
      of size 𝑀 × 𝑁 and shift_M, shift_N, fragments are selected, all elements of which belong
      to the same class. Classes, the number of fragments of which exceeds the specified
      threshold, participate in training.
   4. Network parameters are adjusted: number of layers, kernels size, number of feature maps,
      number of classes.
   5. Parameters of the training procedure are adjusted: numbers of classes, number of training
      epochs, objects of each class are divided into training and validation samples (as a rule, in
      a ratio of 7 : 3).
   6. Training procedure is started.
   Trained network is visualized in Figure 4.
   The number of weights adjusted as a result of training is 158184. There are 3 convolution
layers in the network, 3 layers of normalization (batchnorn layers), which speed up the learning
procedure; three activation layers (ReLu layers), which perform nonlinear transformation, and
softmax and classoutput layers, provided the recognition procedure. Using the input layer
(imageinput), training and recognizable images are introduced.


Figure 4:


                                                   155
Victor I. Kozik et al. CEUR Workshop Proceedings                                          152–160


   Let’s consider the results of experiments. Note, that the only criterion of the classification
effectiveness is the classification accuracy, which is defined as the ratio of correctly objects
classified number to the total number of objects (this term — accuracy — is used along with
the term “probability of correct classification”). Let us note the feature of our method of a
training sample forming, training and classification. With different sizes of fragments into
which sections of classes are divided, and limiting the number of elements in a class, the number
of classes will be different, which will not allow determining the actual dependence of the
classification accuracy on the size of fragments, since two factors affect here: the size of the
fragment and the number of classes. Therefore, we calculated the number of classes (18) for
the maximum fragment size of 16x16 and then trained the network for all fragments with this
number of classes. The dependence of the classification accuracy on the fragments size with
number of classes 18 is shown in Figure 5.
   It can be seen from this graph that the optimal fragment size is near 14×14.
   A very important factor affecting the classification accuracy is the dimension of the input
layer, which is equal to the number of principal components used at the input. This dependence
is presented in Table 1.
   It can be seen that, starting with 5 principal components, the classification accuracy increases
insignificantly.
   A very important parameter of the network is the number of learning epochs. Dependence
of classification accuracy on this parameter for a fragment 14×14 and the number of principal
components — 5 is shown in Figure 6.
   The classification accuracy monotonically increases with the number of epochs, taking a
sharp jump from 20 to 30 epochs, although this function depends on the size of the fragments.


Figure 5:


Table 1
                                      PCA number         Accuracy
                                            1            0.253509
                                            5            0.995038
                                           10            0.997448
                                           20             0.99844


                                                   156
Victor I. Kozik et al. CEUR Workshop Proceedings         152–160


                                             Figure 7:
Figure 6:


Figure 8:


Figure 9:


                                                   157
Victor I. Kozik et al. CEUR Workshop Proceedings                                         152–160


Table 2
           Class number             Class name            Number of classes   Accuracy
                 2                   Buildings                  2621           0.9987
                 4                     Corn                     2269           0.9927
                 7                   Corn-EW                    169               1
                 8                   Corn-NS                    368               1
                 9                 Corn-CleanTill               2481            0.997
                 10             Corn-CleanTill-EW               4241           0.9914
                 12         Corn-CleanTill-NS-Irrigated          45               1
                 14                 Corn-MinTill                896            0.9963
                 15              Corn-MinTill-EW                1099            0.997
                 16              Corn-MinTill-NS                 93               1
                 17                 Corn-NoTill                  30               1
                 18               Corn-NoTill-EW                1304              1
                 21                    Grass                     91             0.963
                 26                     Hay                     443            0.9925
                 27                 Hay-Alfalfa                 191               1
                 30                 Not cropped                  70               1
                 33                   Orchard                   1996              1
                 35                    Pond                     326               1
                 38                Soybeans-NS                  227               1
                 39             Soybeans-CleanTill              239            0.9722
                 40             Soybeans-CleanTill              2000            0.99
                 41           Soybeans-CleanTill-EW             1057           0.9905
                 42           Soybeans-CleanTill-NS              40            0.6667
                 44         Soybeans-CleanTill Weedy            1046           0.9777
                 45              Soybeans-Drilled                50               1
                 46              Soybeans-MinTill                65               1
                 47            Soybeans-MinTill-EW              689               1
                 48          Soybeans-MinTill-Drilled           721               1
                 49            Soybeans-MinTill-NS              185               1
                 50               Soybeans-NoTill               356               1
                 52             Soybeans-NoTill-NS              436               1
                 56                    Trees                    636               1
                 57                   Wheat                     8115           0.9988


   For the fragment 14×14, it is already at 30 epochs actually comes out in saturation, and for a
5x5 size the classification accuracy continues to grow even at 50 epochs, as can be seen from
Figure 7.
   The classification accuracy also depends on the number of layers. In the previous work [1],
only convolutional layers, input and output, were considered. Since here the entire network
is shown, let’s consider the dependence on the total number of layers. This dependence for a
14×14 fragment is shown in Figure 8. As follows from the graph, the optimal number of layers
is 13.
   So, we have chosen the following network parameters: fragment size — 14×14, number
of principal components — 5, number of network layers 13, number of learning epochs 50.


                                                   158
Victor I. Kozik et al. CEUR Workshop Proceedings                                            152–160


Classification results with the number of classes 33, (as seen in the learning function shown in
Figure 9) — 99.72%, which, in our opinion, is a very good result.
   The resulting table with the number of elements in each class and the probability of class
recognition is shown in Table 2.
   It can be suggested, that high values of classification accuracy are obtained due to overfitting
of the neural network. This is an undesirable phenomenon that occurs under solving learning
problems by precedents, when the probability of the trained algorithm average error on the test
sample is significantly higher than the error on the training sample. From Figure 9 (bottom part),
characterizes the behavior of the error in the learning process, follows that the error on the
test sample is very insignificantly (by a part of a percent) higher than the error on the training
sample, which means that there is no overfitting in this case. Table 2 shows the classification
results, indicating the probabilities for each class.
   From the class names it is clear that we did not integrate closely related classes into one (for
example, crops of corn, crops of soybeans), as it was done in other publications [9, 10]. It is
clear that it is much easier to distinguish corn crops from buildings than to distinguish between
different planting options for the corn or soybeans for different types of plowing. Note that
with fragments of 14×14, almost indistinguishable objects — crops of corn, soybeans — are
classified with a very high (often 100%) probability.
   It should be said that the results obtained in this work significantly exceed the results of [11],
with one caveat: the latter does not contain the problem of covering an area belonging to a class
by rectangular windows; therefore the regions with a complex configuration can be classified
there.


5. Conclusion
Thus, in this report we have analyzed the influence of the neural networks parameters on
the accuracy of hyperspectral images classification. The network parameters and methods
of forming a training sample are selected, which provide a very high classification accuracy
(integral accuracy is 99.72%), and such a high accuracy is ensured on close classes (11 types of
corn plowing, 14 types of soybean plowing). Analyzing such high classification accuracy, the
following should be said. This is largely due to the way as the training and validation samples
are formed, characterized by their very close mixing. At the same time, it is obvious that this
method shows a certain limit of classification accuracy, from which it’s possible to deviate,
for example, by increasing the fragments coverage step or forming the training and validation
samples spatially separated.


References
 [1] Kozik V.I., Nezhevenko E.S. Classification of hyperspectral images using convolutional
     neural networks // Avtometriya. 2021. No. 1. P. 13–21.
 [2] Borzov S.M., Potaturkin O.I. Spectral-spatial methods of classification of hyperspectral
     images, a review // Avtometriya. 2018. Vol. 54. No. 6. P. 64–86.


                                                   159
Victor I. Kozik et al. CEUR Workshop Proceedings                                          152–160


 [3] Nezhevenko E.S., Feoktistov A.S. Investigation of the efficiency of neural network classifica-
     tion of hyperspectral images using the Hilbert – Huang transform // Collection of Articles
     Based on the Materials of the International Scientific Congress “Interexpo Geo-Siberia”.
     Novosibirsk, April 18–22, 2016. Vol. 1. P. 60–64.
 [4] Nezhevenko E.S, Feoktistov A.S, Dashevsky O.Yu. Neural network classification of hyper-
     spectral images based on the Hilbert – Huang transform // Avtometriya. 2017. Vol. 53. No. 2.
     P. 79–85.
 [5] Baumgardner M. F., Biehl L. L., Landgrebe D. A. 220 Band AVIRIS Hyperspectral Image
     Data Set: June 12, 1992 Indian Pine Test Site 3. Purdue University Research Repository.
     2015. doi:10.4231/R7RX991C.
 [6] Audebert N., Saux B., Lefèvre S. Deep learning for classification of hyperspectral data: A
     comparative review // Geoscience and Remote Sensing Magazine. IEEE, 2019. Vol. 7. No. 2.
     P. 159–173.
 [7] Li Y., Zhang H., Shen Q. Spectral–spatial classification of hyperspectral imagery with
     3D convolutional neural network // Remote Sensing. 2017. Vol. 9. No. 67. P. 1–21,
     DOI:10.3390/rs9010067.
 [8] Krizhevsky A. Learning multiple layers of features from tiny images. Master’s Thesis,
     Department of Computer Science, University of Toronto, 2009.
 [9] Borzov S.M, Potaturkin O.I Research of the efficiency of spectral-spatial classification of
     hyperspectral observation data // Avtometriya. 2017. Vol. 53. No. 1. P. 32–42.
[10] Borzov S.M., Potaturkin O.I. Classification of hyperspectral images with different methods
     of forming training samples // Avtometriya. 2018. Vol. 54. No. 1. P. 89–97.
[11] Nezhevenko E.S. Neural network classification of difficult to distinguish types of vegetation
     byhyperspectral features // Avtometriya. 2019. No. 3. P. 62–70.


                                                   160

</pre>