<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hyperspectral data dimensionality reduction using nonlinear autoencoders</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evgeny Myasnikov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Geoinformatics and Information Security department Samara National Research University; Image Processing Systems Institute of RAS - Branch of the FSRC "Crystallography and Photonics" RAS Samara</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>33</fpage>
      <lpage>36</lpage>
      <abstract>
        <p>-The known feature of hyperspectral images is a high spectral resolution, which allows us to identify materials and classify objects in images with high accuracy. However hyperspectral images contain substantial redundancy, which can be eliminated with the aid of dimensionality reduction techniques. In this paper, we propose and study several dimensionality reduction techniques based on the pretraining the encoder-decoder neural network with the results of the nonlinear mapping and principal component analysis techniques. The experiments performed on an open dataset show that the proposed techniques both provide the discriminative low-dimensional features and allow us to reconstruct source hyperspectral data with little error.</p>
      </abstract>
      <kwd-group>
        <kwd>autoencoder</kwd>
        <kwd>hyperspectral images</kwd>
        <kwd>nonlinear mapping</kwd>
        <kwd>principal component analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>Hyperspectral images are widely used nowadays in
different fields such as agriculture, medicine, biology,
chemistry, and so on. The known feature of hyperspectral
images is high spectral resolution, which allows us to
identify materials and classify depicted images with high
accuracy.</p>
      <p>
        However hyperspectral images contain substantial
redundancy, which can be eliminated with the aid of
dimensionality reduction techniques. The images obtained
after the dimensionality reduction stage can be processed
efficiently as much less data volume is involved in
processing. It is worth noting that dimensionality reduction
techniques are often used in different problems of image
analysis (see [
        <xref ref-type="bibr" rid="ref1 ref3">1-3</xref>
        ], for example). The key requirement to the
dimensionality reduction procedures is the possibility to
preserve the quality of the solution of applied problems that
is classification, segmentation, material detection, and so on.
      </p>
      <p>
        The most commonly used techniques for the
dimensionality reduction of hyperspectral data are linear
techniques such as Principal Component Analysis (PCA).
While a number of general-purpose nonlinear
dimensionality reduction procedures exist [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], their use in
hyperspectral image analysis is limited as many of them do
not provide the ability to restore source hyperspectral data as
such procedures provide only one-way data mapping.
      </p>
      <p>In the last years, neural network approaches become
more and popular. In particular, autoencoder neural networks
[5] were used for the dimensionality reduction of
hyperspectral images. Such neural networks perform both
nonlinear dimensionality reduction and provide the inverse
mapping, which allows us to restore the source hyperspectral
data up to some reconstruction error.</p>
      <p>
        Recently, it was shown [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] that the autoencoder network
can be pretrained using principal component analysis
technique, and its use for the dimensionality reduction
allowed to outperform the PCA technique both in terms of
the reconstruction error and classification accuracy.
      </p>
      <p>
        However, it was also shown [
        <xref ref-type="bibr" rid="ref7 ref8">7,8</xref>
        ] that the nonlinear
mapping technique [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have advantages over the PCA in
terms of classification and segmentation quality of
hyperspectral images. For this reason, in this paper, we
study the possibility to train the autoencoder–like
architecture to capture the nonlinear mapping. In particular,
we split the autoencoder into encoder and decoder and train
both parts separately using the results of nonlinear mapping
and investigate the effect of the subsequent fine-tuning of
the whole network.
      </p>
      <p>The structure of the paper is as follows. In the next
Section II, we give necessary theoretical information on the
neural network architecture and the nonlinear mapping
algorithm. In Section III we describe the training procedures
used in the experimental study and describe the results of
experiments. The conclusions and the list of references are
given at the end of the paper.</p>
      <p>II.</p>
    </sec>
    <sec id="sec-2">
      <title>METHOD</title>
      <sec id="sec-2-1">
        <title>A. Autoencoder Neural Network</title>
        <p>The autoencoder neural network proposed in [5] was
earlier referred to as the autoassociative neural network. It
consists of two consecutive parts called the encoder and
decoder.</p>
        <p>The encoder part takes a multidimensional vector x ϵ RM
as input and produces corresponding low-dimensional
representations y ϵ Rm so that m&lt;M. The encoder consists of
at least two fully – connected layers. The first layer contains
some number of neurons (defined by the parameters of the
neural network architecture) connected to all the components
of an input vector. The last layer of the encoder contains the
number of neurons equal to the desired dimensionality of the
reduced space.</p>
        <p>The decoder usually has the mirror-reflected architecture.
It has the same number of layers with the same number of
neurons, but this is not the necessary requirement. Anyway,
the input layer of the decoder takes the reduced
representation y ϵ Rm from the output of the encoder and
restores the multidimensional vectors ~x ϵ RM. So the output
layer of the decoder has the number of neurons equal to the
input dimensionality M. The number of hidden layers and
neurons is defined by the parameters of the neural network
architecture.</p>
        <p>As the number of neurons in the output layer of the
encoder is less than the number of neurons in the input and
hidden layers, this layer is often referred to as a bottleneck
layer, and the whole network architecture is often referred to
as a bottleneck architecture.</p>
        <p>The autoencoder architecture is usually trained in
selflearning mode by applying the same multidimensional
vectors x ϵ RM to both input and output layers of the
autoencoder. The training process itself is based on the
minimization of the following cast function:
 E  1 N xi  ~xi 2  </p>
        <p>N i 1
where N is the number of samples, and x ϵ RM, ~x ϵ RM are
inputs and outputs of the network. After training the encoder
can be used to perform the dimensionality reduction of the
source data (direct mapping), and decoder can be used to
restore the source data by its reduced representation (inverse
mapping).</p>
        <p>In this paper, we study, if the encoder and decoder parts
can be trained separately to force the neural network to
perform the mapping with the desired properties. It was
shown earlier that the separate pre-training of encoder and
decoder with the PCA results helped to perform the training
more efficiently compared to the standard training.</p>
        <p>
          In particular, the approach proposed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] consists of the
following steps: perform the PCA for the input dataset;
pretrain the encoder to produce the PCA results for the input
data; pre-train the decoder to produce the input data for the
encoded data; fine-tune the whole network according to the
standard scheme.
        </p>
        <p>
          In this paper, we follow the similar scheme but use the
results of the nonlinear mapping algorithm instead of the
PCA, and perform the fine-tuning optionally to study if such
an approach can be more efficient than the standard PCA,
nonlinear mapping or the proposed recently autoencoder
pretrained with the PCA [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>B. Nonlinear Mapping</title>
        <p>
          The nonlinear mapping is a numerical procedure that
performs the mapping (nonfunctional) of data into
lowdimensional space so that the data structure is preserved (see
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for example). This structure is defined in nonlinear
mapping by all the pairwise distances between the points in
the dataset. The Euclidean distance d() is usually used to
measure the distances.
        </p>
        <p>As the pairwise distances cannot be preserved exactly in
a common case, the so-called data mapping error is
introduced:</p>
        <p>N
  i , j d ( x i , x j )  d ( y i, y j ) 2  


   </p>
        <p>i , j 1 (i  j )
Here N is the number of data points, d(xi,xj) is the distance
between points xi and xj in the multidimensional space,
d(yi,yj) is the distance between the corresponding points yi, yj
in the reduced space, µ and  are some constants. Usually, µ
is the inversion of the sum of square distances between all
the possible pairs of data points in multidimensional space,
and ij are equal to one.</p>
        <p>The minimization of the data mapping error is usually
performed using the gradient descent technique. The
coordinates of data points yi ϵ Rm are the tunable parameters.</p>
        <p>In this paper, we use the stochastic gradient descent
based on mini-batches to minimize the data mapping error.
The overall algorithm for dimensionality reduction using the
nonlinear mapping consists of the initialization of the
coordinates yi with the results of the principal component
analysis with the subsequent refinement of yi using the
stochastic gradient descent. The optimization process
(refinement) stops when the coordinates of the data points yi
in the reduced space become stable.</p>
      </sec>
      <sec id="sec-2-3">
        <title>C. The methods used in the study</title>
        <p>As it was outlined in the introduction, in this paper, we
study several variants of training the autoencoder-like
encoder-decoder network. In particular, we consider the
following techniques:</p>
        <p>
          - The autoencoder network pretrained with the results of
the PCA technique (AE-PCA), as it is described in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ];
- The neural network with encoder and decoder
(EDNLM) trained separately using the results of the nonlinear
mapping technique;
        </p>
        <p>- The same autoencoder network pretrained with the
results of the nonlinear mapping technique and fine-tuned
using the standard approach (AE-NLM).</p>
        <p>III.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>EXPERIMENTS</title>
      <p>
        In this section, we describe the results of the experiments,
which were performed using the Indian Pines dataset. This
dataset was acquired using the AVIRIS hyperspectral sensor.
This dataset contains 145 x 145 image pixels and 224
spectral components [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Due to the high noise and water
absorption in the source image, we used the version
containing 204 spectral channels.
      </p>
      <p>In all the described experiments, for the implementation
of the neural networks, we used the Keras framework and
Python language. The experiments were carried out on
GeForce GTX 1070 ti.</p>
      <p>For each considered neural network technique, we varied
the number of hidden layers in the encoder and decoder and
performed experiments for one and two hidden layers that
correspond to four and six layers in the corresponding
autoencoder networks.</p>
      <p>The number of neurons in the input layer of the encoder
and the output layer of the decoder was defined by the
dimensionality of the input space that is the number of
channels in the hyperspectral image. The number of neurons
in the bottleneck layer varied from 1 to 10 according to the
dimensionality of the reduced space. We also varied the
number of neurons in the hidden layers. In particular, we
used 64, 128, and 256 neurons in hidden layers.</p>
      <p>
        According to the recommendations given in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], we used
ReLU activation functions for hidden layers and linear
activations in the output layers of the encoder and decoder.
Analogously, we used Adam optimizer [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] with the default
parameters. The batch size was set to 16, however, we
suppose that a bigger batch size could also be used.
      </p>
      <p>
        To measure the effectiveness of each particular approach,
we estimated both the reconstruction error as it is defined in
(1) and the classification accuracy using the reduced
representation. The latter indicator plays an important role in
hyperspectral image analysis problems, for example, in
vegetation type recognition [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>For the latter indicator, we used the overall accuracy of
the one nearest neighbor (1-NN) classifier. The accuracy
itself was measured as a fraction of correctly classified image
pixels. To measure the accuracy, at first, we performed
dimensionality reduction using one of the studied techniques
for all the pixels in the considered image. Then we split all
the ground truth pixels into training and testing sets in the
proportion 60/40. After that, we trained the classifier using
the training set and estimated its accuracy using the test one.</p>
      <p>In our first experiment, we compared different techniques
described in Subsection II.C and different architectures from
the viewpoint of the reconstruction error (1). The results of
this experiment are shown in Fig. 1. In particular, we
pretrained the encoder and decoder of the AE-PCA network for
50 iterations, fine-tuned the entire network for 50 iterations,
and then measures the reconstruction quality.
quality indicator. The experiment was carried out for a
different number of layers and neurons.</p>
      <p>As can be seen in the figure, the reconstruction error
decreases with the growth of the dimensionality m of the
reduced space defined by the number of neurons in the
bottleneck layer, which is an expected result.</p>
      <p>While we cannot highlight any winner technique in this
experiment, we should note, that the AE-NLM technique
often shows better results. It means that the nonlinear
mapping result, which was used for training, provide the
ability to restore the source data with quite a good quality.
This also means that the decoder trained on the NLM data
can be used as an inverse mapping for the NLM.</p>
      <p>For the AE-NLM network, we trained the network with
the same strategy, but used the NLM results instead of the
PCA results at the pretraining stage. For the ED-NLM
network, we trained separately encoder and decoder for 100
epochs. After the training, we measured the error (1) as the</p>
      <p>In our second experiment, we compared the considered
techniques from the viewpoint of the classification accuracy.
The results of this experiment are shown in Fig. 2. In this
figure, we added the results for the classical linear (PCA) and
nonlinear (NLM) dimensionality reduction techniques.</p>
      <p>As can be seen, the proposed techniques provided better
results than the classical approaches in most cases. Again, it
is difficult to outline any approach. Nevertheless, we do not
observe any substantial advantages in the fine-tuning of the
NLM initialized network over the version with separate
encoder and decoder.</p>
      <p>CONCLUSION</p>
      <p>In this paper, we studied several dimensionality reduction
neural network techniques based on autoencoder
architecture. We compared the proposed techniques from the
viewpoint of the reconstruction error and the accuracy of the
per-pixel classification.</p>
      <p>We showed that the proposed techniques outperformed
the baseline (PCA and NLM) approached in terms of the
classification accuracy in almost all the considered cases.
The decoder trained using the results of the NLM can be
successfully used as an inverse mapping for hyperspectral
image analysis.</p>
    </sec>
    <sec id="sec-4">
      <title>ACKNOWLEDGMENT</title>
      <p>The work was partly funded by RFBR according to the
research project 18-07-01312-a in parts of «2. Method» - «3.
Experiments» and by the Russian Federation Ministry of
Science and Higher Education within a state contract with
the «Crystallography and Photonics» Research Center of the
RAS in parts «1. Introduction» and «4. Conclusion».</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.A.</given-names>
            <surname>Dmitriev</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.V.</given-names>
            <surname>Myasnikov</surname>
          </string-name>
          , “
          <article-title>Comparative study of description algorithms for complex-valued gradient fields of digital images using linear dimensionality reduction methods</article-title>
          ,” Computer Optics, vol.
          <volume>42</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>822</fpage>
          -
          <lpage>828</lpage>
          ,
          <year>2018</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179- 2018-42-5-
          <fpage>822</fpage>
          -828.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.V.</given-names>
            <surname>Gashnikov</surname>
          </string-name>
          , “
          <article-title>Optimization of the multidimensional signal interpolator in a lower dimensional space</article-title>
          ,”
          <source>Computer Optics</source>
          , vol.
          <volume>43</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>653</fpage>
          -
          <lpage>660</lpage>
          ,
          <year>2019</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2019-43-4-
          <fpage>653</fpage>
          - 660.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.V.</given-names>
            <surname>Myasnikov</surname>
          </string-name>
          , “
          <article-title>The study of dimensionality reduction methods in the task of browsing of digital image collections,” Computer Optics</article-title>
          , vol.
          <volume>32</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>296</fpage>
          -
          <lpage>301</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.A.</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Verleysen</surname>
          </string-name>
          , “Nonlinear Dimensionality Reduction," Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>M.A. Kramer</surname>
          </string-name>
          , “
          <article-title>Nonlinear principal component analysis using autoassociative neural networks,” AIChE J.</article-title>
          , vol.
          <volume>37</volume>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>243</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Myasnikov</surname>
          </string-name>
          , “
          <article-title>Dimensionality Reduction of Hyperspectral Images using Autoassociative Neural Networks,”</article-title>
          <source>IEEE Proc. of International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON)</source>
          , pp.
          <fpage>0591</fpage>
          -
          <lpage>0595</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Myasnikov</surname>
          </string-name>
          , “
          <article-title>Evaluation of nonlinear dimensionality reduction techniques for classification of hyperspectral images</article-title>
          ,
          <source>” CEUR Workshop Proceedings</source>
          , vol.
          <volume>2268</volume>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>154</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Bibikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. L.</given-names>
            <surname>Kazanskiy</surname>
          </string-name>
          and
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Fursov</surname>
          </string-name>
          , “
          <article-title>Vegetation type recognition in hyperspectral images using a conjugacy indicator,” Computer Optics</article-title>
          , vol.
          <volume>42</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>846</fpage>
          -
          <lpage>854</lpage>
          ,
          <year>2018</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2018-42-5-
          <fpage>846</fpage>
          -854.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.W.</given-names>
            <surname>Sammon</surname>
          </string-name>
          , “
          <article-title>A nonlinear mapping for data structure analysis</article-title>
          ,
          <source>” IEEE Transactions on Computers</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>401</fpage>
          -
          <lpage>409</lpage>
          ,
          <year>1969</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.F.</given-names>
            <surname>Baumgardner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.L.</given-names>
            <surname>Biehl</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.A.</given-names>
            <surname>Landgrebe</surname>
          </string-name>
          , “220
          <string-name>
            <given-names>Band</given-names>
            <surname>AVIRIS Hyperspectral Image Data Set</surname>
          </string-name>
          : June 12,
          <source>1992 Indian Pine Test Site</source>
          <volume>3</volume>
          ,” Purdue University Research Repository,
          <year>2015</year>
          . DOI:
          <volume>10</volume>
          .4231/R7RX991C.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          , “Adam: A Optimization,” arXiv:
          <fpage>1412</fpage>
          .6980v8,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.A.</given-names>
            <surname>Bibikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.L.</given-names>
            <surname>Kazanskiy</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.A.</given-names>
            <surname>Fursov</surname>
          </string-name>
          , “
          <article-title>Vegetation type recognition in hyperspectral images using a conjugacy indicator,” Computer Optics</article-title>
          , vol.
          <volume>42</volume>
          , no.
          <issue>5</issue>
          , pp.
          <fpage>846</fpage>
          -
          <lpage>854</lpage>
          ,
          <year>2018</year>
          . DOI:
          <volume>10</volume>
          .18287/
          <fpage>2412</fpage>
          -6179-2018-42-5-
          <fpage>846</fpage>
          -854.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>