<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>This work was supported by the Russian Science Foundation, project No.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Optimization of the Composition of Standards in Recognition and Compression Algorithms of Hyperspectral Images*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leonid Lebedev</string-name>
          <email>lebedev@pmk.unn.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Lobachevsky state university of Nizhny Novgorod</institution>
          ,
          <addr-line>Nizhny Novgorod</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <volume>1</volume>
      <issue>16</issue>
      <fpage>6</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>The paper proposes a solution to the problem of minimizing the number of standards in order to increase both the compression coefficient of hyperspectral images (HSI) and the speed of correlation extreme compression methods (CEM). As modifications of the CEM, randomized and differential compression algorithms are offered. The randomized and difference algorithms are based on the hypothesis of spatial compactness of pixels located in local regions of the image matrix. This means that when a new template is formed based on an unrecognized pixel, there is a high probability of using a pixel that lies near the boundaries of the coverage areas of the existing templates, which leads to their increase. In order to reduce the influence of spatial compactness of pixels on the formation of standards, a methodology based on changing the sequence of recognized pixels is proposed. In a randomized algorithm, a row of the matrix is randomly determined for this, on the basis of which a sequence of recognized pixels is generated by a random column generator. In the difference algorithm of compression, the row number of the matrix is determined by the rule for finding the members of an arithmetic progression with a given difference. For the selected line a sequence of recognizable pixels is formed on the same principle. It should be noted that line-by-line pixel recognition in the self-learning mode allows compressing HSI of almost any volume. The effectiveness of the created algorithms is demonstrated on two fragments of real HSI. A comparative analysis of all three compression algorithms in terms of the quantitative composition of the obtained standards is presented.</p>
      </abstract>
      <kwd-group>
        <kwd>Hyperspectral Image</kwd>
        <kwd>Correlation Extreme Methods</kwd>
        <kwd>Similarity Estimates</kwd>
        <kwd>Recognition</kwd>
        <kwd>Compression</kwd>
        <kwd>Random Pixel Samples</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the basics of correlation extreme methods of recognition and compression in
self-learning modes that are invariant to a given type of transformation, when
hyperspectral images act as the source information. The HSI is represented as a
three-dimensional cube, each pixel of which is described by the response values in the
corresponding spectral channels. Representing HSI pixels as vectors (points) y = ( y 1 , y 2 ,..., y n )
in a multidimensional linear space ℝ , where n it corresponds to the number of
channels of the spectrometer, allows us to obtain similarity estimates for recognition and
compression methods that are invariant, respectively, to the identity transformation  I
m
and ˆmI ; to the similarity transformation  mS and ˆmS ; to the offset  mD and ˆmD ; to the
scaling and offset  mT and ˆT . The difference in similarity estimates for recognition
m
and compression methods is that the conversion operators adduce the source pixel y
to the reference pixel ye = ( y 1e, y e2,..., y e ) during recognition, while compression, on
n
the contrary, returns the restored value y based on the corresponding conversion of the
reference pixel ye . Note that for any pair of pixels y and ye for identical conversion
and offset, the similarity estimates for the recognition and compression methods are the
same. To increase the compression ratio of the HSI, two directions can be easily
identified: first, with the choice of compression method, and second, with a decrease in the
number of standards. When compressing based on recognition with self-learning,
reducing the number of standards also increases the speed of reduction algorithms. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
to reduce the number of standards, an algorithm based on the idea of solid stacking by
means of coverage zones of standards of the entire set of pixels is proposed. For the
identical transformation, the formation of a new standard based on an unrecognized
pixel is carried out in accordance with the formula (1)
   =  + ∑ ∈ ⃗ ,

⃗ = ( −    ) ⋅ ( ⋅ √

̂ ,
− 1),
(1)
where Q is the set of standards for which the pixel lies in the region of their solid
stacking;  is the compression threshold; and  is a parameter whose value depends
on the space dimension n and is responsible for the implementation of the regions of
solid stacking.
      </p>
      <p>In a space of dimension n = 2 , the formation of a new standard ye3 , the location of
which ensures optimal solid stacking in accordance with formula (1), is illustrated in
algorithms allows reducing their number by magnitude up to 5%. However, there are
difficulties in its practical application. The difficulty of applying this method lies in the
choice of the value of the parameter h for the implementation of the optimal continuous
stacking. Obviously, a violation of the principle of solid stacking or excessive
intersection of the coverage areas of the standards leads to a decrease in the efficiency of the
optimization method. In turn, the analysis of the formation of standards shows that</p>
      <p>Optimization of the Composition of Standards in Recognition and Compression… 3
when using a sequential enumeration of pixels (television scan), the intersection of the
coverage areas of the standards is a significant factor in their excess replenishment.</p>
      <p>
        Therefore, in order to focus on pixels that are more distant in space from the
reference signatures, it is proposed to select pixels for compression at random. Thus, the
randomness of the choice of pixels can provide continuous coverage with fewer
standards. The same idea in solving other problems can be traced in [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods of minimizing the number of standards</title>
      <p>Consider two algorithms, randomized and difference, that minimize the number of
standards relative to the original HSI compression algorithm, implemented on the basis
of the correlation-extreme recognition method with self-learning.
2.1</p>
      <sec id="sec-3-1">
        <title>The randomized compression algorithm for hyperspectral images</title>
        <p>The main difficulty in implementing the compression algorithm when forming a
random pixel recognition sequence is the large amount of HSI, which does not allow to
store all the information in many programming systems in RAM. Therefore, the
selection of a pixel at random in the proposed randomized compression algorithm of the HSI
will be carried out in two stages. First, a row of an image matrix is determined by a
random number generator, and then a sequence of pixels of this row is randomly
generated. At each step, the values of random numbers are controlled in order to
exclude the possibility of their coincidence. Since at the second step, the selection of
pixels is carried out within one row of the matrix, the dimensions of the HSI are not critical
for this implementation of the algorithm.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>The difference algorithm of compression for hyperspectral images</title>
        <p>An alternative algorithm to the above is the difference algorithm of compression of the
HSI. Here, in the proposed algorithm, the row is selected according to the principle of
obtaining the members of the arithmetic progression with a given difference d. Line
numbers for compression in this algorithm are formed by the formula

= ( ⋅  )%
+ 
(  ⋅  /
),  ∈ {0,1, . . . , 
− 1},
(2)
where Kr is the number of rows of the HSI matrix. To determine the sequence pixel
numbers in a row, one can use the same formula (2), replacing the value of the
parameter Kr in it by the value Ks equal to the number of columns and possibly changing
the difference d . Thus, by changing the sequence of lines and pixels in them in
accordance with the proposed algorithm, it is possible to reduce the correlation of current
pixels. And this implies a decrease in the number of standards for continuous coverage.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental research</title>
      <p>Experimental studies to minimize the number of standards in the proposed algorithms,
randomized and differential algorithm, were carried out on two fragments of HSI with
a significantly different composition of objects.
3.1</p>
      <sec id="sec-4-1">
        <title>Experiments on the HSI fragment f100520t01p00r12</title>
        <p>The first fragment was represented by lines 251 through 550 of the f100520t01p00r12
HSI file of the AVIRIS spectrometer based on 224 frequencies. The number of columns
of the fragment matrix was 813 pixels. The spatial resolution was 17.3 m. The image
of this fragment is shown in Fig. 2. For all four types of similarity ratings, this fragment
was compressed under various conditions. The original compression algorithm was
tested on a fragment when specifying various lines that were taken as start lines.
Basically, the step of changing the start lines was 10 units. The way to select the current
lines further after the start was sequential. As a result of the experiments, for each
assessment, a set of s was obtained, the elements of which are the values of the number
of standards formed during compression of the fragment. The boundary values, the
average number of standards, and the variance were found for this set.</p>
        <p>Optimization of the Composition of Standards in Recognition and Compression… 5
For a randomized algorithm, similar characteristics were obtained. For the difference
algorithm, the number of generated standards was obtained with the difference value
d = 10 . The results obtained are summarized in Table 1.
with similarity with similarity with similarity with similarity
estimate ˆ mI estimate ˆmS estimate ˆmD estimate ˆmT</p>
        <p>From the data given in the table, it follows that the randomized and difference
algorithms are superior to the original compression algorithm in terms of minimizing the
number of standards needed to restore a fragment of the HSI with an error not exceeding
2% of the pixel norm. In turn, the difference algorithm is more efficient than the
randomized algorithm by the same criterion, although this is mainly for assessing
similarities in an identical transformation. However, as follows from the obtained boundaries
of the change in the number of standards in the experiments performed, in some cases
the randomized algorithm is superior to the difference algorithm, although on average
it can be inferior to it.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experiments on the HSI fragment MoffettField</title>
        <p>The second fragment was formed on the basis of lines 101 to 600 of the MoffettField
HSI. The number of matrix columns is 753, and the number of channels used by the
AVIRIS spectrometer was 224. This fragment is shown in Fig. 3. The difference of this
fragment from the previous one lies in the greater diversity of the underlying surface
and, therefore, the need to form a much larger number of standards.</p>
        <p>The experiments on this fragment were carried out according to the same scheme.
The research results are shown in table 2. As follows from the results in the tables,
when compressing the HSI using a similarity estimate that is invariant with respect to
the identical transformation ˆ mI or the displacement ˆmD , preference should be given to
the difference algorithm. To evaluate the similarity invariant with respect to the
similarity transformation ˆmS or the similarity transformation with offset ˆT , both
compresm
sion algorithms showed almost identical results in minimizing the number of standards.</p>
        <p>However, when choosing a compression algorithm for HSI in this case, a
randomized algorithm should be preferred because of the lack of settings in it. In the difference
algorithm, the value of the parameter d should be estimated mainly from the data on
the spatial resolution of the spectrometer used, with the aim of choosing the sequence
of less correlated pixels.</p>
        <p>Initial
algorithm
Randomized
algorithm
Optimization of the Composition of Standards in Recognition and Compression… 7
Adaptive compression
algorithm, σ =5%
with similarity with similarity with similarity with similarity
estimate ˆ I
m
estimate ˆS
m
estimate ˆD
m
estimate ˆT
m
The hypothesis of a possible reduction in the number of standards as a result of a change
in the sequence of pixels during compression of the HSI was confirmed by experiments.
As a result, on the basis of the correlation extreme method, two algorithms were
created, randomized and differential compression algorithms, which reduce the number of
standards by 2-5% and thereby increase the compression coefficient of the HSI, as well
as the speed of the procedures.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lebedev</surname>
            <given-names>L.I.</given-names>
          </string-name>
          :
          <article-title>Geometrical aspects of correlation-extreme methods of object recognition and HSI compression</article-title>
          .
          <source>In: Proceedings of the 6th International Conference on Information Technology and Nanotechnology (ITNT-2020)</source>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>238</lpage>
          . Samara National University, Samara (
          <year>2020</year>
          )..
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Svirnov</surname>
            ,
            <given-names>S.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikhailov</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ostrikov</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          :
          <article-title>Application randomized method of principal components for hyperspectral data compression</article-title>
          .
          <source>J. Modern problems of Earth remote sensing from cosmos</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ),
          <fpage>9</fpage>
          -
          <lpage>17</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Borzov</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guryanov</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potaturkin</surname>
            ,
            <given-names>O.I.</given-names>
          </string-name>
          :
          <article-title>Study of the classification efficiency of difficult-to-distinguish vegetation types using hyperspectral data</article-title>
          .
          <source>J. Computer Optics</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ),
          <fpage>464</fpage>
          -
          <lpage>473</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bibikov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kazanskiy</surname>
            ,
            <given-names>N.L.</given-names>
          </string-name>
          <string-name>
            <surname>Fursov</surname>
            ,
            <given-names>V.A.</given-names>
          </string-name>
          :
          <article-title>Vegetation type recognition in hyperspectral images using a conjugacy indicator</article-title>
          .
          <source>J. Computer Optics</source>
          ,
          <volume>42</volume>
          (
          <issue>5</issue>
          ),
          <fpage>846</fpage>
          -
          <lpage>854</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>