<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Feature Selection Methods for Remote Sensing Images Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>E. Goncharova</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Gaidel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Image Processing Systems Institute - Branch of the Federal Scientific Research Centre “Crystallography and Photonics” of Russian Academy of Sciences</institution>
          ,
          <addr-line>151 Molodogvardeyskaya st., 443001, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Samara National Research University</institution>
          ,
          <addr-line>34 Moskovskoe Shosse, 443086, Samara</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>86</fpage>
      <lpage>91</lpage>
      <abstract>
        <p>Different methods of feature selection are used to improve the performance of remote sensing images classification. In this work two methods of feature selection are examined. The first one is based on the discriminant analysis, and the second one rests on building the regression model. Histogram and textural features are considered as characteristics of an image. The experiments on the remote sensing dataset UC Merced Land Use show the effectiveness of these methods. As the result, the largest fraction of correctly classified images accounts for the 95%. Dimension of the initial feature space consisting of 18 features has been reduced to 3 features.</p>
      </abstract>
      <kwd-group>
        <kwd>Feature selection</kwd>
        <kwd>classification</kwd>
        <kwd>remote sensing images</kwd>
        <kwd>discriminant analysis</kwd>
        <kwd>regression analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. The object of the study</title>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Feature extraction</title>
        <p>where R, G, B is an intensity of red, green, and blue component of the image resolution cell having coordinates (m, n)
respectively.</p>
        <p>I (m, n) ranges in value from 0 to L  1 , where L is a maximum gray level.</p>
        <p>
          There are a large number of different features, which can characterize an image. In this work we use the histogram features
that describe the spatial distribution of gray values. If the discrete image is considered as a two-dimensional stochastic process,
we can estimate its spatial distribution of gray values and, therefore, raw (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) and central moments (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ).
        </p>
        <p> k 
µk </p>
        <p>1 M N I k (i, j) .</p>
        <p>MN i1 j1</p>
        <p>1 M N  I i, j   1 k .</p>
        <p>MN i1 j1
The calculated features are:
 mean intensity:
 standard deviation:
s  2 ;
   2 ;
 skewness:</p>
        <p>
 1  3 ;
 3

 2  4  3 .</p>
        <p> 4
R(m, n) </p>
        <p>
          I 1 , and also ( IR , IG , IB – mean intensity of red, green, and blue component respectively);
 second raw moment (mean energy):
 kurtosis (a measure of the “tailedness” of the probability distribution):
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
The autocorrelation matrix (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ) describes dependence among the pixels of an image [4].
        </p>
        <p>1</p>
        <p>  I (i, j)I (i  m, j  n)
(M  m )(N  n ) i j</p>
        <p>.</p>
        <p>Pd1d2 i, j    m, n 1, 2,..., M  1, 2,..., N  | I  m, n  i, I  m  d1 , n  d2   j , i, j  0, L 1 .</p>
        <p>Textural features are extracted from the spatial dependence matrices, which are calculated for eight different distances
d1, d2  : 1, 0 , 0,1 , 1, 1 ,  2, 0 , 0, 2 ,  2, 2 . To get the invariant under rotation features, they are extracted from the
average matrices. Thus, eight more textural features can be defined as follows:</p>
        <p>
          To extract the subset of informative features two methods were examined. The former belongs to the discriminant analysis
theory. According to this method, we choose the set of features that provides the largest value of the criterion J (Q) [6]:
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
 angular second moment:
        </p>
        <p>2
 M X MY</p>
        <p>,</p>
        <sec id="sec-3-1-1">
          <title>R – a number of neighboring pixel pairs;</title>
          <p>M X , MY – the row and column means;
DX , DY – the row and column variance.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature selection methods</title>
        <p>1) Ω0  Ω1  Ω ;
2) Ω0  Ω1   .
where P i, j  – an element of averaged over the four dimensions 1, 0 , 0,1 , 1, 1 and  2, 0 , 0, 2 ,  2, 2 .</p>
        <p>Let Ω be a set of objects for recognition. In this work a feature vector xk  RK , where K is a number of features, is
considered as the element of this set. The set is divided into two classes Δ  Ω j 2j1 with the following properties:</p>
        <p>Let ( xk ) : Ω  Δ be the ideal operator that puts an object in correspondence with its class. As long as the ideal operator is
unknown, another operator ( xk ) : Ω  Δ can be created. ( xk ) tries to predict a class of input object, according to the
information got from a training set of data U  Ω , in which the outcome of object is observable.</p>
        <p>As the features can be measured in varied units, firstly, they should be standardized to get zero mean and unit variance. For
this purpose the expected value:
and variance:</p>
        <p>M (i) 
1 U</p>
        <p> xk (i) , i  1, K , M  RK</p>
        <p>U k 1
R(i,i) 
1 U  xk (i)  M (i)2 , i  1, K , R  RKK</p>
        <p>U k1
should be estimated for each feature.</p>
        <p>
          Therefore, the feature vectors can be standardized by applying the formula (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ).
        </p>
        <p>xk (i)  xk (i)  M (i) , k  1, U , i  1, K .</p>
        <p>R(i, i)
Image Processing, Geoinformation Technology and Information Security / E. Goncharova, A. Gaidel
tr R</p>
        <p>,
where Q – current set of features;</p>
        <p>R – mixture covariance matrix;
Rj – within-class covariance matrix;</p>
        <p>P Ω j  – prior probability of class Ω j , there P Ω j   12 .</p>
        <p>Thus, the stronger the scattering between two classes exceeds the average within-class scattering, the better selected set of
features is.</p>
        <p>To form the set of the most informative descriptors a greedy strategy of adding a feature was applied. Let the initial feature
set be empty – Q0   . In step i we consider all the sets, like Qi, j  Qi1  j , and calculate the criterion Ji, j  J Qi, j  .</p>
        <sec id="sec-3-2-1">
          <title>Then choose the set that maximizes the criterion:</title>
          <p>Qi  Qi1   arg max Ji, j   Qi1   arg max J Qi1  j .</p>
          <p> j1;KZ\Qi1   j1;KZ\Qi1 
These steps are iterated until a required number of features are obtained.</p>
          <p>The second approach is based on the regression analysis. The regression analysis estimates the relationships among the
dependent variable and one, or more, independent variables.</p>
          <p>
            We propose that the number of class, which xk can belongs to, is an independent variable y  xk  . This implies that the
feature vector xk influences y  xk  , and the regression model (
            <xref ref-type="bibr" rid="ref6">6</xref>
            ) can be built as follows:
          </p>
          <p>y  X   ,
where y  ( y1</p>
          <p>y2
X – feature matrix;</p>
          <p>yn )T – output vector;
   0 1</p>
          <p> Q T – regression weights;
  1  2  n T – error vector.</p>
          <p>
            The unknown coefficients belonging to the vector  are determined from the training set data via the ordinary least squares
method:
(
            <xref ref-type="bibr" rid="ref6">6</xref>
            )
 y  X T  y  X   min .
          </p>
          <p></p>
          <p>
            The value of each feature is directly related to its weight in the regression equation (
            <xref ref-type="bibr" rid="ref6">6</xref>
            ). According to this proposal, the greedy
strategy of removing a feature can be applied to forming the set of the informative descriptors.
          </p>
          <p>Let the initial feature set Q0  Q contain all the analyzed features. In each step i the linear regression model yi  X i i
is built in the corresponding feature space. Then a feature with the minimal coefficient is removed from the set according to the
following rule:</p>
          <p> 
Qi1  Qi \  arg min i  j   .</p>
          <p> j1;KZQi 
  x, y  </p>
          <p>K
  x(i)  y(i)2 .</p>
          <p>i1
As in the previous case these steps are iterated until a required number of features are obtained.</p>
          <p>To estimate the classification power of the obtained feature subsets the nearest-neighbor classification is carried out. The
Euclidean distance in feature space is defined as follows:</p>
          <p>The classifier assigns the class of the vector x to the class of its closest point in the training set. In terms of the
computational complexity, this method is rather simple in comparison with others. Since this classifier is memory-based, if the
number of objects in the training set becomes large, this computational requirement may become excessive. The
nearestneighbor misclassification rate is no more than twice larger than the Bayes error rate [7].</p>
          <p>Image Processing, Geoinformation Technology and Information Security / E. Goncharova, A. Gaidel
The nearest-neighbor error rate is assessed as follows:
 
 xk  U |   xk     xk </p>
          <p>U</p>
          <p>, k  1, U ,
where U – test set.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>To assess the performance of the proposed approaches two image sets from the remote-sensing UC Merced Land Use dataset
were used. This dataset includes aerial optical images, belonging to different classes (agricultural field, forest, beach, etc.), 100
for each class. Each image measures 256×256 pixels (RGB color space). There are two classes of images (agricultural fields and
forest) being examined in this work. Figure 1 illustrates sample images belonging to the two classes.</p>
      <p>a)</p>
      <p>b)</p>
      <p>To carry out the experiments we used 5-fold cross-validation. The results obtained with the discriminant and regression
analysis methods are shown in tables 1 and 2 respectively.</p>
      <p>Image Processing, Geoinformation Technology and Information Security / E. Goncharova, A. Gaidel</p>
      <p>Having analyzed the results, we can conclude that the discriminant analysis method performed best on this classification task.
The lowest classification error rate of 0.05 was achieved in three-dimensional feature space, consisting of IR , I , s . The studied
textural features have no significant effect on the quality of this classification. The inclusion of more textural characteristics,
considering the correlation of features on various distances, may provide a better performance of this feature group.</p>
      <p>95%</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Thus, for the task of the remote sensing images classification the subset of informative features was extracted. On the images
from the UC Merced Land Use dataset, the histogram features produced the best outcome. It should be mentioned that the
images were represented in RGB color space; hence the mean intensity of these three components appeared to have considerable
impact on the discriminatory power.</p>
      <p>The feature vector, selected with the discriminant analysis method, produced the best classification performance (using the
nearest-neighbor classification method) on the images from the UC Merced Land Use dataset. The minimal classification error
rate made up 0.05, therefore the proportion of the correctly classified images was 95%. This rate was achieved in the reduced
three-dimensional feature space, consisting of the descriptors IR , I , s .</p>
      <p>Thus, applying the feature selection methods leads to improving the image classification performance. In this study, the
combination of three of the 18 initial descriptors appeared to be informative, while the other features increased the
misclassification rate.</p>
      <p>The method based on the discriminant analysis criterion provided good results and can be applied to fulfill the task of feature
selection. Overall, in the future work we are interested in considering more features, which can characterize an image, and
multiclass classification that can enable us to get more universal results.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements References</title>
      <p>The work was partially supported by the Russian Foundation of Basic Research (grant 16-41-630761 р_а), the Russian
Federation Ministry of Education and Science as a part of Samara University's competitiveness enhancement program in
20132020 and the RAS based research program “Bioinformatics, modern information technologies and mathematical methods in
medicine”.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Guofeng</given-names>
            <surname>Sheng</surname>
          </string-name>
          , Wen Yang, Tao Xu,
          <string-name>
            <given-names>Hong</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Guofeng Sheng. High-resolutionsatellite scene classification using a sparse coding based multiple featurecombination</article-title>
          .
          <source>International Journal of Remote Sensing</source>
          <year>2012</year>
          ;
          <volume>33</volume>
          (
          <issue>8</issue>
          ):
          <fpage>2395</fpage>
          -
          <lpage>2412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Glumov</surname>
            <given-names>NI</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myasnikov</surname>
            <given-names>EV</given-names>
          </string-name>
          .
          <article-title>Method of the informative features selection on the digital images</article-title>
          .
          <source>Computer Optics</source>
          <year>2007</year>
          ;
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>73</fpage>
          -
          <lpage>76</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Gaidel</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zelter</surname>
            <given-names>PM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapishnikov</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khramov</surname>
            <given-names>AG</given-names>
          </string-name>
          .
          <article-title>Computed tomography texture analysis capabilities in diagnosing a chronic obstructive pulmonary disease</article-title>
          .
          <source>Computer Optics</source>
          <year>2014</year>
          ;
          <volume>38</volume>
          (
          <issue>4</issue>
          ):
          <fpage>843</fpage>
          -
          <lpage>850</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gaidel</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pervushkin</surname>
            <given-names>SS</given-names>
          </string-name>
          .
          <article-title>Research of the textural features for the bony tissue diseases diagnostics using the roentgenograms</article-title>
          .
          <source>Computer Optics</source>
          <year>2013</year>
          ;
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <fpage>113</fpage>
          -
          <lpage>119</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Haralick</surname>
            <given-names>RM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shanmugam</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinstein</surname>
            <given-names>I.</given-names>
          </string-name>
          <article-title>Textural features for image classification</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          <year>1973</year>
          ;
          <volume>3</volume>
          :
          <fpage>610</fpage>
          -
          <lpage>621</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Goncharova</surname>
            <given-names>EF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gaidel</surname>
            <given-names>AV</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khramov</surname>
            <given-names>AG</given-names>
          </string-name>
          .
          <article-title>Statistical study of the factors affecting the cardiovascular disease</article-title>
          .
          <source>Information Technology and Nanotechnology</source>
          <year>2016</year>
          ;
          <fpage>1020</fpage>
          -
          <lpage>1025</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Fukunaga</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Introduction to statistical pattern recognition</article-title>
          . San Diego: Academic Press,
          <year>1990</year>
          ; 592 p.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>