Graph Neural Sparsification for Hyperspectral Image
Classification with Local and Global Consistency
Haojie Hu1 , Minli Yao1 , Fang He1 and Fenggan Zhang1
1
    Xi’an Research Institute of Hi-Tech, Xi’an, 710025, China.


                                             Abstract
                                             Recently, graph neural network (GNN) has drawn increasing attention for hyperspectral image (HSI) classification, and there
                                             are many related works have been proposed. However, previous works are focus on either the superpixel-level local spatial
                                             information or the pixel-level global spectral information. In this paper, a novel Graph Neural Sparsification Network (GNSN)
                                             is proposed for HSI classification. In our method, a fully connected graph is adaptively constructed to make full use of local
                                             spatial information and global spectral information. Besides, we apply a neural sparsification technique to remove potentially
                                             task-irrelevant edges in case of misleading message propagation. The resulting network (GNSN) is end-to-end trainable. We
                                             conduct experiments on three popular benchmarks, including Indian Pines, Pavia University, and Kennedy Space Center, and
                                             achieve state-of-the-art performance on all three datasets.

                                             Keywords
                                             Graph neural network (GNN), hyperspectral image (HSI), superpixel, sparsification


1. Introduction                                                                                                       computation and limits its applicability. Superpixel-level
                                                                                                                      adjacent graph is another commonly used graph for HSI
HSI classification is a fundamental and challenging prob-                                                             classification [4, 5], which is usually accomplished by a
lem in HSI processing [1], which aims to assign a specific                                                            superpixel segmentation algorithm. However, this graph
label to each pixel in the image. It has been widely ap-                                                              only focuses on the local structure of the data. In order to
plied to many scenarios, such as military target detection,                                                           capture the long-range contextual information, multiple
vegetation monitoring, and disaster prevention and con-                                                               convolution layers are required to be stacked, which is
trol.                                                                                                                 highly inefficient and will bring potential concerns of
   Convolutional neural networks (CNNs) have proven                                                                   over-smoothing.
their effectiveness in image processing by automatically                                                                 Aiming to address the above issues of the superpixel-
extract discriminative features. Many CNN-based vari-                                                                 level adjacent graph, we propose a novel GNN-based HSI
ants have been designed for HSI classification from the                                                               classification algorithm (GNSN), in which a fully con-
1D-CNN to the 3D-CNN, from the single CNN to the                                                                      nected graph is adaptively constructed to make full use
hybrid CNN. Although the existing CNN-based methods                                                                   of local spatial and global spectral information. Further-
have achieved good performance to some extent, they                                                                   more, in order to avoid potentially misleading message
need many training parameters and tend to be overfitting                                                              propagation, we apply the neural sparsification technique
due to the scarcity of the training samples. Meanwhile,                                                               to remove potentially task-irrelevant edges. The main
CNNs tend to blur the classification boundary using a                                                                 contributions of this paper are summarized as follows:
fixed shape kernel around the central pixel to extract
features. Therefore, the accurate classification of HSI is                                                                 • We incorporate both local and global information
still challenging.                                                                                                           by a learnable graph, which will automatically
   To build more flexible models, many GNN-based meth-                                                                       decide whether the model will likely pay more at-
ods have been proposed for HSI classification. Generally,                                                                    tention to the nodes near it or pay more attention
there can be two categories according to the types of                                                                        to the nodes far away from it;
adjacent graph. One is pixel-level adjacent graph[2, 3].                                                                   • GNSN applies a sparsification strategy that fa-
This type of graph can directly propagate information                                                                        vors subsequent classification task by removing
between nearby and distant regions, while it takes each                                                                      potentially task-irrelevant edges;
pixel as a node of a graph, leading to a massive amount of                                                                 • Experiments on three benchmark HSI datasets
                                                                                                                             show that GNSN outperforms other compared
CDCEO 2021: 1st Workshop on Complex Data Challenges in Earth                                                                 HSI classification methods.
Observation, November 1, 2021, Virtual Event, QLD, Australia.
" haojiehu705@gmail.com (H. Hu); yaominli66@163.com
(M. Yao); fanghe1107@gmail.com (F. He); zfg417@163.com
(F. Zhang)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                 Projection matrix


          SLIC


                                                                        GCN
                                     Graph {V, E, A}


                                                                                              Conv(1*1)
                      Graph


                                                                                                          Softmax
                    Projection                         Sparsification
                                                                                  Graph
                                                                               Reprojection
                   Adaptive graph construction
                                                          Concat
 Observed HSI

Figure 1: Schematic depiction of the proposed Graph Neural Sparsification Network. GNSN first assigns nearby pixels with
similar features to the same vertex after graph projection. And then, a learned adjacent matrix that captures both local and
global structure is obtained through adaptive graph construction. Under the guidance of sparsification, the graph removes
most task-irrelevant edges and is subsequently fed into a two-layer GCN to get the high-level feature map. After the graph
reprojection, the corresponding coarse segmentation result is obtained. Finally, the refined feature map is fused with the
original feature to get the final refined classification result.


2. Approach                                                   computational complexity and meanwhile preserve the
                                                              local structure of HSI, the GNN-based models usually
The task for hyperspectral classification aims to predict work on superpixel-based nodes but not the pixel-based
the labels of all the pixels in the HSI by a classification nodes. Similarly, we first apply the simple linear iterative
model given the supervised data set. As illustrate in Fig. clustering (SLIC) method to partition the entire image
1, we present a Graph Neural Sparsification Network for into many spatially connected superpixels. By treating
the HSI classification model to adaptively capture both each superpixel as a graph node and taking the average
local and global contextual information. GNSN takes spectral signatures of each superpixel as the node feature,
the original image data 𝑋 ∈ R𝐻×𝑊 ×𝐵 and the asso- the graph projection assigns 𝑋 to a set of vertices and
ciation matrix 𝑄 ∈ R𝐻𝑊 ×𝑁 produced by a superpixel gets the feature matrix 𝑉 of the vertices in the graph by
segmentation algorithm as input, where 𝐻, 𝑊 𝐵 and 𝑁 the form of matrix multiplication as follows
represent the height, width, the number of bands, and the
number of superpixels respectively. By treating superpix-                      𝑉 =𝑄  ˆ 𝑇 Flatten(𝑋),                 (1)
els as the vertexes in the graph, we transform the image
to the graph representation 𝑉 ∈ R      𝑁 ×𝐵
                                            . Applying the where 𝑄    ˆ is the normalized 𝑄 by column, i.e., 𝑄  ˆ
                                                                                                                  𝑖,𝑗 =
adaptive graph construction discussed below, we can 𝑄𝑖,𝑗 / ∑︀ 𝑄𝑚,𝑗 . Flatten(·) denotes flattening the HSI
                                                                       𝑚
obtain the similarity matrix 𝐴 ∈ R𝑁 ×𝑁 that captures data by the spatial dimension, and the association matrix
both local and global structure in the image data. Subse- 𝑄 ∈ R𝐻𝑊 ×𝑁 is introduced by SLIC that is defined as
quently, the sparsification operation removes most task-
irrelevant edges in the graph. Following the paradigm of
                                                                                    {︃
                                                                                      1, 𝑖𝑓 𝑥𝑖 ∈ 𝒮𝑗 ,
GCN, a two-layer GCN network is applied on the graph to                     𝑄𝑖,𝑗 =                                   (2)
                                                                                      0,      otherwise,
further propagate information, resulting in a high-level
feature map 𝑉  ̃︀ ∈ R𝑁 ×𝐶 , where 𝐶 is the dimension of
                                                              where 𝒮𝑖 is the 𝑖-th superpixel that consists of several
the feature map. After the graph reprojection, the corre- homogeneous pixels.
sponding coarse segmentation result 𝑋    ̃︁ ∈ R𝐻×𝑊 ×𝐶 is        We perform graph reprojection to transform the result-
obtained in the original grid form. Finally, the refined fea- ing features back to the original coordinate space. The
ture map concatenated with the original feature passes graph reprojection operation takes the inputs of trans-
through a conventional 1 × 1 convolutional layer to get formed vertex features 𝑉      ̃︀ and the assignment matrix 𝑄,
the final classification result. GNSN consists of four main and produces the corresponding 2D feature map 𝑋.   ̃︁ This
blocks: Graph Projection and Reprojection, Graph Con- operation is defined as
struction, Sparsification, and GCN. In the following, we
will describe each block in detail.                                             ̃︁ = Reshape(𝑄𝑉
                                                                                𝑋                 ̃︀ ),              (3)
   Graph Projection and Reprojection. To reduce the
where Reshape(·) denotes restoring the spatial dimension      3. Experiments
of the flattened data.
   Graph Construction. The graph (which is the adja-         In this section, we conduct experiments to validate the
cency matrix in our formulation) is constructed according    effectiveness of our proposed method GNSN on three
to the similarity between different nodes. Denote 𝐴𝑖𝑗 as     real-world benchmark datasets, namely Indian Pines (IP),
the (𝑖, 𝑗)-element of the adjacency matrix 𝐴, we have:       Pavia University (PU), and Kennedy Space Center (KSC).
                                                             The Indian Pines dataset is captured over the agricultural
          𝐴𝑖𝑗 = 𝑤𝑇 [𝑊 𝑇 𝑣𝑖 ||𝑊 𝑇 𝑣𝑗 ] + 𝑏(𝑣𝑖 ,𝑣𝑗 ) ,     (4) test site in the Indian Pines, which contains 200 spectral
                                                             bands with a size of 145 × 145 and 16 terrain classes for
where 𝑏(𝑣𝑖 ,𝑣𝑗 ) = 1 if the 𝑖-th region and the 𝑗-th region research and analysis. The Pavia University dataset is the
are spatially related, and otherwise 𝑏(𝑣𝑖 ,𝑣𝑗 ) = 0. Com- continuous imaging of 115 spectral bands with a size of
pared to the adjacency graph described in [5, 6], where 610 × 340. This dataset includes nine land-cover classes.
the receptive field is restricted to the neighbors, we can The KSC data contains 176 bands with a size of 512×614.
see that Eq. (4) provides both global and local informa- For classification purposes, 13 classes representing the
tion. And the learnable graph will automatically decide various land cover types that occur in this environment
whether the model will likely pay more attention to the were defined for the site.
nodes near it or pay more attention to the nodes far away
from it.
   Sparsification Network. We focus on 𝑘-neighbor
                                                             3.1. Experimental settings
graph for graph neural sparsification. The 𝑘-neighbor To quantitatively evaluate different models for hyper-
graph is obtained by repeatedly sampling 𝑘 edges for spectral image classification tasks from various aspects,
each node in the original graph. To make samples dif- three metrics, including overall accuracy (OA), average
ferentiable, we apply Gumbel-Softmax to generate dif- accuracy (AA), and kappa coefficient, are used to evalu-
ferentiable discrete samples. Firstly, softmax function is ate the performance of all the compared methods. For
employed to compute the probability to sample the edge, all the datasets, we randomly select 30 labeled pixels in
which is                                                     each class for training and choose 15 labeled pixels if the
                                  (︀      )︀                 corresponding class has less than 30 samples. Besides,
                             exp 𝐴𝑣𝑖 ,𝑣𝑗
             𝜋𝑣𝑖 ,𝑣𝑗 = ∑︀                                (5) the learning rate and the number of training epochs are
                           𝑤∈N𝑣𝑖 exp (𝐴𝑣𝑖 ,𝑤 )               set to 0.001 and 2000, respectively. All the reported re-
                                                             sults are calculated based on the average of ten training
Then we can generate differentiable samples using Gumble- sessions to obtain stable results, and the best results are
Softmax as follows                                           highlighted in bold.
                       (︀(︀ (︀       )︀      )︀ )︀
                  exp log 𝜋𝑣𝑖 ,𝑣𝑗 + 𝜖𝑣 /𝜏
   𝑥𝑣𝑖 ,𝑣𝑗 = ∑︀                                       , (6) 3.2. Classification results
                 𝑤∈N𝑣 exp ((log (𝜋𝑣𝑖 ,𝑤 ) + 𝜖𝑤 ) /𝜏 )
                    𝑖
                                                              In our experiments, we compare our method with several
where 𝑥𝑣𝑖 ,𝑣𝑗 is a scalar that represents weather to sample   HSI classification methods: SVM, Multilayer Perception
the edge between 𝑣𝑖 and 𝑣𝑗 , 𝜖𝑣𝑗 = − log(− log(𝑠)) with       (MLP), 3DCNN [8], NLGCN [9] and GSAGE [10]. The
𝑠 randomly drawn from Uniform (0,1), and 𝜏 is a hyperpa-      quantitative results of different methods on all the three
rameter that controls the interpolation between discrete      datasets are shown in Table 1. From the results, we can
distribution and continuous categorical densities.            see that GNN-based methods, including NLGCN, GSAGE,
   Graph Convolution Network (GCN).                           and the proposed GNSN, show better performance than
   We perform two-layer GCN from [7] to further propa-        traditional machine learning and deep learning models
gate information on the graph. Specifically, for a single     (i.e., SVM, MLP, 3DCNN), demonstrating that GNN is
                                                       ′
graph convolution with its parameter 𝑊 ∈ R𝑑×𝑑 , the           more helpful for extracting discriminative information.
operation is defined as                                       Note that NLGCN and GSAGE neglect fusing the informa-
                                                              tion of both local and global information, thus resulting
                    𝑉
                    ̃︀ = 𝑓 (𝐴𝑉 𝑊 ),                    (7)    in lower performance than our GNSN.
where 𝑓 can be any nonlinear activation function. We
can see that GCN calculates new features of a vertex as a     4. Conclusion
weighted average of features of its neighbors on a graph,
which allows the vertices in the same cluster to have         In this article, we propose a novel Graph Neural Sparsi-
similar features, making the subsequent classifications       fication Network (GNSN) for hyperspectral image clas-
much easier.                                                  sification. Different from prior way of constructing the
Table 1
Classification results (in percent) of defferent methods on three real datasets
                   Methods           SVM          MLP         3DCNN        NLGCN       GSAGE        GNSN
                        OA           69.72        70.07        79.19        83.83       85.36       94.08
                 IP     AA           81.01        80.78        87.88        91.17       91.13       96.20
                       Kappa         66.00        66.31        76.55        81.70       83.35       93.24
                        OA           77.54        80.59        83.35        87.11       89.41       97.72
                PU      AA           85.16        86.40        88.45        91.95       92.62       97.33
                       Kappa         71.49        75.25        78.59        83.35       86.21       96.97
                        OA           86.70        87.72        91.47        93.76       96.78       99.61
                KSC     AA           81.92        83.81        88.11        90.78       95.00       99.50
                       Kappa         85.16        86.31        90.48        93.02       96.40       99.56


adjacent graph, the proposed GNSN constructs a learn-                 context-aware learning for hyperspectral image
able graph that reveals the local and glocal structure                classification, IEEE Journal of Selected Topics in
of HSI. Besides, a differentiable sparsification technique            Applied Earth Observations and Remote Sensing
is applied to favor subsequent classification task by re-             14 (2021) 4561–4572. doi:10.1109/JSTARS.2021.
moving potentially task-irrelerant edges. Experimental                3074469.
results reveal that our method significantly outperforms          [7] T. N. Kipf, M. Welling, Semi-supervised classifi-
its counterparts in terms of OA, AA, and Kappa.                       cation with graph convolutional networks, arXiv
                                                                      preprint arXiv:1609.02907 (2016).
                                                                  [8] Y. Li, H. Zhang, Q. Shen, Spectral–spatial classi-
References                                                            fication of hyperspectral imagery with 3d convo-
                                                                      lutional neural network, Remote Sensing 9 (2017)
 [1] D. Hong, X. Wu, P. Ghamisi, J. Chanussot,
                                                                      67.
     N. Yokoya, X. X. Zhu, Invariant attribute profiles:
                                                                  [9] L. Mou, X. Lu, X. Li, X. X. Zhu, Nonlocal graph
     A spatial-frequency joint feature extractor for hy-
                                                                      convolutional networks for hyperspectral image
     perspectral image classification, IEEE Transactions
                                                                      classification, IEEE Transactions on Geoscience
     on Geoscience and Remote Sensing 58 (2020) 3791–
                                                                      and Remote Sensing 58 (2020) 8246–8257. doi:10.
     3808. doi:10.1109/TGRS.2019.2957251.
                                                                      1109/TGRS.2020.2973363.
 [2] A. Qin, Z. Shang, J. Tian, Y. Wang, T. Zhang, Y. Y.
                                                                 [10] P. Yang, L. Tong, B. Qian, Z. Gao, J. Yu, C. Xiao, Hy-
     Tang, Spectral spatial graph convolutional net-
                                                                      perspectral image classification with spectral and
     works for semisupervised hyperspectral image clas-
                                                                      spatial graph using inductive representation learn-
     sification, IEEE Geoscience and Remote Sensing Let-
                                                                      ing network, IEEE Journal of Selected Topics in
     ters 16 (2019) 241–245. doi:10.1109/LGRS.2018.
                                                                      Applied Earth Observations and Remote Sensing
     2869563.
                                                                      14 (2021) 791–800. doi:10.1109/JSTARS.2020.
 [3] D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot,
                                                                      3042959.
     Q. Du, B. Zhang, More diverse means better: Mul-
     timodal deep learning meets remote-sensing im-
     agery classification, IEEE Transactions on Geo-
     science and Remote Sensing 59 (2021) 4340–4354.
     doi:10.1109/TGRS.2020.3016820.
 [4] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza,
     J. Chanussot, Graph convolutional networks for
     hyperspectral image classification, IEEE Transac-
     tions on Geoscience and Remote Sensing 59 (2021)
     5966–5978. doi:10.1109/TGRS.2020.3015157.
 [5] Y. Ding, X. Zhao, Z. Zhang, W. Cai, N. Yang, Graph
     sample and aggregate-attention network for hyper-
     spectral image classification, IEEE Geoscience and
     Remote Sensing Letters (2021) 1–5. doi:10.1109/
     LGRS.2021.3062944.
 [6] Y. Ding, X. Zhao, Z. Zhang, W. Cai, N. Yang, Multi-
     scale graph sample and aggregate network with