Graph Neural Sparsification for Hyperspectral Image Classification with Local and Global Consistency Haojie Hu1 , Minli Yao1 , Fang He1 and Fenggan Zhang1 1 Xi’an Research Institute of Hi-Tech, Xi’an, 710025, China. Abstract Recently, graph neural network (GNN) has drawn increasing attention for hyperspectral image (HSI) classification, and there are many related works have been proposed. However, previous works are focus on either the superpixel-level local spatial information or the pixel-level global spectral information. In this paper, a novel Graph Neural Sparsification Network (GNSN) is proposed for HSI classification. In our method, a fully connected graph is adaptively constructed to make full use of local spatial information and global spectral information. Besides, we apply a neural sparsification technique to remove potentially task-irrelevant edges in case of misleading message propagation. The resulting network (GNSN) is end-to-end trainable. We conduct experiments on three popular benchmarks, including Indian Pines, Pavia University, and Kennedy Space Center, and achieve state-of-the-art performance on all three datasets. Keywords Graph neural network (GNN), hyperspectral image (HSI), superpixel, sparsification 1. Introduction computation and limits its applicability. Superpixel-level adjacent graph is another commonly used graph for HSI HSI classification is a fundamental and challenging prob- classification [4, 5], which is usually accomplished by a lem in HSI processing [1], which aims to assign a specific superpixel segmentation algorithm. However, this graph label to each pixel in the image. It has been widely ap- only focuses on the local structure of the data. In order to plied to many scenarios, such as military target detection, capture the long-range contextual information, multiple vegetation monitoring, and disaster prevention and con- convolution layers are required to be stacked, which is trol. highly inefficient and will bring potential concerns of Convolutional neural networks (CNNs) have proven over-smoothing. their effectiveness in image processing by automatically Aiming to address the above issues of the superpixel- extract discriminative features. Many CNN-based vari- level adjacent graph, we propose a novel GNN-based HSI ants have been designed for HSI classification from the classification algorithm (GNSN), in which a fully con- 1D-CNN to the 3D-CNN, from the single CNN to the nected graph is adaptively constructed to make full use hybrid CNN. Although the existing CNN-based methods of local spatial and global spectral information. Further- have achieved good performance to some extent, they more, in order to avoid potentially misleading message need many training parameters and tend to be overfitting propagation, we apply the neural sparsification technique due to the scarcity of the training samples. Meanwhile, to remove potentially task-irrelevant edges. The main CNNs tend to blur the classification boundary using a contributions of this paper are summarized as follows: fixed shape kernel around the central pixel to extract features. Therefore, the accurate classification of HSI is • We incorporate both local and global information still challenging. by a learnable graph, which will automatically To build more flexible models, many GNN-based meth- decide whether the model will likely pay more at- ods have been proposed for HSI classification. Generally, tention to the nodes near it or pay more attention there can be two categories according to the types of to the nodes far away from it; adjacent graph. One is pixel-level adjacent graph[2, 3]. • GNSN applies a sparsification strategy that fa- This type of graph can directly propagate information vors subsequent classification task by removing between nearby and distant regions, while it takes each potentially task-irrelevant edges; pixel as a node of a graph, leading to a massive amount of • Experiments on three benchmark HSI datasets show that GNSN outperforms other compared CDCEO 2021: 1st Workshop on Complex Data Challenges in Earth HSI classification methods. Observation, November 1, 2021, Virtual Event, QLD, Australia. " haojiehu705@gmail.com (H. Hu); yaominli66@163.com (M. Yao); fanghe1107@gmail.com (F. He); zfg417@163.com (F. Zhang) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Projection matrix SLIC GCN Graph {V, E, A} Conv(1*1) Graph Softmax Projection Sparsification Graph Reprojection Adaptive graph construction Concat Observed HSI Figure 1: Schematic depiction of the proposed Graph Neural Sparsification Network. GNSN first assigns nearby pixels with similar features to the same vertex after graph projection. And then, a learned adjacent matrix that captures both local and global structure is obtained through adaptive graph construction. Under the guidance of sparsification, the graph removes most task-irrelevant edges and is subsequently fed into a two-layer GCN to get the high-level feature map. After the graph reprojection, the corresponding coarse segmentation result is obtained. Finally, the refined feature map is fused with the original feature to get the final refined classification result. 2. Approach computational complexity and meanwhile preserve the local structure of HSI, the GNN-based models usually The task for hyperspectral classification aims to predict work on superpixel-based nodes but not the pixel-based the labels of all the pixels in the HSI by a classification nodes. Similarly, we first apply the simple linear iterative model given the supervised data set. As illustrate in Fig. clustering (SLIC) method to partition the entire image 1, we present a Graph Neural Sparsification Network for into many spatially connected superpixels. By treating the HSI classification model to adaptively capture both each superpixel as a graph node and taking the average local and global contextual information. GNSN takes spectral signatures of each superpixel as the node feature, the original image data 𝑋 ∈ R𝐻×𝑊 ×𝐵 and the asso- the graph projection assigns 𝑋 to a set of vertices and ciation matrix 𝑄 ∈ R𝐻𝑊 ×𝑁 produced by a superpixel gets the feature matrix 𝑉 of the vertices in the graph by segmentation algorithm as input, where 𝐻, 𝑊 𝐵 and 𝑁 the form of matrix multiplication as follows represent the height, width, the number of bands, and the number of superpixels respectively. By treating superpix- 𝑉 =𝑄 ˆ 𝑇 Flatten(𝑋), (1) els as the vertexes in the graph, we transform the image to the graph representation 𝑉 ∈ R 𝑁 ×𝐵 . Applying the where 𝑄 ˆ is the normalized 𝑄 by column, i.e., 𝑄 ˆ 𝑖,𝑗 = adaptive graph construction discussed below, we can 𝑄𝑖,𝑗 / ∑︀ 𝑄𝑚,𝑗 . Flatten(·) denotes flattening the HSI 𝑚 obtain the similarity matrix 𝐴 ∈ R𝑁 ×𝑁 that captures data by the spatial dimension, and the association matrix both local and global structure in the image data. Subse- 𝑄 ∈ R𝐻𝑊 ×𝑁 is introduced by SLIC that is defined as quently, the sparsification operation removes most task- irrelevant edges in the graph. Following the paradigm of {︃ 1, 𝑖𝑓 𝑥𝑖 ∈ 𝒮𝑗 , GCN, a two-layer GCN network is applied on the graph to 𝑄𝑖,𝑗 = (2) 0, otherwise, further propagate information, resulting in a high-level feature map 𝑉 ̃︀ ∈ R𝑁 ×𝐶 , where 𝐶 is the dimension of where 𝒮𝑖 is the 𝑖-th superpixel that consists of several the feature map. After the graph reprojection, the corre- homogeneous pixels. sponding coarse segmentation result 𝑋 ̃︁ ∈ R𝐻×𝑊 ×𝐶 is We perform graph reprojection to transform the result- obtained in the original grid form. Finally, the refined fea- ing features back to the original coordinate space. The ture map concatenated with the original feature passes graph reprojection operation takes the inputs of trans- through a conventional 1 × 1 convolutional layer to get formed vertex features 𝑉 ̃︀ and the assignment matrix 𝑄, the final classification result. GNSN consists of four main and produces the corresponding 2D feature map 𝑋. ̃︁ This blocks: Graph Projection and Reprojection, Graph Con- operation is defined as struction, Sparsification, and GCN. In the following, we will describe each block in detail. ̃︁ = Reshape(𝑄𝑉 𝑋 ̃︀ ), (3) Graph Projection and Reprojection. To reduce the where Reshape(·) denotes restoring the spatial dimension 3. Experiments of the flattened data. Graph Construction. The graph (which is the adja- In this section, we conduct experiments to validate the cency matrix in our formulation) is constructed according effectiveness of our proposed method GNSN on three to the similarity between different nodes. Denote 𝐴𝑖𝑗 as real-world benchmark datasets, namely Indian Pines (IP), the (𝑖, 𝑗)-element of the adjacency matrix 𝐴, we have: Pavia University (PU), and Kennedy Space Center (KSC). The Indian Pines dataset is captured over the agricultural 𝐴𝑖𝑗 = 𝑤𝑇 [𝑊 𝑇 𝑣𝑖 ||𝑊 𝑇 𝑣𝑗 ] + 𝑏(𝑣𝑖 ,𝑣𝑗 ) , (4) test site in the Indian Pines, which contains 200 spectral bands with a size of 145 × 145 and 16 terrain classes for where 𝑏(𝑣𝑖 ,𝑣𝑗 ) = 1 if the 𝑖-th region and the 𝑗-th region research and analysis. The Pavia University dataset is the are spatially related, and otherwise 𝑏(𝑣𝑖 ,𝑣𝑗 ) = 0. Com- continuous imaging of 115 spectral bands with a size of pared to the adjacency graph described in [5, 6], where 610 × 340. This dataset includes nine land-cover classes. the receptive field is restricted to the neighbors, we can The KSC data contains 176 bands with a size of 512×614. see that Eq. (4) provides both global and local informa- For classification purposes, 13 classes representing the tion. And the learnable graph will automatically decide various land cover types that occur in this environment whether the model will likely pay more attention to the were defined for the site. nodes near it or pay more attention to the nodes far away from it. Sparsification Network. We focus on 𝑘-neighbor 3.1. Experimental settings graph for graph neural sparsification. The 𝑘-neighbor To quantitatively evaluate different models for hyper- graph is obtained by repeatedly sampling 𝑘 edges for spectral image classification tasks from various aspects, each node in the original graph. To make samples dif- three metrics, including overall accuracy (OA), average ferentiable, we apply Gumbel-Softmax to generate dif- accuracy (AA), and kappa coefficient, are used to evalu- ferentiable discrete samples. Firstly, softmax function is ate the performance of all the compared methods. For employed to compute the probability to sample the edge, all the datasets, we randomly select 30 labeled pixels in which is each class for training and choose 15 labeled pixels if the (︀ )︀ corresponding class has less than 30 samples. Besides, exp 𝐴𝑣𝑖 ,𝑣𝑗 𝜋𝑣𝑖 ,𝑣𝑗 = ∑︀ (5) the learning rate and the number of training epochs are 𝑤∈N𝑣𝑖 exp (𝐴𝑣𝑖 ,𝑤 ) set to 0.001 and 2000, respectively. All the reported re- sults are calculated based on the average of ten training Then we can generate differentiable samples using Gumble- sessions to obtain stable results, and the best results are Softmax as follows highlighted in bold. (︀(︀ (︀ )︀ )︀ )︀ exp log 𝜋𝑣𝑖 ,𝑣𝑗 + 𝜖𝑣 /𝜏 𝑥𝑣𝑖 ,𝑣𝑗 = ∑︀ , (6) 3.2. Classification results 𝑤∈N𝑣 exp ((log (𝜋𝑣𝑖 ,𝑤 ) + 𝜖𝑤 ) /𝜏 ) 𝑖 In our experiments, we compare our method with several where 𝑥𝑣𝑖 ,𝑣𝑗 is a scalar that represents weather to sample HSI classification methods: SVM, Multilayer Perception the edge between 𝑣𝑖 and 𝑣𝑗 , 𝜖𝑣𝑗 = − log(− log(𝑠)) with (MLP), 3DCNN [8], NLGCN [9] and GSAGE [10]. The 𝑠 randomly drawn from Uniform (0,1), and 𝜏 is a hyperpa- quantitative results of different methods on all the three rameter that controls the interpolation between discrete datasets are shown in Table 1. From the results, we can distribution and continuous categorical densities. see that GNN-based methods, including NLGCN, GSAGE, Graph Convolution Network (GCN). and the proposed GNSN, show better performance than We perform two-layer GCN from [7] to further propa- traditional machine learning and deep learning models gate information on the graph. Specifically, for a single (i.e., SVM, MLP, 3DCNN), demonstrating that GNN is ′ graph convolution with its parameter 𝑊 ∈ R𝑑×𝑑 , the more helpful for extracting discriminative information. operation is defined as Note that NLGCN and GSAGE neglect fusing the informa- tion of both local and global information, thus resulting 𝑉 ̃︀ = 𝑓 (𝐴𝑉 𝑊 ), (7) in lower performance than our GNSN. where 𝑓 can be any nonlinear activation function. We can see that GCN calculates new features of a vertex as a 4. Conclusion weighted average of features of its neighbors on a graph, which allows the vertices in the same cluster to have In this article, we propose a novel Graph Neural Sparsi- similar features, making the subsequent classifications fication Network (GNSN) for hyperspectral image clas- much easier. sification. Different from prior way of constructing the Table 1 Classification results (in percent) of defferent methods on three real datasets Methods SVM MLP 3DCNN NLGCN GSAGE GNSN OA 69.72 70.07 79.19 83.83 85.36 94.08 IP AA 81.01 80.78 87.88 91.17 91.13 96.20 Kappa 66.00 66.31 76.55 81.70 83.35 93.24 OA 77.54 80.59 83.35 87.11 89.41 97.72 PU AA 85.16 86.40 88.45 91.95 92.62 97.33 Kappa 71.49 75.25 78.59 83.35 86.21 96.97 OA 86.70 87.72 91.47 93.76 96.78 99.61 KSC AA 81.92 83.81 88.11 90.78 95.00 99.50 Kappa 85.16 86.31 90.48 93.02 96.40 99.56 adjacent graph, the proposed GNSN constructs a learn- context-aware learning for hyperspectral image able graph that reveals the local and glocal structure classification, IEEE Journal of Selected Topics in of HSI. Besides, a differentiable sparsification technique Applied Earth Observations and Remote Sensing is applied to favor subsequent classification task by re- 14 (2021) 4561–4572. doi:10.1109/JSTARS.2021. moving potentially task-irrelerant edges. Experimental 3074469. results reveal that our method significantly outperforms [7] T. N. Kipf, M. Welling, Semi-supervised classifi- its counterparts in terms of OA, AA, and Kappa. cation with graph convolutional networks, arXiv preprint arXiv:1609.02907 (2016). [8] Y. Li, H. Zhang, Q. Shen, Spectral–spatial classi- References fication of hyperspectral imagery with 3d convo- lutional neural network, Remote Sensing 9 (2017) [1] D. Hong, X. Wu, P. Ghamisi, J. Chanussot, 67. N. Yokoya, X. X. Zhu, Invariant attribute profiles: [9] L. Mou, X. Lu, X. Li, X. X. Zhu, Nonlocal graph A spatial-frequency joint feature extractor for hy- convolutional networks for hyperspectral image perspectral image classification, IEEE Transactions classification, IEEE Transactions on Geoscience on Geoscience and Remote Sensing 58 (2020) 3791– and Remote Sensing 58 (2020) 8246–8257. doi:10. 3808. doi:10.1109/TGRS.2019.2957251. 1109/TGRS.2020.2973363. [2] A. Qin, Z. Shang, J. Tian, Y. Wang, T. Zhang, Y. Y. [10] P. Yang, L. Tong, B. Qian, Z. Gao, J. Yu, C. Xiao, Hy- Tang, Spectral spatial graph convolutional net- perspectral image classification with spectral and works for semisupervised hyperspectral image clas- spatial graph using inductive representation learn- sification, IEEE Geoscience and Remote Sensing Let- ing network, IEEE Journal of Selected Topics in ters 16 (2019) 241–245. doi:10.1109/LGRS.2018. Applied Earth Observations and Remote Sensing 2869563. 14 (2021) 791–800. doi:10.1109/JSTARS.2020. [3] D. Hong, L. Gao, N. Yokoya, J. Yao, J. Chanussot, 3042959. Q. Du, B. Zhang, More diverse means better: Mul- timodal deep learning meets remote-sensing im- agery classification, IEEE Transactions on Geo- science and Remote Sensing 59 (2021) 4340–4354. doi:10.1109/TGRS.2020.3016820. [4] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, J. Chanussot, Graph convolutional networks for hyperspectral image classification, IEEE Transac- tions on Geoscience and Remote Sensing 59 (2021) 5966–5978. doi:10.1109/TGRS.2020.3015157. [5] Y. Ding, X. Zhao, Z. Zhang, W. Cai, N. Yang, Graph sample and aggregate-attention network for hyper- spectral image classification, IEEE Geoscience and Remote Sensing Letters (2021) 1–5. doi:10.1109/ LGRS.2021.3062944. [6] Y. Ding, X. Zhao, Z. Zhang, W. Cai, N. Yang, Multi- scale graph sample and aggregate network with