=Paper=
{{Paper
|id=Vol-3052/short11
|storemode=property
|title=Large-Scale Hyperspectral Image Clustering Using Contrastive Learning
|pdfUrl=https://ceur-ws.org/Vol-3052/short11.pdf
|volume=Vol-3052
|authors=Yaoming Cai,,Yan Liu,,Zijia Zhang,,Zhihua Cai,,Xiaobo Liu
|dblpUrl=https://dblp.org/rec/conf/cikm/CaiLZCL21
}}
==Large-Scale Hyperspectral Image Clustering Using Contrastive Learning==
Large-Scale Hyperspectral Image Clustering Using Contrastive Learning Yaoming Cai1 , Yan Liu1 , Zijia Zhang1 , Zhihua Cai1,4 and Xiaobo Liu2,3 1 School of Computer Science, China University of Geosciences, 430074 Wuhan, China 2 School of Automation, China University of Geosciences, 430074 Wuhan, China 3 Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, China 4 Corresponding author Abstract Unsupervised hyperspectral image (HSI) classification is an important but challenging task in the hyperspectral processing community. Despite great success, previous HSI clustering approaches belong to offline clustering which is often performed in a transductive scheme, thus failing to generalize to large-scale and unseen scenes. In this paper, we propose an online and deep clustering model for large-scale HSI clustering, termed spectral-spatial contrastive clustering (SSCC). Specifically, SSCC performs contrastive learning based on a series of semantic-preserving spectral-spatial augmentation to simultaneously maximize the intra-class agreement and inter-class variation, which are implemented by an instance-level contrastive loss and a cluster-level contrastive loss, respectively. The SSCC model is trained in an end-to-end fashion with minibatch, allowing it to efficiently handle large-scale HSIs. We assess the performance of SSCC on real HSI and show that SSCC significantly advances the state-of-the-art results with 8.41% improvement on accuracy. Keywords Hyperspectral image, clustering, self-supervised learning, contrastive learning 1. Introduction they focus on offline HSI clustering tasks, i.e., the clus- tering is dependent upon the whole dataset, which limits Hyperspectral image (HSI) consists of hundreds of nar- their application on large-scale online learning scenarios. row bands with rich spectral and spatial information, To address the first drawback, some attempts, e.g., revealing the spectral property of the area or object of [3, 9, 10], have been made to use deep clustering networks interest at a nanometer resolution [1]. HSI intelligent to learn cluster-friendly deep representations. Such ap- interpretation is one of the hot spots in the current re- proaches can usually improve the clustering performance mote sensing community. With the development of deep upon the shallow ones by significant margins. However, learning techniques, great progress has been made by the second drawback remains an open problem. Zhai et al. training expressive networks with massive labelled data [11] shown that sparsity representation is useful to allevi- [2]. However, current human-annotated datasets suffer ate the issue. Nonetheless, the procedure of constructing from a large amount of manpower, leading to the limita- a suitable dictionary if often heuristic and suboptimal, tion of availability and applicability [3]. particularly cannot be implemented by end-to-end. As Without label information, unsupervised HSI classifi- a result, most current works compromise on this prob- cation becomes a challenging task, thus leading to un- lem by verifying on smaller scenes, lacking dependable competitive accuracy. Many efforts have been devoted to performance evidence from large-scale HSI data. bridging the gap between supervised models and unsu- Fortunately, self-supervised learning (SSL) has emerged pervised models [4]. More recently, subspace clustering as a powerful paradigm to circumvent human annotation (SC) [5, 6] and non-negative matrix factorization (NNMF) [12]. The core idea is to learn to solve a label-free pre- [7, 8] were frequently adopted for HSI clustering. Despite text task, such as colorization [13] and inpainting [14], their promising performance, these approaches collec- enabling the model to capture semantic information. A tively suffer from two drawbacks. First, they are based downstream task will benefit from the pre-trained model on shallow feature representation and failing to capture by fine-tuning and transfer learning. According to their high-level spectral-spatial information, which results in objectives, pretext tasks can be broadly classed into three poorer robustness and generalization ability. Second, categories [15]: generative, contrastive, and adversarial. CDCEO 2021: 1st Workshop on Complex Data Challenges in Earth The tremendous success of recent contrastive learning Observation, November 1, 2021, Virtual Event, QLD, Australia. models including SimCLR [16], BYOL [17], MoCo [18], " caiyaom@cug.edu.cn (Y. Cai); yanliu@cug.edu.cn (Y. Liu); and BalowTwins [19], has proven that contrastive learn- zhangzijia@cug.edu.cn (Z. Zhang); zhcai@cug.edu.cn (Z. Cai); ing tends to be a more promising branch. The pretext xbliu@cug.edu.cn (X. Liu) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative in contrastive learning is to maximize the similarity be- CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) tween two positive views of every sample that are auto- as the backbone and the projection heads respectively, in which the instance/cluster projection head transforms ππ ππ Projection data into 128 and πΆ dimension, where πΆ denotes the ππ ResNet-18 Maximize Head ππΌ β number of targets. We use the cluster projection head to 1 ππ perform clustering at the inference stage. Shared ππ β ππ 2 Projection 2.2. Spectral-Spatial Augmentation ππ ResNet-18 Maximize Head ππΆ β ππ ππ Formally, let π₯ be an HSI sample in Rπ1 Γπ2 Γπ , where π1 Γ π2 is the spatial size and π denotes the number of Figure 1: The overall framework of our proposed SSCC. Two spectral band. We construct positive pair by forwarding augmentations are sampled from an augmentation pool π― π₯ to two augmentations π―π and π―π sampling from an and applied to input patches. A shared backbone encoder π (Β·) augmentation pool π― . Formally, π₯ = π― (π₯) and π₯ = π π π and two projection heads, i.e., instance-level projection head π―π (π₯), where π―π , π―π β π― . ππΌ (Β·) and cluster-level projection head ππΆ (Β·), are trained to Based on the characteristics of HSI, the augmentation simultaneously maximize the agreement between instance representations π§ π and π§ π , and cluster representations π¦ π pool consists of spectral augmentations and spatial aug- and π¦ π via a contrastive loss. mentations. The spectral augmentations include group band permutation and band random drop, and the spa- tial augmentations include random crop with resize and random horizontal/vertical flip. Precisely, group band matically generated by data augmentation. permutation divides π bands into π adjacent groups and In this paper, we propose a spectral-spatial contrastive randomly permutes spectral bands within each group. clustering (SSCC) approach for large-scale HSI. The ap- Band random drop will mask a spectral band with a prob- proach takes ResNet-18 as the backbone and consists of ability of π. The spatial augmentations are the same as an instance-level contrastive head and a cluster-level con- the pipelines defined in torchvision1 . trastive head. Considering the inherent spectral-spatial properties of HSI, we introduce several semantic-preserved augmentation strategies, including ResizedCrop, Hori- 2.3. Projection Heads zontal/Vertical Flip, and GroupBandShuffle. The pro- SSCC contains two projection heads. We use ππΌ (Β·) and posed approach has some unique advantages: 1) SSCC ππΆ (Β·) to denote the instance-level projection head and performs clustering and deep feature learning simultane- cluster-level projection head. Each head takes π₯π and π₯π ously; 2) SSCC adopts minibatch training in an end-to- as inputs and produces a pair of predictions, i.e., denot- end fashion, thus it is inherently suitable for large-scale ing as π§ π and π§ π for ππΌ (Β·) and π¦ and π¦ for ππΆ (Β·). The π π HSI scenes; 3) SSCC is an online clustering model and goal of ππΌ (Β·) is to encourage the intra-class agreement, can easily generalize to unseen data. instead and ππΆ (Β·) aims to encourage the inter-class varia- tion. Specifically, we achieve these by defining the follow- ing contrastive losses. Let {π₯(1) (π ) π , ..., π₯π , π₯π (π +1) (2π ) , ..., π₯π } 2. Methodology be 2π augmented samples with batch size of π . The Motivated by recent contrastive clustering developments instance-level contrastive loss over sample π₯π is given π in visual representation learning [20], which performs by clustering jointly with contrastive learning, we introduce β β (οΈ (π) (π) sim π§ π ,π§ )οΈ β β a novel SSCC approach for large-scale HSI clustering. β expβ β π―πΌ π β β β (π) βπ = β log β β‘ β β (οΈ )οΈ β β (οΈ β )οΈ ββ€ (π) (π) (π) (π) β π β’ sim π§ π ,π§ π sim π§ π ,π§ β βοΈ π ββ₯ β 2.1. Overall β£expβ β +expβ β β β π―πΌ π―πΌ β β¦ π=1 The core of SSCC is to maximize the similarity between (1) representations of positive pairs from both instance space Here, π― πΌ denotes a temperature parameter and π ππ is and cluster space, as shown in Fig. 1. The SSCC conducts a similarity function. We adopt cosine similarity in the a spectral-spatial augmentations, then the augmented paper, i.e., pairs are forwarded into a weight-sharing backbone en- (οΈ )οΈ (οΈ )οΈπ coder, π (Β·), resulting in deep representations βπ and (οΈ )οΈ π§ (π) π§ (π) βπ . Behind the encoder, projection heads consisting of sim π§ (π) , π§ (π) = (2) βπ§ (π) β βπ§ (π) β instance projection head and cluster projection head are carried out to maximize the similarity between prediction pairs. More specifically, we adopt ResNet-18 and MLPs 1 https://pytorch.org/ Similarly, we calculate the loss of π₯π by βπ . The (π) (π) Patch Size batched instance-level contrastive loss is defined as βπΌ = 7Γ7 11Γ11 15Γ15 π 9Γ9 13Γ13 17Γ17 1 (π) βπ + βπ . (π) 57.5 βοΈ 2π π=1 55.0 7.5 Instead, the cluster-level contrastive loss is defined on an inter-class space, i.e., 52.5 7.0 50.0 ACC (%) Loss 6.5 β (οΈ )οΈ β (Β·π) (Β·π) β β expβ β sim π¦ π ,π¦ π―πΆ π β β 47.5 β β 45.0 (π) βΜπ = β logβ 6.0 β β‘ β (οΈ )οΈ β β (οΈ β )οΈ ββ€ β (Β·π) (Β·π) (Β·π) (Β·π) πΆ β’ sim π¦ π ,π¦ π sim π¦ π ,π¦ 42.5 β βοΈ π ββ₯ β β£expβ β +expβ β β β π―πΆ π―πΆ 5.5 β β¦ π=1 (3) 40.0 0 100 200 300 400 500 where π―πΆ is another temperature parameter. Further- Training Epoch more, the cluster-level contrastive loss can be defined as Figure 2: Loss/ACCs curves along with model training, where πΆ ACC is obtained under varying patch size. 1 βοΈ (οΈ (π) (π) )οΈ βπΆ = βΜπ + βΜπ β π» (π ) , (4) 2πΆ π=1 where π» (π ) denotes the entropy of cluster assignment 3.2. Comparisons with State of the Arts probabilities across the whole augmented minibatch, which is used to avoid the trivial solution, and can be computed Table 1 reports the comparative results of different HSI by clustering approaches. We can see that SSCC achieves state-of-the-art clustering results in terms of OA, Kappa, . and purity. In particular, SSCC (OA=57.07) improves πΆ (οΈ )οΈ (οΈ (οΈ )οΈ)οΈ (οΈ )οΈ (οΈ (οΈ )οΈ)οΈ βοΈ (Β·π) (Β·π) (Β·π) (Β·π) π» (π ) = π π¦ π log π π¦ π + π π¦π log π π¦ π π=1 upon JSSC (OA=48.66) by a margin of 8.41 points. Fur- (5) thermore, SSCC accurately distinguishes class Nos. 4, 8, Finally, the complete training loss function of SSCC is 13, and 15, which is remarkably better than other base- indicated as lines. This demonstrates that SSL-based HSI clustering β = βπΌ + βπΆ . (6) not only has obvious theoretical edges but also has sig- nificant practical effectiveness. 2.4. Training and Predicting The proposed SSCC can be trained in an end-to-end fash- 3.3. Ablation Studies ion. Specifically, we adopt the widely-used Adam ap- In Fig. 2, we show the effect of patch size and training proach as the optimizer with a learning rate of 0.00002, epoch. From the curves, we can conclude that: 1) The batch size of 128, and πΏ2 regularizer of 0.00005. During clustering ACC of SSCC is dramatically increased along the inference stage, we feedforward any given sample with the training; 2) SSCC achieves considerable cluster- and lock the spectral-spatial augmentation process, the ing ACC at a completely random initial status (epoch=0), output of the cluster-level projection head is regarded as signifying the feature representation power of SSL; 3) the prediction of the sample. A larger patch size is often more beneficial to the SSCC model, especially when using a small train epoch. 3. Experiments We further present the evolution of feature representa- tions across the training process of SSCC, as shown in Fig. 3.1. Datasets and Setup 3. It can be seen that the features tend to become more compact within a certain class and more separable from In this paper, we conduct experiments on the widely-used each other class. This proves that our SSCC can capture Indian Pines dataset. We follow the training settings sug- the intrinsic spectral-spatial information of HSI and ob- gested in [20] and the baselines reported in [11]. Several tain superior clustering performance and generalization clustering metrics are used to quantize the clustering ability. performance, including producerβs accuracy, overall ac- curacy (OA), Kappa coefficient (Kappa), and purity. It should be noted that more datasets and more extensive 4. Conclusion experiments will be provided in our future work. This paper presented a novel SSCC model for large-scale HSI online clustering task. The SSCC model follows a Table 1 Comparative results on the Indian Pines dataset. Class No. FCM FCM-S1 SSC-S L2-SSC LRSC SGCNR FSCAG SCC JSCC SSCC 1 39.13 15.22 0 0 52.17 26.52 9.57 8.26 13.48 0 2 25.63 26.58 26.75 42.58 27.31 36.13 23.85 27.87 33.40 29.48 3 24.58 26.10 36.87 18.80 1.81 22.80 34.36 41.47 30.17 57.71 4 6.33 13.92 10.13 0 13.50 13.00 24.47 9.62 16.12 100 5 44.31 46.29 62.73 62.32 57.97 44.68 40.08 46.00 57.93 65.84 6 26.99 29.78 72.05 80.41 32.47 44.08 39.45 52.41 55.97 95.34 7 0 0 0 89.29 0 14.29 0 12.86 0.71 0 8 86.61 98.83 100 54.39 28.03 66.78 92.59 78.62 72.51 100 9 20.00 29.00 0 25.00 25.00 12.00 9.00 5.00 15.00 0 10 23.15 25.60 35.08 46.81 17.18 34.75 29.12 27.65 48.66 48.05 11 28.19 26.66 37.68 37.19 69.86 34.60 30.22 39.41 58.71 34.22 12 23.61 24.72 30.69 31.53 19.90 15.82 22.56 12.65 19.26 75.04 13 99.02 98.73 99.02 95.61 23.90 74.54 87.71 87.12 61.27 100 14 34.16 32.98 49.33 45.69 41.03 44.36 37.45 38.62 70.31 69.33 15 17.62 18.81 15.54 15.28 19.17 15.60 17.67 19.07 25.49 100 16 59.14 58.06 97.85 94.62 0 59.14 65.81 69.03 37.63 0 OA(%) 31.35 32.70 43.37 43.11 36.68 36.31 34.70 37.76 48.66 57.07 Kappa 0.2561 0.2695 0.3757 0.3667 0.2713 0.2946 0.2887 0.3091 0.4254 0.5390 Purity 0.5015 0.5082 0.5588 0.5670 0.4571 0.5105 0.5137 0.5222 0.5689 0.7475 100 100 100 75 75 50 50 50 25 25 0 0 0 25 25 50 50 50 75 75 100 100 100 75 50 25 0 25 50 75 100 75 50 25 0 25 50 75 100 100 50 0 50 100 (a) Epoch=0 (b) Epoch=100 (c) Epoch=200 100 100 100 75 75 75 50 50 50 25 25 25 0 0 0 25 25 25 50 50 50 75 75 75 100 100 100 100 75 50 25 0 25 50 75 100 100 75 50 25 0 25 50 75 100 100 75 50 25 0 25 50 75 100 (d) Epoch=300 (e) Epoch=400 (f) Epoch=500 Figure 3: The evolution of feature representations across the training process, where the features for t-SNE are computed from the backbone. contrastive learning pipeline and consists of two pro- HSI show that SSCC can achieve state-of-the-art cluster- jection heads associated with the instance-level and the ing performance with significant margins over previous cluster-level contrasting. Furthermore, we introduced a works. The success of SSCC offers a powerful alternative semantics-preserving augmentation pool based on the for unsupervised HSI classification. It should be noted characteristic of HSI. SSCC is featured by offline cluster- that this is a preliminary work and further analysis on ing, minibatch and end-to-end training, making it easy the proposed method will be conducted in our future to deal with large-scale HSI. Experimental results on real works. 5. Acknowledgments [10] J. Lei, X. Li, B. Peng, L. Fang, N. Ling, Q. Huang, Deep spatial-spectral subspace clustering for hyper- This work was supported in part by the National Nat- spectral image, IEEE Transactions on Circuits and ural Science Foundation of China (NSFC) under Grant Systems for Video Technology 31 (2021) 2686β2697. Nos. 61773355 and 61973285, the Fundamental Research [11] H. Zhai, H. Zhang, L. Zhang, P. Li, Sparsity-based Funds for National Universities, China University of Geo- clustering for large hyperspectral remote sensing sciences (Wuhan) under Grant CUGGC03 and 1910491T06, images, IEEE Transactions on Geoscience and and the National Scholarship for Building High Level Uni- Remote Sensing (2020) 1β15. doi:10.1109/TGRS. versities, China Scholarship Council (No. 202006410044). 2020.3032427. [12] L. Jing, Y. Tian, Self-supervised visual feature learn- ing with deep neural networks: A survey, IEEE References Transactions on Pattern Analysis and Machine In- [1] Y. Cai, X. Liu, Z. Cai, Bs-nets: An end-to-end frame- telligence (2020) 1β1. doi:10.1109/TPAMI.2020. work for band selection of hyperspectral image, 2992393. IEEE Transactions on Geoscience and Remote Sens- [13] R. Zhang, P. Isola, A. A. Efros, Colorful image ing 58 (2020) 1969β1984. colorization, in: European conference on computer [2] P. Ghamisi, B. Rasti, N. Yokoya, Q. Wang, B. Hofle, vision, Springer, 2016, pp. 649β666. L. Bruzzone, F. Bovolo, M. Chi, K. Anders, [14] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, R. Gloaguen, P. M. Atkinson, J. A. Benediktsson, A. A. Efros, Context encoders: Feature learning by Multisource and multitemporal data fusion in re- inpainting, in: Proceedings of the IEEE conference mote sensing: A comprehensive review of the state on computer vision and pattern recognition, 2016, of the art, IEEE Geoscience and Remote Sensing pp. 2536β2544. Magazine 7 (2019) 6β39. [15] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang, [3] Y. Cai, Z. Zhang, Z. Cai, X. Liu, X. Jiang, Q. Yan, J. Tang, Self-supervised learning: Generative or Graph convolutional subspace clustering: A robust contrastive, IEEE Transactions on Knowledge and subspace clustering framework for hyperspectral Data Engineering (2021) 1β1. doi:10.1109/TKDE. image, IEEE Transactions on Geoscience and Re- 2021.3090866. mote Sensing 59 (2021) 4191β4202. [16] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A [4] H. Zhai, H. Zhang, L. Pingxiang, L. Zhang, Hyper- simple framework for contrastive learning of vi- spectral image clustering: Current achievements sual representations, in: H. D. III, A. Singh (Eds.), and future lines, IEEE Geoscience and Remote Sens- Proceedings of the 37th International Conference ing Magazine (2021). on Machine Learning, volume 119 of Proceedings of [5] H. Zhang, H. Zhai, L. Zhang, P. Li, Spectral-spatial Machine Learning Research, PMLR, 2020, pp. 1597β sparse subspace clustering for hyperspectral remote 1607. sensing images, IEEE Transactions on Geoscience [17] J.-B. Grill, F. Strub, F. AltchΓ©, C. Tallec, P. H. and Remote Sensing 54 (2016) 3672β3684. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, [6] H. Zhai, H. Zhang, L. Zhang, P. Li, Nonlocal means Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, regularized sketched reweighted sparse and low- R. Munos, M. Valko, Bootstrap Your Own Latent: rank subspace clustering for large hyperspectral A new approach to self-supervised learning, in: images, IEEE Transactions on Geoscience and Re- Neural Information Processing Systems, MontrΓ©al, mote Sensing 59 (2021) 4164β4178. Canada, 2020. [7] L. Zhang, L. Zhang, B. Du, J. You, D. Tao, Hy- [18] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momen- perspectral image unsupervised classification by tum contrast for unsupervised visual representation robust manifold matrix factorization, Information learning, in: Proceedings of the IEEE/CVF Confer- Sciences 485 (2019) 154 β 169. ence on Computer Vision and Pattern Recognition [8] Y. Qin, B. Li, W. Ni, S. Quan, P. Wang, H. Bian, Affin- (CVPR), 2020. ity matrix learning via nonnegative matrix factor- [19] J. Zbontar, L. Jing, I. Misra, Y. LeCun, S. Deny, Bar- ization for hyperspectral imagery clustering, IEEE low twins: Self-supervised learning via redundancy Journal of Selected Topics in Applied Earth Obser- reduction, in: Proceedings of the 38th International vations and Remote Sensing 14 (2021) 402β415. Conference on Machine Learning, 2021. [9] Y. Cai, M. Zeng, Z. Cai, X. Liu, Z. Zhang, Graph [20] Y. Li, P. Hu, Z. Liu, D. Peng, J. T. Zhou, X. Peng, regularized residual subspace clustering network Contrastive clustering, in: Proceedings of the AAAI for hyperspectral image clustering, Information Conference on Artificial Intelligence, volume 35, Sciences 578 (2021) 85β101. AAAI Press, 2021, pp. 8547β8555.