=Paper= {{Paper |id=Vol-3052/short11 |storemode=property |title=Large-Scale Hyperspectral Image Clustering Using Contrastive Learning |pdfUrl=https://ceur-ws.org/Vol-3052/short11.pdf |volume=Vol-3052 |authors=Yaoming Cai,,Yan Liu,,Zijia Zhang,,Zhihua Cai,,Xiaobo Liu |dblpUrl=https://dblp.org/rec/conf/cikm/CaiLZCL21 }} ==Large-Scale Hyperspectral Image Clustering Using Contrastive Learning== https://ceur-ws.org/Vol-3052/short11.pdf
Large-Scale Hyperspectral Image Clustering Using
Contrastive Learning
Yaoming Cai1 , Yan Liu1 , Zijia Zhang1 , Zhihua Cai1,4 and Xiaobo Liu2,3
1
  School of Computer Science, China University of Geosciences, 430074 Wuhan, China
2
  School of Automation, China University of Geosciences, 430074 Wuhan, China
3
  Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, China
4
  Corresponding author


                                             Abstract
                                             Unsupervised hyperspectral image (HSI) classification is an important but challenging task in the hyperspectral processing
                                             community. Despite great success, previous HSI clustering approaches belong to offline clustering which is often performed
                                             in a transductive scheme, thus failing to generalize to large-scale and unseen scenes. In this paper, we propose an online
                                             and deep clustering model for large-scale HSI clustering, termed spectral-spatial contrastive clustering (SSCC). Specifically,
                                             SSCC performs contrastive learning based on a series of semantic-preserving spectral-spatial augmentation to simultaneously
                                             maximize the intra-class agreement and inter-class variation, which are implemented by an instance-level contrastive loss and
                                             a cluster-level contrastive loss, respectively. The SSCC model is trained in an end-to-end fashion with minibatch, allowing
                                             it to efficiently handle large-scale HSIs. We assess the performance of SSCC on real HSI and show that SSCC significantly
                                             advances the state-of-the-art results with 8.41% improvement on accuracy.

                                             Keywords
                                             Hyperspectral image, clustering, self-supervised learning, contrastive learning



1. Introduction                                                                                                       they focus on offline HSI clustering tasks, i.e., the clus-
                                                                                                                      tering is dependent upon the whole dataset, which limits
Hyperspectral image (HSI) consists of hundreds of nar-                                                                their application on large-scale online learning scenarios.
row bands with rich spectral and spatial information,                                                                    To address the first drawback, some attempts, e.g.,
revealing the spectral property of the area or object of                                                              [3, 9, 10], have been made to use deep clustering networks
interest at a nanometer resolution [1]. HSI intelligent                                                               to learn cluster-friendly deep representations. Such ap-
interpretation is one of the hot spots in the current re-                                                             proaches can usually improve the clustering performance
mote sensing community. With the development of deep                                                                  upon the shallow ones by significant margins. However,
learning techniques, great progress has been made by                                                                  the second drawback remains an open problem. Zhai et al.
training expressive networks with massive labelled data                                                               [11] shown that sparsity representation is useful to allevi-
[2]. However, current human-annotated datasets suffer                                                                 ate the issue. Nonetheless, the procedure of constructing
from a large amount of manpower, leading to the limita-                                                               a suitable dictionary if often heuristic and suboptimal,
tion of availability and applicability [3].                                                                           particularly cannot be implemented by end-to-end. As
   Without label information, unsupervised HSI classifi-                                                              a result, most current works compromise on this prob-
cation becomes a challenging task, thus leading to un-                                                                lem by verifying on smaller scenes, lacking dependable
competitive accuracy. Many efforts have been devoted to                                                               performance evidence from large-scale HSI data.
bridging the gap between supervised models and unsu-                                                                     Fortunately, self-supervised learning (SSL) has emerged
pervised models [4]. More recently, subspace clustering                                                               as a powerful paradigm to circumvent human annotation
(SC) [5, 6] and non-negative matrix factorization (NNMF)                                                              [12]. The core idea is to learn to solve a label-free pre-
[7, 8] were frequently adopted for HSI clustering. Despite                                                            text task, such as colorization [13] and inpainting [14],
their promising performance, these approaches collec-                                                                 enabling the model to capture semantic information. A
tively suffer from two drawbacks. First, they are based                                                               downstream task will benefit from the pre-trained model
on shallow feature representation and failing to capture                                                              by fine-tuning and transfer learning. According to their
high-level spectral-spatial information, which results in                                                             objectives, pretext tasks can be broadly classed into three
poorer robustness and generalization ability. Second,                                                                 categories [15]: generative, contrastive, and adversarial.
CDCEO 2021: 1st Workshop on Complex Data Challenges in Earth                                                          The tremendous success of recent contrastive learning
Observation, November 1, 2021, Virtual Event, QLD, Australia.                                                         models including SimCLR [16], BYOL [17], MoCo [18],
" caiyaom@cug.edu.cn (Y. Cai); yanliu@cug.edu.cn (Y. Liu);                                                            and BalowTwins [19], has proven that contrastive learn-
zhangzijia@cug.edu.cn (Z. Zhang); zhcai@cug.edu.cn (Z. Cai);                                                          ing tends to be a more promising branch. The pretext
xbliu@cug.edu.cn (X. Liu)
                                       Β© 2021 Copyright for this paper by its authors. Use permitted under Creative   in contrastive learning is to maximize the similarity be-
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Commons License Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                        tween two positive views of every sample that are auto-
                                                                      as the backbone and the projection heads respectively,
                                                                      in which the instance/cluster projection head transforms
                                                          π’›π‘Ž
                                     π’‰π‘Ž   Projection                  data into 128 and 𝐢 dimension, where 𝐢 denotes the
                 π’™π‘Ž     ResNet-18                          Maximize
                                          Head 𝑔𝐼 β‹…                   number of targets. We use the cluster projection head to
             1
                                                          𝒛𝑏          perform clustering at the inference stage.
                       Shared π‘“πœƒ β‹…

                                                          π’šπ‘Ž
             2                            Projection                  2.2. Spectral-Spatial Augmentation
                 𝒙𝑏     ResNet-18                          Maximize
                                          Head 𝑔𝐢 β‹…
                                     𝒉𝑏
                                                          π’šπ‘
                                                                Formally, let π‘₯ be an HSI sample in R𝑛1 ×𝑛2 Γ—π‘š , where
                                                                𝑛1 Γ— 𝑛2 is the spatial size and π‘š denotes the number of
Figure 1: The overall framework of our proposed SSCC. Two spectral band. We construct positive pair by forwarding
augmentations are sampled from an augmentation pool 𝒯 π‘₯ to two augmentations π’―π‘Ž and 𝒯𝑏 sampling from an
and applied to input patches. A shared backbone encoder 𝑓 (Β·) augmentation pool 𝒯 . Formally, π‘₯ = 𝒯 (π‘₯) and π‘₯ =
                                                                                                       π‘Ž        π‘Ž                𝑏
and two projection heads, i.e., instance-level projection head
                                                                𝒯𝑏 (π‘₯), where π’―π‘Ž , 𝒯𝑏 ∈ 𝒯 .
𝑔𝐼 (Β·) and cluster-level projection head 𝑔𝐢 (Β·), are trained to
                                                                   Based on the characteristics of HSI, the augmentation
simultaneously maximize the agreement between instance
representations 𝑧 π‘Ž and 𝑧 𝑏 , and cluster representations 𝑦 π‘Ž pool consists of spectral augmentations and spatial aug-
and 𝑦 𝑏 via a contrastive loss.                                 mentations. The spectral augmentations include group
                                                                band permutation and band random drop, and the spa-
                                                                tial augmentations include random crop with resize and
                                                                random horizontal/vertical flip. Precisely, group band
matically generated by data augmentation.                       permutation divides π‘š bands into π‘˜ adjacent groups and
   In this paper, we propose a spectral-spatial contrastive randomly permutes spectral bands within each group.
clustering (SSCC) approach for large-scale HSI. The ap- Band random drop will mask a spectral band with a prob-
proach takes ResNet-18 as the backbone and consists of ability of 𝑝. The spatial augmentations are the same as
an instance-level contrastive head and a cluster-level con- the pipelines defined in torchvision1 .
trastive head. Considering the inherent spectral-spatial
properties of HSI, we introduce several semantic-preserved
augmentation strategies, including ResizedCrop, Hori- 2.3. Projection Heads
zontal/Vertical Flip, and GroupBandShuffle. The pro- SSCC contains two projection heads. We use 𝑔𝐼 (Β·) and
posed approach has some unique advantages: 1) SSCC 𝑔𝐢 (Β·) to denote the instance-level projection head and
performs clustering and deep feature learning simultane- cluster-level projection head. Each head takes π‘₯π‘Ž and π‘₯𝑏
ously; 2) SSCC adopts minibatch training in an end-to- as inputs and produces a pair of predictions, i.e., denot-
end fashion, thus it is inherently suitable for large-scale ing as 𝑧 π‘Ž and 𝑧 𝑏 for 𝑔𝐼 (Β·) and 𝑦 and 𝑦 for 𝑔𝐢 (Β·). The
                                                                                                   π‘Ž           𝑏
HSI scenes; 3) SSCC is an online clustering model and goal of 𝑔𝐼 (Β·) is to encourage the intra-class agreement,
can easily generalize to unseen data.                           instead and 𝑔𝐢 (Β·) aims to encourage the inter-class varia-
                                                                tion. Specifically, we achieve these by defining the follow-
                                                                ing contrastive losses. Let {π‘₯(1)       (𝑁 )
                                                                                              π‘Ž , ..., π‘₯π‘Ž    , π‘₯𝑏
                                                                                                                 (𝑁 +1)         (2𝑁 )
                                                                                                                        , ..., π‘₯𝑏     }
2. Methodology                                                  be 2𝑁 augmented samples with batch size of 𝑁 . The
Motivated by recent contrastive clustering developments instance-level contrastive loss over sample π‘₯𝑖 is given
                                                                                                                       π‘Ž

in visual representation learning [20], which performs by
clustering jointly with contrastive learning, we introduce                        βŽ›                     βŽ›       (οΈ‚
                                                                                                                   (𝑖) (𝑖)
                                                                                                             sim 𝑧 π‘Ž ,𝑧
                                                                                                                           )οΈ‚ ⎞                     ⎞
a novel SSCC approach for large-scale HSI clustering.                            ⎜                    exp⎝
                                                                                                         ⎜
                                                                                                                   𝒯𝐼
                                                                                                                        𝑏
                                                                                                                             ⎠
                                                                                                                              ⎟
                                                                                                                                                ⎟
                                                                       (𝑖)
                                                                      β„’π‘Ž = βˆ’ log ⎜ ⎑
                                                                                 ⎜          βŽ›    (οΈ‚             )οΈ‚ ⎞     βŽ›        (οΈ‚
                                                                                                                                                ⎟
                                                                                                                                            )οΈ‚ ⎞⎀
                                                                                                    (𝑖) (𝑗)                      (𝑖) (𝑗)        ⎟
                                                                                   𝑁 ⎒        sim 𝑧 π‘Ž ,𝑧 π‘Ž                 sim 𝑧 π‘Ž ,𝑧
                                                                                 ⎝ βˆ‘οΈ€                                                 𝑏      ⎟βŽ₯ ⎠
2.1. Overall                                                                            ⎣exp⎝                    ⎠+exp⎝
                                                                                            ⎜                    ⎟       ⎜
                                                                                                       𝒯𝐼                              𝒯𝐼    ⎠⎦
                                                                                      𝑗=1


The core of SSCC is to maximize the similarity between                                                                    (1)
representations of positive pairs from both instance space Here,   𝒯 𝐼 denotes    a temperature    parameter     and π‘ π‘–π‘š   is
and cluster space, as shown in Fig. 1. The SSCC conducts   a similarity   function.   We  adopt  cosine    similarity in the
a spectral-spatial augmentations, then the augmented paper, i.e.,
pairs are forwarded into a weight-sharing backbone en-                                       (︁      )︁ (︁    )︁𝑇
coder, 𝑓 (Β·), resulting in deep representations β„Žπ‘Ž and                      (︁          )︁      𝑧 (𝑖) 𝑧 (𝑗)
β„Žπ‘ . Behind the encoder, projection heads consisting of                sim 𝑧 (𝑖) , 𝑧 (𝑗) =                                (2)
                                                                                               ‖𝑧 (𝑖) β€– ‖𝑧 (𝑗) β€–
instance projection head and cluster projection head are
carried out to maximize the similarity between prediction
pairs. More specifically, we adopt ResNet-18 and MLPs          1
                                                                 https://pytorch.org/
Similarly, we calculate the loss of π‘₯𝑏 by ℒ𝑏 . The
                                                    (𝑖)              (𝑖)
                                                                                                           Patch Size
batched instance-level contrastive loss is defined as ℒ𝐼 =                                           7Γ—7      11Γ—11     15Γ—15
     𝑁                                                                                               9Γ—9      13Γ—13     17Γ—17
 1        (𝑖)
        β„’π‘Ž + ℒ𝑏 .
                 (𝑖)
                                                                                              57.5
    βˆ‘οΈ€
2𝑁
    𝑖=1
                                                                                              55.0                              7.5
  Instead, the cluster-level contrastive loss is defined on
an inter-class space, i.e.,                                                                   52.5                              7.0
                                                                                              50.0




                                                                                    ACC (%)




                                                                                                                                  Loss
                                                                                                                                6.5
                                   βŽ›      (οΈ‚             )οΈ‚ ⎞
                                             (·𝑖) (·𝑖)
            βŽ›                                                                   ⎞
                               exp⎝
                                  ⎜
                                       sim 𝑦 π‘Ž ,𝑦
                                              𝒯𝐢
                                                  𝑏       ⎟
                                                          ⎠                                   47.5
            ⎜                                                              ⎟
                                                                                              45.0
  (𝑖)
β„’Μ‚π‘Ž = βˆ’ log⎜                                                                                                                    6.0
           ⎜      ⎑   βŽ›      (οΈ‚           )οΈ‚ ⎞     βŽ›    (οΈ‚
                                                                           ⎟
                                                                     )οΈ‚ ⎞⎀ ⎟
                                (·𝑖) (·𝑗)                  (·𝑖) (·𝑗)
              𝐢 ⎒         sim 𝑦 π‘Ž ,𝑦 π‘Ž               sim 𝑦 π‘Ž ,𝑦
                                                                                              42.5
            ⎝ βˆ‘οΈ€                                                𝑏       ⎟βŽ₯ ⎠
                  ⎣exp⎝                        ⎠+exp⎝
                      ⎜                        ⎟   ⎜
                                 𝒯𝐢                             𝒯𝐢
                                                                                                             5.5
                                                                           ⎠⎦
                𝑗=1

                                                     (3)                                      40.0
                                                                   0     100     200      300    400     500
where 𝒯𝐢 is another temperature parameter. Further-                             Training Epoch
more, the cluster-level contrastive loss can be defined
as                                                       Figure 2: Loss/ACCs curves along with model training, where
                    𝐢                                    ACC is obtained under varying patch size.
               1 βˆ‘οΈ (︁ (𝑖)        (𝑖)
                                      )︁
       ℒ𝐢 =              β„’Μ‚π‘Ž + ℒ̂𝑏 βˆ’ 𝐻 (π‘Œ ) ,        (4)
              2𝐢 𝑖=1

where 𝐻 (π‘Œ ) denotes the entropy of cluster assignment 3.2. Comparisons with State of the Arts
probabilities across the whole augmented minibatch, which
is used to avoid the trivial solution, and can be computed Table 1 reports the comparative results of different HSI
by                                                                            clustering approaches. We can see that SSCC achieves
                                                                              state-of-the-art clustering results in terms of OA, Kappa,
                                                                            . and purity. In particular, SSCC (OA=57.07) improves
         𝐢   (︁      )︁    (︁ (︁      )︁)︁    (︁      )︁    (︁ (︁      )︁)︁
        βˆ‘οΈ€      (·𝑖)             (·𝑖)            (·𝑖)             (·𝑖)
𝐻 (π‘Œ ) = 𝑃 𝑦 π‘Ž          log 𝑃 𝑦 π‘Ž          + 𝑃 𝑦𝑏        log 𝑃 𝑦 𝑏
        𝑖=1                                                                   upon JSSC (OA=48.66) by a margin of 8.41 points. Fur-
                                                                        (5) thermore, SSCC accurately distinguishes class Nos. 4, 8,
Finally, the complete training loss function of SSCC is 13, and 15, which is remarkably better than other base-
indicated as                                                                  lines. This demonstrates that SSL-based HSI clustering
                           β„’ = ℒ𝐼 + ℒ𝐢 .                                (6) not only has obvious theoretical edges but also has sig-
                                                                              nificant practical effectiveness.
2.4. Training and Predicting
The proposed SSCC can be trained in an end-to-end fash-                             3.3. Ablation Studies
ion. Specifically, we adopt the widely-used Adam ap-       In Fig. 2, we show the effect of patch size and training
proach as the optimizer with a learning rate of 0.00002,   epoch. From the curves, we can conclude that: 1) The
batch size of 128, and 𝐿2 regularizer of 0.00005. During   clustering ACC of SSCC is dramatically increased along
the inference stage, we feedforward any given sample       with the training; 2) SSCC achieves considerable cluster-
and lock the spectral-spatial augmentation process, the    ing ACC at a completely random initial status (epoch=0),
output of the cluster-level projection head is regarded as signifying the feature representation power of SSL; 3)
the prediction of the sample.                              A larger patch size is often more beneficial to the SSCC
                                                           model, especially when using a small train epoch.
3. Experiments                                                We further present the evolution of feature representa-
                                                           tions across the training process of SSCC, as shown in Fig.
3.1. Datasets and Setup                                    3. It can be seen that the features tend to become more
                                                           compact within a certain class and more separable from
In this paper, we conduct experiments on the widely-used each other class. This proves that our SSCC can capture
Indian Pines dataset. We follow the training settings sug- the intrinsic spectral-spatial information of HSI and ob-
gested in [20] and the baselines reported in [11]. Several tain superior clustering performance and generalization
clustering metrics are used to quantize the clustering ability.
performance, including producer’s accuracy, overall ac-
curacy (OA), Kappa coefficient (Kappa), and purity. It
should be noted that more datasets and more extensive 4. Conclusion
experiments will be provided in our future work.
                                                           This paper presented a novel SSCC model for large-scale
                                                           HSI online clustering task. The SSCC model follows a
Table 1
Comparative results on the Indian Pines dataset.
         Class No.                    FCM         FCM-S1             SSC-S         L2-SSC           LRSC             SGCNR              FSCAG             SCC              JSCC           SSCC
         1                            39.13            15.22              0                 0           52.17             26.52           9.57             8.26            13.48              0
         2                            25.63            26.58          26.75             42.58           27.31             36.13          23.85            27.87            33.40          29.48
         3                            24.58            26.10          36.87             18.80            1.81             22.80          34.36            41.47            30.17          57.71
         4                             6.33            13.92          10.13                 0           13.50             13.00          24.47             9.62            16.12           100
         5                            44.31            46.29          62.73             62.32           57.97             44.68          40.08            46.00            57.93          65.84
         6                            26.99            29.78          72.05             80.41           32.47             44.08          39.45            52.41            55.97          95.34
         7                                0                0              0             89.29               0             14.29              0            12.86             0.71              0
         8                            86.61            98.83           100              54.39           28.03             66.78          92.59            78.62            72.51           100
         9                            20.00            29.00              0             25.00           25.00             12.00           9.00             5.00            15.00              0
         10                           23.15            25.60          35.08             46.81           17.18             34.75          29.12            27.65            48.66          48.05
         11                           28.19            26.66          37.68             37.19           69.86             34.60          30.22            39.41            58.71          34.22
         12                           23.61            24.72          30.69             31.53           19.90             15.82          22.56            12.65            19.26          75.04
         13                           99.02            98.73          99.02             95.61           23.90             74.54          87.71            87.12            61.27           100
         14                           34.16            32.98          49.33             45.69           41.03             44.36          37.45            38.62            70.31          69.33
         15                           17.62            18.81          15.54             15.28           19.17             15.60          17.67            19.07            25.49           100
         16                           59.14            58.06          97.85             94.62               0             59.14          65.81            69.03            37.63              0
         OA(%)                        31.35         32.70             43.37             43.11       36.68              36.31             34.70        37.76             48.66             57.07
         Kappa                       0.2561        0.2695            0.3757            0.3667      0.2713             0.2946            0.2887       0.3091            0.4254            0.5390
         Purity                      0.5015        0.5082            0.5588            0.5670      0.4571             0.5105            0.5137       0.5222            0.5689            0.7475



        100                                                                                                                              100
                                                                           100
         75                                                                                                                              75

         50                                                                                                                              50
                                                                           50
         25                                                                                                                              25

                                                                            0                                                             0
          0
                                                                                                                                         25
         25
                                                                           50                                                            50
         50
                                                                                                                                         75
         75
                                                                           100                                                           100
              100     75        50     25     0   25    50      75               100    75    50   25     0     25   50     75    100          100         50          0            50          100

                                (a) Epoch=0                                                  (b) Epoch=100                                                (c) Epoch=200

                                                                           100                                                           100
         100
          75                                                               75                                                             75

          50                                                               50                                                             50

          25                                                               25                                                             25
          0                                                                 0                                                              0
          25                                                                25                                                            25
          50                                                                50                                                            50
          75                                                                75                                                            75
         100                                                               100                                                           100
                100        75    50     25    0   25    50     75    100         100    75    50   25     0     25   50     75    100          100   75     50    25       0   25    50    75     100

                            (d) Epoch=300                                                    (e) Epoch=400                                                (f) Epoch=500
Figure 3: The evolution of feature representations across the training process, where the features for t-SNE are computed
from the backbone.



contrastive learning pipeline and consists of two pro-                                                   HSI show that SSCC can achieve state-of-the-art cluster-
jection heads associated with the instance-level and the                                                 ing performance with significant margins over previous
cluster-level contrasting. Furthermore, we introduced a                                                  works. The success of SSCC offers a powerful alternative
semantics-preserving augmentation pool based on the                                                      for unsupervised HSI classification. It should be noted
characteristic of HSI. SSCC is featured by offline cluster-                                              that this is a preliminary work and further analysis on
ing, minibatch and end-to-end training, making it easy                                                   the proposed method will be conducted in our future
to deal with large-scale HSI. Experimental results on real                                               works.
5. Acknowledgments                                             [10] J. Lei, X. Li, B. Peng, L. Fang, N. Ling, Q. Huang,
                                                                    Deep spatial-spectral subspace clustering for hyper-
This work was supported in part by the National Nat-                spectral image, IEEE Transactions on Circuits and
ural Science Foundation of China (NSFC) under Grant                 Systems for Video Technology 31 (2021) 2686–2697.
Nos. 61773355 and 61973285, the Fundamental Research           [11] H. Zhai, H. Zhang, L. Zhang, P. Li, Sparsity-based
Funds for National Universities, China University of Geo-           clustering for large hyperspectral remote sensing
sciences (Wuhan) under Grant CUGGC03 and 1910491T06,                images, IEEE Transactions on Geoscience and
and the National Scholarship for Building High Level Uni-           Remote Sensing (2020) 1–15. doi:10.1109/TGRS.
versities, China Scholarship Council (No. 202006410044).            2020.3032427.
                                                               [12] L. Jing, Y. Tian, Self-supervised visual feature learn-
                                                                    ing with deep neural networks: A survey, IEEE
References                                                          Transactions on Pattern Analysis and Machine In-
 [1] Y. Cai, X. Liu, Z. Cai, Bs-nets: An end-to-end frame-          telligence (2020) 1–1. doi:10.1109/TPAMI.2020.
     work for band selection of hyperspectral image,                2992393.
     IEEE Transactions on Geoscience and Remote Sens-          [13] R. Zhang, P. Isola, A. A. Efros, Colorful image
     ing 58 (2020) 1969–1984.                                       colorization, in: European conference on computer
 [2] P. Ghamisi, B. Rasti, N. Yokoya, Q. Wang, B. Hofle,            vision, Springer, 2016, pp. 649–666.
     L. Bruzzone, F. Bovolo, M. Chi, K. Anders,                [14] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell,
     R. Gloaguen, P. M. Atkinson, J. A. Benediktsson,               A. A. Efros, Context encoders: Feature learning by
     Multisource and multitemporal data fusion in re-               inpainting, in: Proceedings of the IEEE conference
     mote sensing: A comprehensive review of the state              on computer vision and pattern recognition, 2016,
     of the art, IEEE Geoscience and Remote Sensing                 pp. 2536–2544.
     Magazine 7 (2019) 6–39.                                   [15] X. Liu, F. Zhang, Z. Hou, L. Mian, Z. Wang, J. Zhang,
 [3] Y. Cai, Z. Zhang, Z. Cai, X. Liu, X. Jiang, Q. Yan,            J. Tang, Self-supervised learning: Generative or
     Graph convolutional subspace clustering: A robust              contrastive, IEEE Transactions on Knowledge and
     subspace clustering framework for hyperspectral                Data Engineering (2021) 1–1. doi:10.1109/TKDE.
     image, IEEE Transactions on Geoscience and Re-                 2021.3090866.
     mote Sensing 59 (2021) 4191–4202.                         [16] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A
 [4] H. Zhai, H. Zhang, L. Pingxiang, L. Zhang, Hyper-              simple framework for contrastive learning of vi-
     spectral image clustering: Current achievements                sual representations, in: H. D. III, A. Singh (Eds.),
     and future lines, IEEE Geoscience and Remote Sens-             Proceedings of the 37th International Conference
     ing Magazine (2021).                                           on Machine Learning, volume 119 of Proceedings of
 [5] H. Zhang, H. Zhai, L. Zhang, P. Li, Spectral-spatial           Machine Learning Research, PMLR, 2020, pp. 1597–
     sparse subspace clustering for hyperspectral remote            1607.
     sensing images, IEEE Transactions on Geoscience           [17] J.-B. Grill, F. Strub, F. AltchΓ©, C. Tallec, P. H.
     and Remote Sensing 54 (2016) 3672–3684.                        Richemond, E. Buchatskaya, C. Doersch, B. A. Pires,
 [6] H. Zhai, H. Zhang, L. Zhang, P. Li, Nonlocal means             Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu,
     regularized sketched reweighted sparse and low-                R. Munos, M. Valko, Bootstrap Your Own Latent:
     rank subspace clustering for large hyperspectral               A new approach to self-supervised learning, in:
     images, IEEE Transactions on Geoscience and Re-                Neural Information Processing Systems, MontrΓ©al,
     mote Sensing 59 (2021) 4164–4178.                              Canada, 2020.
 [7] L. Zhang, L. Zhang, B. Du, J. You, D. Tao, Hy-            [18] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momen-
     perspectral image unsupervised classification by               tum contrast for unsupervised visual representation
     robust manifold matrix factorization, Information              learning, in: Proceedings of the IEEE/CVF Confer-
     Sciences 485 (2019) 154 – 169.                                 ence on Computer Vision and Pattern Recognition
 [8] Y. Qin, B. Li, W. Ni, S. Quan, P. Wang, H. Bian, Affin-        (CVPR), 2020.
     ity matrix learning via nonnegative matrix factor-        [19] J. Zbontar, L. Jing, I. Misra, Y. LeCun, S. Deny, Bar-
     ization for hyperspectral imagery clustering, IEEE             low twins: Self-supervised learning via redundancy
     Journal of Selected Topics in Applied Earth Obser-             reduction, in: Proceedings of the 38th International
     vations and Remote Sensing 14 (2021) 402–415.                  Conference on Machine Learning, 2021.
 [9] Y. Cai, M. Zeng, Z. Cai, X. Liu, Z. Zhang, Graph          [20] Y. Li, P. Hu, Z. Liu, D. Peng, J. T. Zhou, X. Peng,
     regularized residual subspace clustering network               Contrastive clustering, in: Proceedings of the AAAI
     for hyperspectral image clustering, Information                Conference on Artificial Intelligence, volume 35,
     Sciences 578 (2021) 85–101.                                    AAAI Press, 2021, pp. 8547–8555.