Hyperspectral Anomaly Detection Based on Low-Rank Structure Exploration Shizhen Chang1 , Pedram Ghamisi1,2 1 Institute of Advanced Research in Artificial Intelligence (IARAI), LandstraรŸer HauptstraรŸe 5, 1030 Vienna, Austria. 2 Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz Institute Freiberg for Resource Technology, Machine Learning Group, Chemnitzer Str. 40, D-09599 Freiberg, Germany Abstract As one of the typical research area in unsupervised hyperspectral image learning, anomaly detection needs to accomplish the abnormal pixels separation process without prior spectral knowledge. Recently, the representation-based detectors which can find the spectral similarity between pixels under no statistical distribution assumption have attracted extensive attention and been frequently used. To this end, low-rank regularization methods can approximately decompose the hyperspectral data into a low-rank background part and a sparse anomaly part. Based on the theory of representation and self-representation, this paper proposed a double low-rank regularization (DLRR) model for hyperspectral anomaly detection. To further explore the reconstructed structure differences between the original data and the assumed background, the residual of their corresponding low-rank coefficient matrices are computed and utilized as a part of the detection output together with the column-wise โ„“2 norm of the sparse matrix. Experiments carried out on two real-world hyperspectral datasets show promising performances compared with other state-of-the-art detectors. Keywords Hyperspectral imagery, anomaly detection, low-rank representation, sketched-subspace clustering 1. Introduction limitations in practical applications. To overcome the insufficient accuracy happened in With continuous and redundant spectral bands, hyper- the distribution-based models, the representation-based spectral images (HSIs) carry a wealth of spectral and detectors have been proposed and shown intended per- spatial information of land-covers [1, 2]. This promotes formances. Representative methods are the collaborative military and civilian applications utilizing the spectral representation detector (CRD) [11], the background joint characteristics of different materials. And lots of research sparse representation detector (BJSRD) [12], etc. A dual works have been conducted, such as feature extraction concentrate window is utilized to extract the possible [3], noise reduction [4], unmixing [5], classification [6], background information as the confidence dictionaries and detection [7], etc. As a special branch of HSIs re- of the background at each test pixel. And the detection searches, anomaly detection aims to extract potential result is approximately derived by calculating the repre- abnormal pixels without any prior knowledge [8]. There- sentation residual of the pixel. Nowadays, the low-rank fore, suitable methods need to be designed. representation is widely used in hyperspectral anomaly Traditionally, classic anomaly detectors have been de- detection which takes advantage of the repeatability of veloped mostly based on the assumption of data sta- the background spectrum and decomposes the original tistical distributions. The benchmark RX detector, the data matrix. Considering that the anomalies are usually cluster-based anomaly detection (CBAD) [7] algorithm, rare and sparse, Chen et. al. [13] first utilized the low- the blocked adaptive computationally efficient outlier rank decomposition model for anomaly detection. Later, nominators (BACON) [9] and the random selection-based many types of research have been carried out based on anomaly detector (RSAD) [10] assume that the data fol- the low-rankness of the background subspace and the lows the Gaussian or Gaussian mixture distributions, sparsity of the anomaly subspace [14, 15]. However, after then they implement the detection task according to the doing the low-rank decomposition, the detection decision Mahalanobis distance between the pixel-under-test and of these methods either focus on analyzing the sparse ma- the background. However, this hypothesis has obvious trix or back to the statistical estimation, a better combina- tion of the assumed background component and anomaly CDCEO 2021: 1st Workshop on Complex Data Challenges in Earth component may let the detection more reasonable. Observation, November 1, 2021, Virtual Event, QLD, Australia. As is well known, subspace clustering (SC) is gradu- " szchang@whu.edu.cn (S. Chang); pedram.ghamisi@iarai.ac.at (P. Ghamisi) ally developed for unsupervised HSIs interpretation [16].  0000-0001-2345-6789 (S. Chang); 0000-0003-1203-741X It can learn the similarity between pixels through self- (P. Ghamisi) dictionary learning. Inspired by the sketched-SC and ยฉ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). representation theory, we propose a double low-rank CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) regularization model for hyperspectral anomaly detec- where ๐›ฝ and ๐œ† are the regularization parameters, ๐ด is the tion. The proposed method is solved via the alternat- self-representation coefficient matrix, ๐‘ is the assumed ing direction method of multipliers (ADMMs) method background coefficient matrix, and E denotes the sparse [17] and the anomalies are finally detected by computing part indicating the anomalies. a background-related residual matrix and an anomaly- Considering that the difference of ๐ด and ๐‘ may reflect related distance matrix. the abnormal information of the image, then the residual of these two matrices can be utilized as the reference of detection output when ๐‘ก = ๐‘›. And the final detection 2. Proposed method result is formulated by the sum of the column-wise โ„“2 norm of the low-rank coefficient matrices residual and 2.1. Related works the sparse coefficient matrix: Let X = {x๐‘– }๐‘ ๐‘–=1 โˆˆ R ๐ฟร—๐‘ denotes the data collections, where ๐ฟ is the number of bands and ๐‘ is the total pixel ๐ท๐ท๐ฟ๐‘…๐‘… (x๐‘– ) = ||๐ด:,๐‘– โˆ’ ๐‘:,๐‘– ||2 + ||๐ธ:,๐‘– ||2 . (4) number of the image. Then, the low-rank representation model aims to decompose the data into a lowest-rank 2.3. Problem optimization matrix L โˆˆ R๐ฟร—๐‘ and a sparse matrix E โˆˆ R๐ฟร—๐‘ . The To solve the proposed DLRR model, the ADMM method optimization problem is given by: is employed and the detailed optimization process is de- min ||L||โ‹† + ๐›ผ||E||0 ๐‘ .๐‘ก. X = L + E, (1) scribed as follows. L,E First, ๐ต and ๐ป are introduced as the auxiliary variables for the coefficient matrix ๐ด and ๐‘, respectively: where ๐›ผ is the regularization parameter, || ยท ||โ‹† and || ยท ||0 represents the nuclear norm and the โ„“0 norm, respec- min ||๐ต||โ‹† + ๐›ฝ||๐ป||โ‹† + ๐œ†||E||2,1 tively. ๐ต,๐ด,๐ป,๐‘,E For a given dictionary ๐ท, the low-rank matrix L can ๐‘ .๐‘ก. X = Xฬƒ๐ด, X = ๐ท๐‘ + E, (5) be rewritten as a linear combination of the ๐ท and its ๐ด = ๐ต, ๐‘ = ๐ป. corresponding coefficient matrix ๐‘. And the NP-hard problem eq. (1) can be represented as: Then, the augmented Lagrangian function of (5) can be constructed: min ||๐‘||โ‹† + ๐›ผ||E||2,1 ๐‘ .๐‘ก. X = ๐ท๐‘ + E, (2) ๐‘,E min ||๐ต||โ‹† + ๐›ฝ||๐ป||โ‹† + ๐œ†||E||2,1 ๐ต,๐ด,๐ป,๐‘,E,๐‘Œ1 ,๐‘Œ2 ,๐‘Œ3 ,๐‘Œ4 where ๐ท โˆˆ R๐ฟร—๐‘› has ๐‘› dictionary samples, ๐‘ โˆˆ R๐‘›ร—๐‘ , ๐œŒ ๐œŒ + ||X โˆ’ Xฬƒ๐ด + ๐‘Œ1 /๐œŒ||2๐น + ||X โˆ’ ๐ท๐‘ โˆ’ ๐ธ + ๐‘Œ2 /๐œŒ||2๐น and || ยท ||2,1 is the โ„“2,1 norm of the matrix. โ„“2,1 norm 2 2 can be regarded as the โ„“1 norm of the โ„“2 norm of matrix ๐œŒ ๐œŒ + ||๐ด โˆ’ ๐ต + ๐‘Œ3 /๐œŒ||๐น + ||๐‘ โˆ’ ๐ป + ๐‘Œ4 /๐œŒ||2๐น , 2 columns. 2 2 (6) 2.2. Problem formulation where ๐‘Œ1 โˆˆ R๐ฟร—๐‘ , ๐‘Œ2 โˆˆ R๐ฟร—๐‘ , ๐‘Œ3 โˆˆ R๐‘กร—๐‘ , and Assume that the data can be self-represented: ๐‘Œ4 โˆˆ R๐‘›ร—๐‘ are the Lagrangian multipliers, and ๐œŒ > 0 is the penalty parameter. X = X๐ถ, Then the equation (6) can be divided into five optimiza- tion problems and be updated one by one with iterative where ๐ถ is the coefficient matrix. Then for a sketched procedures. The updating rules of these variables are: data Xฬƒ = X๐‘‡ , we also have: 1) ๐ต step with fixed ๐ด and ๐‘Œ3 : X = Xฬƒ๐ด, ๐œŒ min ||๐ต||โ‹† + ||๐ด โˆ’ ๐ต + ๐‘Œ3 /๐œŒ||2๐น . (7) where ๐‘‡ โˆˆ R๐‘ ร—๐‘ก is defined as a random projection ma- ๐ต 2 trix to compress X while preserving its main information, 2) ๐ป step with fixed ๐‘ and ๐‘Œ4 : and ๐ด โˆˆ R๐‘กร—๐‘ . By means of the sketched data Xฬƒ, the proposed double ๐œŒ min ๐›ฝ||๐ป||โ‹† + ||๐‘ โˆ’ ๐ป (๐‘˜+1) + ๐‘Œ4 /๐œŒ||2๐น . (8) low-rank regularization (DLRR) model based on sketched- ๐ป 2 SC can be formulated as: 3) E step with fixed ๐‘ and ๐‘Œ2 : min ||๐ด||โ‹† + ๐›ฝ||๐‘||โ‹† + ๐œ†||E||2,1 ๐œŒ ๐ด,๐‘,E (3) min ๐œ†||E||2,1 + ||Xโˆ’๐ท๐‘ โˆ’๐ธ +๐‘Œ2 /๐œŒ||2๐น . (9) E 2 ๐‘ .๐‘ก. X = Xฬƒ๐ด, X = ๐ท๐‘ + E, 4) ๐ด step with fixed ๐ต, ๐‘Œ1 and ๐‘Œ3 : ๐œŒ ๐œŒ min ||Xโˆ’Xฬƒ๐ด+๐‘Œ1 /๐œŒ||2๐น + ||๐ดโˆ’๐ต+๐‘Œ3 /๐œŒ||2๐น . ๐ด2 2 (10) 5) ๐‘ step with fixed ๐ป, E, ๐‘Œ2 and ๐‘Œ4 : ๐œŒ ๐œŒ min ||Xโˆ’๐ท๐‘โˆ’๐ธ+๐‘Œ2 /๐œŒ||2๐น + ||๐‘โˆ’๐ป+๐‘Œ4 /๐œŒ||2๐น . ๐‘2 2 (a) (b) (11) 6) The Lagrangian multipliers and the penalty pa- Figure 1: The San Diego dataset. (a) Image scene. (b) Ground- rameter are updated as: truth. ๐‘Œ1 = ๐‘Œ1 + ๐œŒ(X โˆ’ Xฬƒ๐ด), (12) ๐‘Œ2 + ๐œŒ(X โˆ’ ๐ท๐‘ โˆ’ ๐ธ), (13) ๐‘Œ3 = ๐‘Œ3 + ๐œŒ(๐ด โˆ’ ๐ต), (14) ๐‘Œ4 = ๐‘Œ4 + ๐œŒ(๐‘ โˆ’ ๐ป), (15) ๐œŒ = min{1.1๐œŒ, ๐œŒ๐‘š๐‘Ž๐‘ฅ }. (16) (a) (b) The solutions of (7) and (8) are calculated by ๐ต = ฮ˜(1/๐œŒ) (๐ด + ๐‘Œ3 /๐œŒ) and ๐ป = ฮ˜(๐›ฝ/๐œŒ) (๐‘ + ๐‘Œ4 /๐œŒ), respec- Figure 2: The Urban dataset. (a) Image scene. (b) Ground- truth. tively, where ฮ˜ is the singular value thresholding (SVT) operator. Then E is updated by ๐’ฎ(๐œ†/๐œŒ) (X โˆ’ ๐ท๐‘ + ๐‘Œ2 /๐œŒ) where ๐’ฎ is a โ„“2,1 -min thresholding operator [18]. ๐ด and ๐‘ are respectively solved by finding the partial deriva- and 189 bands are utilized for the detection task tive and setting it to zero. Their optimized solutions are after eliminating the noisy bands. It records the โŠค โŠค โŠค ๐ด = (๐ผ + Xฬƒ Xฬƒ) (Xฬƒ X + Xฬƒ ๐‘Œ1 /๐œŒ + ๐ต โˆ’ ๐‘Œ3 /๐œŒ) and โˆ’1 area of the San Diego airport, CA, USA in 100 ร— ๐‘ = (๐ผ + ๐ทโŠค ๐ท)โˆ’1 (๐ทโŠค X โˆ’ ๐ทโŠค E + ๐ทโŠค ๐‘Œ2 /๐œŒ + ๐ป โˆ’ 100 pixels, three aircrafts including 58 pixels are ๐‘Œ4 /๐œŒ). selected as the anomaly target. The visualized The initial settings of this optimization process are: 2-D image scene and the ground-truth map of ๐ด0 = ๐‘0 = ๐ป0 = 0, E0 = 0, ๐‘Œ0 = ๐‘Œ1 = 0, ๐‘Œ3 = this dataset are shown in Figure 1. ๐‘Œ4 = 0, ๐œŒ0 = 0.01, ๐œŒ๐‘š๐‘Ž๐‘ฅ = 10 . And the convergence 6 2) Urban dataset: This dataset was collected by the conditions are ||๐‘‹ โˆ’ Xฬƒ๐ด||๐น < ๐œ–, ||๐‘‹ โˆ’ ๐ท๐‘ + ๐ธ||๐น < HYDICE airborne sensor, which has a spatial res- ๐œ–, ||๐ด โˆ’ ๐ต||๐น < ๐œ–, ||๐‘ โˆ’ ๐ป||๐น < ๐œ–, or the iteration olution of 1 m and a spectral resolution of 10 nm. times exceeds the predefined upper limit. Empirically, After removing low quality bands, 162 bands are the predefined value of the error tolerance is ๐œ– = 10โˆ’6 left for anomaly detection. This image scene con- and the maximum iteration time is 100. tains 80 ร— 100 pixels, and 17 small objects are considered as anomaly targets. The 2-D visual- ization map and the ground-truth of this dataset 3. Experiments are shown in Figure 2. In this section, the performance of the proposed DLRR method is assessed on two real-world HSI scenes: the 1 .0 1 .2 A n o m a ly (1 0 % ~ 9 0 % ) M in ~ M a x N o r m a liz e d S ta tis tic a l R a n g e P r o b a b ility o f d e te c tio n B a c k g ro u n d (1 0 % ~ 9 0 % ) M e d ia n L in e San Diego dataset and the Urban dataset. Four classi- 0 .8 1 .0 cal anomaly detection methods, which are RX, BACON, 0 .6 0 .8 Kernel-RX (KRX) [19], and the low probability anomaly 0 .6 0 .4 R X B A C O N 0 .4 detector (LPAD) [20], respectively, are applied for com- 0 .2 K R X L P A D 0 .2 parable analysis. The regularization parameters ๐›ฝ and ๐œ† D L R R 0 .0 0 .0 0 .2 0 .4 0 .6 0 .8 1 .0 0 .0 are set as 5 and 10, respectively. The background dictio- F a ls e A la r m R a te R X B A C O N K R X L P A D D L R R naries are collected by the mean vector of the K-means (a) (b) clusters, and the dictionary number ๐‘› is set as 400. Figure 3: The San Diego dataset. (a) ROC curves. (b) Normal- ized background-anomaly statistical range. 1) San Diego dataset: This dataset was captured by the AVIRIS sensor, which has a spatial resolution of 3.5 m and a spectral resolution of 10 nm. This The proposed DLRR model together with other com- dataset has 224 original spectral bands in total, parable algorithms are conducted in the aforementioned San Diego and Urban datasets, and the detection perfor- agery: Overview and application, Remote Sens. 10 mances evaluated by the ROC curves and the normalized (2018) 482. background-anomaly statistical range are shown in Fig- [5] R. Heylen, M. Parente, P. Gader, A review of non- ure 3 and Figure 4, respectively. It can be seen that in linear hyperspectral unmixing methods, IEEE J. Sel. the San Diego dataset, the proposed DLRR model has the Topics Appl. Earth Observa. Remote Sens. 7 (2014) smallest false alarm rate when the detection probability 1844โ€“1868. reaches 1. And for the Urban dataset, our method has [6] Y. Xu, B. Du, L. Zhang, Beyond the patchwise clas- the largest separation range between the backgrounds sification: Spectral-spatial fully convolutional net- and the anomalies. For further evaluation, the area under works for hyperspectral image classification, IEEE ROC curve (AUC) values are computed and the results Trans. Big Data. 6 (2020) 492โ€“506. are shown in Table 1. The results show that the proposed [7] M. J. Carlotto, A cluster-based approach for de- DLRR method has the largest AUC values compared with tecting man-made objects and changes in imagery, other methods in both two datasets. IEEE Trans. Geosci. Remote Sens. 43 (2005) 374โ€“387. 1 .0 [8] S. Chang, B. Du, L. Zhang, A subspace selection- 1 .2 A n o m a ly (1 0 % ~ 9 0 % ) M in ~ M a x based discriminative forest method for hyperspec- P r o b a b ility o f d e te c tio n N o r m a liz e d S ta tis tic a l R a n g e B a c k g ro u n d (1 0 % ~ 9 0 % ) M e d ia n L in e 0 .8 tral anomaly detection, IEEE Trans. Geosci. Remote 1 .0 0 .8 Sens. 58 (2020) 4033โ€“4046. 0 .6 0 .6 [9] N. Billor, A. S. Hadi, P. F. Velleman, BACON: 0 .4 R X B A C O N 0 .4 blocked adaptive computationally efficient outlier 0 .2 K R X L P A D D L R R 0 .2 0 .0 0 .0 0 1 0 .0 1 0 .1 1 0 .0 nominators, Comput. Statist. Data Anal. 34 (2000) 279โ€“298. F a ls e A la r m R a te R X B A C O N K R X L P A D D L R R (a) (b) [10] B. Du, L. Zhang, Random-selection-based anomaly Figure 4: The Urban dataset. (a) ROC curves. (b) Normalized detector for hyperspectral imagery, IEEE Trans. background-anomaly statistical range. Geosci. Remote Sens. 49 (2010) 1578โ€“1589. [11] W. Li, Q. Du, Collaborative representation for hy- perspectral anomaly detection, IEEE Trans. Geosci. Remote Sens. 53 (2014) 1463โ€“1474. Table 1 [12] J. Li, H. Zhang, L. Zhang, L. Ma, Hyperspectral The AUC Values of Five Algorithms in Two Datasets anomaly detection by the use of background joint Methods San Diego Dataset Urban Dataset sparse representation, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 8 (2015) 2523โ€“2533. RX 0.8742 0.9919 BACON 0.8768 0.9720 [13] S.-Y. Chen, S. Yang, K. Kalpakis, C.-I. Chang, Low- KRX 0.7490 0.7836 rank decomposition-based anomaly detection, in: LPAD 0.8973 0.8137 Proc. SPIE, volume 8743, Art, 2013, p. 87430N. DLRR 0.9261 0.9927 [14] Y. Zhang, B. Du, L. Zhang, S. Wang, A low-rank and sparse matrix decomposition-based mahalanobis distance method for hyperspectral anomaly detec- tion, IEEE Trans. Geosci. Remote Sens. 54 (2015) References 1376โ€“1389. [15] L. Li, W. Li, Q. Du, R. Tao, Low-rank and sparse [1] P. Ghamisi, M. Dalla Mura, J. A. Benediktsson, A decomposition with mixture of gaussian for hyper- survey on spectralโ€“spatial classification techniques spectral anomaly detection, IEEE Trans. Cybern. based on attribute profiles, IEEE Trans. Geosci. (2020). Remote Sens. 53 (2015) 2335โ€“2353. [16] H. Zhai, H. Zhang, L. Zhang, P. Li, Nonlocal means [2] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, regularized sketched reweighted sparse and low- J. Chanussot, Graph convolutional networks for hy- rank subspace clustering for large hyperspectral perspectral image classification, IEEE Trans. Geosci. images, IEEE Trans. Geosci. Remote Sens. 59 (2020) Remote Sens. (2020). 4164โ€“4178. [3] Y. Xu, L. Zhang, B. Du, F. Zhang, Spectralโ€“spatial [17] S. Boyd, N. Parikh, E. Chu, Distributed optimization unified networks for hyperspectral image classifi- and statistical learning via the alternating direction cation, IEEE Trans. Geosci. Remote Sens. 56 (2018) method of multipliers, Now Publishers Inc, 2011. 5893โ€“5909. [18] L. Zhang, L. Peng, T. Zhang, S. Cao, Z. Peng, In- [4] B. Rasti, P. Scheunders, P. Ghamisi, G. Licciardi, frared small target detection via non-convex rank J. Chanussot, Noise reduction in hyperspectral im- approximation minimization joint โ„“2,1 norm, Re- mote Sens. 10 (2018) 1821. [19] H. Kwon, N. M. Nasrabadi, Kernel RX-algorithm: A nonlinear anomaly detector for hyperspectral imagery, IEEE Trans. on Geosci. Remote Sens. 43 (2005) 388โ€“397. [20] Z. Li, L. Wang, S. Zheng, Applied low dimension linear manifold in hyperspectral imagery anomaly detection, in: Proc. SPIE, volume 9142, Art, 2014, p. 91421P.