Unsupervised Learning Method for mmWave MIMO Channel Estimation with Mutual Information GAN Congrong Dong1,∗ , Zhongliang Deng1,∗ , Enwen Hu1 , Wen Liu1 , Xudong Song1 and Licheng Wei1 1 Beijing University of Posts and Telecommunications (BUPT), 10 Xitucheng Road, Haidian district, Beijing, China Abstract The current wireless systems are evolving towards higher frequency, specifically millimeter waves (mmWave). To leverage the wide bandwidth advantages of mmwaves and overcome the high path loss characteristics, massive MIMO antenna arrays have emerged, resulting in the wireless channel exhibiting ”high-dimensional” characteristics. Traditional channel estimation methods do not perform well when extended to mmwave massive MIMO channel estimation. Meanwhile, the limitations of deep learning-based channel estimation methods lies in the need for a large labeled dataset, since the high-dimensional characteristics of future communication channels drastically increase the cost of channel measurement and labeling. This significantly impedes the application of deep learning(DL) in channel estimation. This paper proposes an unsupervised channel estimation method based on a mutual information maximization generative adversarial network (InfoGAN). It performs unsupervised learning and classification of various clustered delay line (CDL) channels and automatically estimates and reconstructs channels for different CDL channels. Additionally, it integrates the training method of WGAN to ensure the stability and convergence of the training process. The proposed method outperforms Orthogonal Matching Pursuit (OMP), EM-GM-AMP (an approximate message passing algorithm), and LOS/NLOS conditional GAN (CGAN) across all CDL channels. Keywords channel estimation, generative adversarial network (GAN), millimeter wave (mmWave), massive MIMO 1. Introduction The large-scale deployment and continuous evolution of global 5G networks have led to significant changes in wireless channels regarding frequency, antenna, and scenario. These changes have introduced new characteristics, such as non-stationarity in the space-time-frequency domain, posing challenges to traditional channel modeling and estimation. Compared to traditional microwave communication technologies, mmWave communication technology offers abundant spectrum resources, higher data rates, and greater spectral efficiency. Due to its shorter wavelength, mmWave has a high spatial resolution, enabling precise localization of mobile devices. To compensate for the significant path loss of mmWave, employing large-scale MIMO technology for high-precision directional beamforming[1] has become the mainstream approach. The short wavelength of mmWave further reduces the size of antenna arrays, facilitating the integration and miniaturization of antennas, and allowing for the construction of larger-scale antenna arrays. Accurate channel state information must be obtained through channel estimation to leverage the advantages of mmWave massive MIMO for high-precision localization in complex LOS and NLOS scenarios. In 6G and future communication systems, the scale of antenna arrays at base stations (BS) and user equipment (UE) will continue to expand, significantly increasing the dimensions and complexity of channel estimation. Traditional channel estimation methods for sub-6 GHz bands, such as least squares (LS) or minimum mean squared error (MMSE) estimators, cannot be directly extended to mmWave MIMO[2]. Recent methods commonly use the sparsity of mmWave MIMO channels in the Proceedings of the Work-in-Progress Papers at the 14th International Conference on Indoor Positioning and Indoor Navigation (IPIN-WiP 2024), October 14 - 17, 2024, Hong Kong, China ∗ Corresponding author. Envelope-Open 2022110363@bupt.cn (C. Dong); dengzhl@bupt.edu.cn (Z. Deng); owen.hu@bupt.edu.cn (E. Hu); liuwen@bupt.edu.cn (W. Liu); xd@bupt.edu.cn (X. Song); 2022110364@bupt.cn (L. Wei) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings beamspace[3] to design channel estimation schemes, transforming the problem into a compressed sensing problem. This involves leveraging signal sparsity and a small number of pilot sequences for channel reconstruction. Additionally, DL has emerged as a promising approach[4], with methods based on image reconstruction or deep learning networks being proposed. DL can implicitly learn complex channel distributions, reconstructing unknown signals or images from observations. However, DL for channel estimation typically employs supervised learning and relies on large labeled channel datasets. The increasing diversity of channel types in future mmWave communication scenarios and the need to measure various channel parameters make labeling data prohibitively expensive, significantly limiting DL’s application in mmWave MIMO channel estimation. This paper proposes an unsupervised DL method for mmWave MIMO channel estimation, introducing mutual information for unsupervised classification of CDL channels and using GAN to generate diverse channel data for estimation. 2. Related Work Research has demonstrated that wireless channels exhibit sparsity[3] in the delay and angle domains. Leveraging this sparsity, compressed sensing (CS)-based channel estimation techniques have gained significant attention. These techniques represent high-dimensional channels using sparse bases, trans- forming the channel estimation problem into a sparse signal recovery problem[5]. However, CS relies on the assumption that the channel is sparse under a specific basis (usually DFT), which is challenging to satisfy in practical scenarios. An inappropriate sparse basis can cause grid mismatch issues, reducing the accuracy of the channel’s sparse representation[6, 7] . Consequently, achieving accurate signal reconstruction using effective sparse representations remains a key focus. Algorithms like Orthogonal Matching Pursuit (OMP)[8] and Approximate Message Passing (AMP) [9] are commonly studied for this purpose. DL-based methods for channel estimation have also gained popularity. With the rapid development of deep learning, techniques such as Generative Adversarial Networks (GAN) offer new possibilities for mmWave MIMO channel estimation. GAN’s adversarial learning mechanism can learn complex distributions and generate diverse data samples, which is highly suitable for channel estimation given the high cost and difficulty of mmWave MIMO channel measurements. Many researchers have developed supervised learning frameworks for channel estimation, using pilot signals as inputs or conditions to train neural networks that output channel matrices[10, 11, 12]. For instance, in [10], pilot signals are used as conditional information, and received signals as inputs, to train a Conditional GAN (CGAN) to output the channel matrix. However, this approach does not use random noise vectors as input, limiting the GAN’s ability to generate diverse models, merely learning the mapping between pilots and channels. In [12], the sparsity of mmWave MIMO channels in the beamspace domain is utilized by first classifying channels into LOS/NLOS, then using the classifier’s output to train a CGAN for estimating five types of CDL channels. However, relying solely on LOS/NLOS conditions may lead to performance losses due to variations in LOS components across different CDL channels. Based on the above discussion, to avoid the high cost and difficulty of mmWave MIMO channel data measurement and labeling, we propose an unsupervised channel classification and estimation framework that uses GAN to achieve high-dimensional channel estimation from a small number of pilot signals, obtain sparse representations of the channel, and reconstruct the original channel. The main contributions are as follows: • Transform the channel estimation problem into a sparse signal reconstruction problem in the beamspace domain, training GAN to directly generate beamspace domain channel matrices without assuming a channel model and LOS/NLOS condition. • Use the differences in the sparsity of different channel models in the beamspace domain as a feature to classify the channels. • Apply InfoGAN to channel estimation. By incorporating interpretable latent variables into the random input vector of GAN and introducing mutual information between the latent variables and the generated channel types, achieve controllable generation of different channel models through maximizing their mutual information, enabling unsupervised classification of channel models. • Combine the unsupervised learning capability of InfoGAN with the stable training advantages of WGAN to form the InfoWGAN model, ensuring training stability and accelerating model convergence. To the best of our knowledge, this is the first time that InfoGAN and WGAN have been integrated for channel estimation. 3. Proposed Approach 3.1. System Model We consider a downlink single-user narrowband mmWave MIMO communication scenario, where the transmitter is equipped with 𝑁𝑇 antennas and the receiver with 𝑁𝑅 antennas, using a fully connected phase shifting network. The hybrid precoder and combiner at the transmitter and receiver are denoted by F ∈ ℂ𝑁𝑡 ×𝑁𝑠 and W ∈ ℂ𝑁𝑟 ×𝑁𝑠 respectively, where 𝑁𝑠 is the number of data streams transmitted. The MIMO channel between BS and UE is represented by H ∈ ℂ𝑁𝑟 ×𝑁𝑡 , and the pilot signal sent by the transmitter is denoted by S ∈ ℂ𝑁𝑠 ×𝑁𝑝 . The received signal can be expressed as: Y = W𝐻 HFS + W𝐻 N, (1) 𝑁𝑟 ×𝑁𝑝 where N ∈ C is an independent and identically distributed (i.i.d.) complex Gaussian random variable with mean 0 and variance 𝜎 2 . It is assumed that the channel follows a block fading model, where H remains constant over 𝑁𝑝 time slots. Using the Kronecker product identity ABC = (C𝑇 ⊗ A) B, we can obtain y = (S𝑇 F𝑇 ⊗ W𝐻 ) H + (I𝑁𝑝 ⊗ W𝐻 ) n, (2) where y ∈ ℂ𝑁s 𝑁p ×1 , H ∈ ℂ𝑁r 𝑁t ×1 and n ∈ ℂ𝑁r 𝑁p ×1 . Denote A = (S𝑇 F𝑇 ⊗ W𝐻 ) , A has dimensions 𝑁𝑠 𝑁𝑝 × 𝑁𝑡 𝑁𝑟 .Assuming that both the transmitter and receiver use uniform spaced linear arrays, under the virtual model, the array response matrices can be represented by unitary DFT matrices AT ∈ ℂ𝑁t ×𝑁t and AR ∈ ℂ𝑁r ×𝑁r respectively. Thus, we can express H as H = AR Hv A𝐻 T 𝑇 (3) H = ((A𝐻 T ) ⊗ AR ) Hv Unlike traditional DL-based channel estimation models, we directly train GAN to output samples of the sparse representation Hv in the beamspace domain rather than the original channel H. That is, we train the generator G to learn the distribution of Hv . Moreover, we do not impose any sparsity constraints on Hv . This approach is more flexible than CS-based channel estimation methods, which require the sparse representation of the signal being recovered to have only a few non-zero values to successfully reconstruct the signal. After the GAN training is completed, we extract the trained generator G, and use the received pilot signal y (2) and the corresponding channel type latent variable c to search for the optimal input variable 𝕫∗ in the latent space of the input noise variable z for G 2 z∗ = arg min ‖y − Asp G(z, 𝑐)‖2 + 𝜆reg ‖z‖22 , (4) z∈ℝ𝑑 𝑇 where Asp = (A𝐻 𝐻 T FS) ⊗ W AR and 𝜆reg is the regularization coefficient used to impose L2 norm constraint on the noise variable z. After optimizing the noise variable, we can obtain the estimated beam domain channel Hv, est = G (z∗ ). We use the normalized mean square error (NMSE) as the metric to evaluate the quality of Hv, est , defined as 2 ‖Hv − Hv, est ‖2 NMSE = 𝔼 [ 2 ], (5) ‖Hv ‖2 3.2. GAN Architectures In this section, we first introduce the principles and structure of WGAN-GP, followed by the InfoWGAN model proposed in this paper. We will detail how WGAN and InfoGAN are integrated into InfoWGAN and describe its training process. Figure 1: InfonWGAN Network Architecture 3.2.1. Wasserstein GAN With Gradient Penalty A Wasserstein Generative Adversarial Network (WGAN) is a variant of the classic GAN. Its optimization goal is to solve the min-max problem: min max 𝔼𝑥∼𝑃𝑟 [𝐷(𝑥)] − 𝔼𝑧∼𝑃𝑔 [𝐷(𝐺(𝑧))] (6) 𝐺 𝐷∈𝒟 where 𝒟 is the set of 1-Lipschitz functions and 𝑃𝑟 is the real data distribution. As pointed out in [13], the original GAN suffers from mode collapse and instability during training due to the use of KL or JS divergence, which is one of its major drawbacks. In contrast, WGAN uses the Wasserstein-1 distance instead of KL divergence to measure the distance between two distributions, making WGAN’s training more robust and stable. Theoretically, WGAN requires 1-Lipschitz continuity. Traditional WGAN enforces 1-Lipschitz continuity through weight clipping, but this can lead to insufficient capacity in the discriminator network, reducing the model’s performance. Gradient penalty ensures 1-Lipschitz continuity by directly penalizing the norm of the gradients in the loss function. The classic gradient penalty term is defined as: 2 𝜆𝔼𝑥∼ℙ̂ 𝑥̂ [(‖∇𝑥̂ 𝐷(𝑥)‖ ̂ 2 − 1) ] (7) where 𝑥̂ is a sample drawn from the straight line between the data distribution and the generator distribution, and 𝐷(𝑥)̂ is the output of the discriminator on 𝑥.̂ 3.2.2. InfoWGAN The input of the original GAN is an unconstrained random vector z, resulting in outputs from G that lack semantic features and cannot correspond to specific dimensions of z, leading to poor interpretability. InfoGAN[14] introduces a latent variable c into the input vector z and incorporates mutual information I(X; Y) between c and the output of G to control the generated data. The definition of mutual information is as follows: 𝐼 (𝑋 ; 𝑌 ) = 𝐻 (𝑋 ) − 𝐻 (𝑋 |𝑌 ) (8) To encourage the categorical latent variable c to be associated with meaningful semantic features and to better align with the WGAN loss function, we introduce an auxiliary network Q that maximizes Algorithm 1 InfoWGAN Training Process 1: for number of training iterations do 2: for 𝑛𝑑 iterations do 3: Sample minibatch of 𝑚 beamspace channel realizations {H(𝑖) }𝑚 𝑖=1 ∼ ℙH𝑣 , 4: (𝑖) 𝑚 (𝑖) 𝑚 latent variables {z }𝑖=1 ∼ ℙz , and random numbers {𝜖 }𝑖=1 ∼ 𝑈 [0, 1] 5: Sample categorical latent codes {c(𝑖) }𝑚 𝑖=1 ∼ Cat(𝐾 = 5, 𝑝 = 0.1) 6: ̂ H𝑣 = G(z, c; 𝜃𝑔 ) 7: H̃𝑣 = 𝜖H𝑣 + (1 − 𝜖)Ĥ𝑣 8: 𝜃𝑑 = Update_D(Ĥ𝑣 , H𝑣 , H̃𝑣 , 𝑚, 𝛾 , 𝛽; 𝜃𝑑 ) 9: end for 10: Sample minibatch of 𝑚 latent variables {z(𝑖) }𝑚 𝑖=1 ∼ ℙz 11: Sample categorical latent codes {c(𝑖) }𝑚 𝑖=1 ∼ Cat(𝐾 = 5, 𝑝 = 0.1) 12: 𝜃𝑔 = Update_G(G(z; 𝜃𝑔 ), 𝑚, 𝛾 ; 𝜃𝑔 ) 13: 𝜃𝑞 = Update_Q(𝜃𝑞 ) 14: end for Subroutine 1: 𝜃𝑑 = Update_D(xG , x𝑟 , xrG , 𝑚, 𝛾 , 𝛽; 𝜃𝑑 ) 1 𝑚 (𝑖) (𝑖) 1: L(𝜃𝑑 ) = ∑ [𝐷(xG ; 𝜃𝑑 ) − 𝐷(x𝑟 ; 𝜃𝑑 )] 𝑚 𝑖=1 2 (𝑖) 2: L(𝜃𝑑 ) = L(𝜃𝑑 ) + 𝛽 (‖∇ (𝑖) 𝐷(xrG ; 𝜃𝑑 )‖2 − 1) xG 3: 𝜃𝑑 = 𝜃𝑑 − 𝛾Adam(∇𝜃𝑑 L(𝜃𝑑 )) Subroutine 2: 𝜃𝑔 = Update_G(xG , 𝑚, 𝛾 ; 𝜃𝑔 ) 1: Input: xG will be a function of 𝜃𝑔 1 𝑚 (𝑖) 2: L(𝜃𝑔 ) = ∑ −𝐷(xG ) 𝑚 𝑖=1 3: 𝜃𝑔 = 𝜃𝑔 − 𝛾Adam(∇𝜃𝑔 L(𝜃𝑔 )) Subroutine 3: 𝜃𝑞 = Update_Q(c, xG , 𝑚, 𝛾 ; 𝜃𝑞 ) 1 𝑚 (𝑖) 1: L(𝜃𝑞 ) = − ∑ log 𝑄(c(𝑖) |xG ) 𝑚 𝑖=1 2: 𝜃𝑞 = 𝜃𝑞 − 𝛾Adam(∇𝜃𝑞 L(𝜃𝑞 )) the mutual information between the latent variable c and the generated samples G(z, c). The loss of the auxiliary network Q is approximated by minimizing the negative log-likelihood between c and the predictions of the auxiliary network Q(c|G(c, z)). Combined with the previously mentioned WGAN-GP, the objective function of the proposed InfoWGAN is defined as follows: min max 𝔼𝑥∼𝑃𝑟 [𝐷(𝑥)] − 𝔼𝑧∼𝑃𝑔 ,𝑐∼𝑃𝑐 [𝐷(𝐺(𝑧, 𝑐))] 𝐺,𝑄 𝐷∈𝒟 2 + 𝜆1 𝔼𝑥∼ℙ ̂ 𝑥̂ [(‖∇𝑥̂ 𝐷(𝑥)‖ ̂ 2 − 1) ] − 𝜆2 𝔼𝑐∼𝑃𝑐 ,𝑧∼𝑃𝑔 [− log 𝑄(𝑐|𝐺(𝑧, 𝑐))] (9) This objective function consists of three parts: the adversarial game term, the gradient penalty term, and the mutual information maximization term, with 𝜆1 and 𝜆2 being the regularization hyperparameters for the latter two terms, respectively. To reduce the number of network parameters, the auxiliary network Q shares all network parameters with the discriminator D of the WGAN, except for the last layer. Specifically, since the output of D is a binary value indicating real or fake, the final layer of D is a linear layer. On the other hand, the output of Q aims to classify the input channel matrix, therefore, the last two layers of Q are a softmax layer and a linear layer, respectively.The complete model network architecture is shown in Figure 1. 3.2.3. Training Process In Algorithm 1, we detail the training process of the proposed InfoGAN model. First, we sample real beamspace channel matricesH, noise vectors z, and categorical latent variables c, and call Subroutine 1 to update the discriminator’s parameters 𝜃𝑑 continuously for 𝑛𝑑 times. Subsequently, we sample another batch of (z, c) and sequentially call Subroutine 2 and 3 to update the parameters of the generator G and the auxiliary network Q. The detailed training parameters can be found in Table1. Table 2 Table 1 Channel Parameters Training Parameters Parameter Value Parameter Value Delay Profile CDL - A,B,C,D,E Dataset Size Train - 6000 × 5 Subcarrier Spacing 15 kHz Test - 50 × 5 𝑁𝑇 × 𝑁 𝑅 64 × 16 Optimizer Adam Antenna Array Type ULA Learning Rate 0.0005 Antenna Spacing 𝜆𝑐 /2 Batch size 64 Carrier Frequency 40 GHz Epochs 200 Delay Spread 30 ns Doppler Shift 5 Hz 4. Simulation Details 4.1. Data Generation According to 3GPP TR 38.901 [15], we used MATLAB to generate five types of CDL channel data for model training and testing. CDL-A, B, and C are NLOS channels, while CDL-D and E are LOS channels. The channels are sorted by LOS component proportion: B < C < A < E < D. The detailed channel generation parameters are shown in Table2. We compare the proposed InfoGAN model with two mainstream algorithms in the current field of channel estimation: CS-based methods, such as OMP and Approximate Message Passing (AMP), as well as DL-based methods. 4.2. BASELINE We compare the proposed InfoGAN model with two mainstream algorithms in the current field of channel estimation: CS-based methods, such as Orthogonal Matching Pursuit (OMP) and Approximate Message Passing (AMP), as well as DL-based methods. • Orthogonal Matching Pursuit (OMP): A greedy algorithm for sparse signal reconstruction. We use the OMP algorithm for channel estimation as described in [16], where OMP minimizes ‖𝐻𝑣 ‖0 subject to ‖𝑦 − 𝐴sp 𝐻𝑣 ‖2 ≤ 𝜎. The stopping criterion of the algorithm is based on the power of the residual error. • EM-GM-AMP: Combining Expectation-Maximization (EM) and Approximate Message Passing (AMP) techniques, EM-GM-AMP is used for sparse signal reconstruction and effectively handles noise. The algorithm takes the received pilot signal 𝑦 and the measurement matrix 𝐴sp as inputs, and the channel matrix 𝐻 is solved through AMP iterations. The specific number of iterations depends on the SNR level. • Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (CWGAN-GP): We adopt the CGAN model proposed in [2]. First, a LOS predictor is trained to perform binary classification (LOS/NLOS) on the input signal. The classification result is used as conditional information and input to both G and D to control the type of channel model generated by G. By classifying the channels, the range of channels generated by G is narrowed, effectively speeding up the estimation process and improving the accuracy of channel estimation. The model’s training data and conditions are the same as those used for the proposed InfoWGAN. 4.3. Simulation Results & Analysis Figure 2: The trend of changes in channel images generated by InfoWGAN at epoch=1, 80, 160 During the iterative training process of InfoWGAN, we use fixed input variables to sample models at different stages to evaluate the training performance of InfoWGAN. Taking the CDL_B channel model as an example, we visualized the fake samples generated by G at epochs 1, 80, and 160, as shown in Figure 2. It can be clearly observed that as the number of training iterations increases, G successfully captures the sparse characteristics of the CDL_B channel, and the generated fake channel model samples increasingly resemble the real channel samples. We compared our proposed InfoWGAN with the aforementioned baselines on five different CDL channels. We plotted the NMSE vs. SNR graphs for the five channel models, as shown in Figure 3. By observing Figure 3, the following conclusions can be drawn: CS-based methods demonstrate good robustness under both LOS and NLOS conditions. The NMSE range for EM-GM-AMP remains within the interval [−2, 2] under both conditions, consistently outper- forming OMP. In contrast, LOS conditions significantly impact the performance of DL-based models, with an average difference of 6 dB. Additionally, the performance order of DL-based models across the five CDL channel models is B < C < A < D < E, corresponding to the increasing of LOS components in the channel. Thus, DL-based channel estimation methods clearly outperform CS-based methods under LOS conditions. Under NLOS channel conditions, the practical performance of CWGAN, proposed in [2], is slightly inferior to EM-GM-AMP. This is because CWGAN uses LOS/NLOS as conditional information without further distinguishing the three NLOS channel models with different levels of sparsity. Our proposed InfoWGAN model, on the other hand, perfectly captures the differences in sparsity among CDL_A, B, C Figure 3: Comparison of NMSE for InfoWGAN, OMP, EM-GM-AMP, and CWGAN-GP on Five CDL Channels channels, allowing for specific channel estimation for each channel. As a result, it performs better than EM-GM-AMP. Additionally, it is observed that since CDL_B and CDL_C are fully NLOS channels, and CDL_A is a mixed LOS/NLOS channel, CDL_A exhibits more significant sparsity. Therefore, among these three channels, InfoWGAN performs best on CDL_A , with an average performance improvement of approximately 1.6 dB over EM-GM-AMP. For LOS channels CDL_D, E, compared to CWGAN, our proposed InfoWGAN is more robust to noise variations, especially under low SNR conditions. Under high SNR conditions, the performance improvement of InfoWGAN is not significant due to the high sparsity of CDL_D and CDL_E channels, with only a few non-zero values and lacking distinct sparse features, making them highly similar. Additionally, since there are only two LOS channel types in the CDL model, this can lead to a loss in classification performance. Under low SNR conditions, the enhanced noise power reduces the similarity between CDL_D, E channels, thus improving the model’s performance. Overall, our proposed InfoWGAN model is more suitable for channel estimation under low SNR conditions. 5. CONCLUSIONS In mmWave MIMO communication scenarios, the diversity of channel types and the high cost of channel measurement pose challenges to channel estimation. Moreover, traditional channel estimation algorithms do not scale well. To address this, this paper proposes a model integrating InfoGAN and WGAN, called InfoWGAN. By transforming the mmWave channel matrix into the beamspace and leveraging the sparsity of mmWave in the beamspace, InfoWGAN is trained for unsupervised channel classification and subsequently performs channel estimation and reconstruction. Simulation results show that, compared to traditional CS-based channel estimation methods and CGAN, the proposed InfoWGAN demonstrates better performance in both LOS and NLOS scenarios and exhibits greater robustness under low SNR conditions. Acknowledgments This work was supported by the National Key Research and Development Program of China under Grant No.2022YFB3904702. This work was also financially supported by the National Natural Science Foundation of China under Grant No.62372049. References [1] T. L. Marzetta, Noncooperative cellular wireless with unlimited numbers of base station antennas, IEEE transactions on wireless communications 9 (2010) 3590–3600. [2] A. S. Doshi, M. Gupta, J. G. Andrews, Over-the-air design of gan training for mmwave mimo channel estimation, IEEE Journal on Selected Areas in Information Theory 3 (2022) 557–573. [3] Y. Zhou, M. Herdin, A. M. Sayeed, E. Bonek, Experimental study of mimo channel statistics and capacity via the virtual channel representation, Univ. Wisconsin-Madison, Madison, WI, USA, Tech. Rep 5 (2007) 10–15. [4] S. Gao, P. Dong, Z. Pan, G. Y. Li, Deep learning based channel estimation for massive mimo with mixed-resolution adcs, IEEE Communications Letters 23 (2019) 1989–1993. [5] P. Babington, A Mathematical Introduction to Compressive Sensing, volume 4 of 10, 3 ed., Springer New York, The address, 2013. An optional note. [6] H. Liu, X. Yuan, Y. J. Zhang, Super-resolution blind channel-and-signal estimation for massive mimo with one-dimensional antenna array, IEEE Transactions on Signal Processing 67 (2019) 4433–4448. [7] J. Dai, A. Liu, V. K. Lau, Fdd massive mimo channel estimation with arbitrary 2d-array geometry, IEEE Transactions on Signal Processing 66 (2018) 2584–2599. [8] A. Alkhateeb, O. El Ayach, G. Leus, R. W. Heath, Channel estimation and hybrid precoding for millimeter wave cellular systems, IEEE journal of selected topics in signal processing 8 (2014) 831–846. [9] S. Rangan, Generalized approximate message passing for estimation with random linear mixing, in: 2011 IEEE International Symposium on Information Theory Proceedings, IEEE, 2011, pp. 2168–2172. [10] Y. Dong, H. Wang, Y.-D. Yao, Channel estimation for one-bit multiuser massive mimo using conditional gan, IEEE Communications Letters 25 (2020) 854–858. [11] Y. Yang, F. Gao, X. Ma, S. Zhang, Deep learning-based channel estimation for doubly selective fading channels, IEEE Access 7 (2019) 36579–36589. [12] H. He, C.-K. Wen, S. Jin, G. Y. Li, Deep learning-based channel estimation for beamspace mmwave massive mimo systems, IEEE Wireless Communications Letters 7 (2018) 852–855. [13] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein generative adversarial networks, in: International conference on machine learning, PMLR, 2017, pp. 214–223. [14] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, Infogan: Interpretable representation learning by information maximizing generative adversarial nets, Advances in neural information processing systems 29 (2016). [15] G. T. 38.901, Study on channel model for frequencies from 0.5 to 100 ghz, 2017. [16] R. Méndez-Rial, C. Rusu, N. González-Prelcic, A. Alkhateeb, R. W. Heath, Hybrid mimo architectures for millimeter wave communications: Phase shifters or switches?, IEEE access 4 (2016) 247–267.