=Paper=
{{Paper
|id=Vol-3189/paper_06
|storemode=property
|title=Joint Message Passing and Auto-Encoder for Deep Learning
|pdfUrl=https://ceur-ws.org/Vol-3189/paper_06.pdf
|volume=Vol-3189
|authors=Yiqun Ge,Wuxian Shi,Jian Wang,Rong Li,Wen Tong
|dblpUrl=https://dblp.org/rec/conf/wcci/GeSWLT22
}}
==Joint Message Passing and Auto-Encoder for Deep Learning==
Joint Message Passing and Auto-Encoder for Deep Learning Yiqun Ge 1, Wuxian Shi 1 , Jian Wang 2, Rong Li 2 and Wen Tong 1 1 Wireless Technology Laboratory, Huawei Technologies Co., Ltd., Ottawa K0A3M0, Canada 2 Wireless Technology Laboratory, Huawei Technologies Co., Ltd., Hangzhou 310051, China Abstract Autoencoders (AE) are emerging artificial neural networks that learn efficient embedding of unlabeled data and have been considered in the design of end-to-end transceivers. However, AE-based end-to-end transceivers face with a major challenge of poor generalization ability due to the communication channel dynamics. In this paper, a message-passing algorithm (MPA) layer is incorporated into an AE to simultaneously enable coarse learning in training phase and adaptive reasoning in inference phase. Theoretical analysis is also conducted to demonstrate the effectiveness of the MPA layer, recommending that the proposed model is applicable in more general systems. Keywords 1 Autoencoder, end-to-end communication, transceiver, MPA, back-propagation. 1. Introduction Machine Learning (ML) is envisioned as a promising means to enable an intelligent six- generalization (6G) network. It has attracted extensive interests in both academia and industry. As a popular ML model, Autoencoder (AE) model learns some hidden but efficient data representations by combining the two neural networks that fits well with the classic transceivers in wireless communications, one neural network for an encoder and another for a decoder. AE-based end-to-end (E2E) transceiver aims to extract the most essential information minimum for a specific goal. It naturally requires a joint design of learning algorithms and communication techniques. Specifically, an encoder neural network learns to act as a transmitter, while a decoder one learns as a receiver. By iterative data sensing and model training, AE-based E2E transceiver actually integrates the three factors, data source distribution, goal orientation, and radio channel, into one framework. In recent years, there is much research interest on this topic [1, 2, 3, 4, 5]. Despite of AE-based end-to-end transceiver’s better performance than classical ones, the framework still suffer from the following three challenges: ● Inaccurate Gradient Transmission: Training an AE E2E transceiver needs a channel model that is mathematically differentiable to support the backward propagation (BP) of gradients from the receiver side to the transmitter side. Nevertheless, a realistic channel model must include some non-linear components such as digital/analog pre-distortion and other un-differentiable stages like up/down sampling. Therefore, the channel model used in AE E2E transceiver training tends to be oversimplified to support inaccurate gradient transmission. ● Excessive and Dynamic Channel Distortion: Essentially, learning on a hidden layer, or an intermediate layer, is to react or adapt itself to the posterior probability of its input signal. In an AE-based E2E transceiver framework, the first layer of the receiver is such an intermediate layer that its input signal is varied with the dynamic channel distortion. 1 AI6G’22: First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks, July 21, 2022, Padua, Italy EMAIL: {yiqun.ge, wuxian.shi, wangjian23, lirongone.li, tongwen}@huawei.com ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Furthermore, the channel variation will penetrate forward to the entire receiver neural network part. In case of a fast varying channel, a well-trained receiver might as well become obsolete quickly and receiving performance degrades soon. ● Loss of Important Connectivity: According to [6], some outputs of a DNN-based classifier relies on some shortcuts (localities) inside the entire deep neural network. Some parts of the neural network is more important than others. “Shortcuts” are designated to the important part. If these shortcuts were perturbed, the classification performance would degrade significantly. However, the path loss and thermal random noise in communication channels may affect the critical shortcuts in some probability, largely undermining the AE-based E2E transceiver’s performance. These three issues result in the poor generalization ability of the AE-based transceiver over the dynamic and time varying wireless environment. Some prior works have developed some solutions. In [7], a two-phase training strategy is proposed, where the AE-based transceiver is first trained through a stochastic channel model offline, and fine-tuned when it is used in the real channel. To obtain a differentiable channel, [8, 9] proposed to approximate the unknown real channel through generative adversarial networks (GANs). With a trained GAN model connecting the encoder and decoder, both forward inference and backward propagation can be conducted. Since the obstacle caused by the unknown channel is that the back-propagated gradients are not easy to get at the encoder side, methods for gradient estimation are introduced in [10, 11]. Although the aforementioned works can overcome the three issues to some extent, all of them demand high energy consumption, large controlling overhead, and none of them can meet the real-time latency requirement in future wireless communications. These issues motivated us to incorporate a MPA layer into an AE. Its existence can simultaneously solve the out-of-distribution (OOD) and outlier problems usually in the inference stage and reduce the communication and computation costs. 2. Autoencoder-based Transceiver Design with MPA In this section, we first propose a new AE-based transceiver by inserting an MPA layer with the transmitter, i.e., MPA-AE. Then, we introduce the detailed training process, including the forward sub- iteration and backward sub-iteration. 2.1. Autoencoder-based transceiver with MPA Figure 1: AE-based transceiver with an MPA layer As illustrated in Fig. 1, we consider an AE-based transceiver that consists of a DNN-based transmitter and a DNN-based receiver. The transmitter and the receiver are connected by wireless channels. To adapt to the dynamic channel conditions, we insert an MPA layer between the transmitter and the communication channel. Without loss of generality, we assume that the channel state information is available at the transmitter, which can be realized by concurrent channel feedback or uplink/downlink channel reciprocity. The MPA layer is responsible for conducting a linear dimension reduction transformation, whose coefficients are fine-tuned using an iterative algorithm with two sub-iterations. The first one is the forward sub-iteration that passes messages from the DNN transmitter to the DNN receiver through the channel. The other is the backward sub-iteration that passes message from channel layer to the output layer of the DNN transmitter. To better describe the working mechanism of the introduced transceiver, we first provide the key parameters throughout the paper, as summarized in Table I. Table 1 System Parameters Parameters Meaning L Dimension of the transmitter’s output N Dimension of the communication channel measurement 𝒉𝑘 The k-th channel measurement 𝒏𝑘 The n-th additive noise measurement 𝒇𝑖 The i-th feature of the transmitter’s output 𝒕𝑘 The k-th feature vector of the MPA layer’s output 𝒓𝑘 The k-th received signal F Input feature matrix [𝒇1 , … , 𝒇𝐿 ] H Channel vector [𝒉1 , … , 𝒉𝑁 ] N Noise vector [𝒏1 , … , 𝒏𝑁 ] R Received signal vector[𝒓1 , … , 𝒓𝑁 ] T Output feature matrix [𝒕1 , … , 𝒕𝑁 ] Based on the notations, we elaborate the detailed training process in the sequel. 2.2. Forward Sub-iteration with Support Vector Machine Support vector machine (SVM) is a supervised machine learning model used for data classification, regression, and outlier detection. In general, an SVM model is composed of a non-linear dimension extension function 𝜑(∙), a linear combination function 𝑓(𝐱) = 𝐰 ∙ 𝜑(𝐱) + 𝐛, and a binary classification function sign(∙), where 𝐱 is the input data, 𝐰 is the weight coefficient vector and 𝐛 is the bias vector. The objective of SVM is to divide the data samples into classes to find a maximum marginal hyper- plane. Figure 2: MPA layer with SVM Taking advantage of the dimension transformation of SVM, we use it to transform the dimension of the transmitter’s output, L, to the dimension of the communication channel measurement, N. Fig. 2 shows the detailed forward iteration with SVM. Specifically, the input of the MPA-layer is an L- dimensional feature matrix 𝐅 = [𝒇1 , 𝒇2 , … , 𝒇𝐿 ], where 𝒇𝒊 is the i-th input feature vector with K- dimension. The output of the MPA-layer is an N-dimensional feature matrix 𝐓 = [𝒕𝟏 , 𝒕𝟐 , … , 𝒕𝑵 ] where 𝒕𝒊 is the i-th output feature vector with K-dimension. When the output feature vectors are transmitted via communication channels, the received signal is given by 𝐿 𝒓𝑖 = ∑ 𝛼𝑙,𝑖 ∙ 𝒇𝑙 ∙ 𝒉𝑖 + 𝒏𝑖 , 𝑖 = 1, … , 𝑁, 𝑙=1 where 𝛼𝑙,𝑖 is the coefficient of the connection between neuron l and neuron i. Based on the above description, we can conclude that the forward sub-iteration is to keep fine-tuning the hyper-plane of the SVM model in both training and inference phases for given transmitter’s feature matrix 𝐅, channel state information 𝐇, noise vector 𝐍, and received signal 𝐑. Note that the MPA layer is mathematically differentiable. Once the coefficients 𝛼𝑙,𝑖 are fixed, it can pass the BP gradients from the receiver side to the transmitter side during the training stage. 2.3. Backward Sub-iteration with Attention-DNN As we discussed earlier, the MPA layer needs to be trained by a standalone mode rather than a connection mode with back-propagation from the receiver. In this regards, we consider to use an attention-DNN in the backward sub-iteration. Figure 3: The structure of attention-DNN Attention-DNN is an efficient approach that measures the similarity of two features with different dimensions. Fig. 3 depicts the structure of the attention-DNN. The input is the received signal 𝐑. The attention operation is conducted by computing the inner product of each 𝒓𝑖 with an attention coefficient 𝒄𝑙 , i.e., ⟨𝒓𝑖 , 𝒄𝑙 ⟩. This inner product implies the similarity of the signal 𝒓𝑖 and the attention coefficient 𝒄𝑙 , which is normalized by a softmax layer as 𝑒 ⟨𝒓𝑖,𝒄𝑙⟩ 𝛼𝑙,𝑖 = 𝑁 , 𝑖 = 1, … , 𝑁. ∑𝑛=1 𝑒 ⟨𝒓𝑛 ,𝒄𝑙 ⟩ Then, the output of the attention-DNN is given by 𝑁 𝒛𝑙 = ∑ 𝛼𝑙,𝑖 ∙ 𝒓𝑖 , 𝑙 = 1, … , 𝐿. 𝑖=1 We shall note that the number of attentions are less than the number of received signals, i.e., 𝐿 < 𝑁. Attention-DNN can be employed in the MPA layer for back-propagation. Specifically, each extracted feature vector 𝒇𝑙 can be used as an attention coefficient. Then, in the backward sub-iteration, the coefficient 𝛼𝑙,𝑖 can be given by 𝑒 ⟨𝒓𝑖,𝒇𝑙⟩ 𝛼𝑙,𝑖 = 𝑁 , 𝑖 = 1, … 𝑁, 𝑙 = 1, … , 𝐿. ∑𝑛=1 𝑒 ⟨𝒓𝑛 ,𝒇𝑙⟩ 3. Global Tandem Learning In this section, we propose two algorithms for the AE-based transceiver in the training phase and the inference phase. 3.1. Coarse Learning The training of the AE-based transceiver includes two parts. One is for the MPA layer in a standalone mode and the other is for the DNNs in the transmitters and receivers by BP. The detailed procedure is summarized in Algorithm 1. Algorithm 1. The training algorithm 1 1. Initialize the coefficients of the MPA layer 𝛼𝑙,𝑖 =𝐿 . 2. Initialize the batchsize 𝑏. 3. For step from 1: T do 4. In tandem stage 1 5. Sample a batch of training messages 𝐗 = [𝐱1 , … , 𝐱𝑏 ]. 6. DNN-based transmitter computes 𝐅 = [𝒇1 , … , 𝒇𝐿 ] based on the training messages 𝐗. 7. Compute 𝐿 𝒕𝑖 = ∑ 𝜶𝑙,𝑖 ∙ 𝒇𝑙 , 𝑖 = 1, … 𝑁, 𝑙=1 8. Send 𝐓 = [𝒕1 , 𝒕2 , … , 𝒕𝑁 ]to the DNN-based receiver via communication channels, as 9. 𝒓𝑖 = 𝒕𝑖 ∙ 𝒉𝑖 + 𝒏𝑖 , 𝑖 = 1, … , 𝑁. 10. DNN-based receiver inputs the received signals 𝐑 = [𝒓1 , 𝒓2 , … , 𝒓𝑁 ] into the DNN and computes the decoded message. 11. Update the transmitter and the receiver by back-propagation. 12. In tandem stage 2 13. For iteration from 1: M do Compute 𝒓𝑖 = ∑𝒊 𝒉𝑖 𝛼𝑙,𝑖 ∙ 𝒇𝑙 . 14. Compute ||𝒓|| = √𝒓12 + 𝒓22 + ⋯ + 𝒓2𝑁 . 15. Update 𝒓𝒊 𝜷𝑙,𝑖 = ⟨||𝒓|| , 𝒇𝑙 ⟩ , 𝑖 = 1, … , 𝑁, 𝑙 = 1, … , 𝐿. 16. Update the coefficients of the MPA layer by 17. 𝜶𝑙,𝑖 = softmax(𝜷𝑙,𝑖 ), 𝑖 = 1, … , 𝑁, 𝑙 = 1, … , 𝐿. 18. Endfor 19. Endfor 20. Output 𝜶𝑙,𝑖 , 𝑙 = 1, … , 𝐿; 𝑖 = 1, … , 𝑁. 3.2. Inference Cycle Adaptation Once the training is finished, the AE-based transceiver is used for inference. In particular, the MPA- layer can help the transceiver adapt to the channel dynamics: when the neurons on the encoder neural network and decoder neural network are fixed, the coefficient 𝛼𝑙,𝑖 on the MAC layer could continue to adapt themselves in terms of the current physical channel condition. The detailed procedure is summarized in Algorithm 2. Algorithm 2. The inference algorithm 1. Input:New messages 𝐗. 2. DNN-based transmitter computes 𝐅 = [𝒇1 , … , 𝒇𝐿 ] based on the new message 𝐗. 3. For iteration from 1: M do Compute 𝒓𝑖 = ∑𝐿𝑙=1 𝒉𝑖 𝛼𝑙,𝑖 ∙ 𝒇𝑙 , , 𝑖 = 1, … 𝑁. 4. Compute ||𝒓|| = √𝒓12 + 𝒓22 + ⋯ + 𝒓2𝑁 . 5. Update 𝒓𝒊 𝜷𝑙,𝑖 = ⟨||𝒓|| , 𝒇𝑙 ⟩ , 𝑖 = 1, … , 𝑁, 𝑙 = 1, … , 𝐿. 6. Update the coefficients of the MPA layer by 𝜶𝑙,𝑖 = softmax(𝜷𝑙,𝑖 ), 𝑖 = 1, … , 𝑁, 𝑙 = 1, … , 𝐿. 7. Endfor 8. Compute 𝐿 𝒕𝑖 = ∑ 𝜶𝑙,𝑖 ∙ 𝒇𝑙 , 𝑖 = 1, … 𝑁, 𝑙=1 9. Compute 𝒓𝑖 = 𝒕𝑖 ∙ 𝒉𝑖 + 𝒏𝑖 , 𝑖 = 1, … , 𝑁. 10. DNN-based receiver inputs the received signals𝐑 = [𝒓1 , … , 𝒓𝑁 ] into the DNN and computes the decoded message 𝐗̂ ̂ 11. Output 𝐗. 4. Simulations We consider the following simulation settings. The transmitter sends a block with 256 bits in each time slot through 16QAM modulation scheme without channel coding. The channel gain changes every 200 time slots by a random distortion, which follows ℎ𝑡+1 = ℎ𝑡 + ∆ℎ𝑑 , where the random distortion follows ∆ℎ𝑑 ~𝐶𝑁(0, 0.3). The proposed MPA-AE is first pre-trained at the channel condition ℎ0 with a fixed SNR 10dB to learn all the neurons on the encoding and decoding neural network and the coefficients on the MAP layer. Each time that the channel varies, only the coefficients 𝛼𝑙,𝑖 at the MPA layer are fine-tuned, while the rest neurons are fixed. For comparison, we simulate a pre-trained AE without the MPA layer as a baseline, whose neurons are fixed even when the channel has changed, and a retrained AE without MPA layer as another baseline, that is completely retrained whenever the channel varies. The three frameworks use the same AE structure in which a fully connected neural network with one hidden layer of 16 neurons is used for both the encoder and the decoder and ReLU is used as the activated function for the hidden layer and Adam optimizer is used to train the AE. The only difference of MPA-AE to the rest AE is the inserted MPA layer. Figure 4: Performance comparison Fig. 4 shows the block error rate performance of the proposed MPA-AE method together with the two baselines, i.e., pre-trained AE without updating during the channel changing and retrained AE. It can be seen that without updating accordingly, the pre-trained AE failed to work. The proposed MPA- AE and retrained AE show almost the same performance. However, we would like to emphasize that retrained AE took almost 14 more times for updating the whole AE than fine-tuning only the MPA layer. The retrained AE is too time-consuming to be implemented in the real application scenarios. The improvement of the generalization is attributed to the MPA layer tuned per the dynamic channel condition during the inference stage. In classic AE architecture, all the neuron layers, both transmitter and receiver parts, are frozen. If the channel ran out of distribution in the inference stage (highly likely in wireless system), the AE-based transceiver would suffer from these outliers. The MPA layer provides some resilience against these dynamic changes. The DNN part is a key component against channel uncertainty. Different SNR represents different white noise levels. In reality, timing-varying dynamic channel is fading channel that includes path loss changes, phrase changes, multiple-path changes and so on. It is hard for a transmitter to perfectly know the current channel conditions. Both receiver’s channel feedback and UL/DL reciprocity would introduce some uncertainty or bias about the current ground-true channel condition. The uncertainty includes both shifts and rotations, which are well addressed by the non-linearity of DNNs on both transmitter and receiver and the MPA layer iteration at the transmitter. In this sense, it seems indispensable to couple MPA with DNN to enable the wireless transceiver to profit from the DNN- based autoencoder. 5. Conclusions and Future Directions In this paper, we proposed a MPA-AE structure and its corresponding algorithms to train the end- to-end transceiver in the scenarios with time-varying channels. The MPA layer inserted between the encoder and decoder of traditional AE can be fine-tuned when the under-going channel changes from the one the transceiver is trained with. Simulations show the superior performance of the proposed method. The MPA layer could be flexibly incorporated with the AE-based transceiver in many use cases, including single user scenarios and multiuser scenarios. Specifically, in the single user scenarios, the MPA layer can be used to design source coding scheme, high-order modulation scheme, massive MIMO scheme, and pre-distortion scheme, which can well adapt to the time-varying channel conditions. In principle, the DNN layers and MPA layer of a transmitter is to distort the input distribution to match the current channel distortion distribution. The non-linear DNN layers provide a quick and powerful non-linear distortion, while the linear MPA layer provides a quick and adaptive linear matching. Moreover, the MPA layer can also be applied in multiuser scenarios for both uplink and downlink MIMO design and Multiple-Access design. More than one MPA-AE could share the same channel. Then, the MPA layer of each becomes autonomous coding design. These research aspects will be considered in future. Another research direction is to use more complex channel models in the training phase. In particular, the channel models are generated by a DNN with the input of surrounding topological information. In this case, the MPA layer is still effective since the inference DNN of the channel is a non-linear function. However, the DNN-based channel model is often very large. Therefore, how to reduce the DNN model size is an interesting research topic for future investigation. 6. References [1] Ye, H., Li, G. Y., Juang, B. H. F., & Sivanesan, K. (2018, December). Channel agnostic end- to-end learning based communication systems with conditional GAN. In 2018 IEEE Globecom Workshops (GC Wkshps) (pp. 1-5). IEEE. [2] Aoudia, F. A., & Hoydis, J. (2018, October). End-to-end learning of communications systems without a channel model. In 2018 52nd Asilomar Conference on Signals, Systems, and Computers (pp. 298-303). IEEE. [3] Goutay, M., Aoudia, F. A., & Hoydis, J. (2019, June). Deep reinforcement learning autoencoder with noisy feedback. In 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT) (pp. 1-6). IEEE. [4] Hu, B., Wang, J., Xu, C., Zhang, G., & Li, R. (2021, September). A Kalman-based Autoencoder Framework for End-to-End Communication Systems. In 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) (pp. 1-6). IEEE. [5] Cammerer, S., Aoudia, F. A., Dörner, S., Stark, M., Hoydis, J., & Ten Brink, S. (2020). Trainable communication systems: Concepts and prototype. IEEE Transactions on Communications, 68(9), 5489-5503. [6] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. [7] Dörner, S., Cammerer, S., Hoydis, J., & Ten Brink, S. (2017). Deep learning based communication over the air. IEEE Journal of Selected Topics in Signal Processing, 12(1), 132- 143. [8] O’Shea, T. J., Roy, T., & West, N. (2019, February). Approximating the void: Learning stochastic channel models from observation with variational generative adversarial networks. In 2019 International Conference on Computing, Networking and Communications (ICNC) (pp. 681-686). IEEE. [9] Ye, H., Liang, L., Li, G. Y., & Juang, B. H. (2020). Deep learning-based end-to-end wireless communication systems with conditional GANs as unknown channels. IEEE Transactions on Wireless Communications, 19(5), 3133-3143. [10] Raj, V., & Kalyani, S. (2018). Backpropagating through the air: Deep learning at physical layer without channel models. IEEE Communications Letters, 22(11), 2278-2281. [11] Aoudia, F. A., & Hoydis, J. (2019). Model-free training of end-to-end communication systems. IEEE Journal on Selected Areas in Communications, 37(11), 2503-2516.