Incremental Learning-based MIMO Relay Selection Ankit Gupta1 , Mathini Sellathurai1 , Venkata V. Mani2 and Tharmalingam Ratnarajah3 1 School of Engineering and Physical Science (EPS), Heriot-Watt University, Edinburgh, UK. 2 Department of Electronics & Communication Engineering, National Institute of Technology, Warangal, India. 3 Institute for Digital Communications, University of Edinburgh, Edinburgh, UK. Abstract The forthcoming 6G wireless networks are expected to be much more machine-intelligent in resource allocation, including relay selections to serve ever-increasing users and the internet of things with extended coverage. Selecting an optimal multiple-input multiple-output (MIMO) relay using conventional methods becomes challenging due to dependency on perfect channel information, which exponentially increases feedback overhead. In this paper, we propose a novel incremental learning-based online MIMO relay selection algorithm, with only imperfect channel gain information available at the relay nodes in the framework of MIMO two-way amplify-and-forward (TWAF) relay networks. In particular, we develop naive Bayes, logistic regression, and support vector-based incremental learning classifiers for the near-optimal online relay selection. Using simulated results, we show that the proposed online relay selection approaches outperform the best conventional Gram-Schmidt algorithm while reducing the feedback overhead up to a factor of eight. Keywords Incremental learning, MIMO, two-way, amplify-and-forward, relay networks, online relay selection. 1. Introduction With the advent of the internet of everything, the multiple-input multiple-output (MIMO) network is considered a pivotal technology to meet the high data rate requirements in the upcoming sixth-generation (6G) networks. Further, relay networks will play a crucial role in the 6G networks by enhancing network reliability, data coverage, and spectral efficiency. MIMO relaying networks have been recognized to achieve significant diversity gain and significant spectrum efficiency, to expand ubiquitous coverage on land and air in the upcoming 6G networks. Further, relay selection can reduce the total network power dissipation while increasing spectral efficiency [1]. Primarily, in conventional methods, the relay selection uses the procured channel state information (CSI) knowledge. However, the channels’ time-varying nature and noise make the procurement of the perfect CSI for a cooperative MIMO network increase the feedback overhead exceptionally [2]. Thus, with the wide deployment of MIMO relay networks in wireless sensor devices and the internet of things in the upcoming 6G networks, the amount of feedback overhead will increase exponentially. Furthermore, with the increased feedback overhead, the latency of the networks in selecting an even near-optimal relay increases [3]. Hence, we need AI6G’22: First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks, July 21, 2022, Padua, Italy $ ag104@hw.ac.uk (A. Gupta); m.sellathurai@hw.ac.uk (M. Sellathurai); vvmani@nitw.ac.in (V. V. Mani); t.ratnarajah@ed.ac.uk (T. Ratnarajah) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) to devise intelligent relay selection algorithms that perform accurately with minimum CSI dependency and low feedback overhead. It becomes even more challenging in the two-way relaying network, which provides double spectral efficiency gains compared to a one-way relaying network, because a two-way relay node simultaneously serves the terminal node in each transmission phase. Recently, the relay selection problem has been studied using machine learning (ML)-based algorithms in [4, 5, 6, 7]. The authors in [4] operated on the unlabelled dataset using the k-means clustering algorithm, wherein the decode-and-forward (DF) relay nodes were selected under perfect CSI. The authors in [5] extracted key features in social ties network using stacked autoencoder and utilized these features for DF relay selection. Furthermore, the authors in [6] studied two-way (TW) DF relay selection policy using an artificial neural network for a fixed and variable number of relay nodes by using distance as the feature vector. Recently the authors in [7] devised a decision tree-based one-way (OW) DF relay selection with quantized perfect CSI knowledge and show promising results if the Gini-index for the decision trees is set properly. However, all of these works have focussed only on the single-input single-output (SISO) networks, DF relaying and considered perfect estimation of the channels. Furthermore, it is well- known that amplify-and-forward (AF) relaying is widely adopted industrially over DF relaying, because of reduced implementation complexity, but at the expense of noise amplification [8]. The works in [4, 5, 6, 7] utilized offline learning-based ML algorithms that were trained once and deployed. Offline learning leads to two fundamental problems: (1) it needs an extensive amount of training data that covers all the possible testing scenarios, which is not possible with ever changing propagation environments and channel conditions, and (2) it does not continuously integrate new information to the designed ML models, rather a new ML model has to be trained from scratch. However, incremental learning (also referred to as online learning, evolving learning, constructive learning, etc.) has appeared as a paradigm shift to provide streaming ML training and data processing [9]. Once deployed, instead of training the ML models from scratch, incremental learning updates the previously trained model with the streaming new data. Thus, retraining takes place only on a small amount of data at a time (reducing training time), we do not need to store all the data (reducing data storage) and we do not need extensive initial training data covering all possible scenarios (making ML models adaptive to propagation environment). Therefore, in this work, we design incremental learning-based MIMO AF relay selection algorithms with imperfect CSI knowledge. We consider MIMO TWAF relay network, where the terminal nodes intend to communicate by selecting a relay node, under correlated fading channels with imperfect CSI (Section II). We propose an incremental learning framework for the online MIMO TWAF relay selection, detail the process of retraining the ML models and prepare the datasets (Section III). We propose a naive-Bayes (NB)-based online relay selection policy by employing the pairwise algorithm (Section IV). Also, we model the stochastic gradient descent (SGD) classifiers to design and develop the support vector classifier (SVC) and logistic regression (LR)-based online relay selection algorithms (Section V). We perform extensive performance evaluation by varying signal-to-noise-ratio (SNR), channel estimation quality (CEQ), antenna correlation, number of antennas in the MIMO links and relay nodes, and to further reduce the feedback overhead we consider the quantized imperfect channel gain as feedback vectors (Section VI). Lastly, we conclude this work (Section VII). Figure 1: System model of the MIMO TWAF relay network with 𝑁 antennas at each node. 2. System Model We consider a MIMO TWAF relay network wherein the terminal nodes 𝑇1 and 𝑇2 intend to exchange signals by selecting a relay node 𝐾𝑠 ∈ 𝒦 β‰œ {𝐾1 , ..., 𝐾𝐽 }, as shown in Fig. 1. Each node is equipped with 𝑁 antennas and the direct link is absent because of path-loss and shadow fading. For the sake of generality, let Ξ“ and Ξ“^ indicate source and destination terminals, respectively, i.e., if Ξ“ = 𝑇1 then Ξ“ = 𝑇2 , and vice-versa. ^ 2.1. Channel Model and its Estimation Quality We employ the spatially correlated Rayleigh fading channels based on the Kronecker cor- relation by using the linear minimum mean squared error (LMMSE) technique for channel estimation [10]. Let the actual channel matrix be H ^ (Β·) ∼ π’žπ’© (0, 𝜎 2 ) and the transmission and β„Ž^ reception correlation matrices [11] be U(Β·) and R(Β·) , respectively, then the final channel matrix 1/2 ^ 1/2 becomes R H (Β·) (Β·) U(Β·) . Let the errors in estimation be E(Β·) ∼ π’žπ’© (0, 𝜎 2 ) and LMMSE-based 𝑒 estimated channel be H(Β·) ∼ π’žπ’© (0, πœŽβ„Ž2 ), given by (︁ )︁ 1/2 ^ (Β·) + E(Β·) U1/2 H(Β·) = R(Β·) H (Β·) (1) Without loss of generality, we assume that the error variance is dependent on SNR (𝛾) and 𝜎2 π›Ώπ›ΎπœŽ 2 𝛿𝛾 indicate CEQ by 𝛿, such that πœŽπ‘’2 = 1+π›Ώπ›ΎπœŽ ^ β„Ž 2 β‰ˆ 1+𝛿𝛾 and πœŽβ„Ž = 1+π›Ώπ›ΎπœŽ 2 β‰ˆ 1+𝛿𝛾 [10]. 1 2 ^ β„Ž ^ β„Ž ^ β„Ž 2.2. Signal Transmission Model In multiple access phase [𝑇1 , 𝑇2 ] transmit their signals to the 𝒦 relays, wherein the signal obtained at the 𝑗 th relay node becomes y𝐾𝑗 = HΓ𝐾𝑗 sΞ“ + HΞ“^𝐾𝑗 sΞ“^ + n𝐾𝑗 (2) where Ξ“ node transmits sΞ“ with 𝑃Γ transmission power, E{sΞ“ s𝐻 Ξ“ } = 𝑁 I𝑁 and n𝐾𝑗 denotes 𝑃Γ noise at 𝑗 th relay node with E{n𝐾𝑗 n𝐻 𝐾𝑗 } = πœŽπΎπ‘— I𝑁 . Now in broadcast phase, the 𝑗 relay node 2 th amplifies the received signal in (2) with 𝑃𝑅 power, using amplification factor, given by βˆšοΈ‚ ⧸︁ {︁ }︁ 𝛼𝑗 = 𝑃𝑅 π‘‘π‘Ÿ (𝑃Γ /𝑁 )HΓ𝐾𝑗 H𝐻 Γ𝐾𝑗 + (𝑃 ^ Ξ“ /𝑁 )H ^ H Γ𝐾𝑗 ^ 𝐻 + 𝜎 2 I 𝐾𝑗 𝑁 (3) Γ𝐾𝑗 where π‘‘π‘Ÿ(Β·) denotes trace of the matrix. Considering that the self-interference can be canceled at the terminal nodes, the received signal at terminal nodes [12] can be given by βˆ‘οΈπ½ (︁ )︁ yΞ“ = 𝛼𝑗 H𝐾𝑗 Ξ“ HΞ“^𝐾𝑗 sΞ“^ + n𝐾𝑗 + nΞ“ (4) 𝑗=1 where nΞ“ denotes the noise with E nΞ“ n𝐻 = πœŽΞ“2 I𝑁 at Ξ“. The lower bound on the achievable {οΈ€ }οΈ€ Ξ“ capacity of the two-way channel links is the sum of lower bounds of the two one-way links, 𝐢Γ⇔Γ^,𝑗 = 𝐢Γ⇒Γ^,𝑗 + 𝐢Γ^β‡’Ξ“,𝑗 , where the lower bound on the achievable capacity for the transmission from Ξ“ to Ξ“ ^ terminal via 𝑗 th relay node is given by [12] {οΈƒ (οΈƒ )οΈƒ}οΈƒ 1 𝛼𝑗2 𝑃Γ 𝐻 (︁ )οΈβˆ’1 𝐢Γ⇒Γ^,𝑗 = E log2 det I𝑁 + H𝐾𝑗 Ξ“ HΞ“^𝐾𝑗 H𝐻 2 2 𝐻 2 ^ 𝐾𝑗 H𝐾𝑗 Ξ“ 𝛼𝑗 πœŽπΎπ‘— H𝐾𝑗 Ξ“ H𝐾𝑗 Ξ“ +πœŽΞ“ I𝑁 Ξ“ 2 𝑁 where 1/2 is because of half-duplex transmission and similarly we can obtain 𝐢Γ^β‡’Ξ“,𝑗 . We can pick the optimal relay node 𝐾𝑠⋆ that maximizes the achievable capacity by exhaustively searching (ES) over the 𝒦 relays, given by 𝐾𝑠⋆ = arg max𝐾𝑗 βˆˆπ’¦ 𝐢Γ⇔Γ^,𝑗 , βˆ€ 𝐾𝑗 ∈ 𝒦. Nonetheless, with the increase in relays (𝐽) and/or antennas (𝑁 ) in MIMO links, ES becomes computationally very expensive, making it impractical for online relay selection purposes. 3. An Incremental Learning-Formulation for Online MIMO TWAF relay selection In this section, we can model the TWAF relay selection problem as a multi-class classification problem, where the absolute values of the channel gains is provided as the input feature vector and 𝐽 relay nodes resembles as 𝐽 class labels. For an online setting, let us consider that there are 𝑑 = {1, ..., 𝑇 } discrete-time experiments conducted with 𝑝th ML model 𝐿𝑝 , before it is updated with new data online, where 𝑝 = {1, ..., 𝑃 }. During any time instant 𝑑 the absolute value of channel gains of MA and BC phases are provided as feature vector x𝑑 to the ML model 𝐿𝑝 , which then predicts (select) the relay node 𝐾𝑠𝑑 . Training dataset creation policy – We create the dataset as follows. For the 𝑑th time-instant, we formulate a 𝑠 = 4𝐽𝑁 2 -dimensional feature vector (x𝑑 ) containing the absolute values of the imperfect channel gains in the MA and BC phases, for all 𝐽 relays, where 4 is because of CSI knowledge in dual-hop and two-phases, and 𝑁 2 is because of 𝑁 Γ— 𝑁 channel between any two nodes. Correspondingly, we create the label 𝑒𝑑 with the optimal relay node, using the ES method (detailed in Sec. 2). We repeat this process for 𝑇 time intervals to create the dataset {X0 , u0 }. We repeat this process for 𝑃 instants to create a large dataset [{X0 , u0 }, ..., {X𝑃 , u𝑃 }]. Traditionally, in offline learning, we train the model 𝐿0 using dataset {X0 , u0 } and deploy for future instances. However, these offline learning-based models assume that we have built the initial training set X0 covering all the possible scenarios (such as CSI, propagation environment (a) Using Offline learning. (b) Using Incremental (Online) learning. Figure 2: Retraining of the ML-based models using different learning-based approaches. etc.) to be occurred in future, which is impractical with ever changing conditions. In practice, we need to retrain the designed ML models to keep them updated. In Fig. 2a, we depict conventional way of retraining the offline learning (OL)-based ML models and the process below: β€’ We train the ML model 𝐿0 using X0 to obtain trained ML model 𝐿1 . β€’ As new data X1 arrives, we test X1 on ML model 𝐿1 and obtain the relay selection accuracy 𝐴1 for X1 as our testing set. To update the current ML model 𝐿1 (in an online setting) with newly obtained data X1 . We have to train the ML model 𝐿0 from scratch using all the historical data [X0 , X1 ] stored till now to get updated ML model 𝐿2 . And repeat this process of retraining from scratch to update the ML models in online setting. β€’ Consider we are at 𝑝th instance with πΏπ‘βˆ’1 ML model. And we wish to update the πΏπ‘βˆ’1 ML model. Firstly, we need to store all the [X0 , ..., Xπ‘βˆ’1 ] datasets for updating the ML model. Secondly, we have to train ML model 𝐿0 from scratch using [X0 , ..., Xπ‘βˆ’1 ] to get an updated model 𝐿𝑝 at any 𝑝th instance. Thus, for retraining of offline learning-based ML models, we have to store the entire historical data and retrain the ML models from scratch every time. In Fig. 2b, we depict incremental learning (IL)-based ML models and the process below: β€’ We train the ML model 𝐿0 using X0 to obtain trained ML model 𝐿1 . β€’ As new data X1 arrives, we test X1 on ML model 𝐿1 and obtain the relay selection accuracy 𝐴1 for X1 as our testing set. To update the current ML model 𝐿1 (in an online setting) with newly obtained data X1 . We retrain the model 𝐿1 using only last instance testing data X1 to get updated ML model 𝐿2 . And repeat this procedure of retraining to update the ML models in online setting. β€’ Consider we are at 𝑝th instance with πΏπ‘βˆ’1 ML model. And we wish to update the πΏπ‘βˆ’1 ML model. Herein, we can retrain πΏπ‘βˆ’1 ML model using only Xπ‘βˆ’1 dataset to get an updated ML model 𝐿𝑝 at any 𝑝th instance. Thus, in an online setting, designing ML models using incremental learning frameworks offers following advantages over retraining of offline learning based ML models: β€’ We do not need an exhaustive initial dataset X0 covering all the possible future testing scenarios because the incremental learning-based models are designed for retraining. β€’ We do not need to store-and-retrain on all the historical dataset, but just the previous instance testing dataset, reducing data storage cost and retraining time. Now, we propose incremental learning-based ML models for online relay selection below. 4. Incremental Learning-based Naive Bayes Classifier Naive Bayes classifier is a generative-model [13] that assumes (1) for given class labels, the attributes are conditionally independent and (2) no latent attribute impact on prediction. Herein, each feature vector x𝑑 belongs to only one class 𝐾𝑗 ∈ 𝒦, for any time instant 𝑑. Firstly, we learn the class priors (each relay’s probability), given by 𝑃 (𝐾𝑗 ) = 𝐿𝐾𝑗 /𝐿, βˆ€ 𝐾𝑗 , where 𝐿𝐾𝑗 and 𝐿 denote the number of samples with label 𝐾𝑗 and total samples. Secondly, for the given feature vector x𝑑 we generate a model for each label corresponding to each feature. This is done by calculating the mean πœ‡π‘‘Β―,𝐾𝑗 and standard deviation πœŽπ‘‘Β―2,𝐾 associated with 𝑗 all(︁theβƒ’ classes ∈ 𝒦 and for all the features Β― ∈ {1, ..., 𝑑} given by a normal distribution 𝑑 )︁ Β―βƒ’ 𝑃 π‘₯𝑑𝑑 ⃒𝐾𝑗 . Thirdly, we can calculate the conditional probability over query sample x𝑑 as (︁ )︁ Β― 𝑃 (x𝑑 |𝐾𝑗 ) = 𝑑𝑑¯=1 𝑃 π‘₯𝑑𝑑 |𝐾𝑗 , βˆ€ 𝐾𝑗 ∈ 𝒦. Fourthly, by employing the bayes’ theorem, the βˆοΈ€ Β― conditional probability of each label 𝐾𝑗 ∈ 𝒦 for the query sample π‘₯𝑑𝑑 can be decomposed as 𝑃 (x𝑑 |𝐾𝑗 ) 𝑃 (𝐾𝑗 ) 𝑃 (𝐾𝑗 |x𝑑 ) = βˆ‘οΈ€π½ (5) 𝑙=1,𝑙̸=𝐾𝑗 𝑃 (x𝑑 |𝑙) 𝑃 (𝑙) Lastly, the naive Bayes classifier combines independent feature model obtained above with a decision rule like maximum a posteriori, that determines the selected relay 𝐾𝑠𝑑 = arg max𝐾𝑗 βˆˆπ’¦ 𝑃 (𝐾𝑗 |x𝑑 ). Please note we employ the pairwise algorithm [13] for updating the mean and standard deviation of the features incrementally. 5. Incremental Learning-based SGD (SVC and LR) Classifier We now show the SGD classifier [14]-based formulation for linear SVC [15] and LR [16]-based online relay selection. We form a binary classifier where y𝑑 ∈ {βˆ’1, 1} and our aim is to lean a linear scoring function 𝑓 (x𝑑 ) = w𝑑† x𝑑 + 𝑏𝑑 , wherein w𝑑 and 𝑏𝑑 denotes the weight and bias at 𝑑th time-instant. Further, the predictions for the binary classification is done by checking the sign of the scoring function 𝑓 (x𝑑 ) and we aim to minimize: 𝛼 1 βˆ‘οΈπΏ (︁ )︁ 𝐸(w𝑑 , 𝑏𝑑 ) = w𝑑† w𝑑 (w𝑑 ) + (𝑖) (𝑖) β„’ 𝑦𝑑 , 𝑓 (x𝑑 ) (6) 2 𝐿 𝑖=1 where 𝛼 > 0 controls the regularization strength. The loss function β„’(Β·) can be defined as: (𝑖) (𝑖) (𝑖) (𝑖) β€’ Hinge-loss for SVC classifier [15]: β„’(𝑦𝑑 , 𝑓 (x𝑑 )) = max(0, 1 βˆ’ 𝑦𝑑 𝑓 (x𝑑 )). (𝑖) (𝑖) (𝑖) (𝑖) β€’ Log-loss for LR classifier [16]: β„’(𝑦𝑑 , 𝑓 (x𝑑 )) = log(1 + exp(βˆ’π‘¦π‘‘ 𝑓 (x𝑑 ))) The TWAF MIMO relay selection is a multi-class classification (MCC) problem, thus we use one-vs-all classifier to implement the 𝐽 binary classifiers for each relay (class) to find 𝑅𝑠𝑑 . Moreover, the first order routine for SGD learning is applied for updating the weights as (︁ )︁ ⎀ † (𝑖) ⎑ πœ•π’±(w𝑑 ) πœ•β„’ w𝑑 𝑑x + 𝑏 , 𝑑 𝑑𝑦 w𝑑 ← w𝑑 βˆ’ πœ‚π‘‘ βŽ£π›Ό + ⎦ (7) πœ•w𝑑 πœ•w𝑑 Similarly, intercept term is updated and learning rate πœ‚π‘‘ is gradually decayed with time [14]. (a) Mean accuracy. (b) Time-cost comparison. (c) Data-storage cost. Figure 3: Comparison of retraining for OL and IL-based ML relay selection for SNR = 10 dB, 𝜌 = 0.8, 𝐽 = 5, and 𝑁 = 4. Note that both OL and IL algorithms are re-trained (updated) for each 𝑝th round, according to the policies detailed in Sec. 3. 6. Performance Evaluation We keep the number of ML model updates (retraining) as 𝑝 = 50 and utilize Rayleigh fading channels with the corresponding CEQ. We consider following benchmark algorithms – (1) Ex- haustive Search-based optimal Relay Selection (ES RS) – Detailed in Sec. 2, (2) Gram-Schmidt-based Relay Selection (GS RS) [12]–Conventionally, GS-basedβˆοΈ€π‘‡2relay selection policy performs the best, wherein we select the relay by 𝑅𝑠 = arg max𝐾𝑗 βˆˆπ’¦ Ξ“=𝑇1 det(H𝐾𝑗 Ξ“ H𝐾𝑗 Ξ“ ) det(H𝐻 𝐻 ^ 𝐾𝑗 HΞ“ Ξ“ ^ 𝐾𝑗 ), and (3) Random RS – We choose relay 𝑅𝑠 ∈ 𝒦 at random. 6.1. Retraining of the ML Models Developed via Incremental-learning (IL) versus Offline Learning (OL) We compare the retraining of the proposed relay selection ML models (SVC, NB, LR) designed via offline and incremental learning in Fig. 3. We vary the CEQ (𝛿) after every 50 rounds (i.e. for 𝑝 ≀ 50 the 𝛿 = 0.2, for 50 < 𝑝 ≀ 100 the 𝛿 = 0.6 and for 𝑝 > 100 the 𝛿 = ∞), where 𝛿 = 0 indicates fully erroneous channel estimation and 𝛿 = ∞ denotes perfect channel estimation. In Fig. 3a, the accuracy of optimal relay selection via retraining of OL-based ML models is slightly better than the proposed IL approach. This is because retraining of OL-based ML models utilizes all the prior historical data, while retraining in IL takes place using only the previous instant of data. This also explains the reason behind exponential increase in time-cost and linear increase in data-storage cost for retraining OL algorithms, compared to the time-cost of few seconds and data-storage cost of few kilobytes for IL algorithms in Fig. 3b and Fig. 3c, respectively. Thus, IL solves the two fundamental problems (of data storage and re-training time) faced in deploying OL algorithms in practical settings. Hence, providing us a method to deploy the ML algorithms for real-world applications, where re-training the ML algorithms is inevitable. 6.2. Limited Feedback Scenarios and Time-Cost Analysis Each real-value feedback requires 8 bits of information, thus, we focus on feedback overhead by considering the quantized feedback of the channel gains. We divide the channel gains into 5.4 Achievable Capacity [bps/Hz] ES based RS (No Quant.) 5.2 GS based RS (No Quant.) ES based RS (Quant.) 5 GS based RS (Quant.) SVC based RS (Quant.) 4.8 NB based RS (Quant.) 4.6 LR based RS (Quant.) Random RS 4.4 2 6 10 14 18 22 Quantization Levels (Q) Figure 4: Impact of varying CSI quantization levels (SNR = 10 dB, 𝜌 = 0.8, 𝛿 = 0.35, 𝐽 = 10, 𝑁 = 5). Table 1 Comparison of relay selection algorithms. RS Algorithms Time-cost [ms] Complexity Feedback Overhead ES (Optimal upper bound) 0.779 π’ͺ(16𝐽𝑁 3 + 𝐽) 4𝐽𝑁 2 complex-values GS (Conventional best) 0.132 π’ͺ(4𝐽𝑁 3 + 𝐽) = 2Γ—4𝐽𝑁 2 Γ—8 bits IL-based NB (Proposed) 0.023 π’ͺ(4𝐽𝑁 2 ) No Quant.: 4𝐽𝑁 2 real-values IL-based SVC (Proposed) 0.007 π’ͺ(4𝐽𝑁 2 ) = 4𝐽𝑁 2 Γ—8 bits. Quant. lev- IL-based LR (Proposed) 0.036 π’ͺ(4𝐽𝑁 2 + 1) els(𝑄 = 4): 4𝐽𝑁 2 Γ—2 bits 𝑄 levels that can be represented as 𝑄 = 2𝑏 , where 𝑏 denotes the number of bits required for feedback for corresponding 𝑄 levels. In Fig. 4, for benchmark, we consider the performance of ES and GS relay selection when no quantization is performed, i.e., complex-valued CSI feedback. The ES and GS with the quantized channel gains are not able to reach the achievable capacity of ES and GS with no quantization, even with 22 levels, because there is no phase information in the feedback. Also, incremental learning-based relay selection algorithms outperform the conventional best GS with complex-value feedback information (16 bits feedback) once 𝑄 β‰₯ 4 (2 bits feedback) and achieves performance very close to ES for 𝑄 = 6 quantization levels (3 bits feedback), showing the merits of proposed incremental learning. We analyse the computational complexity, time-cost and feedback overhead for all the proposed relay selection algorithms in Table 1 for 𝐽 = 5 and 𝑁 = 4. Clearly, ES-based relay selection takes the most time, followed by GS approach. Also, SVC reduces time-cost by 95% compared to GS relay selection. 7. Conclusion In this work, we proposed incremental learning-based (NB, SVC, LR) relay selection models for the MIMO TWAF networks with imperfect CSI. We showed that incremental learning- based retraining can help us in reducing the time-cost and data-storage-cost exponentially, while achieving similar performance gains, as the retraining of offline ML models. Further, the proposed incremental learning-based relay selection algorithms (using 2 bits feedback) achieve similar performance as conventional best GS algorithm (using 16 bits feedback) and can achieve performance close to optimal brute-force ES algorithm with 3 bits feedback. Also, SVC-based relay selection reduces the time-cost by 95% compared to GS-based relay selection. Acknowledgments We gratefully acknowledge the COG-MHEAR: Towards cognitively-inspired 5G IoT enabled, multi-modal Hearing Aids (https://cogmhear.org) under Grant EP/T021063/1, the Ministry of Human Resource Development Government of India for awarding the grant under SPARC, India (2019/249) and Grant SERB IMRC/AISTDF/CRD/2019/000178 for the support of this work. References [1] Y. Zhao, R. Adve, T. J. Lim, Improving amplify-and-forward relay networks: Optimal power allocation versus selection, in: 2006 IEEE International Symposium on Information Theory, 2006, pp. 1234–1238. [2] A. Bletsas, et. al., A simple cooperative diversity method based on network path selection, IEEE Journal on Selected Areas in Communications 24 (2006) 659–672. [3] D. Love, R. Heath, W. Santipach, M. Honig, What is the value of limited feedback for mimo channels?, IEEE Communications Magazine 42 (2004) 54–59. [4] W. Song, F. Zeng, J. Hu, Z. Wang, X. Mao, An unsupervised-learning-based method for multi-hop wireless broadcast relay selection in urban vehicular networks, in: 2017 IEEE 85th Vehicular Technology Conference (VTC Spring), 2017, pp. 1–5. [5] P. Zhang, et. al., Overlapping community deep exploring-based relay selection method toward multi-hop d2d communication, IEEE Wireless Commun. Letters 8 (2019) 1357–1360. [6] Z. Zhang, et. al., Neural network-based relay selection in two-way swipt-enabled cognitive radio networks, IEEE Transactions on Vehicular Technology 69 (2020) 6264–6274. [7] X. Wang, F. Liu, Data-driven relay selection for physical-layer security: A decision tree approach, IEEE Access 8 (2020) 12105–12116. [8] A. Gupta, K. Singh, M. Sellathurai, Time-switching eh-based joint relay selection and resource allocation algorithms for multi-user multi-carrier af relay networks, IEEE Trans- actions on Green Communications and Networking 3 (2019) 505–522. [9] J. Wang, Encyclopedia Of Data Warehousing And Mining, IGI Global, USA, 2005. [10] A. R. Heidarpour, et. al., Network coded cooperation based on relay selection with imperfect csi, in: 2017 IEEE 86th Vehicular Technology Conference (VTC-Fall), 2017, pp. 1–5. [11] G. Liu, H. Ji, Y. Li, X. Zhang, Sum rate maximization antenna selection via discrete stochastic approximation in mimo two-way af relay with imperfect csi, in: 2012 IEEE Global Communications Conference (GLOBECOM), 2012, pp. 2487–2492. [12] C.-C. Hu, B.-H. Chen, Two-way mimo relaying systems employing layered relay-and- antenna selection strategies, IEEE Systems Journal 12 (2018) 854–861. [13] T. F. Chan, G. H. Golub, R. J. LeVeque, Updating Formulae and a Pairwise Algorithm for Computing Sample Variances, Technical Report, Stanford, CA, USA, 1979. [14] L. Bottou, Large-scale machine learning with stochastic gradient descent, in: Proceedings of COMPSTAT’2010, Physica-Verlag HD, Heidelberg, 2010, pp. 177–186. [15] C.-W. Hsu, C.-C. Chang, C.-J. Lin, A Practical Guide to Support Vector Classification, Technical Report, Department of Computer Science, National Taiwan University, 2003. [16] D. W. Hosmer, S. Lemeshow, Applied logistic regression, John Wiley and Sons, 2000.