Incremental Learning-based MIMO Relay Selection
Ankit Gupta1 , Mathini Sellathurai1 , Venkata V. Mani2 and Tharmalingam Ratnarajah3
1
  School of Engineering and Physical Science (EPS), Heriot-Watt University, Edinburgh, UK.
2
  Department of Electronics & Communication Engineering, National Institute of Technology, Warangal, India.
3
  Institute for Digital Communications, University of Edinburgh, Edinburgh, UK.


                                         Abstract
                                         The forthcoming 6G wireless networks are expected to be much more machine-intelligent in resource
                                         allocation, including relay selections to serve ever-increasing users and the internet of things with
                                         extended coverage. Selecting an optimal multiple-input multiple-output (MIMO) relay using conventional
                                         methods becomes challenging due to dependency on perfect channel information, which exponentially
                                         increases feedback overhead. In this paper, we propose a novel incremental learning-based online MIMO
                                         relay selection algorithm, with only imperfect channel gain information available at the relay nodes
                                         in the framework of MIMO two-way amplify-and-forward (TWAF) relay networks. In particular, we
                                         develop naive Bayes, logistic regression, and support vector-based incremental learning classifiers for
                                         the near-optimal online relay selection. Using simulated results, we show that the proposed online relay
                                         selection approaches outperform the best conventional Gram-Schmidt algorithm while reducing the
                                         feedback overhead up to a factor of eight.

                                         Keywords
                                         Incremental learning, MIMO, two-way, amplify-and-forward, relay networks, online relay selection.


1. Introduction
With the advent of the internet of everything, the multiple-input multiple-output (MIMO)
network is considered a pivotal technology to meet the high data rate requirements in the
upcoming sixth-generation (6G) networks. Further, relay networks will play a crucial role in the
6G networks by enhancing network reliability, data coverage, and spectral efficiency. MIMO
relaying networks have been recognized to achieve significant diversity gain and significant
spectrum efficiency, to expand ubiquitous coverage on land and air in the upcoming 6G networks.
Further, relay selection can reduce the total network power dissipation while increasing spectral
efficiency [1]. Primarily, in conventional methods, the relay selection uses the procured channel
state information (CSI) knowledge. However, the channels’ time-varying nature and noise make
the procurement of the perfect CSI for a cooperative MIMO network increase the feedback
overhead exceptionally [2]. Thus, with the wide deployment of MIMO relay networks in wireless
sensor devices and the internet of things in the upcoming 6G networks, the amount of feedback
overhead will increase exponentially. Furthermore, with the increased feedback overhead, the
latency of the networks in selecting an even near-optimal relay increases [3]. Hence, we need

AI6G’22: First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks,
July 21, 2022, Padua, Italy
$ ag104@hw.ac.uk (A. Gupta); m.sellathurai@hw.ac.uk (M. Sellathurai); vvmani@nitw.ac.in (V. V. Mani);
t.ratnarajah@ed.ac.uk (T. Ratnarajah)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
to devise intelligent relay selection algorithms that perform accurately with minimum CSI
dependency and low feedback overhead. It becomes even more challenging in the two-way
relaying network, which provides double spectral efficiency gains compared to a one-way
relaying network, because a two-way relay node simultaneously serves the terminal node in
each transmission phase.
   Recently, the relay selection problem has been studied using machine learning (ML)-based
algorithms in [4, 5, 6, 7]. The authors in [4] operated on the unlabelled dataset using the k-means
clustering algorithm, wherein the decode-and-forward (DF) relay nodes were selected under
perfect CSI. The authors in [5] extracted key features in social ties network using stacked
autoencoder and utilized these features for DF relay selection. Furthermore, the authors in
[6] studied two-way (TW) DF relay selection policy using an artificial neural network for a
fixed and variable number of relay nodes by using distance as the feature vector. Recently the
authors in [7] devised a decision tree-based one-way (OW) DF relay selection with quantized
perfect CSI knowledge and show promising results if the Gini-index for the decision trees is set
properly. However, all of these works have focussed only on the single-input single-output (SISO)
networks, DF relaying and considered perfect estimation of the channels. Furthermore, it is well-
known that amplify-and-forward (AF) relaying is widely adopted industrially over DF relaying,
because of reduced implementation complexity, but at the expense of noise amplification [8].
   The works in [4, 5, 6, 7] utilized offline learning-based ML algorithms that were trained once
and deployed. Offline learning leads to two fundamental problems: (1) it needs an extensive
amount of training data that covers all the possible testing scenarios, which is not possible
with ever changing propagation environments and channel conditions, and (2) it does not
continuously integrate new information to the designed ML models, rather a new ML model
has to be trained from scratch. However, incremental learning (also referred to as online
learning, evolving learning, constructive learning, etc.) has appeared as a paradigm shift to
provide streaming ML training and data processing [9]. Once deployed, instead of training
the ML models from scratch, incremental learning updates the previously trained model with
the streaming new data. Thus, retraining takes place only on a small amount of data at a
time (reducing training time), we do not need to store all the data (reducing data storage)
and we do not need extensive initial training data covering all possible scenarios (making ML
models adaptive to propagation environment). Therefore, in this work, we design incremental
learning-based MIMO AF relay selection algorithms with imperfect CSI knowledge.
   We consider MIMO TWAF relay network, where the terminal nodes intend to communicate
by selecting a relay node, under correlated fading channels with imperfect CSI (Section II). We
propose an incremental learning framework for the online MIMO TWAF relay selection, detail
the process of retraining the ML models and prepare the datasets (Section III). We propose
a naive-Bayes (NB)-based online relay selection policy by employing the pairwise algorithm
(Section IV). Also, we model the stochastic gradient descent (SGD) classifiers to design and
develop the support vector classifier (SVC) and logistic regression (LR)-based online relay
selection algorithms (Section V). We perform extensive performance evaluation by varying
signal-to-noise-ratio (SNR), channel estimation quality (CEQ), antenna correlation, number
of antennas in the MIMO links and relay nodes, and to further reduce the feedback overhead
we consider the quantized imperfect channel gain as feedback vectors (Section VI). Lastly, we
conclude this work (Section VII).
Figure 1: System model of the MIMO TWAF relay network with 𝑁 antennas at each node.


2. System Model
We consider a MIMO TWAF relay network wherein the terminal nodes 𝑇1 and 𝑇2 intend to
exchange signals by selecting a relay node 𝐾𝑠 ∈ 𝒦 ≜ {𝐾1 , ..., 𝐾𝐽 }, as shown in Fig. 1.
Each node is equipped with 𝑁 antennas and the direct link is absent because of path-loss and
shadow fading. For the sake of generality, let Γ and Γ^ indicate source and destination terminals,
respectively, i.e., if Γ = 𝑇1 then Γ = 𝑇2 , and vice-versa.
                                   ^

2.1. Channel Model and its Estimation Quality
We employ the spatially correlated Rayleigh fading channels based on the Kronecker cor-
relation by using the linear minimum mean squared error (LMMSE) technique for channel
estimation [10]. Let the actual channel matrix be H   ^ (·) ∼ 𝒞𝒩 (0, 𝜎 2 ) and the transmission and
                                                                      ℎ^
reception correlation matrices [11] be U(·) and R(·) , respectively, then the final channel matrix
           1/2 ^      1/2
becomes R H (·)  (·) U(·) . Let the errors in estimation be E(·) ∼ 𝒞𝒩 (0, 𝜎 2 ) and LMMSE-based
                                                                                𝑒
estimated channel be H(·) ∼ 𝒞𝒩 (0, 𝜎ℎ2 ), given by
                                                 (︁               )︁
                                           1/2        ^ (·) + E(·) U1/2
                                H(·) = R(·)           H              (·)                       (1)

Without loss of generality, we assume that the error variance is dependent on SNR (𝛾) and
                                      𝜎2                            𝛿𝛾𝜎 2
                                                                      𝛿𝛾
indicate CEQ by 𝛿, such that 𝜎𝑒2 = 1+𝛿𝛾𝜎
                                      ^
                                      ℎ
                                         2 ≈ 1+𝛿𝛾 and 𝜎ℎ = 1+𝛿𝛾𝜎 2 ≈ 1+𝛿𝛾 [10].
                                               1       2       ^
                                                               ℎ
                                           ^
                                           ℎ                                ^
                                                                            ℎ


2.2. Signal Transmission Model
In multiple access phase [𝑇1 , 𝑇2 ] transmit their signals to the 𝒦 relays, wherein the signal
obtained at the 𝑗 th relay node becomes

                               y𝐾𝑗 = HΓ𝐾𝑗 sΓ + HΓ^𝐾𝑗 sΓ^ + n𝐾𝑗                                 (2)

where Γ node transmits sΓ with 𝑃Γ transmission power, E{sΓ s𝐻   Γ } = 𝑁 I𝑁 and n𝐾𝑗 denotes
                                                                       𝑃Γ

noise at 𝑗 th relay node with E{n𝐾𝑗 n𝐻
                                     𝐾𝑗 } = 𝜎𝐾𝑗 I𝑁 . Now in broadcast phase, the 𝑗 relay node
                                             2                                    th
amplifies the received signal in (2) with 𝑃𝑅 power, using amplification factor, given by
                 √︂ ⧸︁ {︁                                                           }︁
           𝛼𝑗 = 𝑃𝑅 𝑡𝑟 (𝑃Γ /𝑁 )HΓ𝐾𝑗 H𝐻         Γ𝐾𝑗 + (𝑃 ^
                                                       Γ /𝑁 )H ^  H
                                                              Γ𝐾𝑗 ^
                                                                     𝐻   +  𝜎 2 I
                                                                              𝐾𝑗  𝑁            (3)
                                                                       Γ𝐾𝑗

where 𝑡𝑟(·) denotes trace of the matrix. Considering that the self-interference can be canceled
at the terminal nodes, the received signal at terminal nodes [12] can be given by
                              ∑︁𝐽               (︁              )︁
                        yΓ =          𝛼𝑗 H𝐾𝑗 Γ HΓ^𝐾𝑗 sΓ^ + n𝐾𝑗 + nΓ                          (4)
                                  𝑗=1

where nΓ denotes the noise with E nΓ n𝐻        = 𝜎Γ2 I𝑁 at Γ. The lower bound on the achievable
                                   {︀       }︀
                                          Γ
capacity of the two-way channel links is the sum of lower bounds of the two one-way links,
𝐶Γ⇔Γ^,𝑗 = 𝐶Γ⇒Γ^,𝑗 + 𝐶Γ^⇒Γ,𝑗 , where the lower bound on the achievable capacity for the
transmission from Γ to Γ
                       ^ terminal via 𝑗 th relay node is given by [12]
              {︃      (︃                                                                    )︃}︃
          1                  𝛼𝑗2 𝑃Γ 𝐻                         (︁                        )︁−1
𝐶Γ⇒Γ^,𝑗 = E log2 det I𝑁 +           H𝐾𝑗 Γ HΓ^𝐾𝑗 H𝐻               2 2         𝐻      2
                                                    ^ 𝐾𝑗 H𝐾𝑗 Γ 𝛼𝑗 𝜎𝐾𝑗 H𝐾𝑗 Γ H𝐾𝑗 Γ +𝜎Γ I𝑁
                                                    Γ
          2                    𝑁

where 1/2 is because of half-duplex transmission and similarly we can obtain 𝐶Γ^⇒Γ,𝑗 . We
can pick the optimal relay node 𝐾𝑠⋆ that maximizes the achievable capacity by exhaustively
searching (ES) over the 𝒦 relays, given by 𝐾𝑠⋆ = arg max𝐾𝑗 ∈𝒦 𝐶Γ⇔Γ^,𝑗 , ∀ 𝐾𝑗 ∈ 𝒦. Nonetheless,
with the increase in relays (𝐽) and/or antennas (𝑁 ) in MIMO links, ES becomes computationally
very expensive, making it impractical for online relay selection purposes.


3. An Incremental Learning-Formulation for Online MIMO
   TWAF relay selection
In this section, we can model the TWAF relay selection problem as a multi-class classification
problem, where the absolute values of the channel gains is provided as the input feature vector
and 𝐽 relay nodes resembles as 𝐽 class labels. For an online setting, let us consider that there
are 𝑡 = {1, ..., 𝑇 } discrete-time experiments conducted with 𝑝th ML model 𝐿𝑝 , before it is
updated with new data online, where 𝑝 = {1, ..., 𝑃 }. During any time instant 𝑡 the absolute
value of channel gains of MA and BC phases are provided as feature vector x𝑡 to the ML model
𝐿𝑝 , which then predicts (select) the relay node 𝐾𝑠𝑡 .
   Training dataset creation policy – We create the dataset as follows. For the 𝑡th time-instant,
we formulate a 𝑠 = 4𝐽𝑁 2 -dimensional feature vector (x𝑡 ) containing the absolute values of
the imperfect channel gains in the MA and BC phases, for all 𝐽 relays, where 4 is because of CSI
knowledge in dual-hop and two-phases, and 𝑁 2 is because of 𝑁 × 𝑁 channel between any two
nodes. Correspondingly, we create the label 𝑢𝑡 with the optimal relay node, using the ES method
(detailed in Sec. 2). We repeat this process for 𝑇 time intervals to create the dataset {X0 , u0 }.
We repeat this process for 𝑃 instants to create a large dataset [{X0 , u0 }, ..., {X𝑃 , u𝑃 }].
   Traditionally, in offline learning, we train the model 𝐿0 using dataset {X0 , u0 } and deploy
for future instances. However, these offline learning-based models assume that we have built the
initial training set X0 covering all the possible scenarios (such as CSI, propagation environment
             (a) Using Offline learning.               (b) Using Incremental (Online) learning.

Figure 2: Retraining of the ML-based models using different learning-based approaches.


etc.) to be occurred in future, which is impractical with ever changing conditions. In practice, we
need to retrain the designed ML models to keep them updated. In Fig. 2a, we depict conventional
way of retraining the offline learning (OL)-based ML models and the process below:
    • We train the ML model 𝐿0 using X0 to obtain trained ML model 𝐿1 .
    • As new data X1 arrives, we test X1 on ML model 𝐿1 and obtain the relay selection
      accuracy 𝐴1 for X1 as our testing set. To update the current ML model 𝐿1 (in an online
      setting) with newly obtained data X1 . We have to train the ML model 𝐿0 from scratch
      using all the historical data [X0 , X1 ] stored till now to get updated ML model 𝐿2 . And
      repeat this process of retraining from scratch to update the ML models in online setting.
    • Consider we are at 𝑝th instance with 𝐿𝑝−1 ML model. And we wish to update the 𝐿𝑝−1
      ML model. Firstly, we need to store all the [X0 , ..., X𝑝−1 ] datasets for updating the ML
      model. Secondly, we have to train ML model 𝐿0 from scratch using [X0 , ..., X𝑝−1 ] to get
      an updated model 𝐿𝑝 at any 𝑝th instance.
Thus, for retraining of offline learning-based ML models, we have to store the entire historical
data and retrain the ML models from scratch every time. In Fig. 2b, we depict incremental
learning (IL)-based ML models and the process below:
    • We train the ML model 𝐿0 using X0 to obtain trained ML model 𝐿1 .
    • As new data X1 arrives, we test X1 on ML model 𝐿1 and obtain the relay selection
      accuracy 𝐴1 for X1 as our testing set. To update the current ML model 𝐿1 (in an online
      setting) with newly obtained data X1 . We retrain the model 𝐿1 using only last instance
      testing data X1 to get updated ML model 𝐿2 . And repeat this procedure of retraining to
      update the ML models in online setting.
    • Consider we are at 𝑝th instance with 𝐿𝑝−1 ML model. And we wish to update the 𝐿𝑝−1
      ML model. Herein, we can retrain 𝐿𝑝−1 ML model using only X𝑝−1 dataset to get an
      updated ML model 𝐿𝑝 at any 𝑝th instance.
   Thus, in an online setting, designing ML models using incremental learning frameworks
offers following advantages over retraining of offline learning based ML models:
    • We do not need an exhaustive initial dataset X0 covering all the possible future testing
      scenarios because the incremental learning-based models are designed for retraining.
    • We do not need to store-and-retrain on all the historical dataset, but just the previous
      instance testing dataset, reducing data storage cost and retraining time.
  Now, we propose incremental learning-based ML models for online relay selection below.
4. Incremental Learning-based Naive Bayes Classifier
Naive Bayes classifier is a generative-model [13] that assumes (1) for given class labels, the
attributes are conditionally independent and (2) no latent attribute impact on prediction. Herein,
each feature vector x𝑡 belongs to only one class 𝐾𝑗 ∈ 𝒦, for any time instant 𝑡. Firstly, we
learn the class priors (each relay’s probability), given by 𝑃 (𝐾𝑗 ) = 𝐿𝐾𝑗 /𝐿, ∀ 𝐾𝑗 , where
𝐿𝐾𝑗 and 𝐿 denote the number of samples with label 𝐾𝑗 and total samples. Secondly, for the
given feature vector x𝑡 we generate a model for each label corresponding to each feature.
This is done by calculating the mean 𝜇𝑑¯,𝐾𝑗 and standard deviation 𝜎𝑑¯2,𝐾 associated with
                                                                               𝑗
all(︁the⃒ classes  ∈ 𝒦 and  for  all the features ¯ ∈ {1, ..., 𝑑} given by a normal distribution
                                                  𝑑
             )︁
       ¯⃒
𝑃 𝑥𝑑𝑡 ⃒𝐾𝑗 . Thirdly, we can calculate the conditional probability over query sample x𝑡 as
                       (︁       )︁
                          ¯
𝑃 (x𝑡 |𝐾𝑗 ) = 𝑑𝑑¯=1 𝑃 𝑥𝑑𝑡 |𝐾𝑗 , ∀ 𝐾𝑗 ∈ 𝒦. Fourthly, by employing the bayes’ theorem, the
                ∏︀
                                                                         ¯
conditional probability of each label 𝐾𝑗 ∈ 𝒦 for the query sample 𝑥𝑑𝑡 can be decomposed as
                                              𝑃 (x𝑡 |𝐾𝑗 ) 𝑃 (𝐾𝑗 )
                             𝑃 (𝐾𝑗 |x𝑡 ) = ∑︀𝐽                                                  (5)
                                             𝑙=1,𝑙̸=𝐾𝑗 𝑃 (x𝑡 |𝑙) 𝑃 (𝑙)

Lastly, the naive Bayes classifier combines independent feature model obtained above with a
decision rule like maximum a posteriori, that determines the selected relay 𝐾𝑠𝑡 = arg max𝐾𝑗 ∈𝒦
𝑃 (𝐾𝑗 |x𝑡 ). Please note we employ the pairwise algorithm [13] for updating the mean and
standard deviation of the features incrementally.


5. Incremental Learning-based SGD (SVC and LR) Classifier
We now show the SGD classifier [14]-based formulation for linear SVC [15] and LR [16]-based
online relay selection. We form a binary classifier where y𝑡 ∈ {−1, 1} and our aim is to lean a
linear scoring function 𝑓 (x𝑡 ) = w𝑡† x𝑡 + 𝑏𝑡 , wherein w𝑡 and 𝑏𝑡 denotes the weight and bias at
𝑡th time-instant. Further, the predictions for the binary classification is done by checking the
sign of the scoring function 𝑓 (x𝑡 ) and we aim to minimize:
                                  𝛼                1 ∑︁𝐿      (︁             )︁
                    𝐸(w𝑡 , 𝑏𝑡 ) = w𝑡† w𝑡 (w𝑡 ) +
                                                                 (𝑖)     (𝑖)
                                                             ℒ 𝑦𝑡 , 𝑓 (x𝑡 )                   (6)
                                  2                𝐿     𝑖=1

where 𝛼 > 0 controls the regularization strength. The loss function ℒ(·) can be defined as:
                                             (𝑖)     (𝑖)                     (𝑖)   (𝑖)
    • Hinge-loss for SVC classifier [15]: ℒ(𝑦𝑡 , 𝑓 (x𝑡 )) = max(0, 1 − 𝑦𝑡 𝑓 (x𝑡 )).
                                          (𝑖)     (𝑖)                   (𝑖)   (𝑖)
    • Log-loss for LR classifier [16]: ℒ(𝑦𝑡 , 𝑓 (x𝑡 )) = log(1 + exp(−𝑦𝑡 𝑓 (x𝑡 )))
  The TWAF MIMO relay selection is a multi-class classification (MCC) problem, thus we
use one-vs-all classifier to implement the 𝐽 binary classifiers for each relay (class) to find 𝑅𝑠𝑡 .
Moreover, the first order routine for SGD learning is applied for updating the weights as
                                                       (︁                   )︁ ⎤
                                                            †           (𝑖)
                                    ⎡
                                        𝜕𝒱(w𝑡 )    𝜕ℒ     w𝑡 𝑡x + 𝑏 ,
                                                                   𝑡 𝑡𝑦
                      w𝑡 ← w𝑡 − 𝜂𝑡 ⎣𝛼           +                              ⎦                (7)
                                         𝜕w𝑡                  𝜕w𝑡

Similarly, intercept term is updated and learning rate 𝜂𝑡 is gradually decayed with time [14].
       (a) Mean accuracy.            (b) Time-cost comparison.           (c) Data-storage cost.
Figure 3: Comparison of retraining for OL and IL-based ML relay selection for SNR = 10 dB, 𝜌 =
0.8, 𝐽 = 5, and 𝑁 = 4. Note that both OL and IL algorithms are re-trained (updated) for each 𝑝th
round, according to the policies detailed in Sec. 3.


6. Performance Evaluation
We keep the number of ML model updates (retraining) as 𝑝 = 50 and utilize Rayleigh fading
channels with the corresponding CEQ. We consider following benchmark algorithms – (1) Ex-
haustive Search-based optimal Relay Selection (ES RS) – Detailed in Sec. 2, (2) Gram-Schmidt-based
Relay Selection (GS RS) [12]–Conventionally, GS-based∏︀𝑇2relay selection  policy performs the best,
wherein we select the relay by 𝑅𝑠 = arg max𝐾𝑗 ∈𝒦 Γ=𝑇1 det(H𝐾𝑗 Γ H𝐾𝑗 Γ ) det(H𝐻
                                                                     𝐻
                                                                                        ^ 𝐾𝑗 HΓ
                                                                                        Γ     ^ 𝐾𝑗 ),
and (3) Random RS – We choose relay 𝑅𝑠 ∈ 𝒦 at random.

6.1. Retraining of the ML Models Developed via Incremental-learning (IL)
     versus Offline Learning (OL)
We compare the retraining of the proposed relay selection ML models (SVC, NB, LR) designed
via offline and incremental learning in Fig. 3. We vary the CEQ (𝛿) after every 50 rounds (i.e. for
𝑝 ≤ 50 the 𝛿 = 0.2, for 50 < 𝑝 ≤ 100 the 𝛿 = 0.6 and for 𝑝 > 100 the 𝛿 = ∞), where 𝛿 = 0
indicates fully erroneous channel estimation and 𝛿 = ∞ denotes perfect channel estimation. In
Fig. 3a, the accuracy of optimal relay selection via retraining of OL-based ML models is slightly
better than the proposed IL approach. This is because retraining of OL-based ML models utilizes
all the prior historical data, while retraining in IL takes place using only the previous instant of
data. This also explains the reason behind exponential increase in time-cost and linear increase
in data-storage cost for retraining OL algorithms, compared to the time-cost of few seconds and
data-storage cost of few kilobytes for IL algorithms in Fig. 3b and Fig. 3c, respectively. Thus, IL
solves the two fundamental problems (of data storage and re-training time) faced in deploying
OL algorithms in practical settings. Hence, providing us a method to deploy the ML algorithms
for real-world applications, where re-training the ML algorithms is inevitable.

6.2. Limited Feedback Scenarios and Time-Cost Analysis
Each real-value feedback requires 8 bits of information, thus, we focus on feedback overhead
by considering the quantized feedback of the channel gains. We divide the channel gains into
                                                        5.4


                         Achievable Capacity [bps/Hz]
                                                                                                      ES based RS (No Quant.)

                                                        5.2                                           GS based RS (No Quant.)
                                                                                                      ES based RS (Quant.)
                                                         5
                                                                                                      GS based RS (Quant.)
                                                                                                      SVC based RS (Quant.)
                                                        4.8
                                                                                                      NB based RS (Quant.)

                                                        4.6                                           LR based RS (Quant.)
                                                                                                      Random RS
                                                        4.4
                                                              2   6       10    14      18      22
                                                                  Quantization Levels (Q)


Figure 4: Impact of varying CSI quantization levels (SNR = 10 dB, 𝜌 = 0.8, 𝛿 = 0.35, 𝐽 = 10, 𝑁 = 5).


Table 1
Comparison of relay selection algorithms.
              RS Algorithms                                           Time-cost [ms]          Complexity             Feedback Overhead
          ES (Optimal upper bound)                                        0.779              𝒪(16𝐽𝑁 3 + 𝐽)          4𝐽𝑁 2 complex-values
            GS (Conventional best)                                        0.132              𝒪(4𝐽𝑁 3 + 𝐽)            = 2×4𝐽𝑁 2 ×8 bits
            IL-based NB (Proposed)                                        0.023                𝒪(4𝐽𝑁 2 )        No Quant.: 4𝐽𝑁 2 real-values
           IL-based SVC (Proposed)                                        0.007                𝒪(4𝐽𝑁 2 )        = 4𝐽𝑁 2 ×8 bits. Quant. lev-
            IL-based LR (Proposed)                                        0.036              𝒪(4𝐽𝑁 2 + 1)        els(𝑄 = 4): 4𝐽𝑁 2 ×2 bits


𝑄 levels that can be represented as 𝑄 = 2𝑏 , where 𝑏 denotes the number of bits required for
feedback for corresponding 𝑄 levels. In Fig. 4, for benchmark, we consider the performance of
ES and GS relay selection when no quantization is performed, i.e., complex-valued CSI feedback.
The ES and GS with the quantized channel gains are not able to reach the achievable capacity
of ES and GS with no quantization, even with 22 levels, because there is no phase information
in the feedback. Also, incremental learning-based relay selection algorithms outperform the
conventional best GS with complex-value feedback information (16 bits feedback) once 𝑄 ≥ 4
(2 bits feedback) and achieves performance very close to ES for 𝑄 = 6 quantization levels (3 bits
feedback), showing the merits of proposed incremental learning. We analyse the computational
complexity, time-cost and feedback overhead for all the proposed relay selection algorithms in
Table 1 for 𝐽 = 5 and 𝑁 = 4. Clearly, ES-based relay selection takes the most time, followed
by GS approach. Also, SVC reduces time-cost by 95% compared to GS relay selection.


7. Conclusion
In this work, we proposed incremental learning-based (NB, SVC, LR) relay selection models
for the MIMO TWAF networks with imperfect CSI. We showed that incremental learning-
based retraining can help us in reducing the time-cost and data-storage-cost exponentially,
while achieving similar performance gains, as the retraining of offline ML models. Further, the
proposed incremental learning-based relay selection algorithms (using 2 bits feedback) achieve
similar performance as conventional best GS algorithm (using 16 bits feedback) and can achieve
performance close to optimal brute-force ES algorithm with 3 bits feedback. Also, SVC-based
relay selection reduces the time-cost by 95% compared to GS-based relay selection.
Acknowledgments
We gratefully acknowledge the COG-MHEAR: Towards cognitively-inspired 5G IoT enabled,
multi-modal Hearing Aids (https://cogmhear.org) under Grant EP/T021063/1, the Ministry of
Human Resource Development Government of India for awarding the grant under SPARC, India
(2019/249) and Grant SERB IMRC/AISTDF/CRD/2019/000178 for the support of this work.


References
 [1] Y. Zhao, R. Adve, T. J. Lim, Improving amplify-and-forward relay networks: Optimal
     power allocation versus selection, in: 2006 IEEE International Symposium on Information
     Theory, 2006, pp. 1234–1238.
 [2] A. Bletsas, et. al., A simple cooperative diversity method based on network path selection,
     IEEE Journal on Selected Areas in Communications 24 (2006) 659–672.
 [3] D. Love, R. Heath, W. Santipach, M. Honig, What is the value of limited feedback for mimo
     channels?, IEEE Communications Magazine 42 (2004) 54–59.
 [4] W. Song, F. Zeng, J. Hu, Z. Wang, X. Mao, An unsupervised-learning-based method for
     multi-hop wireless broadcast relay selection in urban vehicular networks, in: 2017 IEEE
     85th Vehicular Technology Conference (VTC Spring), 2017, pp. 1–5.
 [5] P. Zhang, et. al., Overlapping community deep exploring-based relay selection method
     toward multi-hop d2d communication, IEEE Wireless Commun. Letters 8 (2019) 1357–1360.
 [6] Z. Zhang, et. al., Neural network-based relay selection in two-way swipt-enabled cognitive
     radio networks, IEEE Transactions on Vehicular Technology 69 (2020) 6264–6274.
 [7] X. Wang, F. Liu, Data-driven relay selection for physical-layer security: A decision tree
     approach, IEEE Access 8 (2020) 12105–12116.
 [8] A. Gupta, K. Singh, M. Sellathurai, Time-switching eh-based joint relay selection and
     resource allocation algorithms for multi-user multi-carrier af relay networks, IEEE Trans-
     actions on Green Communications and Networking 3 (2019) 505–522.
 [9] J. Wang, Encyclopedia Of Data Warehousing And Mining, IGI Global, USA, 2005.
[10] A. R. Heidarpour, et. al., Network coded cooperation based on relay selection with imperfect
     csi, in: 2017 IEEE 86th Vehicular Technology Conference (VTC-Fall), 2017, pp. 1–5.
[11] G. Liu, H. Ji, Y. Li, X. Zhang, Sum rate maximization antenna selection via discrete
     stochastic approximation in mimo two-way af relay with imperfect csi, in: 2012 IEEE
     Global Communications Conference (GLOBECOM), 2012, pp. 2487–2492.
[12] C.-C. Hu, B.-H. Chen, Two-way mimo relaying systems employing layered relay-and-
     antenna selection strategies, IEEE Systems Journal 12 (2018) 854–861.
[13] T. F. Chan, G. H. Golub, R. J. LeVeque, Updating Formulae and a Pairwise Algorithm for
     Computing Sample Variances, Technical Report, Stanford, CA, USA, 1979.
[14] L. Bottou, Large-scale machine learning with stochastic gradient descent, in: Proceedings
     of COMPSTAT’2010, Physica-Verlag HD, Heidelberg, 2010, pp. 177–186.
[15] C.-W. Hsu, C.-C. Chang, C.-J. Lin, A Practical Guide to Support Vector Classification,
     Technical Report, Department of Computer Science, National Taiwan University, 2003.
[16] D. W. Hosmer, S. Lemeshow, Applied logistic regression, John Wiley and Sons, 2000.