Towards Trustworthy AI for QoE prediction in B5G/6G
Networks
José Luis Corcuera Bárcena1 , Pietro Ducange1 , Francesco Marcelloni1 ,
Giovanni Nardini1,2 , Alessandro Noferi1 , Alessandro Renda1 , Giovanni Stea1 and
Antonio Virdis1
1
    Department of Information Engineering, University of Pisa, Largo Lucio Lazzarino 1, 56122 Pisa, Italy
2
    Center for Logistic Systems, University of Pisa, Via dei Pensieri 60, 57124 Livorno, Italy


                                         Abstract
                                         The ability to forecast Quality of Experience (QoE) metrics will be crucial in several applications and
                                         services offered by the future B5G/6G networks. However, QoE timeseries forecasting has not been
                                         adequately investigated so far, mainly due to the lack of available realistic datasets. In this paper, we first
                                         present a novel QoE forecasting dataset obtained from realistic 5G network simulations and characterized
                                         by Quality of Service (QoS) and QoE metrics for a video-streaming application; then, we embrace the
                                         topical challenge of trustworthiness in the adoption of AI systems for tackling the QoE prediction task.
                                         We show how an eXplainable Artificial Intelligence (XAI) model, namely Decision Tree, can be effectively
                                         leveraged for addressing the forecasting problem. Finally, we identify federated learning as a suitable
                                         paradigm for privacy-preserving collaborative model training and outline the related challenges from
                                         both an algorithmic and 6G network support perspective.

                                         Keywords
                                         Machine learning, B5G/6G networks, QoE forecasting, Explainable AI, Federated learning,


1. Introduction
The development of Beyond 5G (B5G) and 6G networks is currently underway, as testified by
the activity of international projects such as Hexa-X1 . Such new technologies are expected
to pave the way to innovative services that will pose stringent quality requirements. For
example, automotive applications, such as see-through or tele-operated driving [1, 2], require
the transmission of high-definition videos in real time. The Quality of Experience (QoE)
perceived by the end users determines whether the service can be provided or not. However,
QoE metrics can often be obtained by subjective rating only, making it difficult to monitor them
in real time. In computer networks - including B5G/6G ones - QoE depends on the Quality of
Service (QoS) metrics provided by the network itself, e.g. radio channel quality and packet loss,
AI6G’22: First International Workshop on Artificial Intelligence in beyond 5G and 6G Wireless Networks,
July 21, 2022, Padua, Italy
$ joseluis.corcuera@phd.unipi.it (J. Corcuera Bárcena); alessandro.noferi@phd.unipi.it (A. Noferi);
{name.surname}@unipi.it (all other authors)
 0000-0002-9984-1904 (J. Corcuera Bárcena); 0000-0003-4510-1350 (P. Ducange); 0000-0002-5895-876X
(F. Marcelloni); 0000-0001-9796-6378 (G. Nardini); 0000-0002-0482-5048 (A. Renda); 0000-0001-5310-6763 (G. Stea);
0000-0002-0629-1078 (A. Virdis)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings           CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073


                  1
                      https://hexa-x.eu/, accessed March 2022
although direct mapping between them and QoE is not straightforward [3]. For this reason,
Artificial Intelligence (AI) and in particular Machine Learning (ML) approaches are foreseen as
a valid mean to predict QoE from QoS metrics.
   Some recent works on the topic [3, 4] have relied on aggregate metrics and have not addressed
the problem of predicting QoE in near real-time over input data streams. The main roadblock
toward this goal is the lack of available realistic datasets featuring QoS and QoE timeseries.
Indeed, obtaining a reliable dataset related to mobile network applications is difficult, because
network operators are hardly willing to share information about their network. Even if a subset
of measurements is provided, their level of detail may not be the required one. Simulations
are a viable option to overcome the above limitations: they can replicate custom (large-scale)
network deployments with custom parametrization (e.g., number of users or amount of traffic).
This allows AI systems to accurately model a variety of network conditions that may not be
observed during a limited time frame in the real network.
   However, the design of AI systems cannot be simply targeted to optimize accuracy and
technical robustness but must also comply with additional requirements towards trustworthy
AI [5], such as transparency of AI models and privacy of data owners. The former requirement
is at the core of the eXplainable Artificial Intelligence (XAI) field [6], concerned with producing
details and reasons regarding the functioning of a model, by either ensuring its interpretability
by design, or exploiting post-hoc techniques. On the other hand, the privacy requirement poses
serious challenges to the training stage of data-hungry ML models and may require to revisit
the learning paradigms: Federated Learning (FL) [7] has been recently proposed as a solution to
enable collaborative training of AI models while preserving the privacy of data owners.
   In this work we approach the problem of QoE prediction in future B5G/6G networks from
different perspectives. Specifically, the contributions of this paper are threefold: i) we use
realistic simulations to make available a dataset that includes QoS and QoE metrics for a video-
streaming application in a 5G network environment; ii) we present a preliminary analysis on
the above dataset exploiting an XAI model to predict QoE; iii) we argue that FL is suitable to
predict QoE in a privacy-preserving way, and we discuss how 6G networks should support it.
   The rest of this paper is organized as follows: Section 2 formulates our problem and describes
the dataset we obtained from realistic 5G network simulations. A preliminary approach for
QoE prediction with an XAI model is presented in Section 3, whereas Section 4 highlights the
pathway towards FL in future 6G networks. Finally, we draw conclusions in Section 5.


2. QoE prediction in B5G/6G networks
In this section, we present the problem of QoE prediction in B5G/6G networks. We also present
a simulated QoE dataset and discuss the assumptions we made to generate it.

2.1. Problem statement
We consider an automotive environment where connected vehicles are User Equipments (UEs)
of the mobile network, and are attached to their respective base station, or gNodeB (gNB).
UEs play real-time video streams whose perceived quality is a relevant factor to determine
the availability of some advanced driving assistance system. For example, in a see-through
application, a vehicle receives the live video acquired by the camera of the vehicle in front of
it, which helps the driver overtake. However, such a service is only safe if one can rely on
the video to be displayed continuously and with high quality for the entire duration of the
maneuver, which may require several seconds. Thus, in order to decide whether to begin the
maneuver at all, we need to predict the QoE perceived by UEs in the next future by leveraging
real-time QoS and QoE data generated by the UEs themselves.
    We assume that each UE collects both QoS (e.g., SINR) and QoE (e.g., percentage of frames
that are correctly displayed) metrics while receiving the video stream. Context information
like UE’s position and speed can also be collected and used as factors to train an ML model.
The above metrics are collected by each UE at discrete time periods, hence for each metric
the UE generates a vector whose 𝑖-th element is the value of the metric collected at time 𝑖.
Network-wise metrics like average utilization of the cell can also be leveraged to provide more
meaningful predictions. All the above metrics gathered by the UE and the network will be used
to predict the value of a target QoE metric (or a set thereof) at a time in the future.
    To support the above prediction, we need to train an ML model with a dataset that includes
realistic QoS and QoE metrics obtained from the mobile network. Since mobile network
operators are hardly available to share data about their users and network, we resorted to
network simulations to generate the relevant dataset.

2.2. Generation of the dataset
Simulations are carried out with Simu5G [8], an open-source model library for the OMNeT++
simulation framework2 that enables the evaluation of end-to-end performance of 5G-enabled
applications. Within Simu5G, we implemented a client-server video-streaming application. The
server sends a video stream to the client following a trace-based approach: sending rate, size
and type of video frames are read from a trace file generated from real videos via a dedicated
command of the FFmpeg library3 . Traces were obtained from three dash-camera videos, so
as to reproduce a see-through scenario.4 Before sending the frames over the network, the
server fragments them in packets, which are then transmitted via the Real-time Transport
Protocol (RTP). RTP packets received by the client are then played out at their corresponding
playout time. We configured the client with 100ms-playout delay: this is a tradeoff between the
real-timeliness of the video streaming and the buffering time required to prevent stalls.
   Figure 1 shows the simulation scenario. Seven gNBs are deployed in a regular hexagonal
grid with inter-gNB distance of 500m. Fifteen UEs are deployed randomly over the floorplan
and connect to the gNB they receive the highest power from. Each of them runs the client side
of the video streaming application, whereas their server-side counterparts reside on a remote
host connected to the 5G core network. Each client receives a different video trace, obtained by
starting one of the three above-mentioned dash cam videos at different times. To create realistic
load conditions, each gNB also sends 50 𝑘𝐵𝑠 of downlink traffic to 30 background UEs. We also
simulate an additional tier of background cells, each serving 30 background UEs, in order to
generate realistic interference to the UEs attached to the seven central gNBs [9]. We run 24
   2
     OMNeT++ Website: https://omnetpp.org, accessed May 2022
   3
     ffprobe -show_frames: online documentation https://ffmpeg.org/ffprobe.html, accessed May 2022
   4
     https://bit.ly/3iN651q, https://bit.ly/35n9elO, https://bit.ly/3IT5g24, accessed May 2022
Figure 1: Simulation scenario


independent replicas of an 120-second simulation, collecting time-tagged metrics from the 15
UEs in the seven central cells. Metrics are summarized in Table 1.

Table 1
Description of the metrics included in the dataset.
Name                      Level       Description
Context
UE position             Application   (x, y, z) coordinates of the UE in the floorplan
UE speed                Application   speed of the UE in 𝑚  𝑠

QoS metrics
avgServedBlocksDl        Network      number of Resource Blocks occupied in downlink
averageCqiDl             Network      CQI values reported in DL
rcvdSinrDl               Network      SINR value measured at packet reception
servingCell              Network      ID of the new serving cell after the handover
frameSize               Application   size of the displayed frame (Byte)
rtpPacketSize           Application   size of the RTP packet (Byte)
end2EndDelay            Application   time between transmission and reception of an RTP packet
interArrivalTimeRtp     Application   interarrival time between two RTP packets
rtpLoss                 Application   RTP packets of frame lost
QoE metrics
framesDisplayed         Application   frame percentage arrived at the time of its display
playoutBufferLength     Application   frame buffer size
firstFrameElapsedTime   Application   3 values: 1) timestamp of the UE request, 2) timestamp of the sender
                                      ACK, 3) time between the request and the first frame displayed


   The resulting dataset consists of 5568 rows, each being a tuple with six fields: run is the ID of
the replica; network_parameters include the variables describing the simulation configuration
(e.g., the scheduling algorithm); module is the network entity (e.g., ue[0]) that recorded the
metric; statistic is the name of the recorded metric; values is a vector including the values
recorded for the above metric; timestamp is a vector whose elements are the timestamps of the
corresponding elements of the values vector. The dataset is available at http:// www.iet.unipi.it/
g.nardini/ ai6g_qoe_dataset.html.
3. QoE prediction as a regression problem
Although the forecasting literature has traditionally been dominated by statistical methods
based on linear processes, ML methods are currently gaining increasing attention due to their
high modelling capability especially on large datasets [10]. In this preliminary analysis we focus
on the latter set of approaches and formulate our QoE prediction task as a regression problem.
First, we describe the preprocessing steps designed to transform the original raw dataset into a
regression dataset, suitable for the downstream adoption of traditional ML approaches. Then,
we show the results of a first experimental analysis, using a Decision Tree as regression model.
The choice of this model is dictated by the explainability requirement for trustworthy AI. Tree-
based models are generally considered among the most inherently explainable classification
and regression models; their interpretability, however, depends on several factors [4] and a
thorough analysis of this aspect will be key for future investigations.

3.1. Data preprocessing
We observe that the time-tagged metrics in the dataset are not aligned on the same timestamps
for two main reasons: 𝑖) only few of them have fixed sampling interval, and 𝑖𝑖) some values are
missing at specific times due to, e.g., connection drops. Thus, a preprocessing stage is required.
   As a first step, the timeseries available to the UEs have been identified. It is worth under-
lining that the avgServedBlocksDl statistic can be retrieved by the UEs based on the available
information on the ServingCell. In other words, for each UE, we build a new timeseries, namely
avgServedBlocksDl_UE, by exploiting the handover information and concatenating the avgServed-
BlocksDl timeseries fragments taken from the cells serving the UE itself. For example, a UE may
obtain this information through the services available in a MEC-enabled architecture.


Figure 2: Preprocessing steps: the QoE prediction task as a regression problem.


   Fig. 2 shows the procedure used for a preliminary analysis with ML techniques for the QoE-
prediction task. As an example, the first ten seconds of the timeseries from three metrics are
shown, namely positionX, rcvdSinrDL and framesDisplayed (QoE target metric). To obtain any
record of the preprocessed dataset we compute statistics within a window 𝑊 over historical data
of each variable. Specifically, mean, median, max, min, variance, standard deviation, kurtosis,
skewness, Q1 and Q3 are computed, and the number of samples used for the estimates is stored.
In future developments the actual trends of the variables can also be considered. The associated
target value is the mean of the frameDisplayed variable over the time horizon of size 𝐻 (one
step ahead forecasting). The subsequent record is obtained by sliding the two windows with a
step 𝐻. Each instance is thus represented in R132 (11 statistics evaluated over window of size
W on 12 timeseries) and is associated with the target QoE (average value of frameDisplayed
over window of size H). In this analysis we focus on timeseries metrics, therefore we do not
include the values of firstFrameElapsedTime in the model.

3.2. Preliminary experimental analysis: setup and results
The dataset has been divided in training and test sets for a first experimental evaluation campaign:
20 runs are grouped to form the training set, whereas the remaining 4 runs represent the test
set. A distinctive trait of the dataset is that some values are missing (e.g., because a simulated
UE lost connectivity at some point). For the purpose of the present work, we simply discard
such records from both the training and the test sets.
   As per the choice of the XAI model, we resort to the Python implementation of the Decision
Tree (DT) for regression available in scikit-learn5 . We set the windows size 𝑊 = 10𝑠 and
𝐻 = 1𝑠 and tested with different split criteria. For the sake of brevity, we only report the results
of the best configuration obtained by using MSE as split criteria, 0.01 as the fraction of samples
required to split an internal node and 0.001 as the fraction of samples required to be at a leaf
node. Table 2 reports the performance of the trained model measured in terms of MSE, MAE
and coefficient of determination (𝑅2 ) for different values of maximum depth of DTs (5,10,15);
also, we report the complexity of the models measured in terms of number of nodes, number of
leaf nodes and number of features selected in the induced tree.

Table 2
Global results and model complexity for different values of maximum DT depth. Regression metrics are
evaluated on whole training and test sets. Best values are highlighted in bold.
                                           Regression metrics                                    Model complexity
                                                                                2
                              MSE                    MAE                    𝑅                                  Features
    Max Depth                                                                                 Nodes   Leaves
                      train         test     train         test    train             test                      selected
                5     0.1038     0.1105      0.2581     0.2661     0.4080            0.3565      57      29         18
               10     0.0817    0.1019       0.2134    0.2418      0.5340           0.4065      303     152         65
               15    0.0791      0.1040     0.2082      0.2424    0.5489             0.3944     401     201         69


   The reported results should be intended as a first baseline on our QoE forecasting dataset. It is
interesting to note that the DT at depth 10 achieves the best generalization capability, with a MAE
on the test set lower than 25%. An increase in depth does not bring any benefit: the complexity
of the model increases leading only to a more severe overfitting. Conversely, the most compact
DT exhibits a slight reduction in the regression metrics but is significantly less complex, and
therefore more interpretable, compared to deeper trees. In the following, a rule extracted from
the most compact DT is shown: IF framesDisplayed_skew ≤ 0.49 AND interArrivalTimeRtp_skew
≤ 0.40 AND interArrivalTimeRtp_counter ≤ 0.37 AND interArrivalTimeRtp_max ≤ 0.09 AND
frameSize_Q3 ≤ 0.00 THEN framesDisplayed = 0.08.
   5
       https://scikit-learn.org/, accessed May 2022
   To better assess the quality of model predictions it is worth looking at the timeseries: Fig.
3 reports the examples of QoE timeseries for two UEs, featuring both the ground-truth and
the predicted values obtained by the most accurate model (depth 10). The visual analysis of
Fig. 3 suggests that the model provides reasonable predictions in different scenarios, namely
when the timeseries are plagued simply by one or by several events of QoE degradation. In the
latter case, regression metrics are comparably lower, likely due to the latency of the model in
capturing the transition between QoE levels. The detection of such transitions represents one
of the most significant challenge of the problem at hand.


Figure 3: Real and predicted values of QoE for two example UEs of the test set.

   Although the preliminary results can be considered promising, we emphasize a few aspects
that deserve further investigation. First, a comparative analysis covering multiple models
and parameters configuration should be carried out; in particular, the impact of the choice
of windows size 𝑊 and 𝐻 must be assessed. Second, an appropriate strategy for handling
missing values should be devised. Third, an analysis on which metrics are relevant for the QoE
forecasting task is advocated. Finally, it should be noticed that the adopted approach stems from
a strong assumption: we build a global training set by collecting data produced by different
sources. In the following section we present a solution to tackle the QoE prediction task when
this last assumption is too strict or unachievable.


4. Federated Learning for QoE prediction
Collecting peripheral data for processing and training on a centralized server is often impractical
due to the resulting communication overheads and the disclosure of UEs’ private data. The
preservation of data owners’ privacy is a crucial requirement towards the realization of trusted
AI-empowered B5G/6G networks: it becomes thus essential to leverage novel paradigms, such
as FL, that enable collaborative model training between UEs without any sharing of raw data
with each other or with other parties. FL perfectly fits the scenario described in this paper. In
the following we outline the challenges associated with the fulfillment of the trustworthiness
requirement through the federated learning of inherently explainable AI models (Fed-XAI), and
we define an architecture to support Fed-XAI operations in a 6G framework.

4.1. Enhancing users’ trust: Federated Learning of XAI models
As anticipated in Section 1, realization of trustworthy AI entails compliance with several
requirements, including privacy and transparency [5]. While the privacy requirement is natively
satisfied by the FL paradigm, the transparency one strongly depends on the specific model
adopted and its ability to provide explanations on any decision made. Most of the existing
FL approaches leverage the federated setting for collaborative training neural networks (NNs)
and deep learning (DL) models, which are often referred to as opaque or black box models.
Conversely, FL of explainable-by-design AI models, such as DTs and Rule Based Systems (RBSs),
has not been adequately investigated so far. The concept of Fed-XAI (i.e., FL of XAI models)
aims to fill this gap by enhancing users’ trust in AI-empowered future 6G networks. Whenever
the FL process is orchestrated by a central entity, a possible implementation of Fed-XAI consists
in (i) local learning of XAI models by data owners, (ii) local models transmission to the central
server, (iii) models aggregation by the central server, and (iv) global model transmission to the
data owners for local inference. The aggregation step is the major challenge towards Fed-XAI:
appropriate procedures for merging DTs and RBSs should be devised as their learning stage
is not based on the optimization of a differentiable global objective function (as it is the case
with NNs and DL models) and the well established federated averaging protocol (FedAvg) [7],
designed for collaborative gradient-based optimization, cannot be immediately applied.

4.2. 6G network support to Federated Learning
We envision that UEs of future 6G networks will participate in FL processes following the
as-a-service paradigm. To this aim, the 6G network must provide new protocols that handle
the interactions among the entities involved in the FL framework, such as querying the list
of available FL processes and joining one of them, as well as participating in the training and
obtaining a model. In the following, we refer to an FL process as a collaborative learning task
dedicated to a specific application (e.g., QoE prediction for automotive applications).
   Our proposed logical architecture for Fed-XAI in 6G networks is depicted in Fig. 4.


Figure 4: Fed-XAI architecture

   Each UE is supported by a Fed-XAI local manager, which interacts with the FL framework
on behalf of the UE application. It manages both the learning and inferencing modules of the
UE. When the UE wants to join/leave an FL process, its Fed-XAI local manager queries the
Fed-XAI service provider, which is the module that maintains the overall view of the FL processes
available in the system. The Fed-XAI service provider orchestrates the entities that will actually
execute the FL processes. In particular, each active FL process is composed of two modules, i.e.
the Fed-XAI controller and the Fed-XAI computation engine. The former manages control-plane
interactions with the Fed-XAI service provider (e.g., authorization grants) and the Fed-XAI local
manager, whereas the latter acts as the FL aggregator. Indeed, the Fed-XAI computation engine
exchanges local and global model updates with the learning submodules of UEs’ Fed-XAI local
manager, which in turn act as FL collaborators. Notably, the deployment of the above entities is
immaterial: the Fed-XAI service provider may reside either in the cloud or at the edge of the 6G
network, while the Fed-XAI local manager may reside at either the UE device or at the edge.
This last option may be necessary with resource-constrained UEs, e.g. IoT devices.


5. Conclusions
In this work, we have presented a novel dataset obtained through realistic 5G network simu-
lations for QoE forecasting in B5G/6G networks. We have discussed some preliminary QoE
forecasting results achieved by a Decision Tree as an inherently explainable model, and experi-
mentally highlighted the adequacy of the adopted approach as a baseline for the QoE forecasting
task. Finally, we have discussed the implications of extending the XAI model towards an FL
approach, from both an algorithmic and a network perspective. Future work will include design-
ing a Fed-XAI-based approach to tackle the prediction of QoE in B5G/6G networks, as well as
evaluating the impact of network transport on the performance of XAI models, and vice versa.


Acknowledgments
We acknowledge the support of: the Italian Ministry of University and Research (MIUR), in the
framework of the Cross-Lab project (Departments of Excellence) and PON 2014-2021 “Research
and Innovation", DM MUR 1062/2021, Project title: “Progettazione e sperimentazione di algoritmi
di federated learning per data stream mining”; the Center for Logistic Systems of Livorno; the
EU Commission through the H2020 projects Hexa-X (Grant no. 101015956).


References
 [1] C-V2X Use Cases and Service Level Requirements Vol. I, Technical Report, 5GAA, 2020.
 [2] C-V2X Use Cases and Service Level Requirements Vol. II, Technical Report, 5GAA, 2021.
 [3] V. Vasilev, J. Leguay, S. Paris, L. Maggi, M. Debbah, Predicting QoE Factors with Machine
     Learning, in: 2018 IEEE Int’l Conf. on Communications (ICC), 2018, pp. 1–6.
 [4] A. Renda, P. Ducange, G. Gallo, F. Marcelloni, XAI Models for Quality of Experience
     Prediction in Wireless Networks, in: 2021 IEEE Int’l Conf. on Fuzzy Systems (FUZZ-IEEE),
     IEEE, 2021, pp. 1–6.
 [5] E. Commission, C. Directorate-General for Communications Networks, Technology, Ethics
     guidelines for trustworthy AI, Publications Office, 2019.
 [6] A. B. Arrieta, et al., Explainable Artificial Intelligence (XAI): Concepts, taxonomies,
     opportunities and challenges toward responsible AI, Information fusion 58 (2020) 82–115.
 [7] Q. Yang, Y. Liu, T. Chen, Y. Tong, Federated machine learning: Concept and applications,
     ACM Trans. on Intelligent Systems and Technology (TIST) 10 (2019) 1–19.
 [8] G. Nardini, D. Sabella, G. Stea, P. Thakkar, A. Virdis, Simu5G–An OMNeT++ Library for
     End-to-End Performance Evaluation of 5G Networks, IEEE Access 8 (2020) 181176–181191.
 [9] G. Nardini, G. Stea, A. Virdis, Scalable Real-Time Emulation of 5G Networks With Simu5G,
     IEEE Access 9 (2021) 148504–148520.
[10] V. Cerqueira, L. Torgo, C. Soares, Machine learning vs statistical methods for time series
     forecasting: Size matters, arXiv preprint arXiv:1909.13316 (2019).