<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>First Workshop on Online Learning from Uncertain Data Streams, July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hoefding Regression Trees for Forecasting Quality of Experience in B5G/6G Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>José Luis Corcuera Bárcena</string-name>
          <email>joseluis.corcuera@phd.unipi.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pietro Ducange</string-name>
          <email>pietro.ducange@unipi.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Marcelloni</string-name>
          <email>francesco.marcelloni@unipi.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Renda</string-name>
          <email>alessandro.renda@unipi.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Rufini</string-name>
          <email>fabrizio.ruffini@ing.unipi.it</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Pisa</institution>
          ,
          <addr-line>Largo Lucio Lazzarino 1, 56122 Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>18</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Online data stream analysis is becoming more and more relevant, as the focus of daily life analyses shifts from ofline processing to real-time acquisition and modeling of massive data from remote devices. In this paper, we focus our attention on the domain of telecommunications, in particular the video streaming services for moving devices (e.g., a passenger enjoying a movie during a car trip). Since the streaming service must provide a satisfactory level of quality of experience to the user, it is important to predict incoming problems on video quality. We used the well-known Hoefding Decision Tree (HDT) for streaming data, tailored to regression problems, and we compared its performance with standard Regression Trees (RTs) to evaluate the potentiality of HDTs to forecast the quality of experience in terms of accuracy, time for learning, and memory used. Results show that, during the online learning process, the standard RT outperforms HDT in terms of accuracy, but is prone to under-performance in terms of timings and memory when applied to potentially massive data streaming scenarios.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Data Stream Mining</kwd>
        <kwd>Regression Tree</kwd>
        <kwd>QoE forecasting</kwd>
        <kwd>Explainable AI</kwd>
        <kwd>Hoefding Decision Tree</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Quality of Experience (QoE) is a measure of end-user satisfaction in enjoying a service and is
typically used in the context of telecommunications [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The fulfillment of QoE metrics is a
primary goal in current, i.e., fourth and fifth generations, and future mobile networks. Beyond
5G (B5G) and 6G networks are indeed currently under development as pointed out, for instance,
by the commitment of institutions, industry and academia in the framework of international
projects such as Hexa-X1. Such next generation wireless networks are expected to be much
more complex than current ones and will support innovative functionalities such as holographic
communication, high precision manufacturing, and smart automotive applications [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Notably,
the capability to play high-definition videos in real-time may represent a key enabler toward
such new functionalities. Thus, being able to forecast the perceived quality of video experience
may be fundamental to avoid the degradation of end-users’ satisfaction or to determine whether
a specific functionality should be provided or not.
      </p>
      <p>
        In the context of video streaming services, QoE metrics include startup delay, rebufering
events and video quality [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] which clearly depend on contextual factors and typically vary
over time; several works [
        <xref ref-type="bibr" rid="ref1 ref3 ref4">1, 3, 4</xref>
        ] have recently addressed the QoE prediction task by
exploiting Machine Learning (ML) techniques and leveraging Quality of Service (QoS) metrics, i.e.,
quantitative measures that characterize the service ofered by the network, such as packet
loss and channel quality. Interestingly, only one of these works [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] has framed the issue of
QoE prediction as a timeseries forecasting problem, yet disregarding important challenges of
data stream mining: the whole dataset is typically not available for ofline processing and the
distribution of data may change over time due to a phenomenon known as concept drift [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
making it essential to adapt the model to avoid performance degradation.
      </p>
      <p>
        In the last decades, various approaches for incremental learning of ML models have been
proposed; here, we focus on the field of eXplainable Artificial Intelligence (XAI) and specifically
on a class of inherently interpretable models, capable of explaining, by design, how decisions
have been taken. Indeed, transparency (i.e., the capability of understanding the structure of
the model itself) represents a key requirement towards trustworthy AI (AI) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which in turn is
deemed as a major pillar in the design of next generation wireless networks. In this framework,
the Hoefding Decision Tree (HDT) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] represents a reference approach: it has been widely
exploited for both classification and regression tasks. In the context of classification tasks, HDT
has also recently been extended with fuzziness to handle vague and noisy data and enhance
interpretability [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>In this paper, we present a preliminary experimental evaluation of the Hoefding Regression
Tree (HRT) for a QoE forecasting task in the frame of next generation wireless networks:
specifically, we resort on a recently published QoS-QoE forecasting dataset and compare the
performance of HRT and classical Regression Tree (RT) from diferent perspectives: modelling
capability, training time and memory required.</p>
      <p>The rest of the paper is organized as follows: in Section 2 we summarize the key aspects of
HRT model; in Section 3 we describe the experimental setup, highlighting the diferent learning
schemes being compared and the evaluation strategies. Section 4 reports the experimental
results, whereas in Section 5 we analyze the robustness of the HRT model to hyperparameter
configuration. Finally, Section 6 draws some conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Hoefding Regression Tree: background</title>
      <p>
        HDT, also known as “Very Fast Decision Tree” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], is a reference model to solve classification
problems over an input data stream. In a nutshell, it allows growing a binary decision tree
incrementally: a leaf is considered for a split only if it contains a minimum number of samples
and a condition based on the Hoefding’s theorem is met. The theorem guarantees, within a
certain level of confidence, that the selected attribute would have been the same in the case
of an infinite number of available samples. In the case of classification, the condition is met
when the diference between the two highest values of the information gains computed for
the attributes available at the leaf node is higher than a bound, dubbed the Hoefding’s bound.
Although the adoption of the Hoefding’s theorem in relation to the splitting criterion has
received some criticism [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], HDT generally provides satisfactory results and can be regarded as
a valid heuristic method.
      </p>
      <p>HRT represents an adaptation of HDT to solve regression problems given an input data
stream. Unlike its classification counterpart, HRT relies on calculating the reduction of variance
of the target variable to decide among the splitting candidates. Let ΔVar() and ΔVar() be the
reduction of variance associated to the best and the second best splitting attribute, respectively.
The Hoefding condition, for a leaf node L, is defined as follows:
and the term , i.e. the Hoefding bound for the leaf node L, is evaluated according to the
following equation:
ΔVar()
ΔVar()</p>
      <p>&lt; 1 − 
 =
√︃ ln(1/)
2
where  (split confidence) is equal to 1 minus the desired probability of choosing the correct
attribute, and  is the number of samples in node L.</p>
      <p>
        The value assigned to a leaf node is the average of the target values of the training samples
contained in the leaf node, and, given an incoming input sample, is used to predict the output
at inference time. As any tree-based model, HRT features a high level of interpretability,
which is a crucial requirement in many applications, including those within next generation
wireless networks. Thus, we adopt HRT for tackling our QoE forecasting problem, leveraging
an implementation available in the scikit-multiflow library [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental analysis</title>
      <p>In this section, we first introduce the problem and the dataset; then, we describe the models and
learning schemes involved in the experimental comparison. Finally, we provide details about
the experimental setup.</p>
      <sec id="sec-3-1">
        <title>3.1. Problem description: the QoE forecasting dataset</title>
        <p>
          As the scenario of our investigation, we consider the publicly available QoS-QoE forecasting
dataset2, introduced in one of our previous works [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. A client-server video-streaming
application is simulated within Simu5G [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], a dedicated open-source model library for realistic
5G network simulations: while experiencing the video, each of the 15 simulated clients, also
referred to as user equipment (UE), measures or collects a set of time-tagged QoS and QoE
metrics. We formulate the QoE prediction task as in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], replicating the preprocessing and
features extraction steps. Specifically, a simulation lasts approximately 120 seconds: for each
user, during such time frame, we collect the timeseries related to 12 metrics (QoS, QoE and
2http://www.iet.unipi.it/g.nardini/ai6g_qoe_dataset.html, accessed June 2022
(1)
(2)
contextual). Then, we obtain any tuple of the preprocessed dataset as follows: for a timestamp
, the input variables consist in 11 statistics (i.e., mean, median, max, min, variance, standard
deviation, kurtosis, skewness, Q1 and Q3, number of samples) measured for each metric in the
time window [ − , ] (with  = 10), whereas the output variable consists in the mean of
the target QoE metric over the time horizon of one second (i.e., in [,  + ], with  = 1). As
the target QoE metric, we consider the average percentage of arrived frames at the time of its
display. The subsequent tuple is obtained by sliding the two windows  and  with a step of
1 second. To summarize, each instance in the dataset is represented in R132, resulting from 11
statistics evaluated over window of size W on 12 timeseries. The 120-seconds video-streaming
simulation is repeated 24 times.
        </p>
        <p>We consider the following setting: we aim to learn the mapping between QoS and QoE in
order to tackle the QoE forecasting problem. We assume that the data generated by diferent
UEs within a simulation can be gathered for training the model; however, the data from the
various simulations are not immediately available but arrive in chunks, each corresponding
to one of the 24 simulations. Basically, one can think of the 24 simulations as representing
temporally consecutive scenarios in which each of the various UEs experiences, from time to
time, diferent situations. The overall dataset consists of 28758 samples, with a chunk size
ranging from 972 to 1466 samples (the variability is induced by the removal of missing values).
Such a setting demands for ad-hoc strategies for incremental model training: in the following,
we describe two learning schemes based on classical RT and HRT models, respectively, along
with the evaluation strategies adopted for assessing the performance of the models.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Learning schemes and evaluation strategies</title>
        <p>Let chunk  indicate the chunk of data of the -th simulation, with  = 1, 2, . . . , 24. Each chunk 
contains the samples of all the UEs from the -th simulation. We compare two learning schemes
using two evaluation strategies.</p>
        <p>Learning schemes. HRT supports an incremental learning scheme: it consists in updating
the model at each incoming chunk. In other words, at each step  the model is updated
considering only the current chunk . Conversely, the classical RT does not support an incremental
learning scheme: the model is retrained from scratch at each newly collected chunk of data.
At each step  the previous model is replaced with a new one trained on ⋃︀
=1 chunk  , i.e., the
union of the chunks collected so far.</p>
        <p>Evaluation strategies. Both learning schemes are evaluated using two approaches, widely
adopted in data stream applications. Prequential evaluation, or interleaved-test-then-train,
can be formalized as follows: once a new chunk  is collected (with  = 2, . . . , 24) we first assess
the performance of the current model on chunk  and then exploit it to train/update the model.
For example, the first evaluation step consists in using the first chunk ( chunk 1) for training and
the chunk 2 for testing. Hold-out evaluation consists in assessing the performance of a model
after updating it using each chunk  on a fixed test set. To carry out this experiment, we assume
that 4 chunks are immediately available as test set (specifically: chunk 21, chunk 22, chunk 23,
and chunk 24). At each step of the analysis the updated model will always be tested on the same
data.</p>
        <p>To summarize, in our experimental campaign, we refer to the various approaches using the
following notation:
• HRT-preq indicates the HRT model, i.e., incremental learning scheme, evaluated using
the prequential strategy.
• HRT-hold-out indicates the HRT model, i.e., incremental learning scheme, evaluated
using the hold-out strategy.
• RT-preq indicates the RT model, i.e., retraining learning scheme, evaluated using the
prequential strategy.
• RT-hold-out indicates the RT model, i.e., retraining learning scheme, evaluated using
the hold-out strategy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Experimental setup</title>
        <p>
          Both HRT and classical RT have publicly available Python implementations: HRT is available
in scikit-multiflow3, whereas the classical RT is implemented in scikit-learn4. Tables
1 and 2 report the values of the main configuration parameters for HRT and RT models. As
per the former, we adopt the default parameter configuration, whereas for the latter we set the
parameters coherently with our previous study [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], pursuing a fair comparison of the results.
        </p>
        <p>We executed our experiments on a computer featuring an x86_64 architecture, 16 cores, Intel
Xeon Processor (Cascadelake) - 2.194GHz and 32GB RAM.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>In this section, we report the results of our experimental analysis from a threefold perspective:
regression metrics, memory used, and time for learning/updating the model.</p>
      <p>3https://scikit-multiflow.readthedocs.io/en/latest/api/generated/skmultiflow.trees.HoefdingTreeRegressor.
html, accessed June 2022</p>
      <p>4https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html, accessed June 2022</p>
      <sec id="sec-4-1">
        <title>4.1. Regression metrics and model complexity</title>
        <p>In the following, we compare the HRT models with their RT counterparts, considering the
prequential and the hold-out evaluation strategies independently.</p>
        <p>
          Figure 1 shows the trends of the Mean Absolute Error (MAE) along the sequence of processed
chunks considering the two evaluation strategies, namely hold-out (Fig. 1a) and prequential
(Fig. 1b). In the hold-out setting, two “ofline” versions of the decision tree described in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] are
considered as reference baselines. These models are generated considering a global training
dataset composed by all training chunks (ie, chunk 1 to chunk 20). The results of these “ofline”
decision trees (RT-ofline-5 and RT-ofline-10, induced by setting the maximum depth at 5 and
10, respectively) are obtained evaluating the models on the same hold-out test set made up of
the last 4 chunks of the dataset.
        </p>
        <p>In general, we can observe how the RT-models outperform their HRT counterparts along the
whole model updating process in streaming. As regards HRT-hold-out model (Fig. 1a), after
an initial phase of “start-up” corresponding more or less to the first five chunks, it reaches a
plateau in performance on the hold-out test set, approaching, but unfortunately not reaching,
the performance of the baseline models.</p>
        <p>As for the prequential evaluation strategy (Fig. 1b), HRT-prequential closely trails the
performance of RT-prequential: again, however, re-training the traditional model leads to
consistently superior performance compared to incremental training of HRT.
(a) Hold-out evaluation strategy
(b) Prequential evaluation strategy</p>
        <p>
          In the following, we discuss in details the trend of the complexity for both the HRT and RT
models. As the training stage is analogous among hold-out and prequential setting (at least
for the first 20 chunks), we just consider the former, but the same considerations apply for the
latter. Figure 2 shows the trends of the complexity of the HRT-hold-out and RT-hold-out along
the sequence of processed chunks. As regards the number of nodes (Fig. 2a) and the number
of leaves (Fig. 2b), it is worth noticing that RT-hold-out is always more complex than the
HRT-hold-out, up to one order of magnitude. We can observe that the RT models entail a large
number of nodes even at the first chunks, while the number of nodes in the HRT models keep
steadily increasing almost linearly with the number of chunks. The depth of the tree (Fig. 2c) is
relevant as well, since it is associated with maximum number of conditions in the antecedent of
the rules that can be extracted from the trees: we can observe that the HRT-hold-out reaches
the same depth of RT-hold-out just at the end of the stream of chunks. We recall that, to obtain
an easier comparison, we constrained the RT models to have the maximum depth equals to 10
as per the best model we found in the previous study reported in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>(a) Number of nodes
(b) Number of leaves
(c) Tree depth</p>
        <p>Tables 3 and 4 reports the performance of the models for the hold-out and prequential
evaluation strategy, respectively, after training the models up to the final available chunk
(from chunk 1 to chunk 20, in the case of hold-out, and from chunk 1 to chunk 23 in the case of
prequential). The performance of the models are measured in terms of Mean Squared Error
(MSE), MAE, and coeficient of determination ( 2). Furthermore, we report the complexity
of the model measured in terms of number of nodes, leaves, maximum depth, and number of
features selected by the induced tree.</p>
        <p>model</p>
        <p>Regression metrics</p>
        <p>MSE MAE 2
HRT-hold-out</p>
        <p>RT-hold-out
RT-ofline-10
RT-ofline-5</p>
        <p>Obviously, the results obtained with the RT-hold-out learning scheme after processing the
ifnal chunk (i.e., chunk 20) are equivalent to those obtained with the more complex among
the two baselines, namely RT-ofline-10: in fact, the last step of the RT-hold-out strategy is
essentially the same scenario as the baseline strategy where the whole dataset (from chunk 1 to
chunk 20) is used for training.</p>
        <p>In general, results confirm that the HRT-hold-out and HRT-preq strategies are characterized
by a worse performance in terms of MAE, MSE, and 2 than their RT-counterparts. However,
HRT models are characterized by the lowest levels of complexity, in terms of number of nodes,
number of leaves, number of selected features and maximum depth of the trees, thus ensuring a</p>
        <p>model
higher level of interpretability than RT models.</p>
        <p>Figures 3 and 4 report examples of QoE test timeseries for diferent UEs, overlapping the
ground-truth with the predicted values obtained by the diferent models in the test datasets,
after processing the last available chunk of training data. The visual analysis suggests that
the diferent models provide reasonable predictions in diferent conditions; in particular, the
HRT-based models show a worse predictive performance than their RT counterparts, possibly
due to their lower complexity.</p>
        <p>(a) HRT-hold-out
(b) RT-hold-out</p>
        <p>To summarize, the accuracies obtained by the HRT models are smaller by a 10-26%
(depending on the MAE, MSE or 2 metric considered) than the corresponding RT counterpart
models. However, this decrease in performance is counter-balanced by the time-for-learning
and memory-used values, that are aspects of utmost importance in a streaming scenario; for
this reason, they are detailed in the following.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Memory occupancy</title>
        <p>Figure 5 shows the training set sizes (i.e., the number of samples) used for updating both the
RT and HRT models when processing a new chunk of data. We just discuss the prequential
(b) RT-prequential
setting: in the hold-out strategy, the learning phase is analogous and the same considerations
apply. As expected, the RT model memory occupancy rapidly exceeds the HRT model one: in a
real-case where we have massive input data stream, this would lead to large training set sizes,
thus making the retraining learning scheme an impractical and very computationally intensive
approach.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Time for model updating</title>
        <p>Figure 6 reports the trends of the updating times for the RT and HRT models. Also in this
case, we just discuss the prequential setting. The plot shows how, after about 22 chunks, the
time for the RT learning exceeds the time for HRT learning. This is important, because for the
HRT models we need to reduce as much as possible the dependency of the training time on
the number of chunks, with the aim of ensuring minimum latency in the operative real-time
application. In addition, for the RT-model we can observe an almost linear relationship between
the number of chunks and the learning time: in fact, a simple linear fitting on the trend related
to RT-preq model yields 2=0.99 and p-value=3.88−23. On the other hand, the HRT-models do
not show a strong increasing behaviour in function of the chunk number, but it can depend on
the dimension, namely the number of samples, of each single chunk.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. HRT sensitivity analysis</title>
      <p>In this section we analyze the sensitivity of HRT models with respect to two aspects: parameter
setting and order of chunks in the streaming process.</p>
      <sec id="sec-5-1">
        <title>5.1. Sensitivity with respect to parameter configuration</title>
        <p>We analyzed the suitability of the default configuration of the HRT training in terms of model
parameters, reported in Table 1. In particular, we compared the MAE values for diferent values
of the grace period and of the tie-threshold, after the whole dataset has been incrementally
processed. We aim to analyse if the default values of the model parameters are a “robust” choice,
and ensure good performances. We recall that grace period defines a threshold on the number of
instances contained in a leaf node before considering it for a split, whereas tie threshold consists
in a threshold below which a split will be forced to break ties. Intuitively, lower values of grace
period and higher values of tie threshold will foster easier node splitting, thus leading to more
complex trees. Notably, hyperparameter tuning for finding optimal parameters configuration is
not viable in an operative scenario, where the model cannot rely on a static dataset but rather
learns from an incoming data stream.</p>
        <p>Figures 7 and 8 show the MAE values on the test set for the HRT-models. It is worth
highlighting that the value of the metric measured at the end of the training conveys only a
partial insight into the behaviour of the model, but can still be considered a proxy for the quality
of the parameter configuration. From the heatmaps, we can observe a slight indication of the
presence of a better-performing area, in the bottom right of the plot, where the grace period
and the tie-threshold have values greater than 300 and 0.08, respectively. The boxplots show
how, for the default configuration (grace period=200 and tie-threshold=0.05), the value of the
MAE score, even if not optimal, lies below the median values for both the evaluation strategies.
(b) HRT-prequential</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Sensitivity with respect to chunk order</title>
        <p>In HRT the initial structure of the model (e.g., the root) is determined based on the initial
chunks and cannot be reassessed subsequently. As a consequence, the order of the chunks may
impact on the performance of the model throughout the whole data stream. To quantify the
performance variation, we performed ten tests where we randomly shufled the input chunks
order. Figure 9 shows the MAE values for the last test dataset (i.e., after processing chunk 23 for
the prequential strategy and chunk 20 for the hold out strategy), suggesting that the order of
input chunks does not significatively afect the resulting performance: the maximum (minimum)
MAE values are diferent of about 12% (5%) with respect to the median value of the distribution.
Such variability is not negligible in absolute terms, but it is still comparable to the variations of
MAE values we observe, for instance, during model training in the prequential case (see Fig.
1b), after the first “start-up” 5 chunks.</p>
        <p>(a) HRT-prequential
(b) HRT-hold-out</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we have discussed an application of streaming methods to a realistic 5G network
simulation for QoE forecasting. We applied a Hoefding Decision Tree for data stream regression
to predict incoming QoE, and we compared the results with standard regression trees. From the
results, we observed that HRT models have proven to be better strategies regarding memory
usage and learning time aspects, at the cost of having worse accuracies than RT models. However,
we experimentally highlighted how HRT models, after an initial start-up phase where the models
complexity increase, approach the performance of the standard RT models with comparable
complexity. This can be explained by the kind of strategy used by the Hoefding Decision
Tree: by construction, the structure of the tree is strongly afected by the initial data input,
and the resulting tree is typically more shallow with respect to “traditional” decision trees.
These considerations represent the initial steps for future works, where ad-hoc methods could
be designed to take the discussed shortcomings into account. In particular, a further study
will aim to shed some light on the relationship between complexity and performance in the
streaming approaches, by refining the tree updating strategy and investigating techniques to
select appropriate parameter configurations. Furthermore, we plan to assess if concepts from
fuzzy set theory can help improve the performance of HRT models in this kind of applications.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partly funded by the Italian Ministry of University and Research (MIUR),
in the framework of the Cross-Lab project (Departments of Excellence) and PON 2014-2021
“Research and Innovation”, DM MUR 1062/2021, Project title: “Progettazione e sperimentazione
di algoritmi di federated learning per data stream mining” and by the EU Commission through
the H2020 projects Hexa-X (Grant no. 101015956).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vasilev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leguay</surname>
          </string-name>
          , S. Paris, L. Maggi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Debbah</surname>
          </string-name>
          ,
          <article-title>Predicting QoE Factors with Machine Learning</article-title>
          ,
          <source>in: 2018 IEEE Int'l Conf. on Communications (ICC)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/ ICC.
          <year>2018</year>
          .
          <volume>8422609</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sheth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tanwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>A taxonomy of AI techniques for 6G communication networks</article-title>
          ,
          <source>COMPUT COMMUN 161</source>
          (
          <year>2020</year>
          )
          <fpage>279</fpage>
          -
          <lpage>303</lpage>
          . doi:
          <volume>10</volume>
          .1016/ j.comcom.
          <year>2020</year>
          .
          <volume>07</volume>
          .035.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Renda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ducange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marcelloni</surname>
          </string-name>
          ,
          <article-title>XAI Models for Quality of Experience Prediction in Wireless Networks</article-title>
          ,
          <source>in: 2021 IEEE Int'l Conf. on Fuzzy Systems (FUZZ-IEEE)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/FUZZ45933.
          <year>2021</year>
          .
          <volume>9494509</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. L. Corcuera</given-names>
            <surname>Bárcena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ducange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marcelloni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Noferi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Renda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Stea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Virdis</surname>
          </string-name>
          ,
          <article-title>Towards Trustworthy AI for QoE prediction in B5G/6G Networks</article-title>
          ,
          <source>in: First Int'l Workshop on Artificial Intelligence in Beyond 5G and 6G Wireless Networks (AI6G</source>
          <year>2022</year>
          ),
          <article-title>(accepted).</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. a.</given-names>
            <surname>Gama</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Žliobaitè</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bifet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bouchachia</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          <article-title>Survey on Concept Drift Adaptation</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>46</volume>
          (
          <year>2014</year>
          ). doi:
          <volume>10</volume>
          .1145/2523813.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Ethics</given-names>
            <surname>Guidelines for Trustworthy</surname>
          </string-name>
          <string-name>
            <surname>AI</surname>
          </string-name>
          ,
          <source>Technical Report</source>
          ,
          <year>2019</year>
          .
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          . High Level Expert Group on AI. https://ec.europa.
          <article-title>eu/digital-single-market/en/news/ ethics-guidelines-trustworthy-ai.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Domingos</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Hulten, Mining high-speed data streams</article-title>
          ,
          <source>in: Proc. of the sixth ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining</source>
          ,
          <year>2000</year>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ducange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marcelloni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pecori</surname>
          </string-name>
          ,
          <article-title>Fuzzy Hoefding Decision Tree for Data Stream Classification</article-title>
          ,
          <source>INT J COMPUT INT SYS 14</source>
          (
          <year>2021</year>
          )
          <fpage>946</fpage>
          -
          <lpage>964</lpage>
          . doi:
          <volume>10</volume>
          .2991/ijcis.d.
          <volume>210212</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rutkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pietruczuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Duda</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Jaworski, Decision Trees for Mining Data Streams Based on the McDiarmid's Bound, IEEE T KNOWL DATA EN 25 (</article-title>
          <year>2013</year>
          )
          <fpage>1272</fpage>
          -
          <lpage>1279</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2012</year>
          .
          <volume>66</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Montiel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Read</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bifet</surname>
          </string-name>
          , T. Abdessalem,
          <article-title>Scikit-multiflow: A multi-output streaming framework</article-title>
          ,
          <source>J MACH LEARN RES 19</source>
          (
          <year>2018</year>
          )
          <fpage>2915</fpage>
          -
          <lpage>2914</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Nardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sabella</surname>
          </string-name>
          , G. Stea,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thakkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Virdis</surname>
          </string-name>
          ,
          <string-name>
            <surname>Simu5G-An</surname>
            <given-names>OMNeT</given-names>
          </string-name>
          ++
          <article-title>Library for End-to-End Performance Evaluation of 5G Networks, IEEE Access 8 (</article-title>
          <year>2020</year>
          )
          <fpage>181176</fpage>
          -
          <lpage>181191</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>3028550</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>