<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NetTimeFormer: An Easily Deployable Network Traffic Prediction Model*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Licheng Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Wuhan Fiberhome Technical Services Co.,Ltd.</institution>
          ,
          <addr-line>Wuhan 430068</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Network traffic prediction plays a critical role in network management and optimization. While traditional deep learning models, such as recurrent neural networks and convolutional neural networks, perform well in time series prediction, they still face some challenges in network traffic prediction. Firstly, these models are prone to information loss when dealing with long time dependencies. Second, these models tend to have high complexity and are difficult to operate effectively in real-world deployments. To address these issues, we propose an improved lightweight transformer model. The model effectively captures long-term dependencies by introducing a self-attention mechanism, and achieves the goal of lightweight by modifying the shape of the embedding module and the com putation of the self-attention score, making it more suitable for practical deployment. Preliminary experimental results show that our improved transformer model outperforms existing methods in terms of both prediction accuracy and efficiency.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep learning</kwd>
        <kwd>Artificial intelligence</kwd>
        <kwd>Network traffic prediction</kwd>
        <kwd>lightweight</kwd>
        <kwd>Attention mechanism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the popularity of the Internet and advances in network technology, network size continues
to expand and network services and applications become more diverse. Network traffic can reflect
user activities and assess network load and operational status [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. By predicting network traffic,
network operation can be managed based on complex characteristics and changing rules,
identifying bottlenecks, potential threats and failures, optimizing configuration, intrusion detection
and fault management. As a result, network traffic prediction has become a hot research topic [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        With the rapid proliferation of Internet of Things (IoT) devices and the complexity of network
environments, predicting network traffic is becoming increasingly important to ensure network
performance, optimize resource allocation and enforce security. However, with these technological
advances comes the proliferation of edge devices, which typically have limited computing power
and storage resources [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]. However, to meet the demand for highly accurate network traffic
prediction, existing research relies on complex deep learning models and large-scale data processing
algorithms that perform well when running on cloud servers, but face serious challenges when
applied to edge devices [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Many existing predictive models require significant computing resources, including not only
powerful central processing units (CPUs) and graphics processing units (GPUs), but also large
amounts of memory and storage. These requirements exceed the processing power of most edge
devices, making it expensive and difficult to run such models on these devices. In addition, the
power constraints of edge devices also mean that highly loaded computational tasks cannot be
sustained for long periods of time, further limiting the practical application of these models
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Therefore, the key issue in current research is how to design and optimize network traffic
prediction models to reduce the consumption of computational resources and adapt to the
processing power of edge devices, while maintaining high prediction accuracy. Meanwhile, effective
extraction and representation of information is crucial in network traffic prediction and deep
learning models. However, traditional models often suffer from information loss or ignore
important features when dealing with long sequence data, especially when dealing with complex
multi-dimensional data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        In comparison with traditional deep learning-based models and other machine learning
algorithms, the Transformer model demonstrates robust global feature extraction and long-range
feature modelling capabilities. Consequently, it represents a research priority for forecasting future
time series. The attention mechanism allows the model to capture pertinent information in a more
flexible manner by dynamically adjusting the extent of the model's attention to different
components of the input data, thereby preventing the loss of crucial features during the transfer of
information [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ]. In particular, the attention mechanism is capable of adaptively assigning
disparate weights to each time step or feature in accordance with the contextual information
inherent to the input sequence. This process ensures that the model not only focuses on local
information but also effectively focuses on the global context when dealing with long sequences or
multi-dimensional data, thereby substantially improving the completeness of information retention
and the accuracy of prediction [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>The application of traditional self-attention mechanisms to long sequence data presents a
significant computational resource consumption challenge, despite the excellent performance
observed in the capture of global contextual information and the improvement of model
performance. The computational complexity of the self-attention mechanism is typically
proportional to the square of the sequence length. Consequently, the demand for computational
resources increases exponentially when dealing with large-scale or high-dimensional data[12,13]. In
particular, the self-attention mechanism necessitates the computation of a similarity matrix for each
element in the sequence with all other elements. This process not only requires a significant
amount of memory but also results in a considerable increase in the computational burden. This is
particularly problematic when high real-time performance is required or when running on
resource-constrained edge devices. The high computational and memory consumption inherent to
self-attention mechanisms presents a significant obstacle to their wide deployment in practical
applications, particularly in the context of ultra-long sequences or large-scale datasets. In such cases,
limitations in computational resources may lead to suboptimal performance or even the inability to
run the model at all[14].</p>
      <p>In order to address the aforementioned challenges, this study employs convolutional neural
networks (CNNs) in conjunction with self-attention mechanisms to introduce an inductive bias,
with the objective of reducing the reliance on the traditional embedding module in response to the
amount of input data. The NetTimeFormer model employs multi-scale convolutional coding in the
embedding module, thereby replacing the input coding module and position coding module of the
standard Transformer. This configuration enables the model to consider the global feature
extraction capacity while acquiring an inductive bias, which mitigates the impact of long time series
information loss. Furthermore, the conventional self-attention mechanism is modified by adopting a
linear attention operating paradigm, which serves to further reduce the model's computational
resource consumption [15].</p>
      <p>The main contributions of this paper are as follows:
1. To address the issue of data loss during transmission, this study employs multi-scale
convolutional coding, replacing the input coding module and position coding module of the
standard Transformer. This improvement guarantees the resilience of the information in the
presence of varying lengths of long-time series samples within a flow.
2. By enhancing the attention mechanism of the conventional Transformer, the computational
complexity is reduced to a linear scale. The enhanced attention mechanism markedly diminishes
the number of parameters and the computational burden, thereby considerably reducing the
deployment cost in authentic engineering contexts.
3. The enhanced Transformer model, designated NetTimeFormer, was developed in the
present study. Evaluation of NetTimeFormer on two publicly accessible datasets indicates that it
demonstrates remarkable performance and minimal computational resource consumption.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <sec id="sec-2-1">
        <title>2.1. CNN-based embedding module</title>
        <p>In this study, in order to more effectively capture and retain the key information in the flow of
long time series and to address the potential loss of info rmation during transmission, we propose a
multi-scale convolutional coding strategy as an alternative to the input coding and position coding
modules in the standard Transformer. This enhanced design ensures the robustness and consistency
of information when dealing with traffic long time series samples of varying lengths, and
significantly enhances the feature extraction capability of the model. The improved embedded
module is shown in Figure 1</p>
        <p>Specifically, assume that the input data are vectors of length  with shape( ,  ,  ), where  is
the batch size,  is the number of channels, and LLL is the sequence length. In order to convert this
dimensional data into a format suitable for 2D convolution operation, we first reorganize (reshape)
the input data to obtain a four-dimensional tensor of shape  ,  , √ , √ . This operation can be
expressed as:</p>
        <p>ℎ =  ℎ  ( )
where  is the original input and   ℎ is the reorganized input.</p>
        <p>Subsequently, three independent convolutional neural networks (CNNs) were devised for the
generation of query, key and value vector representations (denoted as  ,  and  respectively). The
process of forward propagation is as follows:
  
  
  
= σ 
= σ 
= σ 
  ,2         ,1   ℎ
  ,2         ,1   ℎ
  ,2         ,1   ℎ</p>
        <p>Where    ,1(⋅) represents the first convolution kernel of the i-vector.     (⋅)represents
the batch normalisation operation.    ,1(⋅) represents the second convolution kernel of the
ivector, which is a dot convolution.  (⋅) represents the activation function.</p>
        <p>Finally, these 2D feature mappings are again transformed into a 1D sequence representation
suitable for processing by the self-attention mechanism as:
 ,  ,  =  
 (   ,    ,    )
(1)
(2)
(3)</p>
        <p>The introduction of a multi-scale convolutional embedding module enables the effective
extraction of global features from input sequences, while also enhancing the model's capacity to
perceive features at varying time scales through the fusion of multi-scale information. This design
ensures the robustness and accuracy of the model in the task of long-term flow prediction while
capturing essential features.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Attention Mechanisms for Linear Complexity</title>
        <p>In order to overcome the computational resource consumption problem of the traditional
selfattention mechanism in long time series processing, and at the same time improve the feature
extraction capability of the embedding module in different time scales, we introduce a new model
architecture based on the linear attention mechanism and the multiscale convolutional embedding
module[16,17]. The linear attention mechanism reduces the traditional O(L2) to  ( ) by optimizing
the complexity of the attention weight computation, which significantly reduces the consumption
of computational resources and is suitable for longer time series data processing. The calculation
process is shown in Figure 2:</p>
        <p>Where the Softmax operation is applied row by row to ensure that each query has a weight sum
of 1 with all keys.</p>
        <p>In traditional self-attention mechanisms, the computational complexity is usually  ( 2), where
 is the length of the input sequence. This is because when computing the attention weights, the 
and  of the dot product, generating a sequence of size  ×   correlation matrix. However, for the
processing of long time series signals, the consumption of computational resources will increase as
  increases significantly, so it is crucial to reduce the computational complexity.</p>
        <p>In the improved method proposed in this paper, we adopt a linear attention mechanism to
significantly reduce the computational complexity. We pair keys  and the value of  The
transpose is matrix multiplied to generate the correlation matrix  .</p>
        <p>In the design of our proposed method, the input signal is subjected to multiple convolution
operations to generate three different feature vectors corresponding to the query (Q ∈    ×   ×  ),
key ( ∈    ×   ×  ), and value ( ∈    ×   ×  ).</p>
        <p>In the case of the traditional self-attention mechanism, the dot product of the query and key is
calculated in order to obtain the attention score matrix.</p>
        <p>To avoid the problem of vanishing or exploding gradients due to too large a dot product value, it
is common to divide the dot product result by   Perform scaling. The formula is expressed as
follows:</p>
        <p>Then, the Softmax function is applied to the scaled attention score matrix to obtain the relevance
weights of each query with respect to all keys:</p>
        <p>Attention Scores =</p>
        <p>⊤
AttentionWeights =    
  ⊤
(4)
(5)
 ×C×
 (  ) =    

+  

non-overlapping subgroups, i.e.   ∈</p>
        <p>, where  = 1,2, … ,  . Subsequently, for each of the
as follows:
subgroups we   Independent linear transformations are performed. The width of the hidden layer
of the classifier is fixed to twice the number of input neurons and the final output dimension is the
fault type  of the number of faults. For each subgroup  , its linear transformation can be expressed
weight matrix for the grouped linear transformation. Then the output is:
where   denotes the transpose of the input.</p>
        <p>denotes the bias term.   denotes the
  =</p>
        <p>GELU BN Concat  ( 1),  ( 2), … ,  (  )
(6)
(7)
(8)
(9)
(10)</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Lightweight improvements to output modules</title>
        <p>The function of the resultant output layer is to integrate the channel feature vectors of feature
mapping and provide the final prediction vector. However, previous research has often neglected
the in-depth study and improvement of classifiers. A traditional predictor consists of a multilayer
perceptron (MLP) consisting of two fully connected layers (fc). The number of neurons in the last
FC is the length of the predicted sequence. It is calculated as follows:
where   1
and   2</p>
        <p>denote the weights of the two FC layers, respectively, ignoring the bias
terms. BN( ⋅ ) denotes batch normalization. GELU( ⋅ )  is the activation function.</p>
        <p>It has been demonstrated that increasing the
width of the hidden layer improves the
representation of the model, thereby enhancing its effectiveness in capturing complex patterns and
structures in the input data. However, an increase in width also entails a higher computational cost
and may result in model overfitting with respect to the training data. To address these issues, in this
study we employ a grouped MLP to redesign the classifier. This approach can effectively balance
the expressive power and computational efficiency of the model while maintaining its performance,
reducing the risk of overfitting and enabling more flexible adaptation to the requirements of
different tasks.</p>
        <p>=   2   ⋅ GELU  BN   1   ⋅ X
Suppose the input to the backbone module is  ∈    × × L , then this input data is divided into 
where A ∈    ×   ×  .
attention weights  :
 =  ⊤ ⋅ 
 ′ =  ⊤ ⋅</p>
        <p>Next, by computing the query  and the matrix  are multiplied to obtain the final time-domain
method to process long time-span sensor signals more efficiently.</p>
        <p>Since 
∈    × ×  , so that  ′ ∈    × ×  .Since in long time series signals, usually 
≫  , using
this linear attention mechanism increases the computational complexity from the traditional  ( 2)
is reduced to  ( ). This greatly reduces the consumption of computational resources and allows the
our design is able to reduce</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Overall structure</title>
        <p>−1 of the number of parameters.</p>
        <p>When the bias term is ignored, a classifier constructed using a traditional multilayer
perceptron (MLP) produces a total number of parameters of  × 2
+ 2 × 
= 2 (
+  ). And
when dividing the input data into  groups, the classifier constructed using the grouped MLP
strategy produces a total number of parameters of  ×2 ×  + 2 ×  = 2

 +  . Therefore,</p>
        <p>NetTimeFormer consists of three phases including a backbone module for attention
computation and a grouped MLP that outputs prediction results. The overall structure is shown
in Figure 3. Detailed structural information is shown in Table 1. where  is the length of the
prediction sequence and the default value is 96.</p>
        <p>1
    2
    3
Predictor Output
Parameters</p>
        <p>MFLOPs</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Datasets</title>
        <p>In order to validate the sequence prediction accuracy of the model in real scenarios, two
distinct datasets have been employed. The initial dataset was gathered from the core network of a
European city by a private Internet Service Provider (ISP), encompassing the core network
regions of 11 major European cities. The dataset provides a detailed account of internet traffic on
the transatlantic link between 06:57 on 7 June 2005 and 11:17 on 31 July 2005, with data collected
at five-minute intervals. This dataset provides insight into internet transmissions between
multiple European cities, offering a valuable perspective on cross-border network traffic. The
second dataset is derived from the UK academic backbone and provides a comprehensive
overview of the overall traffic patterns within the UK academic network. The dataset records
traffic from 09:30 on 19 November 2004 to 11:11 on 27 January 2005, with data collected at
fiveminute intervals. This dataset offers a comprehensive insight into the overall traffic patterns and
trends within the UK academic network. The combination of these two datasets provides a
multilevel perspective for analyzing internet traffic behavior, encompassing both inter-city traffic
between major European cities and the overall traffic profile of the academic network. This
makes them a valuable resource for research, as they offer a representative overview of internet
traffic patterns.</p>
        <p>The dataset can be downloaded from the link
https://github.com/xiaohuiduan/network-trafficdataset.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Evaluation metrics</title>
        <p>In network traffic prediction tasks, we usually use a variety of evaluation metrics in order to
assess the prediction performance of models. In this paper, three commonly used metrics, Mean
Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE),
are chosen to quantify the prediction accuracy of the model.</p>
        <p>Mean Square Error (MSE) is a measure of the average of the squared error between the
predicted value and the true value, and is an indicator that is sensitive to large errors. the smaller
the MSE, the higher the predictive accuracy of the model. The formula is as follows:
where  is the number of samples.</p>
        <p>is the first  actual value of   is the first  predicted
value. The MSE reflects the extent to which the predicted value deviates from the true value and
is sensitive to outliers due to the presence of squares.</p>
        <p>The Mean Absolute Error (MAE) is the average of the absolute values of all prediction errors
and provides a direct measure of prediction error. Unlike MSE, MAE is less sensitive to large
errors. The formula is as follows:

1

1


MSE =
(  −   )</p>
        <p>2
where |⋅| denotes an absolute value. The MAE reflects the average degree to which the model
deviates from the true value across all predicted values and has better robustness because it is not
overly sensitive to outliers.</p>
        <p>The Mean Absolute Percentage Error (MAPE) is the average of the prediction error as a
percentage of the true value and is used as a measure of relative error. The units of the MAPE are in
per cent, making it more interpretable. The formula is as follows:</p>
        <p>MAPE provides a relative measure of prediction error and is suitable for comparing data of
different magnitudes. However, MAPE has a significant effect on the true value  
produces unstable results and therefore needs to be used with caution in some cases.
close to zero</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Experimental results</title>
        <p>In order to ensure the fairness and credibility of the experiments, all experiments used the
same setup, and the optimizer used Adam. the gradient was computed using the MSE loss
function, and the cosine annealing was used to learn the rate scheduling algorithm, which is:
Lr( ) = Lrmin +
1
2 (Lrmax − Lrmin) 1 + cos</p>
        <p>max</p>
        <p>π
maximum and minimum values of the learning rate, respectively. In the experiments, the chosen
batch size = 64, Lrmax = 10e − 3, Lrmin = 10e − 4, epoch = 200.</p>
        <p>As shown in Table 2, we perform experiments related to network traffic prediction using
NetTimeFormer. The experiments are compared with the traditional Transformer with the
advanced sequence prediction model FEDformer. The results show that our model obtains the
best experimental accuracy under each prediction sequence. And it can still maintain a low error
in long sequence prediction. Under EC dataset, NetTimeFormer improves 10-20% compared to
FEDformer. The ISP dataset has a smoother waveform than EC dataset, so the time series model
can achieve higher accuracy. NetTimeFormer shows excellent prediction accuracy under ISP
dataset. The MSE is only 0.049 at a prediction length of 128.
, where max_lr and lrmin denote the</p>
      </sec>
      <sec id="sec-3-4">
        <title>Prediction accuracy of the model at different prediction lengths when the input sequence length is 96.</title>
      </sec>
      <sec id="sec-3-5">
        <title>Models</title>
      </sec>
      <sec id="sec-3-6">
        <title>Metric</title>
        <p>EC
ISP</p>
        <p>A visual presentation of the sequence prediction results is shown in Figure 4. It can be clearly
seen that NetTimeFormer's prediction results fit very well. The model captures the details of the
fluctuations in the sequence.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we propose an improved lightweight Transformer model, NetTimeFormer, to
address the problems of long time -dependent information loss and high computational complexity
faced by traditional deep learning models in network traffic prediction. We effectively maintain the
integrity of long time series information by introducing multi-scale convolutional coding to replace
the input coding and location coding modules of the standard Transformer. In addition, the
optimized self-attention mechanism reduces the computational complexity to a linear level, which
significantly reduces the number of parameters and computational burden of the model, and lowers
the actual deployment cost. Experimental results on two publicly available datasets show that
NetTimeFormer excels in prediction accuracy and computational efficiency, significantly
outperforming existing methods, especially on resource-constrained edge devices. In summary, this
study is not only innovative in model design, but also experimentally verifies its practicality and
excellent performance in network traffic prediction, which provides valuable references for further
research and practical applications in related fields.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[12] Chen J, Xu M, Xu W, Li D, Peng W, Xu H. A flow feedback traffic predic tion based on visual
quantified features. IEEE Transactions on Intelligent Transportation Systems. 2023 May
9;24(9):10067-75.
[13] Xu Y, Cai X, Wang E, Liu W, Yang Y, Yang F. Dynamic traffic correlations based spatio
temporal graph convolutional network for urban traffic prediction. Information Sciences. 2023
Apr 1;621:580-95.
[14] Xu Y, Cai X, Wang E, Liu W, Yang Y, Yang F. Dynamic traffic correlations based spatio
temporal graph convolutional network for urban traffic prediction. Information Sciences. 2023
Apr 1;621:580-95.
[15] Liu S, Feng X, Ren Y, Jiang H, Yu H. DCENet: A dynamic correlation evolve network for short
term traffic prediction. Physica A: Statistical Mechanics and its Applications. 2023 Mar
15;614:128525.
[16] Fang H, Deng J, Bai Y, Feng B, Li S, Shao S, Chen D. CLFormer: A lightweight transformer
based on convolutional embedding and linear self-attention with strong robustness for bearing
fault diagnosis under limited sample conditions. IEEE Transactions on Instrumentation and
Measurement. 2021 Dec 3;71:1-8.
[17] Han D, Pan X, Han Y, Song S, Huang G. Flatten transformer: Vision transformer using focused
linear attention. InProceedings of the IEEE/CVF international conference on computer vision
2023 (pp. 5961-5971).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Tian</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>Chaotic characteristic analysis and prediction of bottleneck-delay time series under the Internet macro-topology</article-title>
          .
          <source>The European Physical Journal Plus. 2024 Jun</source>
          <volume>1</volume>
          ;
          <issue>139</issue>
          (
          <issue>6</issue>
          ):
          <fpage>494</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Joshi</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadi TH</surname>
          </string-name>
          .
          <article-title>A review of network traffic analysis and prediction techniques</article-title>
          .
          <source>arXiv preprint arXiv:1507.05722. 2015 Jul</source>
          <volume>21</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Vinayakumar</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soman</surname>
            <given-names>KP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poornachandran</surname>
            <given-names>P</given-names>
          </string-name>
          .
          <article-title>Applying deep learning approaches for network traffic prediction</article-title>
          .
          <source>In2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2017 Sep</source>
          <volume>13</volume>
          (pp.
          <fpage>2353</fpage>
          -
          <lpage>2358</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Feng</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shu</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Study on network traffic prediction techniques</article-title>
          .
          <source>InProceedings</source>
          .
          <source>2005 International Conference on Wireless Communications, Networking and Mobile Computing</source>
          ,
          <year>2005</year>
          .
          <source>2005 Sep</source>
          <volume>26</volume>
          (Vol.
          <volume>2</volume>
          , pp.
          <fpage>1041</fpage>
          -
          <lpage>1044</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ferreira</surname>
            <given-names>GO</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ravazzi</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dabbene</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calafiore</surname>
            <given-names>GC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fiore</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Forecasting network traffic: A survey and tutorial with open -source comparative evaluation</article-title>
          .
          <source>IEEE Access. 2023 Jan</source>
          <volume>11</volume>
          ;
          <fpage>11</fpage>
          :
          <fpage>6018</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Sanon</surname>
            <given-names>SP</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reddy</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lipps</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schotten</surname>
            <given-names>HD</given-names>
          </string-name>
          .
          <article-title>Secure federated learning: An evaluation of homomorphic encrypted network traffic prediction</article-title>
          .
          <source>In2023 IEEE 20th Consumer Communications &amp; Networking Conference (CCNC) 2023 Jan</source>
          <volume>8</volume>
          (pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Alkanhel</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>El-kenawy</surname>
            <given-names>ES</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elsheweikh</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdelhamid</surname>
            <given-names>AA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ibrahim</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khafaga</surname>
            <given-names>DS</given-names>
          </string-name>
          .
          <article-title>Metaheuristic Optimization of Time Series Models for Predicting Networks Traffic</article-title>
          .
          <source>CMCCOMPUTERS MATERIALS &amp; CONTINUA. 2023 Jan</source>
          <volume>1</volume>
          ;
          <issue>75</issue>
          (
          <issue>1</issue>
          ):
          <fpage>427</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Li</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>Y</given-names>
          </string-name>
          .
          <article-title>Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data</source>
          .
          <source>2023 Feb</source>
          <volume>20</volume>
          ;
          <issue>17</issue>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Lai</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gan</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            <given-names>G</given-names>
          </string-name>
          .
          <article-title>Deep learning based traffic prediction method for digital twin network</article-title>
          .
          <source>Cognitive Computation</source>
          .
          <year>2023</year>
          Sep;
          <volume>15</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1748</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Khan</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fouda</surname>
            <given-names>MM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Do</surname>
            <given-names>DT</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Almaleh</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahman</surname>
            <given-names>AU</given-names>
          </string-name>
          .
          <article-title>Short -term traffic prediction using deep learning long short-term memory: Taxonomy, applications, challenges, and future trends</article-title>
          .
          <source>IEEE Access. 2023 Aug</source>
          <volume>29</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Bao</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            <given-names>Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guan</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cheriet</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Resource allocation with edge -cloud collaborative traffic prediction in integrated radio and optical networks</article-title>
          .
          <source>IEEE Access. 2023 Jan</source>
          <volume>16</volume>
          ;
          <fpage>11</fpage>
          :
          <fpage>7067</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>