=Paper= {{Paper |id=Vol-1755/10-16 |storemode=property |title=Modelling the Multi-Layer Artificial Neural Network for Internet Traffic Forecasting: The Model Selection Design Issues |pdfUrl=https://ceur-ws.org/Vol-1755/10-16.pdf |volume=Vol-1755 |authors=Mba Odim,Jacob Gbadeyan,Joseph Sadiku |dblpUrl=https://dblp.org/rec/conf/cori/OdimGS16 }} ==Modelling the Multi-Layer Artificial Neural Network for Internet Traffic Forecasting: The Model Selection Design Issues== https://ceur-ws.org/Vol-1755/10-16.pdf
    Modelling the Multi-Layer Artificial Neural Network for
  Internet Traffic Forecasting: The Model Selection Design
                            Issues
              Mba O. Odim                              Jacob A. Gbadeyan                                 Joseph S. Sadiku
    Computer Science Department                       Mathematics Department                       Computer Science Department
       Redeemer’s University                             University of Ilorin                          University of Ilorin
           Ede, Nigeria                                    Ilorin, Nigeria                                Ilorin, Nigeria
        odimm@run.edu.ng                        jagbadeyan@unilorin.edu.ng                          jssadiku@unilorin.edu.ng

ABSTRACT                                                                   1. INTRODUCTION
Internet traffic forecasting models with learning ability, such as         Accurate information about offered traffic is required for efficient
the artificial neural network (ANN), have been growing in                  resource provisioning and general capacity planning of an Internet
popularity in recent time due to their impressive performance in           service. The inability of most statistical methods in modelling the
modelling the high degree of variability and nonlinearity of               high variability of internet traffic accurately, and their lack of
internet traffic. This study examined the impacts of some design           reasoning capabilities have triggered an increased number of
issues on performance of the multi-layer artificial neural network         studies that employ non-traditional statistical methods including
for internet traffic forecasting. The traffic forecasting was              machine learning. Furthermore, traditional summary statistics,
modelled as a standard time series problem and the multilayer              particularly the sample mean and variance are instable metrics for
artificial neural network designed to performs the time series             working with the high variability of internet traffic, as such the
function mapping. The input lags were varied from 1 to 24. The             sample means and sample variances are not reliable statistics for
training epoch values of 200, 500, and 1000 on one and two                 summarising traffic properties [1]. Machine learning techniques,
hidden layered networks were used. The learning algorithm was              such as the Artificial Neural Network (ANN) employ mechanisms
backpropagation with 0.1 learning rate and 0.9 momentum on                 that allow computers to evolve behaviour based on knowledge
logistic sigmoid activation function. The model was implemented            gained from dynamic observations. Machine learning technique
in Visual Basic and validated with four categories of classified           based on nonlinear elements is often referred to as Neural
time series internet traffic of a branch residential network of one        Network. Neural networks are networks of nonlinear elements
of a firm in Nigeria. Various predictive performances without              interconnected through adjustable weights and they play a
consistent pattern were observed on the issues considered,                 prominent role in machine learning. Artificial (ANN) emerged
however, input lag one gave the worst performance in all cases for         with the aim of imitating the information processing process of
the HOURLY traffic; three of the four traffic categories                   human brain. Through learning, ANNs can determine nonlinear
demonstrated the superiority of two hidden layers to one hidden            relationship in a data set by associating the corresponding output
layer. Although the epoch values of 200, 500 and 1000 showed no            to input patterns. The multilayer artificial neural network, among
consistent performance variations, epoch value 200 outperformed            other machine learning models, has shown impressive results in
the others on the model selections. The study revealed that input          forecating studies [2, 3, 4, 5, 6, 7, 8, 9]. However, applying an
lags, number of hidden layers and epoch values could impact on             ANN to a given forecasting endeavour is a hard task, as basic
the traffic forecasting performance of multilayer perceptron and           modelling issues must be carefully considered for enhanced
that performance could be considerably improved by careful                 precision. The issues include the network architecture, learning
selection of those parameters through experimentations.                    parameters and data pre-processing methods [6, 8]). The
                                                                           inconsistencies in performance reports on the design issues in the
CCS Concepts                                                               literature was noted also in [8]. In [9] it was argued that ANN
• Computing Methodologies → Artificial Intelligence →                      technique should not be applied arbitrarily as has been sometimes
Machine learning → Machine learning approaches → Neural                    suggested and even used in the internet forecasting domain [10,
Networks                                                                   11] .

Keywords                                                                   The paper examined the impacts of number of input lags, hidden
Internet Traffic, Times Series Forecasting, Machine learning,              neurons and training epochs on the precision of the multilayer
Multi-layer Artificial Neural Network, Design issues                       artificial neural network in forecasting internet traffic.
                                                                           1. RELATED WORK
                                                                           Quite a number of research efforts has been reported in the
                                                                           literature on seeking appropriate models for forecasting Internet
                                                                           traffic.

CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria.                                  2.1 Internet Traffic Forecasting: Statistical
                                                                           Methods
                                                                           In [12] a comparative study on suitable statistical methods for
                                                                           network traffic estimation was conducted. In the paper, several
                                                                           estimation methods for IP network traffic were studied. The study

                                                                      10
showed that non-linear time series models could model and                   different number of input nodes, activation functions and pre-
forecast better than the classical linear time series models. Anand         processing techniques on the performance of backpropagation
in [13] investigated a non-linear Time series model, the                    network in time series revenue forecasting. The findings showed
Generalised Autoregressive Heteroskdasticity (GARCH) in                     that the performance of ANN model could be considerably
internet traffic modelling. The model showed that the forecasting           improved by careful selection of those parameters. In [19], the
algorithm was accurate compared with actual traffic. Although               performance of two learning algorithms: the linear regression and
nonlinear statistical models can capture the busrtiness of                  Neural Network Standard Back propagation were compared on
network‟s traffic, the models are parametric in nature and                  the prediction of four major stock market indexes. The
therefore require the knowledge of the distribution of the traffic.         comparison showed that the neural network approach resulted in
In addition, they are analytical and therefore require explicit             better prediction accuracy than the Linear Regression model.
programming to clearly specify the algorithmic steps. To take the           Chabaa et al in [20] presented an ANN based on the multi-layer
advantages of machine learning paradigms, applying machine                  perceptron for analysing a time series measured internet traffic
learning techniques to internet traffic forecasting has been on the         data over IP networks. The comparison between some training
increase.                                                                   algorithms demonstrated the efficiency and accuracy of the
                                                                            Levenberg Maquardt and the Resilient back propagation
2.2   Machine Learning and Artificial                                       algorithms. Chukwuchekwa in [21] compared the performance of
                                                                            the back propagation gradient descent technique and genetic
Neural Network for Internet Traffic                                         algorithm on some pattern recognition problems. The
Forecasting                                                                 backpropagation (BP) algorithm was found to outperform the
A vast number of research efforts have been on going in exploring           genetic algorithm in that instance. The study suggested that
machine learning techniques to internet traffic predictions, the            caution should be applied before using other algorithms as
results of which have demonstrated their superiority to statistical         substitutes for the BP algorithm, more especially in classification
forecasting methods. A concurrent neuro-fuzzy model to discover             problems. In [2], an evaluation of several learning rules for
and analyse useful knowledge from available Web log data was                adjusting ANN weights was carried out on the popular airline
proposed in [14]. The study used self-organizing map for pattern            passenger data set. The Levenberg-Marquardt backpropagation
analysis and a fuzzy inference system to capture the chaotic trend          algorithm showed the best performance among other learning
to provide short term (hourly) and long term (daily) web traffic            rule. Various degrees of performances were observed in [22] on
trend predictions. Empirical results demonstrated that the                  examining the impact of input lags of the multilayer perceptron in
proposed approach was efficient for mining and predicting web               forecasting internet traffic on a two layered network. In [23] a
traffic. A study in [15] presented a neural network ensemble                survey of research and application issues on Web usage mining
(NNE) for the prediction of TCP/IP traffic using time series                based on various mining technique was conducted to provide
forecasting (TSF) point of view. The NNE approach was                       some understanding in designing algorithms suitable for mining
compared with TSF methods (Holt -winter and ARIMA) and the                  data.
NNE was found to compete favourably with the TSF methods. In
[16] the least square support vector machines was applied to solve          This review demonstrated the impressive results of applying
the problem of accurately predicting non-peak traffic and the               machine learning technique, such as the artificial neural networks,
method had a good generalization ability and guaranteed global              in forecasting Internet traffic as well as raising concerns over the
minima. [17] Presented a neural network ensemble approach and               little or no consideration given by researchers on the design
two adapted time series methods (ARIMA and Holt-Winters) for                issues. The paper therefore presents results from the study on the
forecasting the amount of traffic in TCP/IP based networks. The             impacts of some multi-layer perceptron design issues on internet
experiments with the neural ensemble achieved the best results for          traffic forecasting.
5 min and hourly data, while the Holt-Winters was the best option
for the daily forecasts. The study in [10] investigated the                 3.        METHODOLOGY
ensembles of artificial neural networks in predicting long-term             The traffic forecasting was modelled as a standard time series
internet traffic. The proposed prediction models were compared              problem and the multilayer artificial neural network designed to
with the classic method of Holt-Winters. Prangchumpol in [18]               performs the time series function mapping.
presented a description approach to predicting incoming and
outgoing data rate in network system by using a data (machine
learning) mining techniques, the association rule discover. The             3.1       Time series for Traffic Forecasting
result of the study showed that the technique could predict future          Traffic forecasting is a standard time series prediction task. The
network traffic.                                                            goal is to approximate the function that relates the future values of
                                                                            a variable of the previous observations of that variable [24]. In
                                                                            some situations, such as internet traffic, data are non-stationary
2.3     Design Issues with Forecasting with                                 and chaotic. In such situation, one general assumption is that
Artificial Neural Network                                                   historical data incorporate all behaviour required to capture the
A detailed state of the art presentation on forecasting with                dependency between the future traffic and that of the past.
artificial neural networks was made in [8]. The study showed that           Therefore, the historical data is the major player in the forecasting
overall; ANNs gave satisfactory performance in forecasting, but             process. The second assumption to model and forecast the
went on to indicate the inconsistencies in performance reports of           dynamic of the traffic is that its values are expressed by discrete
design issues in the literature. The inconsistencies were attributed        time series [2, 3]. A discrete time series is a vector {yt} of
to trial and error methodology adopted in most studies. Faraway             observations made at regular intervals, t=1, 2, 3……, N. For the
and Chatfield [9] argued that it was unwise to apply ANN models             time series forecasting problem, the inputs are typically the past
blindly in black box mode as had sometimes been suggested.                  observation of the data series and the output is the future value.
Shamsuddin, et al. in [7] investigated the effect of applying

                                                                       11
Suppose y1 , y2 ,. yN . denote an observed time series of the                             iii.    The network exhibits a high degree of connectivity, the
traffic loads, then the basic problem is to estimate future traffic                                extent of which is determined by synaptic weights of the
                                                                                                   network.
value such as yN k , where the integer k is called the lead time or
the forecasting horizon [25]. For the univariate method, forecasts
of a given traffic load are based on a model fitted only to the past
                                                         ^
observations of the given time series, so that yt (N, k) depends
only on y1, y2….yN-1. The estimate of              yN 1 is computed as a
weighted sum of the past observations:
                ^

               y                w0 yN  w1 yN 1  w2 yN 2  ... (1)
                    N 1
                                                                                         Figure. 1. Architecture of Multilayer Perceptron with two hidden
where the { wi } are weights.                                                            layers

The Multi-Layer Perceptron performs the following function                               3.2.1     A Neural Model
mapping [3, 8]:                                                                          The node is the basic unit of the Artificial Neural Network. . Each
               ^                                                                         node is able to sum many inputs x1, x2, …,xn form the
               yt  f ( yt 1 , yt 2 ,..., yt n )                 (2)                  environment or from other nodes, with each input modified by an
                                                                                         adjusted node weight (Figure 2). The sum of these weighted
               ^                                                                         inputs is added to an adjustable threshold for the node and then
               yt                                                                        passed through a modifying (activation) function that determines
where                   is       the   estimated   traffic   at   time         t,        the final output.
( yt 1 , yt 2 ,..., yt n ) denotes the training pattern composed of a
fixed number (n) of lagged observations of the series.

The weight to be used in the ANN model are estimated from the
data by minimizing the sum of squares of the within-sample one-
step ahead forecast errors, namely
                                 ^
               S   ( y  yt )2                                         (3)
                           t
over the first part of the time series, called the training set. The                     Figure. 2. Nonlinear model of a neuron [26]
last part of the time series called the test set, is kept in reserve so
that genuine out of sample (ex ante) forecasts can be made and
compared with the actual observations. Equations (1) and (2) give                        The neural model in Figure 2 includes an externally applied bias,
a one-step-ahead forecast as it uses the actual observed values of                       denoted by bk . The bias has the effect of increasing or lowering
all lagged variables as inputs. If multistep-ahead-forecasts are                         the net effect of the activation function, depending on whether it is
required, then it is possible to proceed in one of two ways. Firstly,                    positive or negative, respectively. Mathematically, we may
construct a new architecture with several outputs, giving                                describe the neuron k depicted in Fig. 2 by the following
 ^         ^        ^                                                                    equations:
yt , yt 1 , yt 2 , ... , where each output would have separate                                         m

weights for each connection to the neurons. Secondly, „feedback‟                                   uk   wkj x j
                                                                                                         j 1
the one-step-ahead forecast to replace the lag 1 value as one of the                                                                         (4)
input variables, and the same architecture could then be used to                         and
construct the two-step-ahead forecast, and so on [16].. This study                                  yk =  ( uk + bk )                       (5)
adopted the latter iterative approach because of its numerical
simplicity and because it requires fewer weights to be estimated.                        where x1, x2, …, xm are the input signals; wk1, wk2, …., wkm are the
                                                                                         respective synaptic weights of neuron k. uk is the linear combiner
3.2            The Multilayer Neural Network
                                                                                         output due to the input signals, bk is the “bias”,
                                                                                                                                                    (.) is the
Neural network is a powerful model for solving complex
problems because it has natural potential of solving nonlinear                           activation function, and
                                                                                                                    ykis the output signal of the neuron. The
problems and can esily achieve the input-out mapping, it is good                         use of the bias bk has the effect of applying affine transformation
                                                                                         to the output vk of the linear combiner in the model this is shown
for solving predicting problems [26]. The basic features of the
multilayer perceptrons include:
                                                                                         by
     i.        The model of each neuron in the network includes a                                  vk = uk + b  k
               nonlinear activation function that is differentiable.                                                                                    (6)
     ii.       The network contains one or more layers that are hidden                   The bias bk is an external parameter of neuron k.
               from both input and output nodes.
                                                                                         The activation function, denoted by
                                                                                                                               (v) defines the output of a
                                                                                         neuron in terms of induced local field v. It is this function (also

                                                                                    12
called, the transfer function) that determines the relationship                         7.   Back-propagate error through output and hidden
between inputs and outputs of a node and a network. In general,                              layers and adapt Wij and qj.
the activation function introduces a degree of nonlinearity that is                     8.   Back-propagate error through hidden and input
valuable for most ANN applications. Among these functions,                                   layer and adapt weights Wij and qj,
sigmoid function is very popular. It is a strictly increasing                           9.   Check if Error < Emin or max epoch reached. If not,
function that exhibits a graceful balance between linear and                                 repeat steps 6 – 9, otherwise, stop training.
nonlinear behaviour. The Logistic Sigmoid is defined as in (5)
                                                                              3.3       Data collection and Description
           (v) =                                             (7)
                                                                              Internet traffic data was collected in hourly average kilo bit/s of
A logistic sigmoid function assumes a continuous range of values              TCP/IP traffic of a company‟s resident network from January 1
from 0 to 1. Additional types of activation functions can be found            2010 to September 30 2010 (making up 6552 data points each for
in [8]. Among these functions, logistic transfer function is the              IN and OUT traffic data), daily traffic data from January 1 to
most popular choice [8].                                                      December 31, 2010 (making up 365 data points each for IN and
                                                                              OUT traffic data), using PRTG (Paessler Router Traffic Grapher),
3.2.1     Training of artificial neural networks                              a network monitoring and bandwidth usage tool from a company
ANN has to be trained before it can be put to use. The goal of the            called PAESSLER. 20Mpbs bandwidth was allocated for upload
training is to find the logical relationship from the given                   (Traffic IN) and 20Mbps for download (traffic out) statically for
input/output. There two strategies of the learning: supervised and            the period under consideration.
unsupervised. This study employs the supervised learning
strategy. Supervised learning typically operates in two phases –
training and test set. The training set is used for estimating the arc        3.3.1     Data Pre-processing/ Normalisation
weights while the test set is used for measuring the generalization           Nonlinear activation functions such as the logistic function
ability of the network. Training is used to gain generalised                  typically restricts the possible output from a node to, typically, (0,
knowledge about the system under consideration and testing is                 1) or (-1, 1). This is to avoid computational problems, to meet
used to predict (forecast) the system behaviour using the                     algorithm requirement and to facilitate network learning. Four
knowledge gained. On the other hand, unsupervised techniques                  methods for input normalization are summarized in [8]. This study
such as the reinforcement learning is independent of training data            employs, the Linear transformation to [0, 1], defined as
and operate by directly interacting with the environment.
                                                                                         yn = (y0-ymin)/(ymax-ymin)                         (8)
The training algorithm employed is the Backpropagation. It is a
supervised training strategy and popular method for training the              where yn and yo represent the normalized and original data: ymin,
multilayer perceptron. The training proceeds in two phases [26]:              ymax, are the minimum, maximum of the column or rows
                                                                              respectively.
     1.   In the forward phase, the synaptic weights of the
          network are fixed and the input signal is propagated                3.3.2     Training and Testing set
          through the network, layer by layer, until it reaches the
                                                                              Eighty percent (80%) of the data, that is, 5241.6 approximated to
          output. Thus, in this phase, changes are confined to the
                                                                              5242 was used for training the network, while twenty per cent
          activation potentials and outputs of the neurons in the
                                                                              (20%), that is, 1310.4, approximated to 1310, was used for testing
          network.
                                                                              the generalisation predictive capability of the network each for the
     2.   In the backward phase, an error signal is produced by
                                                                              HOURLY_IN and HOURLY_OUT flow traffic. Also, a training
          computing the output of the network with desired
                                                                              set of 80% and testing set of 20% were used for each of the
          response. The resulting error signal is propagated
                                                                              DAILY traffic, that is. 292 data points for training and 73 for
          through the network, again layer by layer, but this time
                                                                              testing.
          the propagation is performed in the backward direction.
                                                                                        .
          In this second phase, successive adjustment is made to
          the synaptic weights of the network.                                3.4.   Finding the appropriate complexity of
                                                                              the Network
     In [5] it is also reported that the backprobagation is the most          For times series forecasting problem, a training pattern consists of
     computationally straightforward algorithm for training the               a fixed number of lagged observations of the series [7]. The inputs
     multi-layer perceptron. They summarized the algorithms                   (number of lag observations) were varied from 1 to 24, excluding
     steps as                                                                 the bias. One and two hidden layers were considered. The number
                                                                              of hidden nodes were equalled to the number of input nodes. In
          1.   Obtain a set of training patterns                              several studies, networks with the number of hidden nodes being
          2.   Set up ANN model that consist of number of input               equal to the number of input nodes are reported to have better
               neurons, hidden neurons, and output neurons                    forecasting [8]. One output node was used, one look-ahead. So the
          3.   Set learning rate (h) and momentum rate (a)                    model of our network is k, k, k, 1, where k represents the number
          4.   Initialize all connections (Wij and Wjk) and bias              of lag observations (input variables). The epochs were based on
               weights ( qk and qj) to random values.                         200, 500, and 1000. The best model according to [18] is the one
          5.   Set the minimum error Emin/number of epochs                    that gives the best result in the test set. The logistic sigmoid
          6.   Start training by applying input pattern one at a              activation function was used [8]. The Error correction
               time and propagate through the layers then                     backpropagation algorithm with learning rate: 0.1; momentum:
               calculate total error                                          0.9 was used to train the network.


                                                                         13
3.5 Stopping and Evaluation Criteria                                                                          0.2
The training stop after each epoch respectively. Typically, as SSE                                                                                                                     200ep_1hdn
                                                                                                             0.15
based objective function or cost function to be minimized during




                                                                                                      RMSE
                                                                                                              0.1                                                                      200ap_2hdn
the training process is defined in (10). The measure of accuracy
employed is the Root Mean Square error (RMSE) defined as                                                     0.05                                                                      500ep_1hdn
                          1       ^
           RMSE            
                          n t
                               ( yt  yt ) 2                                                                   0                                                                       500ep_2hdn




                                                                                                                      lag1
                                                                                                                      lag3
                                                                                                                      lag5
                                                                                                                      lag7
                                                                                                                      lag9
                                                                                                                    lag11
                                                                                                                    lag13
                                                                                                                    lag15
                                                                                                                    lag17
                                                                                                                    lag19
                                                                                                                    lag21
                                                                                                                    lag23
                                                            (9)                                                                                                                        1000ep_1hdn

                                                                                                                                                                                       1000ep_2hdn
where n is the total number of sample group observations, ŷt is the                                                                              input lag
predicted (computed) value while yt is the target value at time t.
RMSE is one of the most commonly used measure of forecast
error to examine how close the forecast is to the actual value [5].                                Figure 4. ANN model selection for the HOURLY_OUT traffic
The best model is the one that gives the best result in the test set,
that is, the model that has the least RMSE in the testing set [27].                                          There also various values of the performance measure
                                                                                                   with no particular patterns on the issues for the HOURLY_OUT
4.             RESULTS AND DISCUSSION                                                              traffic. As in the HOURLY_IN, the worst performance for was
The system was implemented in visual basic. The RMSE of                                            recorded at input lag 1 in all the cases. The least RMSE with the
various models were recorded and compared based on the design                                      value 0.0621992 occurred at input lag 13 with 200 training epochs
issues considered. The results are presented and discussed in this                                 on two hidden layer network. Therefore, the best model for
section.                                                                                           forecasting the HOURLY_OUT traffic is input lag 13, 200
                                                                                                   training epochs using two hidden layers.
4.1            HOURLY_IN traffic
The RMSE of the testing (prediction) results of the various
                                                                                                   4.3          DAILY_IN traffic
models based on the number of input lags, number of hidden                                         Figure 5 presents the prediction RMSE of the various models for
layers and training epochs on one and two hidden layers network                                    the DAILY_IN traffic.
respectively were compared for the HOURLY_IN traffic. Figure 3
depicts these results.                                                                                       0.25

                                                                                                              0.2
            0.25                                                                                                                                                                 200ep_1hdn
             0.2                                                                200ep_1ddn
                                                                                                             0.15
                                                                                                      RMSE




                                                                                                                                                                                 200ap_2hdn
     RMSE




            0.15                                                                200ap_2hdn
             0.1                                                                                              0.1                                                                500ep_1hdn
                                                                                500ep_1hdn
            0.05
                                                                                500ep_2hdn                   0.05                                                                500ep_2hdn
               0
                   lag1
                          lag4
                                 lag7
                                        lag10
                                                lag13
                                                        lag16
                                                                lag19
                                                                        lag22




                                                                                1000ep_1hdn                                                                                      1000ep_1hdn
                                                                                                               0
                                                                                                                                                                                 1000ep_2hdn
                                                                                                                    lag1
                                                                                                                           lag4
                                                                                                                                  lag7
                                                                                                                                         lag10
                                                                                                                                                 lag13
                                                                                                                                                         lag16
                                                                                                                                                                 lag19
                                                                                1000ep_2hdn                                                                              lag22
                                        input lag

                                                                                                                                         inpu lag
Figure 3. ANN model selection for the HOURLY_IN traffic

There were varying degrees of performance with no regular                                          Figure 5. ANN model selection for the DAILY_IN traffic
patterns of performance among the input lags, between the one
and two hidden layers networks, and among the various epochs                                       Different performance values were also observed with no
used. Nevertheless, the worst performance for all the cases is                                     particular patterns on the various prediction models.. The least
input lag 1. The least RMSE with the value 0.0766984 of this                                       RMSE with the value 0.116691 of this experiment occurred on
experiment occurred at input lag 24 with 200 training epochs on                                    input lag 3 with 1000 training epochs on two hidden layer
two hidden layer network. Therefore, the best model for                                            network. Therefore, the best model for forecasting the DAILY_IN
forecasting the HOURLY_IN traffic is input lag 24, 200 training                                    traffic is input lag 3, 1000 training epochs using two hidden
epochs using two hidden layers.                                                                    layers.

                                                                                                   4.4          The DAILY_OUT traffic.
4.2            HOURLY_OUT traffic
The RMSE of the testing (prediction) results of the various                                        Figure 6 presents the RMSE of the various prediction models for
models were compared for the HOURLY_OUT traffic. The                                               the DAILY_OUT traffic.
results are shown in Figure 4.




                                                                                              14
                                                                               The HOURLY traffic categories had a better prediction
                0.3                                          200ep_1hdn        performance than the DAILY traffic counterparts. This could have
                0.2                                                            been attributed to the very large sample size used for the
   RMSE
                                                             200ap_2hdn
                0.1                                                            HOURLY traffic. It has been reported that the ANN for
                                                             500ep_1hdn        forecasting perform better with large sample size than with small
                 0                                                             sample size (Zhang et al. 1998 [8] and Zhang, et al. [26]). In
                                                             500ep_2hdn        addition, figure 7 revealed that various forecasting models may
                                                             1000ep_1hd
                                                                               exist for different traffic categories, even if the traffic categories
                              input lag                      n                 are all from the same network operator.

Figure 6. ANN model selection for the DAILY_OUT traffic                        This study has observed different forecasting models for the
                                                                               various traffic categories based on the issues. The findings suggest
No particular patterns of performance was observed among the                   that carefully consideration of the design issues is indispensable
various models, although there were different values of the                    for improving the predictive performance of a multi-layer
performance measure. The least RMSE with the value 0.099416                    artificial neural network rather than applying it to internet traffic
of this experiment occurred on input lag 3 with 200 training                   forecasting blindly. However, there are no generally acceptable
epochs on one hidden layer network. Therefore, the best model for              techniques for determining the optimal design parameter but by
forecasting the DAILY_OUT traffic is input lag 3, 200 training                 experimentations, an improved predictive performance model is
epochs using one hidden layer.                                                 feasible.

Table 1 presents a summary of the model selection for forecasting              5. CONCLUSION
the various traffic categories.                                                This study examined the impacts of some important design issues
                                                                               in modelling a multilayer perceptron artificial neural network for
Table 1: Summary of the Forecasting Model Selection for the                    Internet traffic forecasting. The traffic forecasting was modelled
traffic Categories                                                             as a standard time series problem and a multilayer artificial neural
                                                                               network designed to performs the time series function mapping.
                       Hid      Epo       Lear   mom    Input   Least          The mechanism was implemented in a Visual Basic programming
                       den      chs       ning   entu   lags    RMSE           environment and tested with real Internet traffic data through
                       laye               rate   m                             experimentation with the various design issues considered.
                       rs                                                      Although no particular pattern of performance was observed the
HOURLY_IN              2        200       0.1    0.9    24      0.07669        study showed that the forecasting performance can be affected by
HOURLY_OU              2        200       0.1    0.9    13      0.06219        the number of input lags, hidden layers and training epochs,.
T                                                                              Despite that the study did not make any attempt to determine an
DAILY_IN               2        200       0.1    0.9    3       0.11521        optimal values for the various factors considered, it has shown
DAILY_OUT              1        200       0.1    0.9    3       0.09941        that careful experimentation is required to choose appropriate
                                                                               values for each of the design issues. Therefore, the multilayer
For the HOURLY_IN traffic the traffic computed (predictive)                    perceptron should not be applied blindly to Internet traffic
values based on 24 input lags on two hidden layer network using                forecasting.
200 training epochs was deployed, and for the HOURLY_OUT
traffic, the study used 13 input lags of the traffic computed values           6. REFERENCES
of the testing set on 200 training epochs on two hidden layers                 [1]       Crovella, M. and Krishnamurthy B. 2006. Internet
network designed to perform . The study deployed 3 input lags,                           Measurement. John Wiley & Sons, Ltd., England.
two hidden layers of 3 neurons each using 200 training epochs for              [2]       Benkacha, S., Benhra J., and El Hassani, H.. 2015.
predicting the DAILY_IN traffic. For the DAILY_OUT traffic, 3                            Seasonal Time Series Forecasting Models on Artificial
input lags, one hidden layer of three neurons with 200 training                          Neural Network. International Journal of Computer
epochs were selected.                                                                    Applications.     116,     20,    0975-8887,     DOI=:
            Figure 7 compares the predicted models selected for the                      10.5120/20451-2805
traffic categories.                                                            [3]       Benkacha, S., Benhra, J. and El Hassani, H. 2013.
                0.15                                                                     Causal Method and Time Series Forecasting Model
   Least RMSE




                                                                                         based on Artificial Neural Network. International
                 0.1
                                                                                         Journal of Computer Applications. 75, 7, 0975 – 8887.
                0.05                                                           [4]       Islam, S., Keung, J., Lee K. and Liu, A. 2012. Empirical
                  0                                                                      prediction models for adaptive resource provisioning in
                                                                LRMSE                    the cloud. Future Generation Computer Systems. 28,
                                                                                         155-162. DOI= 10.1016/1.future2011.05.027
                                                                               [5]       Chabaa, S. Zeroual, A. and Antari, J. 2010.
                                                                                         Identification and prediction of internet traffic using
                              traffic category                                           artificial neural networks. Journal of. Intelligent
                                                                                         Learning Systems & Applications. 2, 147-155.
                                                                                         DOI=1.4236/jilsa.2010.23018
Figure 7. Summary least RMSE for model selection of the traffic                [6]       Shamsuddin, S. M., Sallehuddin R., and Yusof, N. M.
categories                                                                               2012. Artificial neural network time Series modelling


                                                                          15
       for revenue forecasting. Chiang Mai J. Sci. 35, 3, 411-            [16]   Zhang, Y. and Liu, Y. 2009. Comparison of parametric
       426.                                                                      and nonparametric techniques for non-peak traffic
[7]    Cortez, P., Rio, M. Sousa, P. and Rocha M.. 2007.                         forecasting, World Academy of Science, engineering
       Topology aware internet forecasting using neural                          and Technology. 51,
       networks. In Proceedings of the 17th International                 [17]   Cortez, P., Rio, M., Rocha, M and Sousa. P. 2012.
       Conference on Artificial Neural Networks (Porto,                          Multi-scale internet traffic forecasting using neural
       Portugal), Lecture Notes in Computer Science 4669,                        networks and time series methods. Expert Systems. 29,
       445-452, Springer.                                                        2, 143–155.
[8]    Zhang, G., Patuwo, B. E. and Hu, M. Y. 1998.                       [18]   Prangchumpol, D. A. 2013. Network traffic prediction
       Forecasting with artificial neural networks: The state of                 algorithm based on data mining technique. World
       the art. International Journal of Forecasting. 14, 35 –                   Academy of Science, Engineering and Technology.
       62.                                                                       International Science Index, http//www.waset.org.
[9]    Faraway, J. and Chattfield, C. 1998. Times series                  [19]   Fok, W. W. T, Tam V. W. L. and Ng, H. 2008.
       forecasting with neural networks: a comparative study                     Computational neural network for global stock Indexes
       using the airline data. Journal of Appl. Statist. 47, 231 –               Prediction, In Proceedings of World Congress on
       250.                                                                      Engineering (London, UK, July 2 -4, 2008).
[10]   Miguel, M. L. F., Penna, M. C., Nievola, J. C. Pellenz             [20]   Chabaa S., Zeroual, A., and Antari, J. 2010.
       and M. E. 2012. .New models for long-term internet                        Identification and Prediction of Internet traffic using
       traffic forecasting using artificial neural networks and                  artificial neural networks. Journal of. Intelligent
       flow based information. In Proceedings of 2012 IEEE                       Learning          Systems         &         Applications.
       Network Operations and Management Symposium,                              DOI=1.4236/jilsa.2010.23018.
       1082-1088. DOI= 10.1109/NOMS.2012.6212033,                         [21]   Chukwuchekwa, U. J. 2011. Comparing the
[11]   Cortez, P. Rio, M. Rocha, M. and Sousa, P. 2012. Multi-                   performance of backpropagation algorithm and genetic
       scale Internet traffic forecasting using neural networks                  algorithms     in     pattern   recognition   problems.
       and time series methods. Expert Systems. 29, 2, 143–                      International Journal of Computer Information Systems.
       155.                                                                      2, 5, 7-12.
                                                                          [22]   Odim, M. O., Gbadeyan J. A. and Sadiku J. S. 2014. A
[12]   Mariam I., Dadarlat, V. and Iancu, B. 2009. A                             neural network model for improved internet service
       Comparative Study of the statistical Methods suitable                     resource provisioning. British Journal of Mathematics
       for Network Traffic Estimation. In Proceedings of the                     & Computer Science. 4, 17, 2418-2434,
       13th    WSEAS         International    Conference     on           [23]   Dogne, V., Jain, A. and Jain, S.. 2015. Evolving trends
       Communications. 99-104.                                                   and its application in web usage mining: a survey.
[13]   Anand, C. N. 2009. Internet traffic modeling and                          International Journal of soft computing and
       forecasting using non-linear time series model Garch.                     engineering. 4, 6, 98-101,
       M.Sc. Thesis, Department of Electrical and Computing               [24]   Rutka, G., and Lauks, G. 2007. Study on internet traffic
       Engineering, College of Engineering, Kansas State                         prediction models. Electronics and Electrical
       University.                                                               Engineering. – Kaunas: Technologija, 6, 78, 47–50,
[14]   Wang, X.., Abraham, A. and Smith, K. A. 2005.                      [25]   Chatfield, C. 1992. The analysis of time series: An
       Intelligent web traffic mining and analysis, Journal of                   introduction (4th ed.). Chapman & Hall, London:.
       Network and Computer Applications. 28, 147-165.                    [26]   Haykin, S. 2009. Neural networks and learning
[15]   Cortez, P., Rio, M., Rocha, M. and Sousa, P. 2006.                        machines (3rded.). Pearson Education, Inc, New Jersey.
       Internet Forecasting using Neural Networks. In                     [27]   Zhang, G. P. Patuwo, B. E, and Hu, M. Y. A. 2001.
       Proceeding of the International Joint Conference on                       Simulation Study of Artificial Neural Networks for
       Neural Network (Vancouver), 2635 – 2642.                                  Nonlinear Time-series Forecasting, Computer &
                                                                                 Operations research, l28, 381-396.




                                                                     16