=Paper= {{Paper |id=Vol-1755/10-16 |storemode=property |title=Modelling the Multi-Layer Artificial Neural Network for Internet Traffic Forecasting: The Model Selection Design Issues |pdfUrl=https://ceur-ws.org/Vol-1755/10-16.pdf |volume=Vol-1755 |authors=Mba Odim,Jacob Gbadeyan,Joseph Sadiku |dblpUrl=https://dblp.org/rec/conf/cori/OdimGS16 }} ==Modelling the Multi-Layer Artificial Neural Network for Internet Traffic Forecasting: The Model Selection Design Issues== https://ceur-ws.org/Vol-1755/10-16.pdf

Modelling the Multi-Layer Artificial Neural Network for
Internet Traffic Forecasting: The Model Selection Design
Issues
Mba O. Odim Jacob A. Gbadeyan Joseph S. Sadiku
Computer Science Department Mathematics Department Computer Science Department
Redeemer’s University University of Ilorin University of Ilorin
Ede, Nigeria Ilorin, Nigeria Ilorin, Nigeria
odimm@run.edu.ng jagbadeyan@unilorin.edu.ng jssadiku@unilorin.edu.ng

ABSTRACT 1. INTRODUCTION
Internet traffic forecasting models with learning ability, such as Accurate information about offered traffic is required for efficient
the artificial neural network (ANN), have been growing in resource provisioning and general capacity planning of an Internet
popularity in recent time due to their impressive performance in service. The inability of most statistical methods in modelling the
modelling the high degree of variability and nonlinearity of high variability of internet traffic accurately, and their lack of
internet traffic. This study examined the impacts of some design reasoning capabilities have triggered an increased number of
issues on performance of the multi-layer artificial neural network studies that employ non-traditional statistical methods including
for internet traffic forecasting. The traffic forecasting was machine learning. Furthermore, traditional summary statistics,
modelled as a standard time series problem and the multilayer particularly the sample mean and variance are instable metrics for
artificial neural network designed to performs the time series working with the high variability of internet traffic, as such the
function mapping. The input lags were varied from 1 to 24. The sample means and sample variances are not reliable statistics for
training epoch values of 200, 500, and 1000 on one and two summarising traffic properties [1]. Machine learning techniques,
hidden layered networks were used. The learning algorithm was such as the Artificial Neural Network (ANN) employ mechanisms
backpropagation with 0.1 learning rate and 0.9 momentum on that allow computers to evolve behaviour based on knowledge
logistic sigmoid activation function. The model was implemented gained from dynamic observations. Machine learning technique
in Visual Basic and validated with four categories of classified based on nonlinear elements is often referred to as Neural
time series internet traffic of a branch residential network of one Network. Neural networks are networks of nonlinear elements
of a firm in Nigeria. Various predictive performances without interconnected through adjustable weights and they play a
consistent pattern were observed on the issues considered, prominent role in machine learning. Artificial (ANN) emerged
however, input lag one gave the worst performance in all cases for with the aim of imitating the information processing process of
the HOURLY traffic; three of the four traffic categories human brain. Through learning, ANNs can determine nonlinear
demonstrated the superiority of two hidden layers to one hidden relationship in a data set by associating the corresponding output
layer. Although the epoch values of 200, 500 and 1000 showed no to input patterns. The multilayer artificial neural network, among
consistent performance variations, epoch value 200 outperformed other machine learning models, has shown impressive results in
the others on the model selections. The study revealed that input forecating studies [2, 3, 4, 5, 6, 7, 8, 9]. However, applying an
lags, number of hidden layers and epoch values could impact on ANN to a given forecasting endeavour is a hard task, as basic
the traffic forecasting performance of multilayer perceptron and modelling issues must be carefully considered for enhanced
that performance could be considerably improved by careful precision. The issues include the network architecture, learning
selection of those parameters through experimentations. parameters and data pre-processing methods [6, 8]). The
inconsistencies in performance reports on the design issues in the
CCS Concepts literature was noted also in [8]. In [9] it was argued that ANN
• Computing Methodologies → Artificial Intelligence → technique should not be applied arbitrarily as has been sometimes
Machine learning → Machine learning approaches → Neural suggested and even used in the internet forecasting domain [10,
Networks 11] .

Keywords The paper examined the impacts of number of input lags, hidden
Internet Traffic, Times Series Forecasting, Machine learning, neurons and training epochs on the precision of the multilayer
Multi-layer Artificial Neural Network, Design issues artificial neural network in forecasting internet traffic.
1. RELATED WORK
Quite a number of research efforts has been reported in the
literature on seeking appropriate models for forecasting Internet
traffic.

CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria. 2.1 Internet Traffic Forecasting: Statistical
Methods
In [12] a comparative study on suitable statistical methods for
network traffic estimation was conducted. In the paper, several
estimation methods for IP network traffic were studied. The study

10
showed that non-linear time series models could model and different number of input nodes, activation functions and pre-
forecast better than the classical linear time series models. Anand processing techniques on the performance of backpropagation
in [13] investigated a non-linear Time series model, the network in time series revenue forecasting. The findings showed
Generalised Autoregressive Heteroskdasticity (GARCH) in that the performance of ANN model could be considerably
internet traffic modelling. The model showed that the forecasting improved by careful selection of those parameters. In [19], the
algorithm was accurate compared with actual traffic. Although performance of two learning algorithms: the linear regression and
nonlinear statistical models can capture the busrtiness of Neural Network Standard Back propagation were compared on
network‟s traffic, the models are parametric in nature and the prediction of four major stock market indexes. The
therefore require the knowledge of the distribution of the traffic. comparison showed that the neural network approach resulted in
In addition, they are analytical and therefore require explicit better prediction accuracy than the Linear Regression model.
programming to clearly specify the algorithmic steps. To take the Chabaa et al in [20] presented an ANN based on the multi-layer
advantages of machine learning paradigms, applying machine perceptron for analysing a time series measured internet traffic
learning techniques to internet traffic forecasting has been on the data over IP networks. The comparison between some training
increase. algorithms demonstrated the efficiency and accuracy of the
Levenberg Maquardt and the Resilient back propagation
2.2 Machine Learning and Artificial algorithms. Chukwuchekwa in [21] compared the performance of
the back propagation gradient descent technique and genetic
Neural Network for Internet Traffic algorithm on some pattern recognition problems. The
Forecasting backpropagation (BP) algorithm was found to outperform the
A vast number of research efforts have been on going in exploring genetic algorithm in that instance. The study suggested that
machine learning techniques to internet traffic predictions, the caution should be applied before using other algorithms as
results of which have demonstrated their superiority to statistical substitutes for the BP algorithm, more especially in classification
forecasting methods. A concurrent neuro-fuzzy model to discover problems. In [2], an evaluation of several learning rules for
and analyse useful knowledge from available Web log data was adjusting ANN weights was carried out on the popular airline
proposed in [14]. The study used self-organizing map for pattern passenger data set. The Levenberg-Marquardt backpropagation
analysis and a fuzzy inference system to capture the chaotic trend algorithm showed the best performance among other learning
to provide short term (hourly) and long term (daily) web traffic rule. Various degrees of performances were observed in [22] on
trend predictions. Empirical results demonstrated that the examining the impact of input lags of the multilayer perceptron in
proposed approach was efficient for mining and predicting web forecasting internet traffic on a two layered network. In [23] a
traffic. A study in [15] presented a neural network ensemble survey of research and application issues on Web usage mining
(NNE) for the prediction of TCP/IP traffic using time series based on various mining technique was conducted to provide
forecasting (TSF) point of view. The NNE approach was some understanding in designing algorithms suitable for mining
compared with TSF methods (Holt -winter and ARIMA) and the data.
NNE was found to compete favourably with the TSF methods. In
[16] the least square support vector machines was applied to solve This review demonstrated the impressive results of applying
the problem of accurately predicting non-peak traffic and the machine learning technique, such as the artificial neural networks,
method had a good generalization ability and guaranteed global in forecasting Internet traffic as well as raising concerns over the
minima. [17] Presented a neural network ensemble approach and little or no consideration given by researchers on the design
two adapted time series methods (ARIMA and Holt-Winters) for issues. The paper therefore presents results from the study on the
forecasting the amount of traffic in TCP/IP based networks. The impacts of some multi-layer perceptron design issues on internet
experiments with the neural ensemble achieved the best results for traffic forecasting.
5 min and hourly data, while the Holt-Winters was the best option
for the daily forecasts. The study in [10] investigated the 3. METHODOLOGY
ensembles of artificial neural networks in predicting long-term The traffic forecasting was modelled as a standard time series
internet traffic. The proposed prediction models were compared problem and the multilayer artificial neural network designed to
with the classic method of Holt-Winters. Prangchumpol in [18] performs the time series function mapping.
presented a description approach to predicting incoming and
outgoing data rate in network system by using a data (machine
learning) mining techniques, the association rule discover. The 3.1 Time series for Traffic Forecasting
result of the study showed that the technique could predict future Traffic forecasting is a standard time series prediction task. The
network traffic. goal is to approximate the function that relates the future values of
a variable of the previous observations of that variable [24]. In
some situations, such as internet traffic, data are non-stationary
2.3 Design Issues with Forecasting with and chaotic. In such situation, one general assumption is that
Artificial Neural Network historical data incorporate all behaviour required to capture the
A detailed state of the art presentation on forecasting with dependency between the future traffic and that of the past.
artificial neural networks was made in [8]. The study showed that Therefore, the historical data is the major player in the forecasting
overall; ANNs gave satisfactory performance in forecasting, but process. The second assumption to model and forecast the
went on to indicate the inconsistencies in performance reports of dynamic of the traffic is that its values are expressed by discrete
design issues in the literature. The inconsistencies were attributed time series [2, 3]. A discrete time series is a vector {yt} of
to trial and error methodology adopted in most studies. Faraway observations made at regular intervals, t=1, 2, 3……, N. For the
and Chatfield [9] argued that it was unwise to apply ANN models time series forecasting problem, the inputs are typically the past
blindly in black box mode as had sometimes been suggested. observation of the data series and the output is the future value.
Shamsuddin, et al. in [7] investigated the effect of applying

11
Suppose y1 , y2 ,. yN . denote an observed time series of the iii. The network exhibits a high degree of connectivity, the
traffic loads, then the basic problem is to estimate future traffic extent of which is determined by synaptic weights of the
network.
value such as yN k , where the integer k is called the lead time or
the forecasting horizon [25]. For the univariate method, forecasts
of a given traffic load are based on a model fitted only to the past
^
observations of the given time series, so that yt (N, k) depends
only on y1, y2….yN-1. The estimate of yN 1 is computed as a
weighted sum of the past observations:
^

y  w0 yN  w1 yN 1  w2 yN 2  ... (1)
N 1
Figure. 1. Architecture of Multilayer Perceptron with two hidden
where the { wi } are weights. layers

The Multi-Layer Perceptron performs the following function 3.2.1 A Neural Model
mapping [3, 8]: The node is the basic unit of the Artificial Neural Network. . Each
^ node is able to sum many inputs x1, x2, …,xn form the
yt  f ( yt 1 , yt 2 ,..., yt n ) (2) environment or from other nodes, with each input modified by an
adjusted node weight (Figure 2). The sum of these weighted
^ inputs is added to an adjustable threshold for the node and then
yt passed through a modifying (activation) function that determines
where is the estimated traffic at time t, the final output.
( yt 1 , yt 2 ,..., yt n ) denotes the training pattern composed of a
fixed number (n) of lagged observations of the series.

The weight to be used in the ANN model are estimated from the
data by minimizing the sum of squares of the within-sample one-
step ahead forecast errors, namely
^
S   ( y  yt )2 (3)
t
over the first part of the time series, called the training set. The Figure. 2. Nonlinear model of a neuron [26]
last part of the time series called the test set, is kept in reserve so
that genuine out of sample (ex ante) forecasts can be made and
compared with the actual observations. Equations (1) and (2) give The neural model in Figure 2 includes an externally applied bias,
a one-step-ahead forecast as it uses the actual observed values of denoted by bk . The bias has the effect of increasing or lowering
all lagged variables as inputs. If multistep-ahead-forecasts are the net effect of the activation function, depending on whether it is
required, then it is possible to proceed in one of two ways. Firstly, positive or negative, respectively. Mathematically, we may
construct a new architecture with several outputs, giving describe the neuron k depicted in Fig. 2 by the following
^ ^ ^ equations:
yt , yt 1 , yt 2 , ... , where each output would have separate m

weights for each connection to the neurons. Secondly, „feedback‟ uk   wkj x j
j 1
the one-step-ahead forecast to replace the lag 1 value as one of the (4)
input variables, and the same architecture could then be used to and
construct the two-step-ahead forecast, and so on [16].. This study yk =  ( uk + bk ) (5)
adopted the latter iterative approach because of its numerical
simplicity and because it requires fewer weights to be estimated. where x1, x2, …, xm are the input signals; wk1, wk2, …., wkm are the
respective synaptic weights of neuron k. uk is the linear combiner
3.2 The Multilayer Neural Network
output due to the input signals, bk is the “bias”,
 (.) is the
Neural network is a powerful model for solving complex
problems because it has natural potential of solving nonlinear activation function, and
ykis the output signal of the neuron. The
problems and can esily achieve the input-out mapping, it is good use of the bias bk has the effect of applying affine transformation
to the output vk of the linear combiner in the model this is shown
for solving predicting problems [26]. The basic features of the
multilayer perceptrons include:
by
i. The model of each neuron in the network includes a vk = uk + b k
nonlinear activation function that is differentiable. (6)
ii. The network contains one or more layers that are hidden The bias bk is an external parameter of neuron k.
from both input and output nodes.
The activation function, denoted by
(v) defines the output of a
neuron in terms of induced local field v. It is this function (also

12
called, the transfer function) that determines the relationship 7. Back-propagate error through output and hidden
between inputs and outputs of a node and a network. In general, layers and adapt Wij and qj.
the activation function introduces a degree of nonlinearity that is 8. Back-propagate error through hidden and input
valuable for most ANN applications. Among these functions, layer and adapt weights Wij and qj,
sigmoid function is very popular. It is a strictly increasing 9. Check if Error < Emin or max epoch reached. If not,
function that exhibits a graceful balance between linear and repeat steps 6 – 9, otherwise, stop training.
nonlinear behaviour. The Logistic Sigmoid is defined as in (5)
3.3 Data collection and Description
 (v) = (7)
Internet traffic data was collected in hourly average kilo bit/s of
A logistic sigmoid function assumes a continuous range of values TCP/IP traffic of a company‟s resident network from January 1
from 0 to 1. Additional types of activation functions can be found 2010 to September 30 2010 (making up 6552 data points each for
in [8]. Among these functions, logistic transfer function is the IN and OUT traffic data), daily traffic data from January 1 to
most popular choice [8]. December 31, 2010 (making up 365 data points each for IN and
OUT traffic data), using PRTG (Paessler Router Traffic Grapher),
3.2.1 Training of artificial neural networks a network monitoring and bandwidth usage tool from a company
ANN has to be trained before it can be put to use. The goal of the called PAESSLER. 20Mpbs bandwidth was allocated for upload
training is to find the logical relationship from the given (Traffic IN) and 20Mbps for download (traffic out) statically for
input/output. There two strategies of the learning: supervised and the period under consideration.
unsupervised. This study employs the supervised learning
strategy. Supervised learning typically operates in two phases –
training and test set. The training set is used for estimating the arc 3.3.1 Data Pre-processing/ Normalisation
weights while the test set is used for measuring the generalization Nonlinear activation functions such as the logistic function
ability of the network. Training is used to gain generalised typically restricts the possible output from a node to, typically, (0,
knowledge about the system under consideration and testing is 1) or (-1, 1). This is to avoid computational problems, to meet
used to predict (forecast) the system behaviour using the algorithm requirement and to facilitate network learning. Four
knowledge gained. On the other hand, unsupervised techniques methods for input normalization are summarized in [8]. This study
such as the reinforcement learning is independent of training data employs, the Linear transformation to [0, 1], defined as
and operate by directly interacting with the environment.
yn = (y0-ymin)/(ymax-ymin) (8)
The training algorithm employed is the Backpropagation. It is a
supervised training strategy and popular method for training the where yn and yo represent the normalized and original data: ymin,
multilayer perceptron. The training proceeds in two phases [26]: ymax, are the minimum, maximum of the column or rows
respectively.
1. In the forward phase, the synaptic weights of the
network are fixed and the input signal is propagated 3.3.2 Training and Testing set
through the network, layer by layer, until it reaches the
Eighty percent (80%) of the data, that is, 5241.6 approximated to
output. Thus, in this phase, changes are confined to the
5242 was used for training the network, while twenty per cent
activation potentials and outputs of the neurons in the
(20%), that is, 1310.4, approximated to 1310, was used for testing
network.
the generalisation predictive capability of the network each for the
2. In the backward phase, an error signal is produced by
HOURLY_IN and HOURLY_OUT flow traffic. Also, a training
computing the output of the network with desired
set of 80% and testing set of 20% were used for each of the
response. The resulting error signal is propagated
DAILY traffic, that is. 292 data points for training and 73 for
through the network, again layer by layer, but this time
testing.
the propagation is performed in the backward direction.
.
In this second phase, successive adjustment is made to
the synaptic weights of the network. 3.4. Finding the appropriate complexity of
the Network
In [5] it is also reported that the backprobagation is the most For times series forecasting problem, a training pattern consists of
computationally straightforward algorithm for training the a fixed number of lagged observations of the series [7]. The inputs
multi-layer perceptron. They summarized the algorithms (number of lag observations) were varied from 1 to 24, excluding
steps as the bias. One and two hidden layers were considered. The number
of hidden nodes were equalled to the number of input nodes. In
1. Obtain a set of training patterns several studies, networks with the number of hidden nodes being
2. Set up ANN model that consist of number of input equal to the number of input nodes are reported to have better
neurons, hidden neurons, and output neurons forecasting [8]. One output node was used, one look-ahead. So the
3. Set learning rate (h) and momentum rate (a) model of our network is k, k, k, 1, where k represents the number
4. Initialize all connections (Wij and Wjk) and bias of lag observations (input variables). The epochs were based on
weights ( qk and qj) to random values. 200, 500, and 1000. The best model according to [18] is the one
5. Set the minimum error Emin/number of epochs that gives the best result in the test set. The logistic sigmoid
6. Start training by applying input pattern one at a activation function was used [8]. The Error correction
time and propagate through the layers then backpropagation algorithm with learning rate: 0.1; momentum:
calculate total error 0.9 was used to train the network.

13
3.5 Stopping and Evaluation Criteria 0.2
The training stop after each epoch respectively. Typically, as SSE 200ep_1hdn
0.15
based objective function or cost function to be minimized during

RMSE
0.1 200ap_2hdn
the training process is defined in (10). The measure of accuracy
employed is the Root Mean Square error (RMSE) defined as 0.05 500ep_1hdn
1 ^
RMSE  
n t
( yt  yt ) 2 0 500ep_2hdn

lag1
lag3
lag5
lag7
lag9
lag11
lag13
lag15
lag17
lag19
lag21
lag23
(9) 1000ep_1hdn

1000ep_2hdn
where n is the total number of sample group observations, ŷt is the input lag
predicted (computed) value while yt is the target value at time t.
RMSE is one of the most commonly used measure of forecast
error to examine how close the forecast is to the actual value [5]. Figure 4. ANN model selection for the HOURLY_OUT traffic
The best model is the one that gives the best result in the test set,
that is, the model that has the least RMSE in the testing set [27]. There also various values of the performance measure
with no particular patterns on the issues for the HOURLY_OUT
4. RESULTS AND DISCUSSION traffic. As in the HOURLY_IN, the worst performance for was
The system was implemented in visual basic. The RMSE of recorded at input lag 1 in all the cases. The least RMSE with the
various models were recorded and compared based on the design value 0.0621992 occurred at input lag 13 with 200 training epochs
issues considered. The results are presented and discussed in this on two hidden layer network. Therefore, the best model for
section. forecasting the HOURLY_OUT traffic is input lag 13, 200
training epochs using two hidden layers.
4.1 HOURLY_IN traffic
The RMSE of the testing (prediction) results of the various
4.3 DAILY_IN traffic
models based on the number of input lags, number of hidden Figure 5 presents the prediction RMSE of the various models for
layers and training epochs on one and two hidden layers network the DAILY_IN traffic.
respectively were compared for the HOURLY_IN traffic. Figure 3
depicts these results. 0.25

0.2
0.25 200ep_1hdn
0.2 200ep_1ddn
0.15
RMSE

200ap_2hdn
RMSE

0.15 200ap_2hdn
0.1 0.1 500ep_1hdn
500ep_1hdn
0.05
500ep_2hdn 0.05 500ep_2hdn
0
lag1
lag4
lag7
lag10
lag13
lag16
lag19
lag22

1000ep_1hdn 1000ep_1hdn
0
1000ep_2hdn
lag1
lag4
lag7
lag10
lag13
lag16
lag19
1000ep_2hdn lag22
input lag

inpu lag
Figure 3. ANN model selection for the HOURLY_IN traffic

There were varying degrees of performance with no regular Figure 5. ANN model selection for the DAILY_IN traffic
patterns of performance among the input lags, between the one
and two hidden layers networks, and among the various epochs Different performance values were also observed with no
used. Nevertheless, the worst performance for all the cases is particular patterns on the various prediction models.. The least
input lag 1. The least RMSE with the value 0.0766984 of this RMSE with the value 0.116691 of this experiment occurred on
experiment occurred at input lag 24 with 200 training epochs on input lag 3 with 1000 training epochs on two hidden layer
two hidden layer network. Therefore, the best model for network. Therefore, the best model for forecasting the DAILY_IN
forecasting the HOURLY_IN traffic is input lag 24, 200 training traffic is input lag 3, 1000 training epochs using two hidden
epochs using two hidden layers. layers.

4.4 The DAILY_OUT traffic.
4.2 HOURLY_OUT traffic
The RMSE of the testing (prediction) results of the various Figure 6 presents the RMSE of the various prediction models for
models were compared for the HOURLY_OUT traffic. The the DAILY_OUT traffic.
results are shown in Figure 4.

14
The HOURLY traffic categories had a better prediction
0.3 200ep_1hdn performance than the DAILY traffic counterparts. This could have
0.2 been attributed to the very large sample size used for the
RMSE
200ap_2hdn
0.1 HOURLY traffic. It has been reported that the ANN for
500ep_1hdn forecasting perform better with large sample size than with small
0 sample size (Zhang et al. 1998 [8] and Zhang, et al. [26]). In
500ep_2hdn addition, figure 7 revealed that various forecasting models may
1000ep_1hd
exist for different traffic categories, even if the traffic categories
input lag n are all from the same network operator.

Figure 6. ANN model selection for the DAILY_OUT traffic This study has observed different forecasting models for the
various traffic categories based on the issues. The findings suggest
No particular patterns of performance was observed among the that carefully consideration of the design issues is indispensable
various models, although there were different values of the for improving the predictive performance of a multi-layer
performance measure. The least RMSE with the value 0.099416 artificial neural network rather than applying it to internet traffic
of this experiment occurred on input lag 3 with 200 training forecasting blindly. However, there are no generally acceptable
epochs on one hidden layer network. Therefore, the best model for techniques for determining the optimal design parameter but by
forecasting the DAILY_OUT traffic is input lag 3, 200 training experimentations, an improved predictive performance model is
epochs using one hidden layer. feasible.

Table 1 presents a summary of the model selection for forecasting 5. CONCLUSION
the various traffic categories. This study examined the impacts of some important design issues
in modelling a multilayer perceptron artificial neural network for
Table 1: Summary of the Forecasting Model Selection for the Internet traffic forecasting. The traffic forecasting was modelled
traffic Categories as a standard time series problem and a multilayer artificial neural
network designed to performs the time series function mapping.
Hid Epo Lear mom Input Least The mechanism was implemented in a Visual Basic programming
den chs ning entu lags RMSE environment and tested with real Internet traffic data through
laye rate m experimentation with the various design issues considered.
rs Although no particular pattern of performance was observed the
HOURLY_IN 2 200 0.1 0.9 24 0.07669 study showed that the forecasting performance can be affected by
HOURLY_OU 2 200 0.1 0.9 13 0.06219 the number of input lags, hidden layers and training epochs,.
T Despite that the study did not make any attempt to determine an
DAILY_IN 2 200 0.1 0.9 3 0.11521 optimal values for the various factors considered, it has shown
DAILY_OUT 1 200 0.1 0.9 3 0.09941 that careful experimentation is required to choose appropriate
values for each of the design issues. Therefore, the multilayer
For the HOURLY_IN traffic the traffic computed (predictive) perceptron should not be applied blindly to Internet traffic
values based on 24 input lags on two hidden layer network using forecasting.
200 training epochs was deployed, and for the HOURLY_OUT
traffic, the study used 13 input lags of the traffic computed values 6. REFERENCES
of the testing set on 200 training epochs on two hidden layers [1] Crovella, M. and Krishnamurthy B. 2006. Internet
network designed to perform . The study deployed 3 input lags, Measurement. John Wiley & Sons, Ltd., England.
two hidden layers of 3 neurons each using 200 training epochs for [2] Benkacha, S., Benhra J., and El Hassani, H.. 2015.
predicting the DAILY_IN traffic. For the DAILY_OUT traffic, 3 Seasonal Time Series Forecasting Models on Artificial
input lags, one hidden layer of three neurons with 200 training Neural Network. International Journal of Computer
epochs were selected. Applications. 116, 20, 0975-8887, DOI=:
Figure 7 compares the predicted models selected for the 10.5120/20451-2805
traffic categories. [3] Benkacha, S., Benhra, J. and El Hassani, H. 2013.
0.15 Causal Method and Time Series Forecasting Model
Least RMSE

based on Artificial Neural Network. International
0.1
Journal of Computer Applications. 75, 7, 0975 – 8887.
0.05 [4] Islam, S., Keung, J., Lee K. and Liu, A. 2012. Empirical
0 prediction models for adaptive resource provisioning in
LRMSE the cloud. Future Generation Computer Systems. 28,
155-162. DOI= 10.1016/1.future2011.05.027
[5] Chabaa, S. Zeroual, A. and Antari, J. 2010.
Identification and prediction of internet traffic using
traffic category artificial neural networks. Journal of. Intelligent
Learning Systems & Applications. 2, 147-155.
DOI=1.4236/jilsa.2010.23018
Figure 7. Summary least RMSE for model selection of the traffic [6] Shamsuddin, S. M., Sallehuddin R., and Yusof, N. M.
categories 2012. Artificial neural network time Series modelling

15
for revenue forecasting. Chiang Mai J. Sci. 35, 3, 411- [16] Zhang, Y. and Liu, Y. 2009. Comparison of parametric
426. and nonparametric techniques for non-peak traffic
[7] Cortez, P., Rio, M. Sousa, P. and Rocha M.. 2007. forecasting, World Academy of Science, engineering
Topology aware internet forecasting using neural and Technology. 51,
networks. In Proceedings of the 17th International [17] Cortez, P., Rio, M., Rocha, M and Sousa. P. 2012.
Conference on Artificial Neural Networks (Porto, Multi-scale internet traffic forecasting using neural
Portugal), Lecture Notes in Computer Science 4669, networks and time series methods. Expert Systems. 29,
445-452, Springer. 2, 143–155.
[8] Zhang, G., Patuwo, B. E. and Hu, M. Y. 1998. [18] Prangchumpol, D. A. 2013. Network traffic prediction
Forecasting with artificial neural networks: The state of algorithm based on data mining technique. World
the art. International Journal of Forecasting. 14, 35 – Academy of Science, Engineering and Technology.
62. International Science Index, http//www.waset.org.
[9] Faraway, J. and Chattfield, C. 1998. Times series [19] Fok, W. W. T, Tam V. W. L. and Ng, H. 2008.
forecasting with neural networks: a comparative study Computational neural network for global stock Indexes
using the airline data. Journal of Appl. Statist. 47, 231 – Prediction, In Proceedings of World Congress on
250. Engineering (London, UK, July 2 -4, 2008).
[10] Miguel, M. L. F., Penna, M. C., Nievola, J. C. Pellenz [20] Chabaa S., Zeroual, A., and Antari, J. 2010.
and M. E. 2012. .New models for long-term internet Identification and Prediction of Internet traffic using
traffic forecasting using artificial neural networks and artificial neural networks. Journal of. Intelligent
flow based information. In Proceedings of 2012 IEEE Learning Systems & Applications.
Network Operations and Management Symposium, DOI=1.4236/jilsa.2010.23018.
1082-1088. DOI= 10.1109/NOMS.2012.6212033, [21] Chukwuchekwa, U. J. 2011. Comparing the
[11] Cortez, P. Rio, M. Rocha, M. and Sousa, P. 2012. Multi- performance of backpropagation algorithm and genetic
scale Internet traffic forecasting using neural networks algorithms in pattern recognition problems.
and time series methods. Expert Systems. 29, 2, 143– International Journal of Computer Information Systems.
155. 2, 5, 7-12.
[22] Odim, M. O., Gbadeyan J. A. and Sadiku J. S. 2014. A
[12] Mariam I., Dadarlat, V. and Iancu, B. 2009. A neural network model for improved internet service
Comparative Study of the statistical Methods suitable resource provisioning. British Journal of Mathematics
for Network Traffic Estimation. In Proceedings of the & Computer Science. 4, 17, 2418-2434,
13th WSEAS International Conference on [23] Dogne, V., Jain, A. and Jain, S.. 2015. Evolving trends
Communications. 99-104. and its application in web usage mining: a survey.
[13] Anand, C. N. 2009. Internet traffic modeling and International Journal of soft computing and
forecasting using non-linear time series model Garch. engineering. 4, 6, 98-101,
M.Sc. Thesis, Department of Electrical and Computing [24] Rutka, G., and Lauks, G. 2007. Study on internet traffic
Engineering, College of Engineering, Kansas State prediction models. Electronics and Electrical
University. Engineering. – Kaunas: Technologija, 6, 78, 47–50,
[14] Wang, X.., Abraham, A. and Smith, K. A. 2005. [25] Chatfield, C. 1992. The analysis of time series: An
Intelligent web traffic mining and analysis, Journal of introduction (4th ed.). Chapman & Hall, London:.
Network and Computer Applications. 28, 147-165. [26] Haykin, S. 2009. Neural networks and learning
[15] Cortez, P., Rio, M., Rocha, M. and Sousa, P. 2006. machines (3rded.). Pearson Education, Inc, New Jersey.
Internet Forecasting using Neural Networks. In [27] Zhang, G. P. Patuwo, B. E, and Hu, M. Y. A. 2001.
Proceeding of the International Joint Conference on Simulation Study of Artificial Neural Networks for
Neural Network (Vancouver), 2635 – 2642. Nonlinear Time-series Forecasting, Computer &
Operations research, l28, 381-396.