Modelling the Multi-Layer Artificial Neural Network for Internet Traffic Forecasting: The Model Selection Design Issues Mba O. Odim Jacob A. Gbadeyan Joseph S. Sadiku Computer Science Department Mathematics Department Computer Science Department Redeemer’s University University of Ilorin University of Ilorin Ede, Nigeria Ilorin, Nigeria Ilorin, Nigeria odimm@run.edu.ng jagbadeyan@unilorin.edu.ng jssadiku@unilorin.edu.ng ABSTRACT 1. INTRODUCTION Internet traffic forecasting models with learning ability, such as Accurate information about offered traffic is required for efficient the artificial neural network (ANN), have been growing in resource provisioning and general capacity planning of an Internet popularity in recent time due to their impressive performance in service. The inability of most statistical methods in modelling the modelling the high degree of variability and nonlinearity of high variability of internet traffic accurately, and their lack of internet traffic. This study examined the impacts of some design reasoning capabilities have triggered an increased number of issues on performance of the multi-layer artificial neural network studies that employ non-traditional statistical methods including for internet traffic forecasting. The traffic forecasting was machine learning. Furthermore, traditional summary statistics, modelled as a standard time series problem and the multilayer particularly the sample mean and variance are instable metrics for artificial neural network designed to performs the time series working with the high variability of internet traffic, as such the function mapping. The input lags were varied from 1 to 24. The sample means and sample variances are not reliable statistics for training epoch values of 200, 500, and 1000 on one and two summarising traffic properties [1]. Machine learning techniques, hidden layered networks were used. The learning algorithm was such as the Artificial Neural Network (ANN) employ mechanisms backpropagation with 0.1 learning rate and 0.9 momentum on that allow computers to evolve behaviour based on knowledge logistic sigmoid activation function. The model was implemented gained from dynamic observations. Machine learning technique in Visual Basic and validated with four categories of classified based on nonlinear elements is often referred to as Neural time series internet traffic of a branch residential network of one Network. Neural networks are networks of nonlinear elements of a firm in Nigeria. Various predictive performances without interconnected through adjustable weights and they play a consistent pattern were observed on the issues considered, prominent role in machine learning. Artificial (ANN) emerged however, input lag one gave the worst performance in all cases for with the aim of imitating the information processing process of the HOURLY traffic; three of the four traffic categories human brain. Through learning, ANNs can determine nonlinear demonstrated the superiority of two hidden layers to one hidden relationship in a data set by associating the corresponding output layer. Although the epoch values of 200, 500 and 1000 showed no to input patterns. The multilayer artificial neural network, among consistent performance variations, epoch value 200 outperformed other machine learning models, has shown impressive results in the others on the model selections. The study revealed that input forecating studies [2, 3, 4, 5, 6, 7, 8, 9]. However, applying an lags, number of hidden layers and epoch values could impact on ANN to a given forecasting endeavour is a hard task, as basic the traffic forecasting performance of multilayer perceptron and modelling issues must be carefully considered for enhanced that performance could be considerably improved by careful precision. The issues include the network architecture, learning selection of those parameters through experimentations. parameters and data pre-processing methods [6, 8]). The inconsistencies in performance reports on the design issues in the CCS Concepts literature was noted also in [8]. In [9] it was argued that ANN • Computing Methodologies → Artificial Intelligence → technique should not be applied arbitrarily as has been sometimes Machine learning → Machine learning approaches → Neural suggested and even used in the internet forecasting domain [10, Networks 11] . Keywords The paper examined the impacts of number of input lags, hidden Internet Traffic, Times Series Forecasting, Machine learning, neurons and training epochs on the precision of the multilayer Multi-layer Artificial Neural Network, Design issues artificial neural network in forecasting internet traffic. 1. RELATED WORK Quite a number of research efforts has been reported in the literature on seeking appropriate models for forecasting Internet traffic. CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria. 2.1 Internet Traffic Forecasting: Statistical Methods In [12] a comparative study on suitable statistical methods for network traffic estimation was conducted. In the paper, several estimation methods for IP network traffic were studied. The study 10 showed that non-linear time series models could model and different number of input nodes, activation functions and pre- forecast better than the classical linear time series models. Anand processing techniques on the performance of backpropagation in [13] investigated a non-linear Time series model, the network in time series revenue forecasting. The findings showed Generalised Autoregressive Heteroskdasticity (GARCH) in that the performance of ANN model could be considerably internet traffic modelling. The model showed that the forecasting improved by careful selection of those parameters. In [19], the algorithm was accurate compared with actual traffic. Although performance of two learning algorithms: the linear regression and nonlinear statistical models can capture the busrtiness of Neural Network Standard Back propagation were compared on network‟s traffic, the models are parametric in nature and the prediction of four major stock market indexes. The therefore require the knowledge of the distribution of the traffic. comparison showed that the neural network approach resulted in In addition, they are analytical and therefore require explicit better prediction accuracy than the Linear Regression model. programming to clearly specify the algorithmic steps. To take the Chabaa et al in [20] presented an ANN based on the multi-layer advantages of machine learning paradigms, applying machine perceptron for analysing a time series measured internet traffic learning techniques to internet traffic forecasting has been on the data over IP networks. The comparison between some training increase. algorithms demonstrated the efficiency and accuracy of the Levenberg Maquardt and the Resilient back propagation 2.2 Machine Learning and Artificial algorithms. Chukwuchekwa in [21] compared the performance of the back propagation gradient descent technique and genetic Neural Network for Internet Traffic algorithm on some pattern recognition problems. The Forecasting backpropagation (BP) algorithm was found to outperform the A vast number of research efforts have been on going in exploring genetic algorithm in that instance. The study suggested that machine learning techniques to internet traffic predictions, the caution should be applied before using other algorithms as results of which have demonstrated their superiority to statistical substitutes for the BP algorithm, more especially in classification forecasting methods. A concurrent neuro-fuzzy model to discover problems. In [2], an evaluation of several learning rules for and analyse useful knowledge from available Web log data was adjusting ANN weights was carried out on the popular airline proposed in [14]. The study used self-organizing map for pattern passenger data set. The Levenberg-Marquardt backpropagation analysis and a fuzzy inference system to capture the chaotic trend algorithm showed the best performance among other learning to provide short term (hourly) and long term (daily) web traffic rule. Various degrees of performances were observed in [22] on trend predictions. Empirical results demonstrated that the examining the impact of input lags of the multilayer perceptron in proposed approach was efficient for mining and predicting web forecasting internet traffic on a two layered network. In [23] a traffic. A study in [15] presented a neural network ensemble survey of research and application issues on Web usage mining (NNE) for the prediction of TCP/IP traffic using time series based on various mining technique was conducted to provide forecasting (TSF) point of view. The NNE approach was some understanding in designing algorithms suitable for mining compared with TSF methods (Holt -winter and ARIMA) and the data. NNE was found to compete favourably with the TSF methods. In [16] the least square support vector machines was applied to solve This review demonstrated the impressive results of applying the problem of accurately predicting non-peak traffic and the machine learning technique, such as the artificial neural networks, method had a good generalization ability and guaranteed global in forecasting Internet traffic as well as raising concerns over the minima. [17] Presented a neural network ensemble approach and little or no consideration given by researchers on the design two adapted time series methods (ARIMA and Holt-Winters) for issues. The paper therefore presents results from the study on the forecasting the amount of traffic in TCP/IP based networks. The impacts of some multi-layer perceptron design issues on internet experiments with the neural ensemble achieved the best results for traffic forecasting. 5 min and hourly data, while the Holt-Winters was the best option for the daily forecasts. The study in [10] investigated the 3. METHODOLOGY ensembles of artificial neural networks in predicting long-term The traffic forecasting was modelled as a standard time series internet traffic. The proposed prediction models were compared problem and the multilayer artificial neural network designed to with the classic method of Holt-Winters. Prangchumpol in [18] performs the time series function mapping. presented a description approach to predicting incoming and outgoing data rate in network system by using a data (machine learning) mining techniques, the association rule discover. The 3.1 Time series for Traffic Forecasting result of the study showed that the technique could predict future Traffic forecasting is a standard time series prediction task. The network traffic. goal is to approximate the function that relates the future values of a variable of the previous observations of that variable [24]. In some situations, such as internet traffic, data are non-stationary 2.3 Design Issues with Forecasting with and chaotic. In such situation, one general assumption is that Artificial Neural Network historical data incorporate all behaviour required to capture the A detailed state of the art presentation on forecasting with dependency between the future traffic and that of the past. artificial neural networks was made in [8]. The study showed that Therefore, the historical data is the major player in the forecasting overall; ANNs gave satisfactory performance in forecasting, but process. The second assumption to model and forecast the went on to indicate the inconsistencies in performance reports of dynamic of the traffic is that its values are expressed by discrete design issues in the literature. The inconsistencies were attributed time series [2, 3]. A discrete time series is a vector {yt} of to trial and error methodology adopted in most studies. Faraway observations made at regular intervals, t=1, 2, 3……, N. For the and Chatfield [9] argued that it was unwise to apply ANN models time series forecasting problem, the inputs are typically the past blindly in black box mode as had sometimes been suggested. observation of the data series and the output is the future value. Shamsuddin, et al. in [7] investigated the effect of applying 11 Suppose y1 , y2 ,. yN . denote an observed time series of the iii. The network exhibits a high degree of connectivity, the traffic loads, then the basic problem is to estimate future traffic extent of which is determined by synaptic weights of the network. value such as yN k , where the integer k is called the lead time or the forecasting horizon [25]. For the univariate method, forecasts of a given traffic load are based on a model fitted only to the past ^ observations of the given time series, so that yt (N, k) depends only on y1, y2….yN-1. The estimate of yN 1 is computed as a weighted sum of the past observations: ^ y  w0 yN  w1 yN 1  w2 yN 2  ... (1) N 1 Figure. 1. Architecture of Multilayer Perceptron with two hidden where the { wi } are weights. layers The Multi-Layer Perceptron performs the following function 3.2.1 A Neural Model mapping [3, 8]: The node is the basic unit of the Artificial Neural Network. . Each ^ node is able to sum many inputs x1, x2, …,xn form the yt  f ( yt 1 , yt 2 ,..., yt n ) (2) environment or from other nodes, with each input modified by an adjusted node weight (Figure 2). The sum of these weighted ^ inputs is added to an adjustable threshold for the node and then yt passed through a modifying (activation) function that determines where is the estimated traffic at time t, the final output. ( yt 1 , yt 2 ,..., yt n ) denotes the training pattern composed of a fixed number (n) of lagged observations of the series. The weight to be used in the ANN model are estimated from the data by minimizing the sum of squares of the within-sample one- step ahead forecast errors, namely ^ S   ( y  yt )2 (3) t over the first part of the time series, called the training set. The Figure. 2. Nonlinear model of a neuron [26] last part of the time series called the test set, is kept in reserve so that genuine out of sample (ex ante) forecasts can be made and compared with the actual observations. Equations (1) and (2) give The neural model in Figure 2 includes an externally applied bias, a one-step-ahead forecast as it uses the actual observed values of denoted by bk . The bias has the effect of increasing or lowering all lagged variables as inputs. If multistep-ahead-forecasts are the net effect of the activation function, depending on whether it is required, then it is possible to proceed in one of two ways. Firstly, positive or negative, respectively. Mathematically, we may construct a new architecture with several outputs, giving describe the neuron k depicted in Fig. 2 by the following ^ ^ ^ equations: yt , yt 1 , yt 2 , ... , where each output would have separate m weights for each connection to the neurons. Secondly, „feedback‟ uk   wkj x j j 1 the one-step-ahead forecast to replace the lag 1 value as one of the (4) input variables, and the same architecture could then be used to and construct the two-step-ahead forecast, and so on [16].. This study yk =  ( uk + bk ) (5) adopted the latter iterative approach because of its numerical simplicity and because it requires fewer weights to be estimated. where x1, x2, …, xm are the input signals; wk1, wk2, …., wkm are the respective synaptic weights of neuron k. uk is the linear combiner 3.2 The Multilayer Neural Network output due to the input signals, bk is the “bias”,  (.) is the Neural network is a powerful model for solving complex problems because it has natural potential of solving nonlinear activation function, and ykis the output signal of the neuron. The problems and can esily achieve the input-out mapping, it is good use of the bias bk has the effect of applying affine transformation to the output vk of the linear combiner in the model this is shown for solving predicting problems [26]. The basic features of the multilayer perceptrons include: by i. The model of each neuron in the network includes a vk = uk + b k nonlinear activation function that is differentiable. (6) ii. The network contains one or more layers that are hidden The bias bk is an external parameter of neuron k. from both input and output nodes. The activation function, denoted by (v) defines the output of a neuron in terms of induced local field v. It is this function (also 12 called, the transfer function) that determines the relationship 7. Back-propagate error through output and hidden between inputs and outputs of a node and a network. In general, layers and adapt Wij and qj. the activation function introduces a degree of nonlinearity that is 8. Back-propagate error through hidden and input valuable for most ANN applications. Among these functions, layer and adapt weights Wij and qj, sigmoid function is very popular. It is a strictly increasing 9. Check if Error < Emin or max epoch reached. If not, function that exhibits a graceful balance between linear and repeat steps 6 – 9, otherwise, stop training. nonlinear behaviour. The Logistic Sigmoid is defined as in (5) 3.3 Data collection and Description  (v) = (7) Internet traffic data was collected in hourly average kilo bit/s of A logistic sigmoid function assumes a continuous range of values TCP/IP traffic of a company‟s resident network from January 1 from 0 to 1. Additional types of activation functions can be found 2010 to September 30 2010 (making up 6552 data points each for in [8]. Among these functions, logistic transfer function is the IN and OUT traffic data), daily traffic data from January 1 to most popular choice [8]. December 31, 2010 (making up 365 data points each for IN and OUT traffic data), using PRTG (Paessler Router Traffic Grapher), 3.2.1 Training of artificial neural networks a network monitoring and bandwidth usage tool from a company ANN has to be trained before it can be put to use. The goal of the called PAESSLER. 20Mpbs bandwidth was allocated for upload training is to find the logical relationship from the given (Traffic IN) and 20Mbps for download (traffic out) statically for input/output. There two strategies of the learning: supervised and the period under consideration. unsupervised. This study employs the supervised learning strategy. Supervised learning typically operates in two phases – training and test set. The training set is used for estimating the arc 3.3.1 Data Pre-processing/ Normalisation weights while the test set is used for measuring the generalization Nonlinear activation functions such as the logistic function ability of the network. Training is used to gain generalised typically restricts the possible output from a node to, typically, (0, knowledge about the system under consideration and testing is 1) or (-1, 1). This is to avoid computational problems, to meet used to predict (forecast) the system behaviour using the algorithm requirement and to facilitate network learning. Four knowledge gained. On the other hand, unsupervised techniques methods for input normalization are summarized in [8]. This study such as the reinforcement learning is independent of training data employs, the Linear transformation to [0, 1], defined as and operate by directly interacting with the environment. yn = (y0-ymin)/(ymax-ymin) (8) The training algorithm employed is the Backpropagation. It is a supervised training strategy and popular method for training the where yn and yo represent the normalized and original data: ymin, multilayer perceptron. The training proceeds in two phases [26]: ymax, are the minimum, maximum of the column or rows respectively. 1. In the forward phase, the synaptic weights of the network are fixed and the input signal is propagated 3.3.2 Training and Testing set through the network, layer by layer, until it reaches the Eighty percent (80%) of the data, that is, 5241.6 approximated to output. Thus, in this phase, changes are confined to the 5242 was used for training the network, while twenty per cent activation potentials and outputs of the neurons in the (20%), that is, 1310.4, approximated to 1310, was used for testing network. the generalisation predictive capability of the network each for the 2. In the backward phase, an error signal is produced by HOURLY_IN and HOURLY_OUT flow traffic. Also, a training computing the output of the network with desired set of 80% and testing set of 20% were used for each of the response. The resulting error signal is propagated DAILY traffic, that is. 292 data points for training and 73 for through the network, again layer by layer, but this time testing. the propagation is performed in the backward direction. . In this second phase, successive adjustment is made to the synaptic weights of the network. 3.4. Finding the appropriate complexity of the Network In [5] it is also reported that the backprobagation is the most For times series forecasting problem, a training pattern consists of computationally straightforward algorithm for training the a fixed number of lagged observations of the series [7]. The inputs multi-layer perceptron. They summarized the algorithms (number of lag observations) were varied from 1 to 24, excluding steps as the bias. One and two hidden layers were considered. The number of hidden nodes were equalled to the number of input nodes. In 1. Obtain a set of training patterns several studies, networks with the number of hidden nodes being 2. Set up ANN model that consist of number of input equal to the number of input nodes are reported to have better neurons, hidden neurons, and output neurons forecasting [8]. One output node was used, one look-ahead. So the 3. Set learning rate (h) and momentum rate (a) model of our network is k, k, k, 1, where k represents the number 4. Initialize all connections (Wij and Wjk) and bias of lag observations (input variables). The epochs were based on weights ( qk and qj) to random values. 200, 500, and 1000. The best model according to [18] is the one 5. Set the minimum error Emin/number of epochs that gives the best result in the test set. The logistic sigmoid 6. Start training by applying input pattern one at a activation function was used [8]. The Error correction time and propagate through the layers then backpropagation algorithm with learning rate: 0.1; momentum: calculate total error 0.9 was used to train the network. 13 3.5 Stopping and Evaluation Criteria 0.2 The training stop after each epoch respectively. Typically, as SSE 200ep_1hdn 0.15 based objective function or cost function to be minimized during RMSE 0.1 200ap_2hdn the training process is defined in (10). The measure of accuracy employed is the Root Mean Square error (RMSE) defined as 0.05 500ep_1hdn 1 ^ RMSE   n t ( yt  yt ) 2 0 500ep_2hdn lag1 lag3 lag5 lag7 lag9 lag11 lag13 lag15 lag17 lag19 lag21 lag23 (9) 1000ep_1hdn 1000ep_2hdn where n is the total number of sample group observations, ŷt is the input lag predicted (computed) value while yt is the target value at time t. RMSE is one of the most commonly used measure of forecast error to examine how close the forecast is to the actual value [5]. Figure 4. ANN model selection for the HOURLY_OUT traffic The best model is the one that gives the best result in the test set, that is, the model that has the least RMSE in the testing set [27]. There also various values of the performance measure with no particular patterns on the issues for the HOURLY_OUT 4. RESULTS AND DISCUSSION traffic. As in the HOURLY_IN, the worst performance for was The system was implemented in visual basic. The RMSE of recorded at input lag 1 in all the cases. The least RMSE with the various models were recorded and compared based on the design value 0.0621992 occurred at input lag 13 with 200 training epochs issues considered. The results are presented and discussed in this on two hidden layer network. Therefore, the best model for section. forecasting the HOURLY_OUT traffic is input lag 13, 200 training epochs using two hidden layers. 4.1 HOURLY_IN traffic The RMSE of the testing (prediction) results of the various 4.3 DAILY_IN traffic models based on the number of input lags, number of hidden Figure 5 presents the prediction RMSE of the various models for layers and training epochs on one and two hidden layers network the DAILY_IN traffic. respectively were compared for the HOURLY_IN traffic. Figure 3 depicts these results. 0.25 0.2 0.25 200ep_1hdn 0.2 200ep_1ddn 0.15 RMSE 200ap_2hdn RMSE 0.15 200ap_2hdn 0.1 0.1 500ep_1hdn 500ep_1hdn 0.05 500ep_2hdn 0.05 500ep_2hdn 0 lag1 lag4 lag7 lag10 lag13 lag16 lag19 lag22 1000ep_1hdn 1000ep_1hdn 0 1000ep_2hdn lag1 lag4 lag7 lag10 lag13 lag16 lag19 1000ep_2hdn lag22 input lag inpu lag Figure 3. ANN model selection for the HOURLY_IN traffic There were varying degrees of performance with no regular Figure 5. ANN model selection for the DAILY_IN traffic patterns of performance among the input lags, between the one and two hidden layers networks, and among the various epochs Different performance values were also observed with no used. Nevertheless, the worst performance for all the cases is particular patterns on the various prediction models.. The least input lag 1. The least RMSE with the value 0.0766984 of this RMSE with the value 0.116691 of this experiment occurred on experiment occurred at input lag 24 with 200 training epochs on input lag 3 with 1000 training epochs on two hidden layer two hidden layer network. Therefore, the best model for network. Therefore, the best model for forecasting the DAILY_IN forecasting the HOURLY_IN traffic is input lag 24, 200 training traffic is input lag 3, 1000 training epochs using two hidden epochs using two hidden layers. layers. 4.4 The DAILY_OUT traffic. 4.2 HOURLY_OUT traffic The RMSE of the testing (prediction) results of the various Figure 6 presents the RMSE of the various prediction models for models were compared for the HOURLY_OUT traffic. The the DAILY_OUT traffic. results are shown in Figure 4. 14 The HOURLY traffic categories had a better prediction 0.3 200ep_1hdn performance than the DAILY traffic counterparts. This could have 0.2 been attributed to the very large sample size used for the RMSE 200ap_2hdn 0.1 HOURLY traffic. It has been reported that the ANN for 500ep_1hdn forecasting perform better with large sample size than with small 0 sample size (Zhang et al. 1998 [8] and Zhang, et al. [26]). In 500ep_2hdn addition, figure 7 revealed that various forecasting models may 1000ep_1hd exist for different traffic categories, even if the traffic categories input lag n are all from the same network operator. Figure 6. ANN model selection for the DAILY_OUT traffic This study has observed different forecasting models for the various traffic categories based on the issues. The findings suggest No particular patterns of performance was observed among the that carefully consideration of the design issues is indispensable various models, although there were different values of the for improving the predictive performance of a multi-layer performance measure. The least RMSE with the value 0.099416 artificial neural network rather than applying it to internet traffic of this experiment occurred on input lag 3 with 200 training forecasting blindly. However, there are no generally acceptable epochs on one hidden layer network. Therefore, the best model for techniques for determining the optimal design parameter but by forecasting the DAILY_OUT traffic is input lag 3, 200 training experimentations, an improved predictive performance model is epochs using one hidden layer. feasible. Table 1 presents a summary of the model selection for forecasting 5. CONCLUSION the various traffic categories. This study examined the impacts of some important design issues in modelling a multilayer perceptron artificial neural network for Table 1: Summary of the Forecasting Model Selection for the Internet traffic forecasting. The traffic forecasting was modelled traffic Categories as a standard time series problem and a multilayer artificial neural network designed to performs the time series function mapping. Hid Epo Lear mom Input Least The mechanism was implemented in a Visual Basic programming den chs ning entu lags RMSE environment and tested with real Internet traffic data through laye rate m experimentation with the various design issues considered. rs Although no particular pattern of performance was observed the HOURLY_IN 2 200 0.1 0.9 24 0.07669 study showed that the forecasting performance can be affected by HOURLY_OU 2 200 0.1 0.9 13 0.06219 the number of input lags, hidden layers and training epochs,. T Despite that the study did not make any attempt to determine an DAILY_IN 2 200 0.1 0.9 3 0.11521 optimal values for the various factors considered, it has shown DAILY_OUT 1 200 0.1 0.9 3 0.09941 that careful experimentation is required to choose appropriate values for each of the design issues. Therefore, the multilayer For the HOURLY_IN traffic the traffic computed (predictive) perceptron should not be applied blindly to Internet traffic values based on 24 input lags on two hidden layer network using forecasting. 200 training epochs was deployed, and for the HOURLY_OUT traffic, the study used 13 input lags of the traffic computed values 6. REFERENCES of the testing set on 200 training epochs on two hidden layers [1] Crovella, M. and Krishnamurthy B. 2006. Internet network designed to perform . The study deployed 3 input lags, Measurement. John Wiley & Sons, Ltd., England. two hidden layers of 3 neurons each using 200 training epochs for [2] Benkacha, S., Benhra J., and El Hassani, H.. 2015. predicting the DAILY_IN traffic. For the DAILY_OUT traffic, 3 Seasonal Time Series Forecasting Models on Artificial input lags, one hidden layer of three neurons with 200 training Neural Network. International Journal of Computer epochs were selected. Applications. 116, 20, 0975-8887, DOI=: Figure 7 compares the predicted models selected for the 10.5120/20451-2805 traffic categories. [3] Benkacha, S., Benhra, J. and El Hassani, H. 2013. 0.15 Causal Method and Time Series Forecasting Model Least RMSE based on Artificial Neural Network. International 0.1 Journal of Computer Applications. 75, 7, 0975 – 8887. 0.05 [4] Islam, S., Keung, J., Lee K. and Liu, A. 2012. Empirical 0 prediction models for adaptive resource provisioning in LRMSE the cloud. Future Generation Computer Systems. 28, 155-162. DOI= 10.1016/1.future2011.05.027 [5] Chabaa, S. Zeroual, A. and Antari, J. 2010. Identification and prediction of internet traffic using traffic category artificial neural networks. Journal of. Intelligent Learning Systems & Applications. 2, 147-155. DOI=1.4236/jilsa.2010.23018 Figure 7. Summary least RMSE for model selection of the traffic [6] Shamsuddin, S. M., Sallehuddin R., and Yusof, N. M. categories 2012. Artificial neural network time Series modelling 15 for revenue forecasting. Chiang Mai J. Sci. 35, 3, 411- [16] Zhang, Y. and Liu, Y. 2009. Comparison of parametric 426. and nonparametric techniques for non-peak traffic [7] Cortez, P., Rio, M. Sousa, P. and Rocha M.. 2007. forecasting, World Academy of Science, engineering Topology aware internet forecasting using neural and Technology. 51, networks. In Proceedings of the 17th International [17] Cortez, P., Rio, M., Rocha, M and Sousa. P. 2012. Conference on Artificial Neural Networks (Porto, Multi-scale internet traffic forecasting using neural Portugal), Lecture Notes in Computer Science 4669, networks and time series methods. Expert Systems. 29, 445-452, Springer. 2, 143–155. [8] Zhang, G., Patuwo, B. E. and Hu, M. Y. 1998. [18] Prangchumpol, D. A. 2013. Network traffic prediction Forecasting with artificial neural networks: The state of algorithm based on data mining technique. World the art. International Journal of Forecasting. 14, 35 – Academy of Science, Engineering and Technology. 62. International Science Index, http//www.waset.org. [9] Faraway, J. and Chattfield, C. 1998. Times series [19] Fok, W. W. T, Tam V. W. L. and Ng, H. 2008. forecasting with neural networks: a comparative study Computational neural network for global stock Indexes using the airline data. Journal of Appl. Statist. 47, 231 – Prediction, In Proceedings of World Congress on 250. Engineering (London, UK, July 2 -4, 2008). [10] Miguel, M. L. F., Penna, M. C., Nievola, J. C. Pellenz [20] Chabaa S., Zeroual, A., and Antari, J. 2010. and M. E. 2012. .New models for long-term internet Identification and Prediction of Internet traffic using traffic forecasting using artificial neural networks and artificial neural networks. Journal of. Intelligent flow based information. In Proceedings of 2012 IEEE Learning Systems & Applications. Network Operations and Management Symposium, DOI=1.4236/jilsa.2010.23018. 1082-1088. DOI= 10.1109/NOMS.2012.6212033, [21] Chukwuchekwa, U. J. 2011. Comparing the [11] Cortez, P. Rio, M. Rocha, M. and Sousa, P. 2012. Multi- performance of backpropagation algorithm and genetic scale Internet traffic forecasting using neural networks algorithms in pattern recognition problems. and time series methods. Expert Systems. 29, 2, 143– International Journal of Computer Information Systems. 155. 2, 5, 7-12. [22] Odim, M. O., Gbadeyan J. A. and Sadiku J. S. 2014. A [12] Mariam I., Dadarlat, V. and Iancu, B. 2009. A neural network model for improved internet service Comparative Study of the statistical Methods suitable resource provisioning. British Journal of Mathematics for Network Traffic Estimation. In Proceedings of the & Computer Science. 4, 17, 2418-2434, 13th WSEAS International Conference on [23] Dogne, V., Jain, A. and Jain, S.. 2015. Evolving trends Communications. 99-104. and its application in web usage mining: a survey. [13] Anand, C. N. 2009. Internet traffic modeling and International Journal of soft computing and forecasting using non-linear time series model Garch. engineering. 4, 6, 98-101, M.Sc. Thesis, Department of Electrical and Computing [24] Rutka, G., and Lauks, G. 2007. Study on internet traffic Engineering, College of Engineering, Kansas State prediction models. Electronics and Electrical University. Engineering. – Kaunas: Technologija, 6, 78, 47–50, [14] Wang, X.., Abraham, A. and Smith, K. A. 2005. [25] Chatfield, C. 1992. The analysis of time series: An Intelligent web traffic mining and analysis, Journal of introduction (4th ed.). Chapman & Hall, London:. Network and Computer Applications. 28, 147-165. [26] Haykin, S. 2009. Neural networks and learning [15] Cortez, P., Rio, M., Rocha, M. and Sousa, P. 2006. machines (3rded.). Pearson Education, Inc, New Jersey. Internet Forecasting using Neural Networks. In [27] Zhang, G. P. Patuwo, B. E, and Hu, M. Y. A. 2001. Proceeding of the International Joint Conference on Simulation Study of Artificial Neural Networks for Neural Network (Vancouver), 2635 – 2642. Nonlinear Time-series Forecasting, Computer & Operations research, l28, 381-396. 16