The Comparison of Holt – Winters and Box – Jenkins Methods for Software Failures Prediction Vitaliy Yakovyna and Oleksandr Bachkai Lviv Polytechnic National University, Lviv 79013, Ukraine vitaliy.s.yakovyna@lpnu.ua, abachkay@gmail.com Abstract. Software reliability has proven to be a very important issue in several past decades. Software failures may result in a low-quality product, which is probably unacceptable for customer and other stakeholders. So that, additional works and budget will be required to rise the reliability to an appropriate level. This could be avoided by managing software quality throughout the whole de- velopment process. In order to manage the resources in optimal way, project co- ordinator might need some quantitative indicators, like predicted number of fail- ures, which can be obtained by forecasting. In case of software failures, the most common forecasting approaches include software reliability models. Their main disadvantage is that they do not fit all different types of software, because they rely on the assumptions about software properties and behavior. This paper deals with statistical time-series forecasting techniques, which depend only on the pre- vious values, and hence can be the ultimate solution for failure prediction of dif- ferent types of software. The Holt – Winters smoothing model and ARIMA re- gression model were considered to predict the Angular software failures on a weekly basis. It is shown that while the performance of ARIMA model is almost twice worse than that of Holt – Winters model, the accuracy is almost an order of magnitude better. Keywords: Software Reliability, Time-Series, Failures Prediction, Holt – Win- ters Model, ARIMA model. 1 Introduction Today the process of software development is highly imposed by certain limitations on cost and time, as well as requirements for quality and reliability. Many organizations, involved in software development, spend large amount of their funds on testing and refactoring in order to prevent failures. The greatest problem facing the industry today is how to assess quantitatively soft- ware reliability characteristics (see e.g. [1, 2] and others). Research on software relia- bility engineering has been conducted during the past three decades and numerous sta- tistical models have been proposed for estimating software reliability. Most existing models for predicting software reliability are based purely on the observation of soft- ware product failures where they require a considerable amount of failure data to obtain an accurate reliability prediction. Some other research efforts have recently developed reliability models addressing fault coverage, testing coverage, and imperfect debugging processes [1]. The later a bug is discovered, the more expensive and difficult it is to resolve. This is extremely valuable in case of Software as a Service or cloud software development, which show constantly growing share of the market during the last decade. Quality control methods such as inspection and testing aim to detect faults prior to release. Un- fortunately, code inspection and testing are costly in terms of time and manpower, so managers seek to optimize their effectiveness. Bug prediction has been suggested as a means to this end (see e.g. [3–5]). The growth of empirical software engineering tech- niques has led to increased interest in bug prediction algorithms [3]. 2 Related Works Analysis Forecasting methods are commonly divided into two main groups: intuitive and for- malized (Fig. 1) [6]. Forecasting Intuitive Formal Domain models Time series models Fig. 1. Classification of forecasting methods. Intuitive forecasting methods include expert judgments and estimates. Today, they are often used in marketing, economics, politics and other domains, which behavior are very complex or difficult to predict with mathematical models [7]. Formalized methods are the methods, which use mathematical models to predict fu- ture values. They are divided into domain models and time-series models. Domain models – models, based on processes, rules and mechanisms of domain. For example, weather forecast model contains an equation of dynamics of fluids and ther- modynamics. In the context of software failures prediction the most common ap- proaches include software reliability models. Their main disadvantage: they do not fit all different classes of software, because they depend on particular aspects of it [1]. In order to create an adequate model of software reliability and be able to make decisions, based on such a model, a deep understanding of the processes, methodologies and tech- nologies of creation and testing of software is required. Time-series models are mathematical models of forecasting, which seek to find the dependence of the future value from the past value within the process itself and calcu- late the prediction, based on this dependence. These models are universal for various domains, that is, their general appearance does not change, depending on the nature of the time series [8]. Time series models [9] can be further divided into (see Fig. 2):  regression models;  smoothing models;  models based on neural networks. Time series models Neural networks Smoothing models Regression models modesl Fig. 2. Time series models. Time-series prediction is based on different models and approaches and is widely used for modelling various aspects of human activity [10–12]. The Holt – Winters forecast- ing is one of the most used among the smoothing models. This forecasting procedure is a variant of exponential smoothing which is simple, yet generally works well in prac- tice, and is practically suitable for producing short-term forecasts time-series data (see e.g. [13, 14]). In [15] it was shown that Holt – Winters short-term model is equivalent to particular ARIMA model, and generally do not lie within that subset of the ARIMA class which forms the basis of the Box – Jenkins modelling approach. It is argued that the models considered in [15] have a reasoned structure, and are to be preferred to the Box – Jenkins models for most socio-economic applications. On the other hand, Auto-Regressive (AR) models were first introduced by in 1926 and subsequently supplemented in 1937 by Moving Average (MA) schemes [16]. Wold [17] combined both AR and MA schemes and showed that ARMA processes can be used to model a large class of stationary time series as long as the appropriate order of 𝑝, the number of AR terms, and 𝑞, the number of MA terms, was appropriately speci- fied. The paper [16] concludes that the major problem of ARIMA models is the way of making the series stationary in its mean that has been proposed by Box and Jenkins. In addition, it was shown that using ARMA models to seasonally adjusted data slightly improves post‐sample accuracies while simplifying the use of ARMA models [16]. These two classes of forecasting techniques have been compared in number of em- pirical studies (see e.g. [18–20]). Thus, in [18] the forecasting approach for short- and long-term heat load forecasting on the three levels: monthly, weekly and daily forecast- ing bases was presented. Based on chosen accuracy measures, Multiple regression was recognized as the best forecasting method for daily and weekly short-term heat load forecasting, whereas Holt–Winters methods ensured the best forecasting values in pur- pose of long-term heat load forecasting and monthly short-term heat load forecasting [18]. Paper [19] determines the forecasting accuracy of Holt – Winters and ARIMA models for samples of telemarketing data, and concludes that ARIMA models with in- tervention analysis perform better for the time series studied. Paper [20] uses intraday electricity demand data from ten European countries as the basis of an empirical com- parison of univariate methods for prediction up to a day-ahead. The ARIMA and prin- cipal component analysis methods performed well, but the method that consistently performed the best was the double seasonal Holt – Winters exponential smoothing method [20]. Promising methods of software failures prediction are methods based on nonpara- metric models [21, 22]. Such models do not have main drawbacks and difficulties of analytical models because they do not make any assumption about the mechanism of software failures. Besides smoothing and regression models, the artificial neural net- works are widely used for software failures prediction because of their proven quality of generalization and approximation of almost any smooth functions [23]. In [24] the study of the efficiency of software failures time-series prediction by RBF neural net- works was presented. The achieved root-mean-square error (RMSE) was as low as 1.3% [24]. Thus, the performed related works analysis shows that for different domains with the peculiar time-series features different forecasting methods could perform better. Hence, the goal of this paper is to compare Holt – Winters and ARIMA forecasting for software failures time-series. 3 The Models Description A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time [6, 10]. There are many different models of time series forecasting, but all they are aimed to investigate the following three components [25], shown in Fig. 3:  Seasonal: patterns that repeat with a fixed period of time.  Trend: the underlying trend of the metrics.  Random: also call “noise”, “irregular” or “remainder”, this is the residuals of the original time series after the seasonal and trend series are removed. Fig. 3. Trend, seasonal and random components. 3.1 Holt – Winters Model Smoothing methods are used to reduce the effect of random oscillations in time series. They give the opportunity to receive “pure” values that consist only of deterministic components [9]. The most advanced method of this group is Holt – Winters method, which is also called triple exponential smoothing [13–15]. Let the observed time series be denoted by 𝑦1 , 𝑦2 , … , 𝑦𝑛 . A forecast of 𝑦𝑡+ℎ based on all of the data up to time 𝑡 is denoted by 𝑦̂𝑡+ℎ|𝑡 . The model then is described by the forecast equation (1), which includes the level (2), trend (3), and seasonal (4) components. 𝑦̂𝑡+ℎ|𝑡 = 𝑙𝑡 + 𝑏𝑡 ℎ + 𝑠𝑡−𝑚+ℎ𝑚 +, (1) 𝑙𝑡 = 𝛼(𝑦𝑡 − 𝑠𝑡−𝑚 ) + (1 − 𝛼)(𝑙𝑡−1 + 𝑏𝑡−1 ), (2) 𝑏𝑡 = 𝛽(𝑙𝑡 − 𝑙𝑡−1 ) + (1 − 𝛽)𝑏𝑡−1 . (3) 𝑠𝑡 = 𝛾(𝑦𝑡 − 𝑙𝑡−1 − 𝑏𝑡−1 ) + (1 − 𝛾)𝑠𝑡−𝑚 . (4) Here 𝑚 is the length of seasonality, 𝑙𝑡 represents the level of the series, 𝑏𝑡 denotes the growth, 𝑠𝑡 is the seasonal component, 𝑦̂𝑡+ℎ|𝑡 is the forecast for ℎ periods ahead, and + ℎ𝑚 = [(ℎ − 1)mod 𝑚] + 1. 3.2 ARIMA model The acronym ARIMA stands for Auto-Regressive Integrated Moving Average. Lags of the stationarized series in the forecasting equation are called “autoregressive” terms, lags of the forecast errors are called “moving average” terms, and a time series, which needs to be differenced to be made stationary, is said to be an “integrated” version of a stationary series. Random-walk and random-trend models, autoregressive models, and exponential smoothing models are all special cases of ARIMA models [8, 16]. A non-seasonal ARIMA model is classified as an 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) model, where:  𝑝 is the number of autoregressive terms,  𝑑 is the number of non-seasonal differences needed for stationarity, and  𝑞 is the number of lagged forecast errors in the prediction equation. To fit parameters a methodology was invented by Box and Jenkins [8]. It consists of three steps. 1. Model Identification. Use plots and summary statistics to identify trends, seasonal- ity, and auto-regression elements to get an idea of the amount of differencing and the size of the lag that will be required. 2. Parameter Estimation. Use a fitting procedure to find the coefficients of the regres- sion model. 3. Model Checking. Use plots and statistical tests of the residual errors to determine the amount and type of temporal structure not captured by the model. The main approaches to fitting Box–Jenkins models are nonlinear least squares and maximum likelihood estimation. Maximum likelihood estimation is generally the pre- ferred technique. There is also a seasonal version of ARIMA It incorporates both non-seasonal and seasonal factors in a multiplicative model. One shorthand notation for the model is 𝐴𝑅𝐼𝑀𝐴(𝑝, 𝑑, 𝑞) × (𝑃, 𝐷, 𝑄)𝑆, with 𝑝 – non-seasonal AR order, 𝑑 – non-seasonal dif- ferencing, 𝑞 – non-seasonal MA order, 𝑃 – seasonal AR order, 𝐷 – seasonal differenc- ing, 𝑄 – seasonal MA order, and 𝑆 – time span of repeating seasonal pattern [8, 16]. 4 Experimental To carry out the research a desktop software application was developed, using Visual Studio and C#. Non-seasonal Holt-Winters and ARIMA models were implemented as separate modules (for reusability purposes). Besides, the module to calculate accuracy and performance of prediction method was implemented as well. To receive a realistic estimation of forecasting methods, the failures data from real software project were used. They input data were obtained from the GitHub bug-track- ing system. This study deals with the Angular project failure dates: 8 494 failures from October 2014 to January 2016 were fetched from GitHub, then compiled into 120 time series as number of failures per week. Then, the initial 90% of time-series were selected as previous values, while the last 10% were used to compare the predicted and actual software failures values. To estimate the prediction efficiency the RMSE was used as an accuracy measure, while execution time was used as a performance measure. 5 Results and Discussion The studied Angular failures time-series is plotted in Fig. 4. As it can be seen from the Fig. 4, there is clear seasonal component in this time-series. Using smoothing and re- gression techniques as well as neural networks for such irregular data with seasonal component is not very effective and result in large approximation and prediction errors. However, there are several methods of increasing forecast accuracy. As mentioned in [9], if time series are compiled in a manner of cumulative sums (i.e. each time interval has sum of all previous values), a trend will be easier to estimate, and accuracy of fore- cast might be better. This is confirmed also by the conclusions of the paper [24]. Hence, the time-series were formed in a cumulative manner and models were used to predict software failures. An example of the cumulative time series along with predicted using ARIMA model and actual data are shown in Fig. 5. As it can be easily seen, the time-series became smoother and the predicted software failures values are very close to the actual ones. It should be noted, that non-seasonal versions of both models were implemented. Seasonal versions of models with non-cumulative time series should also give a better accuracy, since it is obvious from chart, that there is a seasonality in series. However, the efficiency of the seasonal versions of the models will be studied at the future work. Fig. 4. Initial software failures time-series. Fig. 5. Results of forecasting of cumulative time-series using the ARIMA model. Typical results of the prediction efficiency of Angular software failures using Holt – Winters and ARIMA models are listed in Table 1. While the performance of ARIMA model is almost twice worse than that of Holt – Winters model, the accuracy is almost an order of magnitude better. This could be explained by the range of the input data: the observed failures period is more than two years, while Holt – Winters model is best suited for short-range forecasting [18, 19]. Table 1. Results of software failures forecasting with different approaches. Forecasting measure Holt-Winters model ARIMA model Accuracy (RMSE) 220.64 35.05 Performance (execution time, seconds) 1.41 3.26 6 Conclusion and Future Work Research on software reliability engineering has been conducted during the past three decades and numerous statistical models have been proposed for estimating software reliability. The growth of empirical software engineering techniques has led to in- creased interest in software failures prediction. The prediction of software failures is of large practical importance, because it provides project coordinators with some estima- tions, which they can use to manage and maintain software quality. Common ways of software failures prediction include usage of reliability models, which have a flaw – they cannot be fitted to all types of software. Time-series prediction methods, which are popular in economics domain, can solve this issue because they are do not rely on domain-specific processes and properties. After analyzing different time series models two of them were selected as the most advanced ones: Holt – Winters smoothing model and ARIMA regression model. The software application was developed to study the efficiency of non-seasonal versions of these models for software failures forecasting. The case study was based on the Angular failures data obtained from the GitHub bug-tracking system, and included failures data for more two years period. The time-series were presented as the cumulative number of failures detected at all time intervals till the current one. To evaluate the efficiency of prediction two parameters were used: the RMSE as an accuracy measure, and the execution time as a performance measure. The obtained results show that Holt – Winters model has better performance, while the ARIMA model was substantially better in sense of the prediction accuracy. The future work will be devoted to studying the efficiency of the seasonal versions of the models, because the studied time-series have explicitly visible seasonality. An- other study will be devoted to the influence of the forecasting interval on the efficiency of software failures prediction as well as studying different types of software both on long- and short-range intervals. References 1. Pham, H.: System Software Reliability. Springer series in reliability engineering, Springer- Verlag London Limited (2006). 2. Trivedi, K. S., Bobbio, A., Muppala, J. K.: Greenbook: Reliability and Availability Engi- neering: Modeling, Analysis and Applications. Cambridge University Press (2017). 3. Rahman, F., Posnett, D., Hindle, A., Barr, E., Devanbu, P.: BugCache for Inspections: Hit or Miss? In: Proc. of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pp. 322–331, ACM, Szeged, Hungary (2011). DOI: 10.1145/2025113.2025157 4. Lewis, C., Lin, Z., Sadowski, C., Zhu, X., Ou, R., Whitehead Jr., E. J.: Does Bug Prediction Support Human Developers? Findings from a Google Case Study. In: Proc. of the Int. Conf. on Software Engineering ICSE’13, pp. 372–381, IEEE, San Francisco, CA, USA (2013). DOI: 10.1109/ICSE.2013.6606583 5. Sunghun Kim: Adaptive bug prediction by analyzing project history. PhD thesis, Univ. of California, Santa Cruz, (2006). 6. Abraham, B., Ledolter, J.: Statistical Methods for Forecasting. John Wiley & Sons, New York, NY, USA (2005). 7. Tikhonov, E. E.: Methods of forecasting in market environment: textbook. Nevinnomyssk (2006). (in Russian) 8. Box, G. E. P., Jenkins, G. M., Reinsel, G. C., Ljung, G. M.: Time Series Analysis: Forecast- ing and Control. 5th edn. John Wiley & Sons, New York, NY, USA (2015). 9. Ajstrakhanov, D. D., Pugachova, M. V., Stepashko V. S. et al.: Conceptual basics of statis- tical monitoring. Derzhkomstat Publishing, Kyiv (2003). (in Ukrainian) 10. Weigend, A.: Time Series Prediction. Routledge, New York, NY, USA (2018). 11. Masters, T.: Neural, Novel and Hybrid Algorithms for Time Series Prediction. John Wiley & Sons, New York, NY, USA (1995). 12. Brown, R. G.: Smoothing, Forecasting and Prediction of Discrete Time Series. Dover Pub- lications, Mineola, NY, USA (2004). 13. Chatfield, C.: The Holt – Winters forecasting procedure. Appl. Statist. 27(3), 264–279 (1978). DOI: 10.2307/2347162 14. Chatfield, C., Yar, M.: Holt – Winters forecasting: some practical issues. The Statistician 37, 129–140 (1988). DOI: 10.2307/2348687 15. Roberts, S.A.: A General Class of Holt-Winters Type Forecasting Models. Management Science 28(7), 808–820 (1982). DOI: 10.1287/mnsc.28.7.808 16. Makridakis, S., Hibon, M.: ARMA Models and the Box–Jenkins Methodology. Journal of Forecasting 16(3), 147–163 (1997). DOI: 10.1002/(SICI)1099-131X(199705)16:3 17. Wold, H.: A Study in the Analysis of Stationary Time Series. Almgrist & Wiksell, Stock- holm (1938). 18. Tratar, L. F., Strmčnik, E.: The comparison of Holt–Winters method and Multiple regression method: A case study. Energy 109, 266–276 (2016). DOI: 10.1016/j.energy.2016.04.115 19. Bianchi, L., Jarrett, J., Hanumara, R. C.: Improving forecasting for telemarketing centers by ARIMA modeling with intervention. International Journal of Forecasting 14(4), 497–504 (1998). DOI: 10.1016/S0169-2070(98)00037-5 20. Taylor, J. W., McSharry, P. E.: Short-Term Load Forecasting Methods: An Evaluation Based on European Data. IEEE Transactions on Power Systems 22(4), 2213–2219 (2007). DOI: 10.1109/TPWRS.2007.907583 21. Khoshgoftaar, T. M., Szabo, R. M.: Predicting software quality, during testing, using neural network models: A comparative study. International Journal of Reliability, Quality and Safety Engineering 1, 303–319 (1994). DOI: 10.1142/S0218539394000222 22. Zheng, J.: Predicting software reliability with neural network ensembles. Expert Systems with Applications 36, 2116–2122 (2009). DOI: 10.1016/j.eswa.2007.12.029 23. Paliwal, M., Kumar, U. A.: Neural networks and statistical techniques: A review of applica- tions. Expert Systems with Applications 36, 2–17 (2009). DOI: 10.1016/j.eswa.2007.10.005 24. Yakovyna, V. S.: Software failures prediction using RBF neural network. Odes’kyi Politechnichnyi Universytet. Pratsi 2(46), 111–118 (2015). 25. Extracting Seasonality and Trend from Data, https://anomaly.io/seasonal-trend-decomposi- tion-in-r, last accessed 2017/05/01.