A Probabilistic Model to Predict the Time Out of Service for Electronic Devices Eralda Gjika (Dhamo) Lule Basha (Hallaci) Ana Ktona Department of Applied Department of Applied Department of Informatics Mathematics Mathematics Faculty of Natural Science, UT Faculty of Natural Science, UT Faculty of Natural Science, UT ana.ktona@fshn.edu.al eralda.dhamo@fshn.edu.al lule.hallaci@fshn.edu.al Abstract Parallel (Processor Sharing - PS); The highest priority client is served first (Priority -P) etc. [Szt16]. Time modeling is one of the challenges In real life there is often a need to estimate the time attracting the attention of researchers in spent in a queue to benefit a service. The time spent in different fields. The prediction of the working a queue, which includes the time from the moment of time and out of service of an electronic device arrival to the system until the time of departure from is important as it helps to optimize the the system may be constant for any request arriving. In management of human resource, material resource as well as monetary resources of the real situations the time spent in the system is not always company. So, determining a predictive constant. Since service discipline affect external factors probability model of time will help in (service provider effectiveness, type of service minimizing maintenance costs, storage of required, etc.) then this lead to changes in waiting time. equipments and increasing efficiency in the Figure 1 shows a situation of arrival times and waiting service. In this study we have analyzed the time in a queue. This pattern is known as "non-regular waiting time for repair and the out of service traffic queue". time of some electronic equipments. The data are taken from an electronic service center in Tirana which offers service for electronic devices such as: computers, laptops, cellphone, tablets etc. for a period of one year. Probability distributions such as: normal, exponential, weibull, log-normal, gamma and pareto are fitted to the real data using maximum likelihood parameter estimation Figure 1: Non-regular traffic queue method. Various graphical and numerical statistical tests are performed to choose the In this work we will study the flow of requests “best” fit to the real data. The chosen arriving at the laboratory of a company which offer probabilistic model helps the company to service for electronic device. The company has only predict service time and design a maintenance one laboratory providing the service for electronic strategy to optimize cost and customer devices so it requires the most efficient management of satisfaction at the same time. the time. Service requests arrive at the laboratory following a 1. Introduction FIFO discipline. In rare situations the company has the right to offer priority service. Figure 2 illustrates the Queue theory is generally considered as an area of steps followed by an arrival service request. operational research because the results are often used when making business decisions about the resources needed to provide a service. The service disciplines are of different types, such as: The First In First Out (FIFO); Last-in-First Out - LIFO; Customers Served Figure 2: Arrival service request scheme Figure 3 shows the scheme of the system. It shows For each costumer the system has: an ID, the time it the waiting time to enter the lab and the time that the enters the service center, the time it enters the lab and device stays in the service center (known as out of the time it goes back to the owner. Unfortunately, the service time of the device). company didn’t note the time the device gets out of the lab. This time is registered in the program lately. We also have to emphasize that there is a lack of information in the secured database, and we have not considered the devices which had missing information in the two databases. Table 1 and Table 2 below give a descriptive statistics for the two databases. Table 1: Descriptive statistics for waiting time to be repaired data n Min Q1 Me Mean sd Q3 Max 292 1.00 3.75 6.00 13.21 21.75 12.00 139 Figure 3: The system scheme of waiting time to repair and out of service of the devices Table 2: Descriptive statistics for out of service time data Waiting time to enter the lab and the service time distribution is one part of the company system n Min Q1 Me Mean sd Q3 Max [Wan17]. The ability of the servers to serve the queries of their costumers will determine the system 330 2.00 26.00 39.50 53.45 45.5 64.00 269 performance. The faster the servers, the better the system performance. The company has a line queue system compound of one server. We have tried to fit a probability model for the waiting time to enter the lab and the out of service time of the electronic equipment’s arriving at the service To better understand the behavior of the data we obtain center. Given that the time is a continuous variable we a graphical view of the two dataset. The histogram of have taken into consideration some continuous the waiting time to be repaired and out of service time distributions such as: normal; log-normal; gamma; of the electronic devices is shown in Figure 4 (a, b) Weibull; exponential and pareto distribution [Mul15]. respectively. The procedure of parameter estimation, evaluation and simulation was done through R software. Waiting time to be repaired Out of service time 10 30 8 2 Data 25 No. of devices No. of devices 20 6 15 We have considered two time data: the waiting time 4 10 from the moment the device enters in the service center 2 5 0 0 until it goes at the lab (known as the waiting time to be 1 9 18 31 45 56 75 86 101 115 132 2 19 39 59 79 99 123 149 176 225 269 Hour Hour repaired) and the out of service time which is the time spent at the service center (from the moment it comes (a) (b) and the moment it goes out of the service center). There are in total 292 observations for the first database (data observed for 1 year) and 330 observations for the Figure 4: Waiting time to repair and out of service second database. The unit time measure is hour. time data 3 Parameter estimation When fitting continuous distributions, three goodness-of-fit statistics are classically considered to choose the “best” fitted distribution: Cramer-von 3.1 Estimation Mises, Kolmogorov-Smirnov and Anderson-Darling The probability distributions we have considered are statistics [D’Ag86]. Table 4 below gives the definition continuous distributions: normal; log-normal; gamma; and the empirical estimate of the three considered Weibull; exponential and pareto distribution. To goodness-of-fit statistics. Other accuracy measures used estimate the parameter of these distributions we have were the Information criteria such as: the Akaike used the maximum likelihood estimation (MLE) Information Criteria (AIC) and Bayes Information method. We have used the R software for estimation Criteria (BIC). procedure and also for statistical test. The packages used in R are: AdequacyModel [Dut08]; MASS Table 4: Goodness-of-fit statistics as defined by [Ven10]; fitdistrplus; actuar. [Boos12], [Hur13]. D’Agostino and Stephens (1986) Table 3 shows the pdf of the continuous General formula Computational formula distributions used to fit the data. Kolmogorov- Smirnov(KS) Table 3: Probability distribution function of Continuous distributions   sup Fn ( x ) - F ( x ) max( D , D ) with Distribution Probability density function  x  i  Exponential f ( x)   e ,x  0 D  max   Fi  i 1,...,n  n  Normal ( x   )2  f ( x)  1 e 2 2 ,xR   i 1 D  max  Fi    2 i 1,..., n  n  Log-Normal (ln x   )2  Cramer-von Mis (CvM) 1 2 2 , x  0    f ( x)  2 2 e n  Fn ( x ) - F ( x ) dx 1 n  2i  1  x 2    Fi  -  Gamma   1   x 12 n i 1  2n   x e f ( x)  , x  0;  ,   0 Anderson-Darling (AD)  ( )   2 1 n Weibull k 1 x  Fn ( x ) - F ( x ) n   (2i  1) log( Fi (1  Fn1i )) k x  ( )k n  n i 1 dx  f ( x)    e , x  0;  , k  0 - F ( x )(1  F ( x ))   Pareto a Where, F(x) is the fitted cumulative distribution ab f ( x)  function, Fn(x) is the empirical distribution function and a 1 , x  b x n is the number of observations of a continuous variable X. 3.2 Model Evaluation 4 Results The evaluation procedure for the proposed probability distributions was done through graphical statistical test and numerical test. Among the test we 4.1 Probability distribution for waiting time to be repaired have considered the: QQ-plot, PP-plot, density plot to compare the efficiency of each proposed distributions In this work we have proposed five probability with the real data. distributions to fit the waiting time and the out of service time for the devices: the normal, exponential, Weibull, gamma, lognormal distribution. These In the waiting time data (in Figure 5), is clearly seen probability distributions are widely used to describe that the normal distribution does not describe at all the events recurring at random points in time, such as the empirical data, but the other fitted distribution such as time between failures of electronic equipment or the exponential, Weibull, gamma and lognormal describes time between arrivals at a service center. An important the left tail of the empirical distribution, especially the characteristic of the exponential distribution is the lognormal distribution could be preferred for the better “memoryless” property, which means that time has no description of center of the distribution. effect on future outcomes. To estimate and evaluate the fitting we have used R We have used the Kolmogorov-Smirnov statistical software as a tool and then we have analyzed the test, Cramer-von Mises and Anderson-Darling test to outcomes of the results. select the “best” fitting distributions for the waiting To compare the fitting performance of the time and out of service time. Results for the waiting probability distributions we have used some accurate time to repair fitted distributions are shown in Table 5. graphical tests. The density plot and the CDF plot may be considered as the basic classical goodness-of-fit Table 5: Goodness of fit statistics for waiting time to plots and the QQ plot together with PP plot are repair data complementary but may be very informative in some cases. Kolmogorov- Cramer- Anderson- Smirnov von Mises Darling Normal 0.28 9.27 47.95 Histogram and theoretical densities Empirical and theoretical CDFs Exponential 0.19 2.97 15.37 0.10 Weibull 0.13 1.75 10.79 1.0 Normal Exponential Weibull Lognormal 0.09 0.41 2.68 0.08 Log Normal 0.8 Gamma Gamma 0.17 2.41 13.27 0.06 0.6 Density CDF Normal Exponential Weibull Lognormal Gamma 0.04 0.4 AIC 2630 2093 2077 1986 2092 0.02 Normal 0.2 Exponential Weibull BIC 2637 2097 2085 1994 2099 Log Normal 0.00 0.0 Gamma 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 Hour Hour The Anderson-Darling statistic is of special interest when it matters to equally emphasize the tails as well as the main body of a distribution but it should be used Q-Q plot P-P plot with caution when comparing fits of various 140 1.0 distributions to the data. By the other side the Cramer- 120 von Mises and Kolmogorov-Smirnov statistics, do not 0.8 100 take into account the complexity of the model (i.e., Empirical probabilities Empirical quantiles 0.6 80 parameter number. To a better decision on fitting 60 0.4 model we may consult the information criteria statistics 40 Normal Normal (AIC and BIC). 0.2 Exponential Exponential 20 Weibull Weibull Log Normal Log Normal As it is clearly seen from Table 5 the goodness-of- 0.0 Gamma Gamma 0 -50 0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 fit statistics for the waiting time are in favor of the Hour Hour lognormal distribution. Figure 5: Four goodness-of-fit plots for various 4.2 Probability distribution for out of service time distributions fitted to waiting time to repair data The empirical histogram for the out of service time is clearly different than the waiting time to be repaired. We have also taken into consideration the up mention probability distributions for the out of service data and Kolmogorov- Cramer- Anderson- the results of the density plot, CDF plot, QQ plot Smirnov von Mises Darling together with PP plot are shown in Figure 6. Normal 0.1672 3.45 19.97 Exponential 0.1677 2.1 12.06 Weibull 0.079 0.7 4.24 Histogram and theoretical densities Empirical and theoretical CDFs Lognormal 0.069 0.28 2 Gamma 0.066 0.46 2.76 Normal 1.0 Exponential 0.015 Weibull 0.8 Log Normal Gamma 0.6 0.010 Density CDF Normal Exponential Weibull Lognormal Gamma 0.4 0.005 Normal Exponential AIC 3458 3287 3252 3235 3238 0.2 Weibull Log Normal BIC 3466 3291 3260 3243 3245 0.000 Gamma 0.0 0 50 100 150 200 250 0 50 100 150 200 250 Hour Hour 5 Conclusions Q-Q plot P-P plot From the graphical and numerical tests it was observed that for the waiting time to repair the “best” probability 1.0 250 distribution was the lognormal with parameters: 0.8 200 meanlog: 1.91 sdlog: 0.062. For the out of service time Empirical probabilities Empirical quantiles 0.6 the lognormal distribution with parameters: meanlog 150 3.66; sdlog 0.83 and the gamma distribution with 0.4 100 parameters: shape 1.74; rate 0.032 seems to better fit Normal Normal 0.2 the real data. 50 Exponential Exponential Weibull Weibull Log Normal Log Normal From the achieved results we can assert that: 0.0 Gamma Gamma 0 -100 0 100 200 300 400 0.0 0.2 0.4 0.6 0.8 1.0 approximately 95% of the equipments spend less than 3 Hour Hour hours waiting to receive the service from the lab and 99% of them spend less than 8 hours. Moreover, the Figure 6: Four goodness-of-fit plots for various time out of service regarding the fitting from the two distributions fitted to out of service time data distributions varies from: 24-28% of the equipment’s which spend less than 24 hour and 53-60% of the In the out of service time data (in Figure 6), the equipment’s spend less than 48 hour; 96-98% of the exponential distribution and the normal distribution do equipment’s obtain the service and return to work not give an adequate fit, but Weibull, gamma and within 7 days. lognormal distributions describe satisfactory the most This study was carried out without considering the of the empirical data. At this point it is difficult to type of electronic devices arriving at the service center. decide on the most preferred distribution so it is a good Of interest would be the categorization of equipment moment to consult the values of the goodness-of-fit and their study in particular to predict the service time statistics mention above. It seems, form Table 6, that and design a maintenance strategy to optimize cost and the lognormal distribution and the gamma distributions customer satisfaction at the same time. are comparative with each other to “best” fit the data. Acknowledgments Table 6: Goodness of fit statistics for out of service time The authors want to thank the service center that provided real data for a better evaluation strategy. References [Dut08] C. Dutang, V. Goulet, M. Pigeon. “actuar: An R Package for Actuarial Science” Journal of Statistical Software, 25(7), 1–37, 2008 [Szt 16] J. Sztrik. Basic Queueing Theory. GlobeEdit, OmniScriptum GmbH & Co, KG, Saarbrucken, 2016, ISBN 978-3-639-73471-3. [Mul15] M. L. Delignette-Muller and C.Dutang. Fitdistrplus: An R Package for Fitting Distributions. Journal of Statistical Software, Volume 64, Issue 4, February 2015. [D’Ag86] R. B. D'Agostino and M. A. Stephens. Goodness-of-Fit Techniques. 1st edition. Dekker, 1986. [Ven10] W. Venables, B. Ripley BD. Modern Applied Statistics with S. 4th edition. Springer Verlag, 2010. [Wan17] Zh. Wang, M. Zhang, D. Wang, C. Song, M. Liu, J. Li, L. Lou and Zh. Liu. Failure prediction using machine learning and time series in optical network. Optics Express 18553, Volume 25, No. 16, August 2017. [Boos12] D. Boos https://www.rdocumentation.org/packages/Rlab/version s/2.15.1/topics/Gamma [Hur13] Ch. Hurlin, https://www.univ- orleans.fr/deg/masters/ESA/CH/Chapter2_MLE.pdf