Development of software system for analysis and optimization of taxi services effciency by statistical modeling methods Pavel Azanov1, Andrey Danilov2, Nikita Andriyanov3 1 Tango Telecom, 122 Krasnaya street, 426057, Izhevsk, Republic Udmurtia, Russia 2 Taxi Ulyanovsk, 1/3 Narimanova prospect, 432071, Ulyanovsk, Russia 3 Ulyanovsk State Technical University, 32 Severniy Venets stret, 432027,Ulyanovsk, Russia Abstract The text considers using of statistical models for taxi service data analysis and forecasting. Special attention is paid to the model parameters identification and short-term forecasting. We suggest to use the mathematical models of images to account the alternating character, associated with the dependence of the taxi orders number on various parameters. In addition the possibility of improving the efffectiveness of evaluation by use of mixed random fields models is shown. Keywords: random processes; mixed models; time series forecasting; taxi service; data analysis; image processing 1. Introduction The following algorithm of the taxi service was quite common recently. Firstly, a dispatcher received the call, then the dispatcher communicated with a driver. During the communication the driver could accept the order or reject it. Usually all connections were provided by the radio devices. However, promising opportunities for use Internet in the taxi order service have appeared [1] due to the rapid Internet development. Now it is not difficult to order a taxi directly on the portal on the Internet or by using special applications for smartphones. In such cases, a very important source of receipt of orders from customers, that we will call customers from the phone, is not taken into account. At the same time, it should be noted that such processing also provides a sufficient collection of statistics, the analysis of which may allow in the future to improve the quality of the taxi service. Increasing the volume of the telephone calls database warrants the possibility of analyzing the talk time, determining the most popular places in the city, etc. You will get a fairly complete statistical description of the taxi service operation adding to this statistics for orders, including the time of their execution, the waiting time of the car, the distribution by hours and other parameters. Thus, there is an urgent task of analyzing an information collected in order to increase the efficiency of the service. So, for example, you can anticipate the number of dispatchers and drivers in advance by making precise forecasts of the calls number and tracking the orders percentage. In this case, both time series [2] and various models of random processes (RP) can be used to work with accumulated information [3,4]. 2. Service architecture and statistics collection Consider the taxi service project based on the contact center. At the same time, telephony is sent to operators through the Internet, and it requires only having a computer with a headset. To organize a dispatch taxi, you need a powerful software and hardware system. Its application allows several thousand taxi cars to work in real time. Obviously, the use of this technology allows you to effectively manage resources, increase the speed of processing orders, always have exact customer numbers, reduce the time for applications. For the contact center organization we need the presence of a multi-channel phone number, which will allow receiving many calls simultaneously. That’s why we should use IP-telephony technologies. One of the most common telephony servers (PBX) is the Asterisk server [5], which allows to use SIP-telephony [6]. Such a telephone PBX should be set up to make calls distribution to taxi service operators. To process incoming calls we use a special program that represents the operator the form of a taxi order based on the Internet browser. To store information about calls, a database server is used, for example, MySQL or MsSQL server. Tariffs are set up using a separate module called Tarifficator. This module is programmed for its use in the web. Thus, it is advisable to use virtualization methods to separate different servers, including a telephony server, a data base of telephony server, and a web server. In addition, an application server is needed. It provides information transfer from the contact center to the drivers. The special program for taxi service implements such transfer. And we suggest to use one more database server to store order information. Fig. 1 presents full architecture of the considered taxi service. The application for the Taxi program can have a version running just under java or common modern devices running by Android and iOS. When a particular driver receives an order, the database is updated. The updates include information about the car, time of order picking, etc. These data can be used to inform the client about the assigned car. The statistics is collected using database servers, but to present information in a convenient form it is necessary to use the Tari_cator. Tari_cator program allows you to display statistics either in a text document or in an excel format document. Fig. 2 presents the revised information on the distribution of orders, preserving the properties of the real sequence. We will make models fit according to this data. It should be noted that the process in Fig. 2 has a heterogeneous structure, as well as some recurrent features. It is therefore necessary to select the most adequate model to more accurately describe all the peculiar distribution characteristics. 3rd International conference “Information Technology and Nanotechnology 2017” 232 Mathematical Modeling / P. Azanov, A. Danilov, N. Andriyanov Fig. 1. Block diagram of the taxi service. Fig. 2. Distribution of orders daily with the conversion (along the X axis is the number of orders, along the Y axis is the certain day). 3. Mathematical models for the presentation of taxi service statistics Let’s consider some variants of the description of the collected statistics on service. Let the data be collected from the beginning of the year (from January) and until the end of the year (to December) with some simplification, which will be used in the presentation approach in the form of an image. 3.1. One-dimensional Autoregressive process Let’s imagine a sequence of data available on orders fOg using an expression for the Autoregressive (AR) of the first order 𝑂𝑖 = 𝜌𝑂𝑖−1 + 𝜉𝑖 , 𝑖 = 1, … , 𝑁 (1) where 𝜌 is a coefficient of correlation throughout the sequence and can easily be evaluated on the basis of existing data; 𝜉𝑖 is accidental admixture with zero mathematical expectation and variance . Besides the variance for orders is also estimated on the basis of the sample. AR processes of higher orders can be used for a more accurate description. In this case, it is need to use the Yule-Walker equations [7] to determine the correlation parameters. 3.2. One-dimensional doubly stochastic model of Random Process Descriptions of the heterogeneity and periodicity of real data can be achieved using mixed models of Random Fields (RF). One of the variants to realize mixed models is the doubly stochastic model [8,9], whose correlation parameters also represent the implementation of the RF: 𝑂𝑖 = 𝜌𝑖 𝑂𝑖−1 + 𝜉𝑖 , 𝑖 = 1, … , 𝑁 (2) 3rd International conference “Information Technology and Nanotechnology 2017” 233 Mathematical Modeling / P. Azanov, A. Danilov, N. Andriyanov where 𝜉𝑖 is the random additive value with zero mathematical expectation and variance ; 𝜌𝑖 is a sequence of correlation parameters (3) where r is the constant correlation coefficient; 𝑚𝑝 is the average value of the basic correlation coeffcient; 𝜎𝜌2 is the dispersion of the process describing change in the correlation parameters; {𝜍𝑖 } is a field of Gaussian random variables with zero mathematical expectation and variance of unit. An increase in the order of the process can also be used for the model (2) and its parameters (3), respectively. However, Fig. 1 shows the process which looks fairly ”prickly”. This fact allows the use of first-order models. It is important that the estimation of all parameters of the model can be performed by mathematical statistics using the available sample, but also satisfactory results can be obtained with a slight increase in complexity, for example, in estimating all the parameters of the model in a sliding window [10] or using a nonlinear Kalman filter [11]. In addition, such algorithms can be adapted to different dimensionalities of the models. 3.3. Presentation in the form of a Random Field The observed quasi-periodicity of the process shown in Fig. 2, allows us to conclude that it is possible to use models of random fields to represent information of this kind. Consider, for example, the doubly stochastic models of images that allow describing heterogeneous signals [12]. As an example, we will use the following model: (4) 2 where 𝑂𝑖,𝑗 is modeled RF with a normal distribution having 𝑀{𝑂𝑖,𝑗 } = 0, 𝑀{𝑂𝑖,𝑗 } = 𝜎𝑂2 ; {𝜉𝑖,𝑗 } is RF of independent standard Gaussian variables with 𝑀{𝜉𝑖,𝑗 } = 0, 𝑀{𝑥𝑖𝑖,𝑗 2 } = 𝜎𝜉2 = 1; 𝜌𝑥𝑖,𝑗 and 𝜌𝑦𝑖,𝑗 are correlation coefficients of the model with multiple roots of characteristic equations of frequency rate (2,2) [13]; 𝑏𝑖,𝑗 is a scale coefficient of simulated RF. Random variables𝜌𝑥𝑖,𝑗 ; j and 𝜌𝑦𝑖,𝑗 have the Gaussian probability distribution function and can be described by AR equations of the first order or higher orders. It is easy to see that the model (4) is a transformation of the usual two-dimensional autoregressive model of the first order. This model of RF can also be used to describe a two-dimensional array of data and has the form: (5) Note that the model (4), unlike the model with constant parameters (5), imitates heterogeneous in the structure of the RF, so it can fairly well reflect sharp surges on the number of orders on weekends and holidays. In order to estimate the parameters of such an image, we can use a vector (row-by-row) nonlinear Kalman filter. It requires to combine the elements of the image string into a vector 𝑥⃗ 𝑖 = (𝑥𝑖1 , 𝑥𝑖2 , … , 𝑥𝑖𝑁 ). Then the model for a single frame of the image can be written as following equation: where diag(𝜌⃗𝑥𝑖 ) is the diagonal matrix with elements 𝜌⃗𝑥𝑖 on the main diagonal; 𝜗 is down triangle matrix determined by the decomposition of covariance matrix: 𝑉𝑥 = 𝜗𝜗 𝑇 . The evaluation process is described by the Kalman nonlinear filter: The use of this algorithm is possible if characteristics of information RF is exactly known, i.e. when we know the correlation coefficients 𝑟1𝑥 , 𝑟2𝑥 , 𝑟1𝑦 , 𝑟2𝑦 , as well as average values by row and column correlation, variance of correlation parameters and variance of information signal. Otherwise, a preliminary assessment of these parameters is required. Pseudogradient assessment procedures, as well as expressions for covariation function for doubly stochastic models can be used for this purpose. Produced at the output sequence of parameters can then be further parsed and replaced with any model. Also you can use and evaluation in the sliding window. Fig. 3 shows the transformation of the original process to the image. Thus, we see that the resulting image, on the one hand, is not strongly correlated, and on the other hand, there are several regions with higher brightness values on the image, which indicates the properties of the heterogeneity. We propose 6 variants of the models to describe the available data. Let’s compare them in detail. 3rd International conference “Information Technology and Nanotechnology 2017” 234 Mathematical Modeling / P. Azanov, A. Danilov, N. Andriyanov Fig. 3. Representation of orders statistics as an image. 4. Comparative analysis of efficiency of prediction based on different models We will perform the necessary parameter estimation for models (1), (2), (4) and (5). So we produce forecasting the past 21 values of a sequence on the basis of models which was considered. It should be noted that the image data will be structured by seasons and weeks, as presented in Table 1. Table 1. Data structure when converting it to image. The latter values will form a rectangular area in the lower right corner of the image, which is also useful for predicting and comparing the results of prediction based on various models. Denote the forecasting methods as follows: 1) A1 is the prediction based on one-dimensional AR model; 2) A2 is the prediction based on one-dimensional doubly stochastic model; 3) A2* is the prediction based on one-dimensional mixed model with the evaluation parameters through the Kalman filter; 4) A3 is the prediction based on two-dimensional AR model; 5) A4 is the prediction based on two-dimensional doubly stochastic model; 6) A4* is the prediction based on mixed model with evaluation parameters through the Kalman filter in two-dimension mode. Fig. 4 presents the results of statistical modeling. Relative variance of the prediction error of the last twenty one value, respectively, are as following: 1) It equals 10.88 for one-dimensional (1D) AR model; 2) It equals 0.254 for one-dimensional (1D) doubly stochastic model; 3) It equals 0.067 for one-dimensional (1D) doubly stochastic model with Kalman filter evaluation; 4) It equals 0.870 for two-dimensional (2D) AR model; 5) It equals 0.174 for two-dimensional (2D) doubly stochastic model; 6) It equals 0.049 for two-dimensional (2D)doubly stochastic model with Kalman filter evaluation. Thus, analysis of the diffierent models predicting results allows to say that using AR model leads to unsatisfactory results when predicting of complex data. Improving the effectiveness of predicting by the statistical models can be get using models of images. But such assessment will also not effective enough. So doubly stochastic models provide the best indicators because such models take into account the heterogeneity inherent in real data. Moving to the multivariate case leads to better forecast 3rd International conference “Information Technology and Nanotechnology 2017” 235 Mathematical Modeling / P. Azanov, A. Danilov, N. Andriyanov because of the characteristics of the analyzed data set. In addition, the highest accuracy of prediction algorithms which were considered is provided by doubly stochastic models of the images. For such models estimation of parameters is performed using the Kalman filter. Fig. 4. Predicting the past values of the taxi service orders and real data (on X axis we have converted number of orders, on Y axis we have the certain day of the year). 5. Software package for statistical analysis of data on taxi service As mentioned earlier the following algorithm of the taxi service is widely known. A dispatcher receives a call from the client and then communicates with a driver on the radio and transmits the order details to the driver. However, there is an alternative variant to this scheme of work at present time. The rapid growth of Internet traffic and the possibilities of IP- telephony (SIP), the availability of smartphones running under iOS or Android in each family allow you to abandon the use of radios when organizing a taxi order service. At the same time, the task of optimizing the work of the taxi service is quite relevant, because this type of service is still in demand even during a finance crisis. Moreover, the opportunity to improve the efficiency of the service and save money by switching to automated mode is a very promising task. Thus, the solution of this problem implies research at the junction of information and telecommunications systems. Indeed, it is necessary to realize not only communication networks that allow to exchange the information between operators, taxi drivers and customers, but also have software implementation of algorithms for handling calls and orders. At the same time, an important study is the statistical analysis of orders data. We have solved the number of tasks during implemention of the software. First of all, the structure of the database has been developed. All the connections were thought out in the database, all the necessary information was collected. Secondly, it was suggested to use mixed or doubly stochastic autoregressive models to solute the orders forecasting problem. Third, we suggested a number of procedures based on the forecast data. We described how to calculate call traffic, determine the required number of operators. Fourthly, a Web-based interface has been developed that allows you to quickly change the settings on the telephony server. The organization of the taxi service is performed on the basis of integration with the contact center (Telephony Server). Furthermore, the service includes: - the database server; - the Web server; - the application server running the special Taxi program. Fig. 5 shows the work of the Contact Center in more detail when the operator processes the order form. After such processing the database is updated and the order for taxi drivers is distributed. Using the programming languages PHP and JavaScript, we developed the web-based interface for analyzing order data. As we mentioned earlier the interface can be conditionally called Tarififcator and allows you to obtain various statistical characteristics, as well as implement database modifications that are aimed at changing prices. In addition, you can view statistics on orders in real time using Tarifficator. Another application, implemented by PHP and JavaScript, is the calculator of complex routes. The program allows you to calculate the cost of an order in the case when the driver passes several points in sequence. For example, cabbie drives first from point A to point B, and then from point B to point C. The module for data analysis has been improved for convenience of the operating with different statistics in the languages PHP and JavaScript. This module (Fig. 6) allows to draw various statistical graphs using the library flot.js, and it also allows to make changes in the database related to setting prices. In addition, it is very important that in the Tarifficator module all necessary statistics is collected in real time mode. 3rd International conference “Information Technology and Nanotechnology 2017” 236 Mathematical Modeling / P. Azanov, A. Danilov, N. Andriyanov Fig. 5. Call processing algorithm. You also can use module that allows the fitting of real data for operating the statistics module. So you can use statistical models of random sequences. Parameter identification may be implemented for the distribution of orders daily, calculated by the common AR model. Using these data the module will give forecast for the following days. Doubly stochastic models allow to consider the non-stationary in the distribution of data (bursts on weekends). For the such models you can use parameter identification algorithms based on a combination of algorithms of pseudogradient search and nonlinear Kalman filter [14]. The developed program complex allows you to accurately forecasting based on the doubly stochastic models of the images. Thus, improving the efficiency of taxi services is possible through the right choice of the necessary number of drivers in different time intervals. Similarly, it is possible to calculate, for example, the required number of call-center staff for different time periods. Fig. 6. Example of presenting statistics in the Tarifficator module. Thus, you can get rid of radio communication and go to a software complex that handles data through the Internet. At the same time, telephony in the contact center means not the operators attached to the handset, but the people who process the data directly on the computer. In our project the dispatching taxi is organized with the help of the powerful software and hardware complex. Furthermore, thousands of cars and more can work simultaneously. So it is possible to completely abandon the use of a radio for a taxi service. Obviously, the use of such technology allows you to effectively manage resources, increase the speed of processing orders, always have exact customer numbers, reduce the time for applications running. And in order that the work of the taxi dispatcher was possible and necessary condition is the presence of a standard computer and a headset with a microphone. 6. Conclusion The problem of analysis and optimization of the taxi order service efficiency is considered. It is suggested to use the doubly stochastic models of images to account for the heterogeneity of the data. A comparative analysis of forecasting based on 6 different models is carried out. In this case, the gain in comparison with autoregressive ones can reach several orders, and by applying the Kalman vector nonlinear filter it is possible to increase the forecast efficiency by another 4-5 times. A powerful software and hardware complex was developed. It will be used in the work of taxi order services and provide a solution to the task of real-time forecasting. Acknowledgements This work was supported by the OOO ”EIS-PFO” (Ulyanosk, Russia). We express our special gratitude for provided information used in the research. 3rd International conference “Information Technology and Nanotechnology 2017” 237 Mathematical Modeling / P. Azanov, A. Danilov, N. Andriyanov References [1] Andriyanov NA, Danilov AN. Taxi service with forecasting statistics based on complex mathematical models Advances of modern science 2016; 2(10): 114–116. (in Russian) [2] Yarushkina NG, Afanasyeva TV, Perfilieva IG. Time series mining. Students book. Ulyanovsk: UlGTU, 2010; 320 p. (in Russian) [3] Prokis J. Digital communications. Translated from eng. Edited by Klovskiy DD. Moscow: Radio and communications, 2000; 800 p. [4] Borovkov AA. Probability Theory. Springer Science and Business Media; 536 p. [5] Meggelen J, Madsen L, Smith J. Asterisk: future of the telephony. 2-nd edition, translated from eng. SPb: Symbol-Plus, 2009; 656 p. [6] Goldstein BS, Zarubin AA, Samorezov VV. Session Initiation Protocol (SIP): Reference book. Series: Telecommunication protocols of Russia, 2005; 456 p. (in Russian) [7] Andriyanov NA, Dementyev VE. The application of the system of equations of the Yule-Walker to simulate isotropic random fields. Modern trends of technical sciences. IV International Scientific Conference materials. Kazan, Russia, 2015: 2–6. (in Russian) [8] Vasil’ev KK, Dement’ev VE, Andriyanov NA. Doubly stochastic models of images. Pattern Recognition and Image Analysis (Advances in Mathematical Theory and Applications) 2015; 25(1): 105–110. DOI: 10.1134/S1054661815010204. [9] Andriyanov NA. Doubly stochastic models based on the correlation interval changes. Mathematical methods and models: theory, application and role in education 2014; 3: 6–8. (in Russian) [10] Andriyanov NA. Method of fitting images based on random field model with changing parameters. Advances of modern science 2016; 5(9): 98–100. (in Russian) [11] Vasil’ev KK, Dement’ev VE, Andriyanov NA. Application of mixed models for solving the problem on restoring and estimating image parameters. Pattern Recognition and Image Analysis (Advances in Mathematical Theory and Applications) 2016; 26(1): 240–247. DOI: 10.1134/S1054661816010284. [12] Dementyev VE, Andriyanov NA. The using of doubly stochastic models of random processes and fields to describe complex heterogeneous signals. Actual problems of physical and functional electronics. Materials of 19-th all-Russian youth scientific schoolseminar. Ulyanovsk: UlGTU, 2016; 98– 99. (in Russian) [13] Vasiliev KK, Krasheninnikov VR. Statistical image analysis. Ulyanovsk: UlGTU, 2014; 214 p. (in Russian) [14] Vasiliev KK, Dementyev VE, Andriyanov NA. Parameter estimation of doubly stochastic random fields. Radio 2014; 7: 103–106. (in Russian) 3rd International conference “Information Technology and Nanotechnology 2017” 238