Operational Forecasting of Road Traffic Accidents via Neural Network Analysis of Big Data Oleg Golovnin Ekaterina Sidorova Department of Information Systems and Technologies Department of Information Systems and Technologies Samara National Research University Samara National Research University Samara, Russia Samara, Russia golovnin@ssau.ru sidoroekaterina@gmail.com Abstract—The paper proposes an approach to the however, in some situations, the measures introduced are operational forecasting of traffic accidents with the separation insufficient, for example, when a vehicle is drifted on a of accident types based on the multilayer Rumelhart slippery road or if the driver is inattentive [4]. Ensuring road perceptron. The approach is applied to analyze Big Data safety and reducing damage from a predicted accident can be collected from external heterogeneous data sources and traffic achieved through directive and indirect impact on-road control systems or Smart City solutions. The approach behavior by actively controlling traffic lights and road signs increases the accuracy of determining the accident possibility with variable information, prompt notification of special by simultaneous analysis of multiple parameters covering services, as well as informing road users. weather conditions, conditions of the road and control devices, seasonal traffic fluctuations, traffic flow, individual vehicle This paper proposes an approach to predict the possibility speed, organizational factors, and events. The software of a road accident through a neuro network analysis of Big implementation of the approach uses the TensorFlow Data from different traffic control systems. framework and the Keras library. The experiments showed that the approach provides a 90% accuracy in recognizing II. STATE-OF-THE-ART situations. The forecast results are useful within an hour from In this study, let an accident is a road traffic incident that the calculation moment, which is enough to react to an emergency situation or notify the drivers. The software is occurred with the participation of at least one vehicle during intended to function as part of accident prevention systems and, its movement on the road network, in which people were in this case, could reduce an accident rate and severity and injured or killed, or damage was caused to the vehicles, increase the awareness of traffic participants. cargos, transport infrastructure and facilities [5]. At the moment, the active development of methods and Keywords— traffic accident, forecasting, TensorFlow, Keras tools is underway to detect [6], predict [7], inform [8], and I. INTRODUCTION prevent accidents [9]. There are a number of solutions based on measuring sensors [10, 11]. In [12], an approach based on A tangible problem for the transport complex of modern infrared sensors is described, which ensures operation in a urban agglomerations is road traffic accidents that damage two-phase mode: accident detection, accident prevention. The drivers, pedestrians, vehicles, cargos, and transport implementation of the approach operates with indicators of infrastructure, which, in turn, leads to economic and social traffic congestion but does not take into account other factors costs. According to the data of the State traffic inspectorate, that may affect the modeling of a dangerous situation [13]. only in October 2019, 17.0 thousand traffic accidents occurred on federal roads of the Russian Federation, in which In [14], a model of short-term traffic flow forecasting 3.9 thousand people were registered dead and 25.6 thousand taking into account spatial and temporal channeling is people injured [1]. Digitalization of management processes, presented. The model was implemented using the Apache development of intelligent technologies and big data Spark framework based on the MapReduce distributed processing methods have led to the emergence of new computing model, thereby achieving a high speed of solutions that can be used in the task of operative forecasting operation sufficient for online prediction but a functional the occurrence of accidents for taking preventive block was not implemented for analyzing the possibility of countermeasures to prevent accidents [2]. road accidents. Actual problems of modern traffic that can be detected or An intelligent approach based on a neural network that predicted before an accident can be occurred [3]: automatically detects an accident that has already occurred according to indirect traffic data is presented in [15]. The  inappropriate speed limits for vehicles; approach is based on the assumption that the average speed of  extreme weather conditions; the traffic flow is changing in the case of an accident. The proposed approach does not predict the possibility of an  violation of traffic rules; accident. In [16], geoinformation models for managing traffic flows in the event of an accident are detected, but reliable  damage to the roadbed and technical means of traffic data on the fact of an accident are not obtained. management; In [17], several controlled training methods were analyzed  dangerous behavior: aggressive driving, obstructing to classify the degree of damage resulting from an accident: overtaking, failure to maintain a safe distance between fatal injuries, severe injuries, minor injuries, and a car vehicles, sudden braking, pedestrians entering the accident. The solution proposed in this work cannot be used roadway. to monitor the situation on the road network and, accordingly, The development of active and passive means of ensuring cannot be used as part of an accident prevention system. road safety in recent years has significantly reduced the In [18], a method was proposed for determining the number of accidents and the severity of their consequences, temporal characteristics of accidents based on a high-speed Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Data Science thermogram, but it provides low-quality indicators. In [19], it  event on the road, CRITCAT ϵ {1, 2, 3, 4, 5, 6, 8, 9}, was proposed to use wavelet spectrograms to assess the x16; characteristics of traffic flows, but the determination of the factors leading to an accident is possible only by indirect  vehicle speed, DVEST ϵ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, signs with an insufficient time accuracy of the event binding. x17. This study proposes an approach to forecasting accidents B. Output Data Description by type of accident, using the Rumelhart multilayer The output data are a vector with the following possible perceptron [20] as applied to Big Data coming online from values: external heterogeneous data sources providing, for example, weather conditions, road network and road traffic  no accidents, y1; characteristics, events on the road network, etc.  head-on collision, y2; III. THE NEURAL NETWORK MODEL  side collision, y3; A. Input Data Description  rear collision, y4; For training and testing the neural network, a model of  rollover, y5; training with a teacher is used, therefore n-dimensional vectors describing the input data are required. Input data  collision with an object off the road, y6; consist of the weather, road, and organizational factors.  collision with an object on the road, y7; The data on the road network includes:  another type of accident, y8.  type of motion control device, TRAFDEV ϵ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, x1; Since the type of accident is encoded as an integer, one- hot coding is used to solve the multiclass classification  state of the motion control device, TRAFFUNCT ϵ problem for the received categories: 0 → [1,0,0,0,0,0,0,0,0], 1 {0, 1, 2, 9}, x2; → [0,1 , 0,0,0,0,0,0,0], ..., 7 → [0,0,0,0,0,0,0,0,1].  speed limit, SPEEDLIMIT ϵ {0, 24, 25, ..., 119, 120, C. Neural Network Topology 121, 999}, x3; Accident forecasting from the point of view of classical  type of road, RELTOJUNCT ϵ {0, 1, 2, 3, 4, 5, 9}, machine learning refers to the problem of multiple x4; classifications. Thus, in accordance with predetermined input and output data, the number of input and output neurons is  type of pavement, SURFTYPE ϵ {1, 2, 3, 4, 5, 8, 9}, determined: a vector of 17 values is input to the neural x5; network, and a vector of 8 values is output. The neural  the number of lanes, RDLANES ϵ {1, 2, 3, 4, 5, 6, 7, network is based on the Rumelhart perceptron with 1 hidden 9}, x6; layer.  type of dividing strip on the right, LINERIGHT ϵ {0, The topology of the used artificial neural network is 1, 2, 3, 4, 5, 9}, x7; shown in Fig. 1.  type of dividing strip on the left, LINELEFT ϵ {0, 1, 2, 3, 4, 5, 9}, x8. The data on the weather includes:  weather condition, WEATHER ϵ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 98, 99}, x9;  road surface condition, SURFCOND ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9, 98, 99}, x10;  lighting condition, LIGHTCOND ϵ {1, 2, 3, 4, 5, 9}, x11. Date and time data are described as the following: Fig. 1. Neural network topology.  time, CRASHTIME ϵ Time, x12;  day of the week, DAYOFWEEK ϵ {1, 2, 3, 4, 5, 6, The number of hidden layer neurons is determined by the 7}, x13; rule of the geometric pyramid:  month, CRASHMONTH ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9,  k=√nm,  10, 11, 12}, x14. where k is the number of neurons in the hidden layer; n is the number of neurons in the input layer; m is the number of Vehicle and event detected data includes: neurons in the output layer.  type of vehicle, BODYCAT ϵ {1, 2, 3, 4, 5, 6, 7, 8, 9, Thus, the number of neurons in the hidden layer is 12. 10, 11, 99}, x15; VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 24 Data Science IV. SOFTWARE IMPLEMENTATION The profiling subsystem does not have a graphical The proposed neural network approach for accident interface for the end-user, so it provides an API for external prediction is implemented in the form of a software profiling subsystems connections. subsystem designed to function as part of an accident A graphical interface has been separately developed prevention system (Fig. 2). Data streams entering the using the API, to train and configure the neural network, to profiling subsystem are logically combined into data sources. force the start of forecasting for the indicated data sets, and The profiling subsystem is implemented in Python in the to view notifications of possible accidents in the situation PyCharm environment; the TensorFlow framework and the center mode. Keras library are used to implement neural networks. Fig. 2. Data streams. Fig. 5. Graph of the number of erroneous classifications from the era of The resulting traffic accident analytics is stored in the training. The final era of 300. database. For access and data management, the psycopg2 library and the PostgreSQL, which provides spatial-temporal V. RESULTS binding of accidents, were used. The incoming For training the neural network, reliable data of a special accompanying data on the road, weather, vehicles are also format were used, which are freely available on the logged in relation to an accident (Fig. 3). data.gov.uk server under the OGL (Open Government License) license [21]. To evaluate the results, we used a graphical interpretation of the results and the metric roc_auc_score of the sklearn package. To improve the accuracy of training, thinning of 20% was used. The best results were shown by the number of training examples for one training of 7. Figures 4 and 5 show graphs of the dependence of accuracy and error indicators for each epoch with a final 300 epoch when using 7 examples for training at a time. Starting from the 200th era, indicators remain approximately in the same aisles, while the metric roc_auc_score shows a result of 0.90. Fig. 3. Data model. Fig. 6. The graph of the number of correct classifications of the neural network from the era of training. The final era of 160. Fig. 4. The graph of the number of correct classifications of the neural network from the era of training. The final era of 300. In order to prevent the retraining of the neural network, the number of epochs was reduced. The graphs in figures 6 and 7 show the performance of learning in 160 eras; the VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 25 Data Science metric roc_auc_score shows the result of 0.91. Thus, with REFERENCES the same learning parameters and network topology, 160 [1] State traffic inspectorate [Online]. URL: http://stat.gibdd.ru. epochs are effective for learning. [2] V.I. Mayorov, “Risk management in the system of road safety,” Bulletin of the Ural Institute of Economics, Management and Law, vol. 3, no. 44, pp. 8-12, 2018. [3] D.V. Sokolov and E.V. Koluzakova, “Actual problems of ensuring road safety in the Russian Federation,” Science. Thought, vol. 5-2, 2016. [4] A.P. Tarko, “Surrogate measures of safety Safe Mobility,” Challenges, Methodology and Solutions, pp. 383-405, 2018. [5] Russian Federation Law No. 196-FZ “On Road Traffic Safety,” 2019. [6] U. Khalil, A. Nasir, S. M. Khan, T. Javid, S. A. Raza and A. Siddiqui, “Automatic road accident detection using ultrasonic sensor,” Int. Multi-Topic Conf. (INMIC), pp. 206-212, 2018. [7] M. Zheng, “Traffic accident’s severity prediction: A deep-learning approach-based CNN network,” IEEE Access, vol. 7, pp. 39897- 39910, 2019. [8] W. Tai, H. Wang, C. Chiang, C. Chien, K. Lai and T. Huang, “RTAIS: Fig. 7. Graph of the number of erroneous classifications from the era of road traffic accident information system,” Int. Conf. on High training. The final era of 160. Performance Computing and Communications, Exeter, pp. 1393-1397, 2018. Since forecasting is carried out taking into account the [9] M. Lamr, “Big Data and its usage in systems of early warning of indicators of events recognition, according to the results given traffic accident risks,” in Int. Conf. on Enterprise Systems (ES), pp. above, it is possible to estimate the time period when solving 154-157, 2018. the forecasting problem. In a stationary mode, the situation on [10] Y. Pan, L. Zhang, X. Wu, K. Zhang and M.J. Skibniewski, “Structural the road is constantly monitored and indicators that change health monitoring and assessment using wavelet packet energy spectrum,” Saf. Sci., vol. 120, pp. 652-665, 2019. every second are taken into account, which leads to a maximum value of the time interval of one minute. However, [11] O. Dehzangi, V. Sahu, V. Rajendra and M. Taherisadr, “GSR-based distracted driving identification using discrete & continuous due to the fact that relatively constant indicators are taken decomposition and wavelet packet transform,” Smart Health, vol. 14, into account, such as weather conditions and road network 100085, 2019. conditions, the forecast results can be useful within an hour [12] N.T.S.A. Wadhahi, S.M. Hussain, K.M. Yosof, S.A. Hussain and A.V. from the moment they are calculated, for example, in the Singh, “Accidents detection and prevention system to reduce traffic form of indicative data for drivers. hazards using IR sensors,” Int. Conf. on Reliability, Infocom Technologies and Optimization (ICRITO), pp. 737-741, 2018. Therefore, we can conclude that the results obtained make [13] O.K. Golovnin, “Data-driven profiling of traffic flow with varying it possible to notify drivers or emergency services in advance road conditions,” CEUR Workshop Proceedings, vol. 2416, pp. 149- about the dangerous conditions on the road. The simultaneous 157, 2019. analysis of multiple parameters using the proposed approach [14] A.A. Agafonov, A.S. Yumaganov and V.V. Myasnikov, “Big Data let consider almost all the factors affecting road safety. analysis in a geoinformatic problem of short-term traffic flow forecasting based on a K nearest neighbors method,” Computer Optics, vol. 42, no. 6, pp. 1101-1111, 2018. DOI: 10.18287/2412-6179-2018- VI. CONCLUSION 42-6-1101-1111. Improving road safety with the latest achievements of [15] Y. Ki, J. Kim, T. Kim, N. Heo, J. Choi and J. Jeong, “Method for science and technology is an obvious way for a developed automatic detection of traffic incidents using neural networks and society to reduce the number of incidents and accidents. traffic data,” Annual Information Technology, Electronics and Mobile Communication Conf. (IEMCON), pp. 184-188, 2018. Intelligent transport systems, systems for the Smart Cities, [16] T.I. Mikheeva, A.A. Osmushin, S.V. Mikheev and O.K. Golovnin, advanced technical means of ensuring passive and active “GIS-based models for transport network emergency management,” safety are constantly being improved. The introduction of Journal of Physics: Conference Series, vol. 1353, no. 1, 012009, 2019. technologies for processing Big Data and machine learning [17] M.F. Labib, A.S. Rifat, M.M. Hossain, A.K. Das and F. Nawrine, seems effective in many areas of the transport industry, “Road accident analysis and prediction of accident severity by using including predicting the possibility of an accident. machine learning in bangladesh,” Int. Conf. on Smart Computing & Communications (ICSCC), pp. 1-5, 2019. The approach presented in this work increases the [18] J. Zhang, J. Wang and S. Fang, “Prediction of urban expressway total accuracy of determining the possibility of an accident by traffic accident duration based on multiple linear regression and analyzing classes of parameters covering such important artificial neural network,” Int. Conf. on Transportation Information factors as weather conditions, conditions of the road and and Safety (ICTIS), pp. 503-510, 2019. control devices, seasonal traffic fluctuations, traffic flow, and [19] O.K. Golovnin and A.A. Stolbova, “Wavelet analysis as a tool for individual vehicle speed. Classification by accident type studying the road traffic characteristics in the context of intelligent transport systems with incomplete data,” SPIIRAS Proceedings, vol. provides the most effective measures aimed at preventing a 18, no. 2, pp. 326-353, 2019. specific type of accident. The experiments carried out [20] O.A. Samonina, “Methods and problems of training a multilayer identified the most effective parameters of the neural network neural network used to assess the characteristics of the designed and achieved the accuracy of situations recognition in 90%. information systems,” Bulletin of the St. Petersburg University of The software implementation of the proposed approach in Railway Engineering, vol. 4, pp. 148-159 , 2008. integration with accident prevention systems can achieve a [21] Road Safety Data [Online]. URL: https://data.gov.uk/dataset/road- reduction in accident rate, reduce the severity of the accidents-safety-data. consequences of an accident, and increase the awareness of traffic participants. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 26