MONITORING ROAD SURFACE CONDITIONS WITH CYCLIST’S SMARTPHONE SENSORS Budi Darma Setiawan1,2, Victor V. Kryssanov1, Uwe Serdült1,3 1 Graduate School of Information Science and Engineering, Ritsumeikan University, Japan 2 Faculty of Computer Science, Universitas Brawijaya, Indonesia, s.budidarma@ub.ac.id 3 Center for Democracy Studies Aarau (ZDA), University of Zurich, Switzerland ABSTRACT Road networks form one of the most important infrastructures in modern cities, while road conditions determine the very possibility and quality of land transportation. It is therefore important to monitor and manage road networks properly. The vast area that should be monitored and managed makes this task both expensive and time- consuming. Recently, an approach to involve road users, such as car drivers, pedestrians, and cyclists, to participate in monitoring road conditions has emerged. Monitoring roads using bicycles has an advantage, compared to using a car, since it allows for reaching narrow roads. This paper presents results of a preliminary study of using a bicycle for detecting road surface defects including potholes, and bumps. Data collected with a cyclist’s smartphone sensors was used to train artificial neural networks in different configurations. The trained networks were then used to detect road surface defects. Results obtained in the experiments indicate that for the accelerometer data, a convolutional neural network provides for the best average accuracy in classifying road surface conditions. Also, this and a long short term memory network produce better results than a standard deep neural network. Keywords: Road condition monitoring, smartphone applications, artificial neural networks. 1. INTRODUCTION The very possibility of land transportation depends on road conditions as a poor quality of the road would fatally disrupt the traffic flow. It is, therefore, important to maintain roads thoroughly and on a regular basis. Due to the enormous size of road networks in modern cities, monitoring road conditions is a time-consuming and expensive task. One approach to cope with this problem is to get road users, such as pedestrians, car drivers, and cyclists, involved in the monitoring process. Recently, several smartphone applications have been developed to monitor road conditions (Allouch et al., 2017; Li and Goldberg, 2018; Mednis et al., 2011; Varona et al., 2019). The focus of the reported studies was on using smartphones placed inside a car to automatically detect road surface conditions while the car is driven. Naturally, however, a car can only be used for monitoring sufficiently wide roads. The focus of the presented work is on using a bicycle for the same purpose. Smartphones are convenient to use for road monitoring, as they are equipped with GPS sensors for tracking locations of road defects. There are also several movement sensors, such as accelerometer, gyroscope, and magnetometer. The idea underlying this study is that when the smartphone is carried by the cyclist through a pothole or defect on the road, the sensors register vibrations, and the data would be used to detect the road surface defect. As not all vibrations are caused by road defects, human-made structures, such as speed bumps, would wrongly be detected as defects. It is, therefore, important to develop a method allowing for reliable classification of road surface conditions that would ignore road structures not requiring maintenance. Potholes and bumps can be recognized by analyzing patterns of a signal generated by accelerometer or gyroscope sensors. There have been studies reported that used machine learning techniques, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and Long Short Term Memory (LSTM), to achieve this goal (Hur et al., 2018; Hussain et al., 2019; Lee et al., 2017). This presented study attempts to find the most efficient method when dealing with signals registered with cyclist smartphones. 2. RELATED WORK Results of a study on road condition monitoring using bicycles have recently been reported (Werner, 2018). The author evaluated the capability of smartphone sensors to register vibrations generated when riding a bicycle, and used vibration data collected to evaluate the quality of bicycle tracks. The assessment was done by calculating the Dynamic Comfort Index (DCI) that reflects the comfort of riding a bicycle. Three types of smartphone sensors Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). IICST2020: 5th International Workshop on Innovations in Information and Communication Science and Technology, Malang, Indonesia Monitoring Road Surface Conditions with Cyclist’s Smartphone Sensors were used in the study: accelerometer, linear accelerometer, and GPS receiver. While a detailed analysis of various smartphone vibrations affecting DCI values has been made, the study mainly focused on the rider’s comfort rather than on the road surface conditions. Li and Goldberg (2018) conducted research on using smartphone accelerometers to evaluate the overall condition of motor-vehicle roads. The study, however, did not attempt to differentiate road defects by type. Varona et al. (2019) tried to combine previous studies on road surface monitoring and road surface material as well as pothole and bump detection. The author also used smartphones installed inside a car. Several machine learning techniques, namely convolutional neural networks, long short-term memory, and Reservoir Computing (RC) were used in the study. The authors claimed that the use of machine learning can improve the real-world scenario of detecting potholes and man-made structures (e.g. speed bumps). Input features for the developed system include accelerometer coordinate values (x, y, and z) and differences of the values notated as diffX, diffY, and diffZ. The same features are used in the presented study, together with CNN and LSTM. Accelerometer data is frequently used for road condition monitoring, whether it comes from a smartphone, an accelerometer sensor embedded into a micro-computer, such as Raspberry Pi, or from an accelerometer installed in a vehicle (Allouch et al., 2017; Devekar et al., 2018; Park et al., 2018). Other sensors, such as gyroscope, steering angle, and wheel speed sensors were also used in the related studies. Installing and configuring steering angle and wheel speed sensors in the case of a bicycle is a complex task, yet not all bicycles would support such installations. Hence the presented study only deals with sensors generally available in a smartphone, i.e. with accelerometers and gyroscopes. 3. DATA 3.1 Data Collection An Android application has been developed for data collection. The application records sequential values read from the accelerometer and gyroscope sensors, but also locations from the GPS. The recording frequency is 50 Hz. Figure 1(a) presents the application interface. The accelerometer is used since it measures the acceleration applied to the phone, while the gyroscope measures the orientation change of the phone that might be experienced by the phone when a cyclist rides a bicycle through holes and bumps. The developed application has been installed on a smartphone, and the phone was put in a cyclist's shirt pocket so that it could record vibrations while the cyclist was riding a bicycle. The data captured with the application was saved in CSV-formatted files. Data was collected while driving on two road lanes as shown in Figure 1(b). The paths included several speedbumps and road defects that were used for training and testing machine learning algorithms. In Figure 1(b), the white line indicates the path used to collect data for the training purposes, and the red line indicates the path used for testing. 3.2 Data Preprocessing The collected data was preprocessed by slicing it into several chunks. A sliding window was used to create the chunks. For the window size, Mednis et al. (2011) got a maximal true positive at 20 samples on the 100Hz sampling rate and 0.2 seconds window size, while Varona et al. (2019) used 85 samples on the 50Hz sampling rate and 1.8 seconds window length. In the presented study, the window length was set at 25 samples on the 50 Hz sampling rate and 0.5 seconds on window length. This was done, based on assumptions that for a bicycle average speed, the time required to drive through an obstacle (pothole or bump) is approximately 0.5 seconds. The window is shifted by 10 values to capture the next chunk. The process is repeated until all the data is arranged into chunks. In the preprocessing step, every chunk got four features: original accelerometer data defined as 𝑎𝑎𝑎𝑎𝑎𝑎 ∈ {𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎}, original gyroscope data defined as 𝑔𝑔𝑔𝑔𝑔𝑔 ∈ {𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔, 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔, 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔}, numerical differential of the accelerometer data defined as 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ∈ {𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑}, and numerical differential of the gyroscope data defined as 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ∈ {𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑}. All x, y, and z notations refer to the three-dimentional space coordinates. Two features, 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 and 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑, which are numerical differentials of the original accelerometer and gyroscope data vectors, respectively, were obtained using equations (1) and (2). In this case, i is the sequential value number in a chunk, “acc” signifies accelerometer data, and “gyr” is for gyroscope data. 77 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Setiawan B.D., Kryssanov V.V., Serdült U. (a) (b) Fig 1. (a) User interface of the developed Android sensor reader application; (b) Paths used to collect the data (Map data: Google, CNES/Airbus, Maxar Technologies) diffAccX (i) = accX(i+1)-accX(i) diffAccY (i) = accY(i+1)-accY(i) (1) diffAccZ (i) = accZ(i+1)-accZ(i) diffGyrX(i) = gyrX(i+1)-gyrX(i) diffGyrY(i) = gyrY(i+1)-gyrY(i) (2) diffGyrZ(i) = gyrZ(i+1)-gyrZ(i) Each chunk preprocessed as described was labeled manually by assigning one of the three labels: “normal”, “pothole”, or “bump”. For training and evaluation purposes, 359 chunks were selected randomly, 309 chunks were used for training (125 chunks labeled as “normal”, 92 as “pothole”, and 92 as “bump”), and the remaining 50 chunks were used for validation. 4. METHOD Three types of artificial neural networks were tested: DNN, CNN, and LSTM. In each case, the input is a chunk of one of the following: acc, gyr, diffAcc, or diffGyr. As each chunk is a 3-dimensional vector (in x, y, and z coordinates), the arrangement of the data is different for each neural network architecture. 4.1 Deep Neural Network The DNN implemented in this study has a simple neural network architecture with 4 hidden layers, each of which is a fully connected layer of 150 hidden units (artificial neurons). A drop-out layer for each of the hidden layers was implemented to prevent the network from overfitting (Srivastava et al., 2014). The drop-out probability was set at 0.2. For the input, each chunk is rearranged sequentially in the order of the 3 coordinates (x, y, and z), so the input length of acc and gyr becomes 75, formed by 3 x 25 coordinate values from the chunk. Since diffAcc and diffGyr are calculated by subtracting the ith values from the (i-1)th values, the length of the corresponding chunks is 72 (formed by 3 x 24 differential coordinate values). For the output layer, the DNN used has 3 output units representing the road surface condition classes (“normal”, “pothole”, and “bump”). This architecture uses the categorical cross-entropy loss function and the Adam optimizer (learning rate = 0.001, beta1 = 0.9, and beta2 = 0.999). The activation function implemented for each hidden unit is rectified linear, while softmax is implemented for the output unit. 78 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Monitoring Road Surface Conditions with Cyclist’s Smartphone Sensors 4.2 Convolutional Neural Network In the implemented CNN, input sequences of chunks are rearranged into a two-dimensional array. First, the x, y, and z sequences are arranged in a one-dimensional sequential array in the order of x, y, and z, respectively. Next, the obtained array is reshaped into a two-dimensional array. For example, the acc and gyr features, each is represented as a one-dimensional array of the length of 75, then re-represented in a two-dimensional array of 5 x 15. Likewise, the diffAcc and diffGyr features are arranged in a one-dimensional array of the length of 72 each and then re-represented as a two-dimensional array of the 8 x 9 size each. Two convolutional layers are used, equipped with maxpooling of the 2 x 2 size. The first convolutional layer uses 56 kernels (3 x 3 size), and the second layer uses 128 kernels (also 3 x 3). There is also a flattening fully connected layer of 150 hidden units. Finally, there is one output layer with 3 output units representing the output classes. Figure 2 depicts the architecture of the CNN used. The implemented CNN also uses categorical cross entropy as the loss function, and Adam as the optimizer with learning rate = 0.001, beta1 = 0.9, and beta2 = 0.999. Each convolutional layer uses rectified linear as the activation function, and each output unit uses softmax. Fig 2. CNN Architecture 4.3 Long Short Term Memory LSTM is often used to analyze sequential data, and to make time-series predictions (Hussain et al., 2019). Since the recorded sensor data is time series, an attempt was made to classify the sequence of accelerometer data using an LSTM, as originally proposed in Varona et al. (2019). This proposed study uses two LSTM layers, each consisting of 128 hidden units. The input is defined as one chunk of a length of 75 or 72 values, which is a sequential arrangement of x, y, and z values. One value of the chunk is processed at one time-step. The output layer is a dens layer which has 3 units corresponding to the class labels. The LSTM architecture is shown in Figure 3. The LSTM is configured, using the categorical cross-entropy loss function and the Adam optimizer (learning rate = 0.001, beta1 = 0.9, and beta2 = 0.999). 5. RESULTS Each of the three artificial neural networks was trained and tested five times, assessing the average and the best accuracy achieved. Figure 4 presents results obtained for the four different features. The maximum accuracy of 93.88% was obtained when using the CNN for the diffACC and diffGyr features, and the LSTM for the acc and diffAcc features. The best average accuracy of 91.84% was registered when the CNN was used with the diffAcc data. 79 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Setiawan B.D., Kryssanov V.V., Serdült U. Fig 3. LSTM Architecture As it can be seen from Figure 4, the classification using the gyroscope data always gets worse results (average accuracy) and stability (assessed with the variance) compared to when using the accelerometer data. It shows that for this case of classification, accelerometer is a better feature compared to the gyroscope. This could be caused by the fact that the gyroscope is more sensitive to orientation change than to acceleration, and vibration patterns due to driving through potholes and bumps reflect not orientation but acceleration changes. The experimental results also showed that the training process is more stable when using the CNN and the diffAcc data. It can be seen in Figure 4 that by using the CNN and the diffAcc data, the maximum and minimum accuracies are not very different. It can also be seen that the CNN and the LSTM produced better results than the DNN, except for when using the original gyroscope data (gyr). The corresponding confusion matrices are given in Figure 5 that shows only the matrices for the four combinations with the best maximum accuracy (CNN for the diffAcc data, CNN for diffGyr, LSTM for acc, and LSTM for diffAcc). As one can see, most of the false-negatives are associated with classifying potholes (for example, in matrix (a), 25% of the potholes cases were recognized as bumps). The “normal” road surface classified correctly using the CNN and diffAcc, whereas the bumps are best recognized when using the LSTM with both acc and diffAcc. 100.00% 97.00% 94.00% 91.00% 88.00% 85.00% 82.00% 79.00% 76.00% 73.00% 70.00% acc gyr diffAcc diffGyr acc gyr diffAcc diffGyr acc gyr diffAcc diffGyr DNN CNN LSTM Fig 4. Accuracy comparison for the DNN, CNN, and LSTM 80 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Monitoring Road Surface Conditions with Cyclist’s Smartphone Sensors c0 c1 c2 c0 c1 c2 c0 c1 c2 c0 c1 c2 c0 1 0 0 c0 1 0 0 c0 1 0 0 c0 .95 .05 0 c1 0 .75 .25 c1 .08 .75 .17 c1 .25 .75 0 c1 .17 .83 0 c2 0 0 1 c2 0 0 1 c2 0 0 1 c2 0 0 1 (a) (b) (c) (d) c0: class “normal” c1: class “pothole” c2: class “bump” Fig 5. Confusion matrices of (a) CNN for the diffAcc, (b) CNN for the diffGyr, (c) LSTM for the acc, and (d) LSTM for the diffAcc Fig 6. Real-world test results (Map data: Google, CNES/Airbus, Maxar Technologies) The best scenario (the CNN for the diffAcc data) was used in a real-world test. The red line in Figure 1(b) shows the test path, and test results are presented in Figure 6, where red spots indicate detected potholes and blue spots indicate detected bumps. Not all potholes and bumps shown in Figure 6 were recognized as expected. Some recognitions were due to the rough surface of the road. It happened since rough surface roads could be considered as roads with multiple potholes and bumps. Therefore, instead of detecting only potholes and speed bumps, one should also detect rough surface roads. In Figure 6, the dots not always appear in the exact position as it should be. As the smartphone GPS accuracy would not be sufficiently high, the detected road defects sometimes appear as far as 5 meters away from the real locations. 6. CONCLUSIONS It has been confirmed in this study that it is possible to detect potholes and speed bumps using machine learning methods. It was shown that deploying CNN with diffAcc data provides for the best detection accuracy. It was also found that accelerometer data fares better compared to gyroscope data. By placing the smartphone inside a shirt pocket, relatively good results were achieved in spite of the possible driver movements that would affect classification results. While inside a shirt pocket, the phone still could sense vibrations caused by driving through potholes and bumps. Additional work is required to investigate the influence of smartphone orientation and position, and also the rider’s speed on the recognition of road surface defects with smartphone sensors. Different orientation and position of the smartphone would generate different signal pattern. The signal could also be affected by the driver’s movements. 81 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Setiawan B.D., Kryssanov V.V., Serdült U. REFERENCES Allouch, A., Koubaa, A., Abbes, T., and Ammar, A. (2017). RoadSense: Smartphone Application to Estimate Road Conditions Using Accelerometer and Gyroscope. IEEE Sensors Journal, 17(13), 4231–4238. Devekar, N., Damodar, S., Shendkar, P., Mulani, W., and Narde, V. (2018). Pothole Detection System for Monitoring Road & Traffic Conditions using IoT. IJIRST - International Journal for Innovative Research in Science & Technology, 5(7), 34–37. Google map. (n.d). Nojihigashi and Kasayama, Kusatsu, Shiga, Japan. https://www.google.com/maps/place/525- 0072,+Japan/@34.9846575,135.9517519,1646m/data=!3m1!1e3!4m5!3m4!1s0x60016d99c1b8cf85:0xe57 167bf5db00a68!8m2!3d34.9796111!4d135.9503377 (last accessed on December 18, 2019) Hur, T., Bang, J., Huynh-The, T., Lee, J., Kim, J. I., and Lee, S. (2018). Iss2Image: A novel signal-encoding technique for CNN-based human activity recognition. Sensors, 18(11), 1–19. Hussain, G., Jabbar, M.S., Cho, J.D., and Bae, S. (2019). Indoor positioning system: A new approach based on lstm and two stage activity classification. Electronics, 8(4), 1–27. Lee, S.M., Yoon, S.M., and Cho, H. (2017). Human activity recognition from accelerometer data using Convolutional Neural Network. In: Proceeding of International Conference on Big Data and Smart Computing, BigComp, 131–134, IEEE: New York NY. Li, X., and Goldberg, D.W. (2018). Toward a mobile crowdsensing system for road surface assessment. Computers, Environment and Urban Systems, 69, 51–62. Mednis, A., Strazdins, G., Zviedris, R., Kanonirs, G., and Selavo, L. (2011). Real time pothole detection using Android smartphones with accelerometers, In: Proceeding of the International Conference on Distributed Computing in Sensor Systems and Workshops, 1–6, IEEE: New York NY. Park, J., Min, K., Kim, H., Lee, W., Cho, G., and Huh, K. (2018). Road surface classification using a deep ensemble network with sensor feature selection. Sensors, 18(12), 1–16. Srivastava, N., Hinton, G., Krizhevsky, A., and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 1929–1958. Varona, B., Monteserin, A., and Teyseyre, A. (2019). A deep learning approach to automatic road surface monitoring and pothole detection. Personal and Ubiquitous Computing. doi: 10.1007/s00779-019-01234-z Werner, S.P. (2018). A monitoring system for bicycle pavement conditions using cyclists’ smartphones a first look at the capabilities of measuring bicycle vibrations using smartphone sensors, Master's Thesis, Eindhoven University of Technology, Netherlands. 82 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).