A dataset for determining user preferences of users on personal vehicles Aleksandr Borodinov Vladislav Myasnikov Alexander Yumaganov Geoinformatics and Information Geoinformatics and Information Geoinformatics and Information Security Department, Security Department, Security Department, Samara National Research University, Samara National Research University, Samara National Research University, Samara, Russia Samara, Russia Samara, Russia aaborodinov@yandex.ru vmyas@geosamara.ru yumagan@gmail.com Abstract—The paper considers the problem of matching GPS departure point to the destination point as a trip. In total, users tracks to a road network. We presented a map-matching recorded 489 tracks. 338 tracks were recorded on weekdays algorithm based on dynamic programming. We collected the and 151 tracks were recorded on weekends. The generalized tracks of movement around the city of several users on personal characteristics of the obtained data for all recorded tracks are vehicles with various trip types to test the proposed algorithm. presented in table 1. The data collected after matching to the road network can be used to further identify user preferences and to build a transport TABLE I. GENERALIZED CHARACTERISTICS OF THE DATA recommender system. Data Characteristic Trip distance Trip time Keywords—GPS Trajectory, map matching, road network. Total value 4523 km 183 h 54 min 20 s Mean 9249 m 22 min 33 s I. INTRODUCTION Median value 5783 m 16 min 35 s Maximum value 74405 m 2 ч 18 min 50 s The area of recommendation systems has increased Minimum value 1264 m 2 min 13 s significantly over the past few years. Advertising on online resources offers users various products [1,2], based on the Users recorded trips using personal smartphones with purchase history and viewing products in online stores. Android and iOS operating systems. Such a recording method Streaming services select films and compose playlists of is close to the real scenario of using navigation systems, in musical compositions for each individual user [3]. Research which the user receives a route or information about the load teams and large IT companies compiled data sets for each field on the transport network through his smartphone. In this case, of application to compare the various machine learning the navigation application on the smartphone can record data methods used in recommender systems. Transport navigation on the user's movement during interaction with the application. systems are one of the new areas of application of recommendation systems [4,5]. However, generally accepted All recorded tracks on google maps are shown in figures 1 machine learning methods and datasets for such systems do not and 2. The recorded tracks cover a significant part of the city's yet exist. At the moment, researchers are trying to use publicly road network. Figure 2 left shows a user who, for six months, available data about user trips, such as OpenStreetMap, Strava used the same route to and from work, and in Figure 2 right, or data on taxi driver trips [6,7]. The main disadvantage of such the user used various routes to travel, depending on the data is the inability to divide the available tracks by users to congestion of the road network and weather conditions. identify their preferences in choosing a route. Another drawback is the lack of information about the type and purpose III. ALGORITHM FOR BUILDING A TRACK USING GPS POINTS of the trip. In the case when the trip is working or a navigation A. Input data system has been used with the definition of the shortest path, the user preferences received will be unreliable. Let { x i , t i } i  0 , I  1 - the data recorded during the trip, where The second section presents data on the collected tracks of xi  ( xi , yi , zi ) are the GPS coordinates of the trip, t i is the user trips by personal vehicles. The second section presents recording time of the i-th route coordinate. We take t 0  0 for data on the collected tracks of user trips by personal transport. the recording start time. The algorithm for linking tracks to the road network and the experimental results are described in the third section. At the We describe the road network as a directed graph end of the work, we presented a conclusion and possible G  (V , W ) , where V - is the set of vertices of the graph, and directions for further work and research. W - is the set of edges of the graph connecting the vertices of II. DATA COLLECTION V . Vertices have coordinates x v  ( x v , y v , z v ) and the traffic We collected data in Samara in a large city with a 0, la c k , population of about a million for 6 months from June to light presence S ( v )   . We describe the edge as  1, presence December 2019. Nine people of different sex, age, marital status and income, who are employees of Samara University,   , if th e r e is n o w a y fr o m v 1 to v 2 , w w v1, v 2   w , where l - edge   w w w w recorded the tracks of their trips. Users recorded work trips (a  ( l m ax ; h ; X ; c ) , o th e r w is e trip from home to work and from work to home) in an amount length w ,  mw a x - maximum permissible speed on w , X w - set of at least 25 tracks and personal trips (all other trips) in an amount of at least 25 tracks. We define the route from the of points defining an edge w , road network ring code Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Image Processing and Earth Remote Sensing  0 , n o t in th e r in g r o a d , The maximum permissible speed in the graph c w   . Edge type h w can take the  m a x  m a x  m a x and current average speeds  a v r 0 w w  r in g r o a d c o d e , o th e r w is e for each edge. w W following values: B. Algorithm parameters  0 - 1 la n e  1 - 2 la n e s Let  m in - minimum matching distance (10 m);  m a x -   2 - 3 la n e s maximum matching distance (50 m);  - time increase factor;  h   3 -  3 la n e s w ith o u t a c e n tr a l d iv id in g s tr ip . w  - increasing the field of view step (0,2 m); K - the number  4 -  2 la n e s w ith a c e n tr a l d iv id in g s tr ip of points in the group (3-5). Custom parameter :  1  5 -  4 la n e s w ith a c e n t r a l d iv id in g s tr ip ,    2    20 . 2 ,   ( C o n tr o lle d  a c c e s s h i g h w a y ) Fig. 1. Collected tracks on a large-scale map (left) and on a small-scale map (right). C. The result of the algorithm w i  a rg m in  ( x i , w ), w W The result of the algorithm is the set { x i , t i , w i , ri ,  i } i  0 , I  1 , x i  a rg m in  ( x , x i ). consisting of an adjusted sequence of points { x i , t i } i  0 , I  1 , for x wi any of which an edge in the road network is indicated w i that Step 2. Consistently look at all the points by K pieces. For K  3 : x i  1 , x i , x i  1 . If all x i  k  w i &  ( x i  k , x i  k )   m in , then wi xi  X and the outliers indicator ri that  1, i - th p o in t is n o t a n o u tlie r , write the points to the result x i  k : x i  k , w i  k : w i  k , ri  k : 1 . ri   , and the estimated  0 , o th e r w is e Then we select and consider all such sequences of points, speed at the point is  i . find the minimum and maximum. In the case when the sequence is violated in time and position, we use the algorithm D. Algorithm for linking points to a specific path described below. Step 1. For each point x i , t i we find the nearest edge w and match the point onto the edge as follows (  Euclidean): Fig. 2. An example of two user preferences in choosing a route to work. VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 119 Image Processing and Earth Remote Sensing After performing step 2, we get the matched sections of the Suppose we have points { x i } iI 01 ( i1  i 0  1  I ) and they path with gaps as shown in Figure 3. The blue color represents the attached points to the corresponding edges of the graph of must be matched on the path p  { v n , v n 1 , ..., v K  1 } , where v n 0 the road network. has coordinates, v n and v n connected by an edge. We j j 1 Next, we consider some arbitrary fragment from i 0 to i1 , discretize the possible positions of points p (wv )  { v n , v n 1 , ..., v K  1 } . For car discretization   2 м. i.e. points { x i , x i  1 , ..., x i } . 0 0 1 n j v n j 1 0 Let the total number of positions N , moreover p ( 0 ) ~ v n , p ( N  1) ~ v n . We calculate I arrays of 0 K 1 characteristics of the proximity of a point xi to p how 2  i (n )  ex p ( xi  p (n ) ) . The task is to find the sequence I 1 n ( i ) i  0 , I  1 :   i ( n ( i ))  m a x , where n ( i )  n ( i  1) . i0 The main recurrence ratio (for the dynamic programming algorithm):      i ( n ( il ) )    Fig. 3. The matched sections of the path with gaps. I 1  il  1  m a x   i ( n (i))  m ax  m a x   i ( n (i))   , n (i) n ( il )  n ( il  1 ), N n ( i )  n ( il ) i0  i0  Step 3. We determine the time interval for each point from  I 1  i 0  i1 to the extreme and determine the appearance physical  m a x   i ( n (i))  n ( i )  n ( il )  i  il  1   ( xi , xi )  ( xi , xi ) possibility of this point. If   m a x or   m ax 0 1 j ti  ti ti  ti Denote  j ( n )  m a x   i ( n ( i )) ,  i ( n ) - similarity,  i ( n ) 0 1 n ( i ): i  j i0 then the point is not taken into account and is further - max integral similarity,  i ( n ) - point position list. considered an outlier ri : 0 . Step 4. We define a subgraph from point i 0 to i1 . We define The result is contained in  0 ( 0 ) and  0 ( 0 ) . the shortest path for this i 0  i1 [8,9]. After that, we find a Algorithm (start from the end): point x in the center of the shortest path and build a circle for i  I  1, 0 with a radius R  (1   )  m a x (  ( x i , x i ),  ( x i , x i )) . In the 0 1 for n  N  1, 0 subgraph we include all the vertices that fall into this circle and the corresponding edges. if ( i   I  1 ) Step 5. We find all the paths without loops in the resulting if ( n   N  1) subgraph between i 0 and i1 . Denote this set Pi , i , where 0 1  p  Pi , i : p  ( w i , i * ; w i * ,... ; ...; w ..., i ) .  i ( N  1)   i ( N  1) 0 1 0 1 lis t  n e w L is t For each path p  Pi , i we apply the developed algorithm 0 1 lis t . a d d ( N  1) for matching points to a specific path based on dynamic  i ( N  1)  lis t programming. Dynamic programming is often used to solve the map matching problem, which is confirmed by a large else number of works [10-12]. if  i ( n )   i ( n  1) E. Algorithm for matching points to a specific path Further, to simplify the presentation (but without loss of lis t  n e w L is t generality), we consider i 0  0 and i1  I  1 . As a criterion for lis t . a d d ( n ) the quality of matching, we use the following:  i ( n )  lis t I 1 2  i (n)   i (n) J p   ex p (  xi  xi p ). i0 VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 120 Image Processing and Earth Remote Sensing else algorithm for matching GPS travel tracks to a road network. The results of the algorithm are demonstrated on the city's road  i ( n )   i ( n  1) network. The presented data set of matched tracks to the road  i ( n )   i ( n  1) network can be used in the transport recommendation system development to obtain a profile of individual user preferences else / / ( i   I  1 ) as a further area of research. if ( n   N  1) ACKNOWLEDGMENT The authors would like to thank Geoinformatics and  i ( N  1)   i ( N  1)   i  1 ( N  1) Information Security Department, Samara National Research  i ( N  1)   i  1 ( N  1) University for taking part in a survey on data collection.  i ( N  1). a d d ( N  1) REFERENCES else [1] T. Joachims, “Optimizing search engines using clickthrough data,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, 2002. if  i ( n )   i  1 ( n )   i ( n  1) [2] X. He, “Practical lessons from predicting clicks on ads at Facebook,” lis t  n e w L is t Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014. DOI:  i ( n )  lis t 10.1145/2648584.2648589. lis t  c o p y (  i  1 ( n ) ) [3] Y. Koren, R. Bell and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems,” Computer, vol. 42, no. 8, pp. 30-37, 2009. lis t . a d d ( n ) DOI: 10.1109/MC.2009.263.  i ( n )   i ( n )   i 1 ( n ) [4] P. Campigotto, C. Rudloff, M. Leodolter and D. Bauer, “Personalized and Situation-Aware Multimodal Route Recommendations: The else FAVOUR Algorithm,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 1, pp. 92-102, 2017. DOI: 10.1109/ TITS.2016.2565643.  i ( n )   i ( n  1) [5] V. V. Myasnikov, “Reconstruction of functions and digital images using  i ( n )   i ( n  1) sign representations,” Computer Optics, vol. 43, no. 6, pp. 1041-1052, 2019. DOI: 10.18287/2412-6179-2019-43-6-1041-1052. The result of the algorithm for matching the track to the [6] X. Huang, “Grab-posisi: An extensive real-life GPS trajectory dataset in road network is shown in Figure 4. The purple line shows the southeast Asia,” Proceedings of the 3rd ACM SIGSPATIAL GPS coordinates of the track, the green line shows the International Workshop on Prediction of Human Mobility, PredictGIS, pp. 1-10, 2019. DOI: 10.1145/3356995.3364536. matching to the road network. [7] J. Lian and L. Zhang, “One-month Beijing taxi GPS trajectory dataset with taxi IDS and vehicle status,” DATA - Proceedings of the 1st Workshop on Data Acquisition To Analysis, Part of SenSys, pp. 3-4, 2018. DOI: 10.1145/3277868.3277870. [8] A.A. Agafonov, A.S. Yumaganov and V.V. Myasnikov, “Big data analysis in a geoinformatic problem of short-term traffic flow forecasting based on a K nearest neighbors method,” Computer Optics, vol. 42, no. 6, pp. 1101-1111, 2018. DOI: 10.18287/2412-6179-2018- 42-6-1101-1111. [9] A.A. Agafonov and V.V. Myasnikov, “Numerical route reservation method in the geoinformatic task of autonomous vehicle routing,” Computer Optics, vol. 42, no. 5, pp. 912-920, 2018. DOI: 10.18287/ 2412-6179-2018-42-5-912-920. [10] T. Yokota, M. Okude, T. Sakamoto and R. Kitahara, “Fast and robust map-matching algorithm based on a global measure and dynamic programming for sparse probe data,” IET Intelligent Transport Systems, vol. 13, no. 11, pp. 1613-1623, 2019. DOI: 10.1049/iet-its.2019.0178. [11] B.Y. Chen, H. Yuan, Q. Li, W.H.K. Lam, S.-L. Shaw and K. Yan, “Map-matching algorithm for large-scale low-frequency floating car data,” International Journal of Geographical Information Science, vol. 28, no. 1, pp. 22-38, 2014. DOI: 10.1080/13658816.2013.816427. [12] Y. Li, Q. Huang, M. Kerber, L. Zhang and L. Guibas, “Large-scale joint Fig. 4. Track matched to the road network (green) and track with raw GPS map matching of GPS traces,” GIS: Proceedings of the ACM coordinates (purple). International Symposium on Advances in Geographic Information Systems, pp. 214-223,, 2013. DOI: 10.1145/2525314.2525333. IV. CONCLUSION In our work, we presented a dataset containing tracks of user trips by personal vehicles. The paper also presents an VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 121