BR2: A Travel Behavioral Approach to Personalized Route Recommendation Based on GPS Trajectories Rodrigo Augusto de Oliveira e Silva Ge Cui University of Calgary University of Calgary Calgary, Alberta, Canada Calgary, Alberta, Canada radeoliv@ucalgary.ca cuig@ucalgary.ca Seyyed Mohammadreza Rahimi Xin Wang University of Calgary University of Calgary Calgary, Alberta, Canada Calgary, Alberta, Canada smrahimi@ucalgary.ca xcwang@ucalgary.ca ABSTRACT travel behaviors reflected in historical GPS trajectories are the Various unknown circumstances may affect users’ travel behav- implicit feedback of their preferences, as they do not explicitly iors between two locations on the road network, hence it is rate preferred routes. Since it is laborious to use surveys and complicated to provide satisfactory personalized route recom- questionnaires to acquire the explicit feedback about traveling mendations. In this paper, we believe that users’ travel behaviors preferences from individual users, users’ travel behaviors are are reflected and can be learned from their historical GPS trajec- captured based on implicit feedback obtained from their past tories. The Behavior-based Route Recommendation (BR2) method travels. is proposed to compute personalized routes based exclusively on The consideration of users’ preferences in route recommen- users’ travel preferences. The concepts of appearance and transi- dation can be problematic when suggesting routes to untrav- tion behaviors are used to describe users’ travel behaviors. The eled locations. To alleviate this issue, some studies have been behaviors are extracted from users’ past travels and the missing conducted in personalized route recommendation based on col- behaviors, of unvisited locations, are estimated with the Opti- laborative filtering (CF) methods, e.g. item-based CF and matrix mized Random Walk with Restart technique. Then, the temporal factorization (MF), to acquire more information through users dependency of travel behaviors is considered by constructing a that share similar preferences [3, 4, 8]. The use of CF has been time difference interval histogram. Last, a behavior graph is gen- proven to provide more accurate recommendations compared erated to allow the maximum probability route computation with to the shortest distance route method, but they usually require the shortest path algorithm, resulting in the most likely route quadratic space and considerable time to process [3, 22]. In this to be taken by a user. Experiments conducted on two real GPS context, it is a complex and computational expensive task due to trajectory data sets demonstrate the efficiency and effectiveness the huge amount of data needed to describe user travel behaviors of the proposed method. and the overall sparseness of known data. The rapid and accurate estimation of missing users’ travel behaviors is crucial to support personalized route recommendation systems. 1 INTRODUCTION Time is another essential factor for the effective comprehen- With advances in Global Positioning System (GPS) technology sion of users’ travel behaviors, as users’ behaviors tend to be and the popularity of mobile devices, massive amounts of hu- similar during specific intervals of the day. For instance, a user man movement data in GPS trajectories have been collected and that has a behavior of taking a route between an origin and des- are available for research, which provides an alternative way to tination at noon might have a different behavior at midnight, but explore travel route recommendation. Route recommendation the same behavior is expected to be shown at times close to noon. systems usually suggest routes based on the optimization of cost To accurately represent temporal dependency and to effectively functions of either distance or travelling time. However, it has incorporate it in the recommendation process are challenges that been observed that in the real world the shortest or quickest need to be addressed. routes are often not taken [5]. A personalized route recommenda- In this study, we propose a personalized Behavior-based Route tion system, on the other hand, provides route recommendations Recommendation method, named BR2, to extract users’ travel based on users’ travel preferences [8]. For instance, some drivers behaviors and compute the route with maximum probability, want to reach the destination as fast as possible, while others are which refers to the route that users are most likely to take based willing to take a slightly longer route that does not go through on their previous travels. In BR2, the implicit information of users’ highways or busy roads. The identification of all factors which travel behaviors is extracted from their historical movements. may affect users’ travel behaviors is still very challenging and an Compared with popular CF methods, the random walk with attempt to acquire as much as these aspects as possible may prove restart (RWR) has been proven to provide better estimations to be infeasible, as distinct factors affect each person particularly when considering users’ implicit feedback [18]. Therefore, we [21]. utilize a RWR approach to estimate users’ missing behaviors and The proper representation of GPS readings allows the analysis investigate optimizations to better train the model. In addition, we of the relationship between users and their behaviors. Users’ make use of temporal dependency within users’ travel behaviors to enhance the accuracy of the recommendation. In summary, Copyright © 2020 for this paper by its author(s). Published in the Workshop Proceed- the contributions of this study are: ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- tribution 4.0 International (CC BY 4.0). • The missing behavior frequencies for each user on spe- edges connecting the user-item pair. The missing ratings are esti- cific road segments are estimated by using the RWR tech- mated by traversing the graph according to the weights and the nique, as it yields more accurate estimations due to implicit restart probability until convergence. data nature. We utilize the Optimized Random Walk with Restart (ORWR) to improve the efficiency of the behavior 2.2 General Route Recommendation estimation. General route recommendation aims to provide a route between • The data sparsity problem is mitigated with the considera- two points in a road network, an origin and a destination, based tion of temporal dependency. A time difference histogram on a given cost function. Researches have been conducted to is computed based on time interval differences from GPS study human movement and provide route recommendations historical data, which weights users’ preferences at similar based on historical GPS trajectories. time accordingly and provide more data for the recom- Chen et al. [2] study the most popular route based on users’ mendation. traveling behaviors. A popularity indicator is used to discover the • Experiments are conducted on two real GPS trajectory data frequent routes in a network in order to assist users when they sets in China, Geolife from Beijing and taxi GPS trajecto- travel to an unfamiliar area. Wei et al. [24] propose an algorithm ries from Shenzhen. The results indicate that the proposed to construct popular routes from historical trajectories in regions approach BR2 outperforms the baseline methods. of the road network. These studies contribute to the area of route planning and In the remainder of this paper, Section 2 outlines the related travel recommendation, but neither of them is truly personalized, works regarding CF techniques and route recommendation. Sec- as they do not study users’ personal route preferences on the tion 3 presents the personalized route recommendation method- road network. ology and explains the proposed method step by step. Section 4 discusses the experiment results. Finally, Section 5 exposes the conclusion and further work of this research. 2.3 Personalized Route Recommendation Differently from general route recommendation, personalized route recommendation considers users’ preferences in the defini- 2 RELATED WORK tion or calculation of a cost function. While some studies explore users’ preferences through explicit feedback, collecting informa- 2.1 Collaborative Filtering Techniques tion by directly querying users, others assume that preference Recommendation systems are popular for providing predictions factors cannot be modelled entirely and that users might not be that interest people in various scenarios. One of the approaches fully aware of their own preferences. Our study is focused on the of recommendation systems is given by CF, which considers latter assumption. information of preference from multiple users to predict missing Several researches explore personalized route recommenda- ratings of users on items, and some of its popular methods are tion through the optimization of the cost function based on mul- the item-based CF, MF, and RWR. tiple criteria either weighted directly by users or defined by their The item-based CF is one of the most simplistic methods which driving preferences information [1, 5, 7, 11, 16, 26]. Users’ pref- considers the similarities between items to calculate values for erences are explicitly collected, modeled, and/or used in the cal- others without a user rating [25]. Since every item needs to be culation of candidate routes, which might not reflect their true compared to compute the similarity score, in the worst-case sce- behaviors and are invariant to time. nario, the total number of evaluations is equal to the combination Another approach is focused on providing personalized tourism of two for all the elements in the data set. The algorithm does route recommendation based on social media data. Studies have not scale well for large data sets, involves intensive processing, used users’ data to model route attributes [9], build data set of and requires O(n2 ) space to store the similarities between n items popular locations [6], and even mine their preferences and tem- [20]. poral information [31]. Nevertheless, these approaches focus on Among the latent factor models, MF is one of the most used movement between points of interest or visiting locations, disre- methods for recommendation problems [10]. The objective of garding the evaluation of users’ preferences directly on the road this technique is to represent rating data into two vectors of network. latent factors, in which the dimensionality may vary based on Personalized route recommendation based on historical GPS the data itself, so that the dot product of the two vectors result trajectories is another branch of research. McGinty and Smyth in approximate values of known ratings. The values of the la- [15] proposed a personalized route planning method that consid- tent factor vectors can be learned from the known data by the ers historical trajectories to derive implicit driving preferences minimization of regularized squared error with the use of the without defining a preference model. The method combines and stochastic gradient descent algorithm. reuses routes sections from previous travels to generate new Random walk is a process that describes the probabilities of routes, but it fails to recommend routes to unfamiliar areas of the series of random movements in a dimension space. In recom- road network. As an extension to their previous work, McGinty mender systems, one of the techniques that considers personal and Smyth [8] used a type of distributed case-based reasoning preferences is the RWR. While the usual approach of random strategy, in which historical trajectories of similar drivers are walk traverses a graph only based on its structure, leading to borrowed to recommend routes in unfamiliar map territories. an exclusive convergence, the RWR makes uses of a probability However, considering the behaviors of similar drivers by directly of returning to the original node on each movement. Therefore, using their trajectories in the recommendation might not pre- the technique allows personalization by having lower ranking cisely reflect the preferences of a driver. values given to nodes farther from the origin node [14]. RWR Letchner et al. [12] introduced a method of personalized route is graph-based, having users and items as nodes and ratings as planning by considering users’ historical trajectories, extracted from GPS readings, in the calculation of an inefficiency ratio that a user’s movements. For a given user u and an associated ap- represents the proportion of time extended in a trip compared to pearance behavior b, the frequency that user u has behavior b is the shortest possible time. Liu et al. [13] explored personalized represented by f rq(u, b). route recommendation for self-drive tourists not only consid- Definition 6 - Transition behavior. A transition behavior ering the drivers’ visiting preferences, but also real-time traffic tb represents the relationship of two sequential appearance be- information. In these approaches, the authors define a metric to haviors bi = (ei , t) and b j = (e j , t) at the same time interval represent users’ implicit preferences but include other factors t, such that ei .end = e j .start. It is denoted as an ordered tuple like distance and travel time as part of the objective function. tb = (bi → b j ) or tb(i→j) in short. Similarly, the frequency that Cui et al. [3] extracted users’ travel behaviors from their histor- user u has transition behavior tb is given by f rq(u, tb). ical GPS trajectories to represent their preferences. By predicting Both appearance and transition behaviors contribute to cap- missing travel behaviors with CF technique, a route with maxi- ture the implicit travel preferences of each user. This implicit mum probability is computed for specific users. In addition, Cui travel preference information is essential in solving the problem and Wang [4] proposed a different representation of travel be- of the personalized route recommendation, as the recommenda- havior and improved the performance of route recommendation. tions need to truly reflect users’ priorities. However, the methodologies disregard the implicit data nature and temporal dependency is not fully explored. A different approach is proposed by Wang et al. [23], in which 3.2 Framework Overview neural networks are used to learn the optimal cost functions The framework of the proposed BR2 is illustrated in Figure 1. The of the A* algorithm. The presented results show considerable first step of BR2 is data preprocessing. The GPS trajectories are improvement in precision, recall, and F1-score, but the training split into trips with defined origins and destinations, the trips time might impede its usage in real-world applications with new are matched to the road network by applying the map matching data constantly being fed into the system. technique, and the outliers are removed from the trajectories. As a final step of the preprocessing component, the appearance 3 BEHAVIOR-BASED ROUTE and transition behaviors are generated from the historical users’ routes. Since users, in general, travel on few routes daily, cov- RECOMMENDATION ering a limited number of road segments in a study area, the This section discusses the proposed method of the Behavior- missing travel behaviors for each user need to be estimated. In based Route Recommendation (BR2). The following preliminaries the second step, the RWR technique is used to estimate users’ are first defined for this research. appearance and transition behaviors on each untraveled road segment. With the missing behaviors frequencies estimated, the 3.1 Preliminaries temporal dependency is evaluated in the data set by building a The preliminaries of this study are defined as follows. time interval difference histogram. The histogram indicates how Definition 1 - Road network. The road network is a graph the data is distributed and suggests the number of intervals that G = (V , E) composed by a set of vertices V and edges E. A vertex should be considered in the route recommendation process. Then, v ∈ V represents the boundary of road segments and an edge the probabilities are calculated from the travel behaviors for a e ∈ E represents a road segment, containing starting and end- defined origin and destination considering the Markov property. ing vertices, denoted as e.start and e.end, respectively, where In addition, the Laplace smoothing method is applied to estimate e.start ∈ V and e.end ∈ V . the probability of users’ missing travel behaviors for the road Definition 2 - GPS reading. A GPS-reading is a 3-tuple p = segments that have never been visited previously by any user. (t, lat, lnд) in which t represents a timestamp, and lat and lnд Finally, the last stage is the recommendation of the route with are the latitude and longitude of the location of the GPS-reading. maximum travel behavior probability. To facilitate the route com- Definition 3 - GPS trajectory. A GPS trajectory tr j = (p1, putation, a behavior graph is constructed to represent the travel p2, p3, · · · , pk ) consists of a sequence of GPS-readings, such that behaviors and the relationship among them. Dijkstra’s algorithm pi .t − p(i−1) .t > 0 and 1 < i ≤ k. is used in the behavior graph to find the route maximum travel Definition 4 - Route. Given a road network G = (V , E), a behavior probability. route R starting from vertex vi and ending at vertex v j is a se- quence of connected road segments R = (vi , e 1, e 2, e 3, · · · , el , v j ), where vi , v j ∈ V and ei ∈ E; ei is the i-th road segment in R, 3.3 Data Preprocessing ei , e j if i , j, e 1 .start = vi , and el .end = v j . The preprocessing can be divided into three parts. First, since Two types of behaviors are proposed to extract the user’s a GPS trajectory usually tracks users for a long period of time movement behaviors on the road network. First, to provide a and contains multiple trips of the users, trajectory segmenta- global overview of how frequently users are in specific locations tion is first applied to divide the raw trajectory into several at a certain time on the road network, the concept of appearance sub-trajectories [3]. After the trajectory segmentation, each GPS behavior is defined to represent the relation between location trajectory corresponds to a single travel route. Second, since and time of users’ movement. The second, transition behavior, the trajectory is usually noisy due to the urban canyon or mea- captures the sequential relation of appearance behaviors, not surement errors, outliers in GPS trajectories are detected and only giving a sense of location regularity but most importantly removed. In this study, if the distance between a GPS point and its of direction. Both behaviors are formally described as follows. nearest road segment exceeds a threshold of 180 seconds, the GPS Definition 5 - Appearance behavior. An appearance behav- point is considered as an outlier. Besides, given the maximum ior is defined as tuple of the road segment and the time, denoted moving speed, if the distance that a GPS point moves exceeds a as b = (e, t), where e is an edge of the road network and t is a threshold within the time interval from its previous GPS point, it time interval of a day, and it describes the location and time of is also taken as outlier. Lastly, GPS trajectories are mapped onto Preprocessing Travel behavior calculation GPS Road trajectory network Route recommendation Travel behavior frequency estimation (RWR) Map matching Behavior graph & segmentation generation Temporal dependency processing Historical Route search routes Behavior probability estimation Travel behaviors Figure 1: Behavior-based Route Recommendation framework overview. Users Behaviors Users Behaviors the road network to obtain users’ historical routes with the map matching method proposed in [17]. u1 w11 b1 Users 1 2 w1r UBn×m 3.4 Behavior Frequency Estimation u2 w2(r+1) br Behaviors Each user in general covers a very limited number of road seg- u3 w3r tb1 3 4 ments in a city, so the missing travel behaviors for each user wn(r+1) UBn×mT need to be estimated. In this section, we discuss how the RWR is applied in the estimation and provide the motivation for the use un wnm tbq of ORWR. (a) Bipartite graph (b) Adjacency matrix 3.4.1 Random Walk with Restart for Behavior Estimation. In this study, the appearance and transition behaviors are first ex- Figure 2: User-behavior graph representations. tracted from the historical users’ trajectories and a user-behavior matrix U Bn×m is built. In user-behavior matrix, rows represent users U = (u 1, u 2, · · · , un ), columns represent both appearance is a m × m matrix (behaviors by behaviors) composed similarities and transition behaviors B = (b1, b2, · · · , br , tb1, tb2, · · · , tbq ), between behaviors. This structure is commonly unoptimized due where m = r + q, and the element U Bi,j of user i and behavior to the exceedingly higher number of behaviors compared to the j represents the frequency of the pair (ui , b j ) or (ui , tb j ). The number of users in a data set. user-behavior matrix can also be represented as a user-behavior The adjacency matrix and restart probability c are provided graph, in which the nodes (users and behaviors) are connected as inputs to the RWR algorithm. The score matrix r , is calculated through edges with weights as the corresponding frequency of by: the user-behavior pair, as illustrated in Figure 2a. " # The RWR technique tackles the problem based on a graph and 0 U B n×m (t ) r (t +1) = (1 − c) T r + c I (n+m)×(n+m) can approach it with an adjacency matrix representation. Consid- U B n×m 0 ering the structure of the user-behavior graph shown in Figure 2a, the idea of RWR is to traverse the graph by either moving to where U B n×m represents the row-normalized user-behavior ma- a neighbor node or going back to an initial node based on a given T trix, U B nxm represents the transposed row-normalized user- restart probability value. Therefore, a behavior node is highly behavior matrix, r (t ) is the score matrix of t-th iteration – initially related to a user when it is visited multiple times. A behavior set as an identity matrix – and I (n+m)×(n+m) is the identity matrix node associated with a higher score value represents the higher with (n + m) rows and columns. The score matrix is calculated probability of being visited from a user node when the graph is iteratively until convergence, in which the difference between traversed. The score values are represented as the weights of the the new and previous scores is smaller than a defined threshold edges in the graph. Overall, the RWR estimates the probability ε. values of all edges in the user-behavior graph by incrementally |r (t +1) − r (t ) | < ε updating the user-behavior probability values based on past be- haviors of the user and the behaviors of the similar behaving After convergence, the RWR algorithm returns the score matrix, user. Since users and behaviors are interpreted as nodes in an which contains the normalized probability values associated with undirected graph, the adjacency matrix of the graph is generated the previously unknown behavior frequencies. with the combination of both users and behaviors, resulting in a Considering the total number of users n and behaviors m, large and sparse symmetric matrix, illustrated in Figure 2b. The the adjacency matrix representation contains (n + m)2 elements. adjacency matrix has (n + m) rows and columns, composed by Consequently, the time complexity of the algorithm is defined by four distinct parts: part 1 is a n × n matrix (users by users) com- the total number of iterations for convergence k and the matrices posed by similarities between users, part 2 is the user-behavior multiplication in every iteration, resulting in a time complexity matrix, part 3 is the transposed user-behavior matrix, and part 4 of O(k(n + m)3 ). 3.4.2 Optimized Random Walk with Restart for Behavior Es- 1 1.2 Exponential Trendline Normalized Frequency timation. The common RWR algorithm approach makes use of 0.8 1 Normalized Frequency Normalized Frequency an adjacency matrix as input, processed for as many iterations 0.6 0.8 as needed until the process converges. However, to fit the data 0.4 0.6 0.4 into an adjacency matrix representation, both sets of users and 0.2 0.2 behaviors need to be combined in rows and columns to form a 0 0 symmetric matrix, as shown in Figure 2b. -10 -5 0 5 Time Difference Interval 10 0 1 2 Time Difference Interval 3 4 This representation could be useful if similarity measures be- (a) (b) tween users and between behaviors are available, describing the inner relations between themselves, as the data could be used to fill parts 1 and 4 of the adjacency matrix presented in Figure 2b. Figure 3: (a) Normalized time difference histogram. (b) Nevertheless, since we do not explore the similarities between Five first intervals of time difference histogram approxi- users and between behaviors, the matrix representation is un- mated by an exponential function (a = 1.0778, b = −0.58). questionably unoptimized and leads to unnecessary processing overhead, as the total number of elements in the matrix is ex- following equation: ceedingly higher than the actual data in the user-behavior matrix. For instance, for n users and m behaviors the total number of ! !  Ír f rq(bi ) Íq f rq(tbi ) + i=1 |k | = 0  elements in the adjacency matrix is (n +m)2 while the actual data   i=1   2 2 consists of the part 2 in Figure 2b, containing (n ∗ m) elements. f req(k) = Ír Ír   f rq(bi ) f rq(b j ) + i , j and To better handle the estimation of missing behavior frequen-  Íqi=1 Íqj=1    cies we use the ORWR algorithm, which considers the user-   i=1 j=1  f rq(tb i ) f rq(tb j ) |k | , 0 behavior bipartite graph represented by the user-behavior matrix Since we evaluate all behaviors against themselves, only the only [19]. Using the user-behavior matrix instead of the entire absolute time difference intervals are calculated and equally di- adjacency matrix has a strong impact in performance, resulting vided for their correspondent interval. For instance, frequencies in a time complexity reduction from O(k(n + m)3 ) to O(knm 2 ). of time difference intervals of absolute value of one hour are calculated and divided in half for the intervals of -1 and 1. Finally, 3.5 Temporal Dependency the obtained frequencies are normalized in the range from zero Time is one of the important factors that influence users’ actions to one. For example, Figure 3a illustrates a histogram obtained throughout the day, as people are more prone to go to different from a real-world data set Geolife by the computation of time places during specific times [27]. In addition, it is possible to difference of behaviors of all driving travelers. identify that people present similar travel behaviors at closer As expected, most of the frequencies are concentrated in the time intervals. For instance, many people go to work from home first few intervals close to the zero interval, showing that most by roughly the same route at similar times in the morning and behaviors happen at similar time intervals. As an attempt to avoid it is not expected for them to show this movement behavior overfitting and to reduce the noise from the data, a function can be during the evening, many hours apart from their common travel used to represent the values with most importance, closer to the behavior. Therefore, the temporal information should also be zero interval. In this study, an exponential function f (x) = ae bx considered when providing personalized route recommendations. is used to represent the data while conveying the notion of decay An appropriate strategy to predict user’s behaviors by con- as the time interval is farther from the evaluated behavior. Figure sidering temporal information is to study the relation between 3b presents how closely an exponential function represents the behaviors at different time intervals. Behaviors associated to a first five intervals of the distribution. given road segment are most likely to be shown on closer time The values from the exponential function for each time inter- intervals. Temporal dependency is implemented in this study val allow weighting the behaviors frequencies from time intervals by considering existing behaviors of the same road segment at close to the recommendation time interval. In Figure 3b, for in- similar time intervals in the calculation of travel behaviors’ prob- stance, the weighting values w1, w2, w3, and w4, regarding the abilities. time difference intervals of 1 to 4, are approximately 0.6, 0.3, To identify how behaviors at different time intervals impact 0.2, and 0.1, respectively. The frequencies of similar time inter- the route recommendation process, a time difference histogram vals are weighted by multiplying them with the corresponding is generated by comparing behaviors of all users related to the value of the exponential function for a time difference inter- same road segments in pairs. For example, if there are 24 time val. The weighted frequencies are added to the frequency of the intervals in total, each time interval representing each hour of the behavior at the time interval used for recommendation. For ex- day, the time difference histogram is divided into values ranging ample, for two appearance behaviors bi and b j of the same road from -12 to 12, with increments of one, and each interval consists segment and user u at time intervals t and t + 1 respectively, of the frequency that behaviors related to the same road segment if the recommendation is provided at time interval t and the happened at a specific time difference. correspondent value of the exponential function for a time dif- A pair of appearance behaviors bi and b j refer to the same ference interval of 1 is w1, the total frequency of behavior bi is road segment if bi .e = b j .e and the correspondent absolute time f rq(u, bi ) = f rq(u, bi ) + w 1 f rq(u, b j ). difference interval is defined as |k | = |bi .t − b j .t |. Since transi- tion behaviors consist of sequential appearance behaviors, the 3.6 Probability Calculation for Travel rationale is the same. Given a total number of appearance and Behavior transition behaviors r and q respectively, the frequency for each Based on behavior frequencies, probabilities are computed for time difference interval value k is computed according to the a user u given a route R at a specific time t. The route with maximum probability reflects the route that the user is most v1 v2 v3 b1 b2 b3 inclined to take, and it is preferred above all others. The route e1 e2 probability is defined as: b1 b2 ϑ(b1,b2) ϑ(b2,b3) P(e 1, e 2, e 3, · · · , el , t |u) e4 e3 v1 v6 b3 b4 P(R|u, t) = P(e 1, e 2, e 3, · · · , el |u, t) = P(t |u) b5 b6 ϑ(b4,b5) ϑ(b5,b6) where e 1, e 2, e 3, · · · , el represent the road network edges, and v4 e5 v5 e6 v6 b4 b5 b6 e 1 and el are the incident edges with the origin and destination vertices, respectively. If the user u and time t are known, P(t |u) (a) Road network (b) Behavior graph is constant, thus simplifying the problem to the maximization of the numerator. Figure 4: Comparison of road network and behavior graph, With the assumption that the series of behavior probabili- given a trajectory from origin v 1 to destination v 6 . ties are described as Markov property, the probability of a user behavior depends on the immediate previous behavior, if not re- lated to the origin, simplifying the representation of the problem. Table 1: Summary of training data used in experiments. Extending the previous equation, the route probability is given by: Data set Features Geolife Shenzhen P(b1, b2, b3, · · · , bl |u) = P(b1 |u)P(tb1→2 |u)P(tb2→3 |u) · · · Users 22 274 P(tbl −1→l |u) Behaviors 26,123 677,068 where P(b1 |u) is the appearance behavior probability of origin Time interval 1 hour 1 hour and P(tbi−1→i |u) is the transition behavior probability, from Elements in user-behavior matrix 574,706 185,516,632 appearance behavior bi−1 to appearance behavior bi . Sparseness in user-behavior matrix 95.45% 99.63% Since traditional pathfinding algorithms search for the path Segments in road network 15,493 29,411 with the minimal weight, the probability function is transformed Vertices in road network 38,485 62,113 in order to shift the problem objective from maximization to minimization. It is achieved by using the logarithm of inversed probabilities as follows: edges. Therefore, a different structure is constructed to enable graph traverse considering probabilities. It is denominated behav- 1 L= ior graph and its structure is illustrated in Figure 4. The behavior P(b1 |u)P(tb1→2 |u)P(tb2→3 |u) · · · P(tbl −1→l |u) graph represents each vertex as an appearance behavior and each 1 edge as a transition behavior. The weight ϑ defines the correspon- ln L = ln P(b1 |u)P(tb1→2 |u)P(tb2→3 |u) · · · P(tbl −1→l |u) dent behavior probability. In addition, the origin and destination l −1 vertices of a given trajectory are included in the structure and 1 Õ 1 ln L = ln + ln their edges represent appearance behaviors, illustrated with a P(b1 |u) i=1 P(tbi→i+1 |u) different notation from other edges and vertices in the graph. If there is no historical data of users’ travels through some Despite adding complexity in the process by generating a road segments, the related behaviors will not exist. To prevent the new structure to represent probabilities, the personalized recom- designation of zero to the probabilities of nonexistent behaviors, mended route can be computed as the least log-inverse proba- the Laplace smoothing method is employed for both appearance bility path, which can be solved by any traditional shortest path and transition behaviors by considering the following: algorithm. fd r q(u,b)+α f rq(u, b) > 0 4 EXPERIMENTAL RESULTS   d  bi ∈S o f r q(u,bi )+α d Í P(b |u) =  d  α otherwise In this section, the performance of the proposed BR2 is evaluated  bi ∈So fd r q(u,bi )+α d  Í through the experiments on two real data sets. The first data set, fd r q(u,tbi →j )+α Geolife [28–30], contains trajectories extracted from drivers in f rq(u, tbi→j ) > 0    d P(tbi→j |u) = k =1 f r q(u,tbi →k )+α d  Íd d  the central district of Beijing and the second data set consists of  α otherwise taxi drivers’ trajectories from Shenzhen. The training data has  Íd fd  k =1 r q(u,tbi →k )+α d  80% of the trajectories while 20% was used for testing purposes. Some details of the data sets are presented in Table 1. Four accuracy measures are considered to evaluate the perfor- where d f rq represents the estimated behavior frequency, P(b|u) mance of the model: precision and recall based on the number of is the probability of appearance behavior b of user u at time t, So road segments and based on the distance of road segments. The is the set of appearance behaviors starting at origin o, P(tbi→j |u) measures definitions are presented as follows. is the probability of transition behavior tbi→j , α is the smooth- ing parameter, and d is number of appearance behaviors – the # of correct road segments Precision segments = behaviors related to the destination road segment, in the case of # of road segments on recommended route transition behavior. # of correct road segments Recall segments = 3.7 Route Search Based on Probability # of road segments on true route The structure of the road map network graph does not allow distance of correct road segments Precision distance = the representation of travel behavior probabilities as weights in distance of recommended route Table 2: Recommendation performance of baseline meth- Table 3: Recommendation performance with different CF ods and BR2 with and without temporal dependency (BR2 methods. - TD). Precision Recall Precision Recall Segments Distance Segments Distance Segments Distance Segments Distance Geolife Geolife IBF 0.507 0.517 0.522 0.549 SD 0.246 0.266 0.215 0.239 MF 0.533 0.537 0.543 0.569 MaP2R 0.533 0.537 0.543 0.569 ORWR 0.576 0.573 0.562 0.587 BR2 - TD 0.576 0.573 0.562 0.587 Shenzhen BR2 0.631 0.637 0.613 0.642 IBF - - - - Shenzhen MF 0.555 0.562 0.520 0.562 SD 0.289 0.311 0.247 0.267 ORWR 0.628 0.632 0.580 0.613 MaP2R 0.555 0.562 0.520 0.562 BR2 - TD 0.628 0.632 0.580 0.613 BR2 0.688 0.692 0.626 0.654 for Geolife data set and 1.0−7 for Shenzhen, the convergence value in ORWR as 5.0−10 , and the restart probability as 0.5 for distance of correct road segments Recall distance = Geolife and 0.1 for Shenzhen. The results are shown in Table 3. distance of true route Due to the large size of the Shenzhen data set, the experiment The different aspects of the route recommendation performance with the item-based CF could not be conducted, as a result of are assessed by the conduction of experiments. First, we compare scalability issue [20]. the performance of BR2 with other baseline methods. Second, As seen in the results, for both data sets, the route recommen- we evaluate the route recommendation performance based on dation with the ORWR obtained the highest precision and recall behavior estimation, considering the ORWR in the proposed BR2 values for the number of road segments and distance compared method and the previously discussed CF techniques. Last, the to IBF and MF due to its capability to better handle implicit data. influence of temporal dependency on the performance of route In addition, the higher amount of data in Shenzhen justifies a recommendation is assessed. larger gap in performance between the ORWR and the other methods. 4.1 Overall Recommendation Performance In this experiment, the recommendation performance of BR2 is 4.3 Temporal Dependency Impact compared against MaP2R [4] and the shortest distance (SD) route method. With respect to the parameters associated with MaP2R, This experiment evaluates the influence of the temporal de- the number of latent factors in MF was set as 30 and the Laplace pendency (TD) in the accuracy of recommendations. The ex- smoothing parameter as 1.0−5 for Geolife data set and 1.0−7 for ponential functions, y = 1.067e −0.57 |x | for Geolife data set and Shenzhen. The BR2 used the same values as MaP2R for the Laplace y = 1.0057e −0.038|x | for Shenzhen data set, were built to repre- smoothing parameters, the ORWR considered a convergence sent the generated data distribution histogram, as exemplified value of 5.0−10 and restart probability of 0.5 for Geolife and 0.1 in Figure 3b. The weight for the behaviors with different time for Shenzhen data sets, and three time difference intervals were intervals was determined according to the function. These func- used for temporal dependency. The obtained precision and recall tions were obtained considering behaviors of the closest three values of the three recommendation methods are presented in time intervals, as it yielded the best accuracy gain in the recom- Table 2. mendation. The results presented in Table 2 show the difference The obtained results show that BR2 outperforms the other in precision and recall of BR2 when temporal dependency was methods in all accuracy measurements. For Geolife data set, BR2 not considered. performs on average 38.9% better than the shortest distance route As seen in the results, the overall accuracy of the model consid- method and 8.5% better than MaP2R. Similarly, for Shenzhen data erably decreased when temporal dependency was not considered, set, BR2 shows an average enhancement of 38.6% and 11.5%. The negatively impacting the accuracy, on average, by 5.6% and 5.1% better performance of BR2 is due to a more effective estimation for Geolife and Shenzhen data sets, respectively. The impact of of users’ travel behaviors using RWR and to the consideration of temporal dependency depends on the behavior distribution and temporal dependency between travel behaviors. the amount of similar behaviors considered in the recommenda- tion process. 4.2 Performance Based on Behavior Estimation 5 CONCLUSION AND FUTURE WORK The effective estimation of users’ travel behaviors is critical for In this study, we proposed the Behavior-based Route Recom- personalized route recommendation. In this experiment, we com- mendation (BR2) to provide personalized routes based on users’ pare the route recommendation performance of ORWR against preferences. The methodology uses the appearance and transition two popular CF methods, i.e. the item-based CF (represented behaviors to capture users’ travel preferences, adopts the ORWR as IBF) and MF, in the estimation of the frequencies of missing to estimate missing behavior frequencies, evaluates temporal behaviors. We do not consider the temporal dependency in the dependency through a time difference interval histogram, and experiment. searches for the route with maximum probability on the behav- In this analysis, the number of latent factors in matrix factor- ior graph. Experiments show the effectiveness of the proposed ization was set as 30, the Laplace smoothing parameter as 1.0−5 method. The lack of data of users with few behaviors was alleviated Conference on Advances in Geographic Information Systems (GIS ’09). ACM, with temporal dependency but the problem with routes without New York, NY, USA, 336–343. https://doi . org/10 . 1145/1653771 . 1653818 [18] H. Park, J. Jung, and U. Kang. 2017. A comparative study of matrix fac- any previous data was not focused in this research. It is believed torization and random walk with restart in recommender systems. In 2017 that with the inclusion of spatial correlation and other data such IEEE International Conference on Big Data (Big Data). 756–765. https: //doi . org/10 . 1109/BigData . 2017 . 8257991 as road type, number of lanes, speed limit, and traffic density can [19] Seyyed Mohammadreza Rahimi, Rodrigo Augusto de Oliveira e Silva, Behrouz lead to a more concise and robust recommendation. Future work Far, and Xin Wang. 2019. Optimized Random Walk with Restart for Recom- also includes the comparison of the optimized model with other mendation Systems. In Advances in Artificial Intelligence, Marie-Jean Meurs and Frank Rudzicz (Eds.). Springer International Publishing, Cham, 320–332. personalized route recommendation methods that may provide a [20] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item- treasure trove of improvements to be incorporated to the model. based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web (WWW ’01). ACM, New York, NY, USA, 285–295. https://doi . org/10 . 1145/371920 . 372071 Acknowledgments. The research is supported by the Natural [21] A. M. Tawfik, H. A. Rakha, and S. D. Miller. 2010. Driver route choice behav- Sciences and Engineering Research Council of Canada Discovery ior: Experiences, perceptions, and choices. In 2010 IEEE Intelligent Vehicles Grant to Xin Wang. Symposium. 1195–1200. https://doi . org/10 . 1109/IVS . 2010 . 5547968 [22] Hanghang Tong, Christos Faloutsos, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast Random Walk with Restart and Its Applications. In Proceedings of the Sixth REFERENCES International Conference on Data Mining (ICDM ’06). IEEE Computer Society, Washington, DC, USA, 613–622. https://doi . org/10 . 1109/ICDM . 2006 . 70 [1] P. Campigotto, C. Rudloff, M. Leodolter, and D. Bauer. 2017. Personalized and [23] Jingyuan Wang, Ning Wu, Wayne Xin Zhao, Fanzhang Peng, and Xin Lin. 2019. Situation-Aware Multimodal Route Recommendations: The FAVOUR Algo- Empowering A* Search Algorithms with Neural Networks for Personalized rithm. IEEE Transactions on Intelligent Transportation Systems 18, 1 (Jan 2017), Route Recommendation. In Proceedings of the 25th ACM SIGKDD International 92–102. https://doi . org/10 . 1109/TITS . 2016 . 2565643 Conference on Knowledge Discovery & Data Mining (KDD ’19). ACM, New York, [2] Z. Chen, H. T. Shen, and X. Zhou. 2011. Discovering popular routes from NY, USA, 539–547. https://doi . org/10 . 1145/3292500 . 3330824 trajectories. In 2011 IEEE 27th International Conference on Data Engineering. [24] Ling-Yin Wei, Yu Zheng, and Wen-Chih Peng. 2012. Construct- 900–911. https://doi . org/10 . 1109/ICDE . 2011 . 5767890 ing Popular Routes from Uncertain Trajectories. In Proceedings of the [3] Ge Cui, Jun Luo, and Xin Wang. 2018. Personalized travel route 18th ACM SIGKDD International Conference on Knowledge Discovery and recommendation using collaborative filtering based on GPS trajectories. Data Mining (KDD ’12). ACM, New York, NY, USA, 195–203. https: International Journal of Digital Earth 11, 3 (2018), 284–307. http: //doi . org/10 . 1145/2339530 . 2339562 //www . tandfonline . com/doi/abs/10 . 1080/17538947 . 2017 . 1326535 [25] S. Wei, N. Ye, S. Zhang, X. Huang, and J. Zhu. 2012. Item-Based Collaborative [4] Ge Cui and Xin Wang. 2017. MaP2R: A Personalized Maximum Probability Filtering Recommendation Algorithm Combining Item Category with Inter- Route Recommendation Method Using GPS Trajectories. In Advances in Knowl- estingness Measure. In 2012 International Conference on Computer Science and edge Discovery and Data Mining, Jinho Kim, Kyuseok Shim, Longbing Cao, Service System. 2038–2041. https://doi . org/10 . 1109/CSSS . 2012 . 507 Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (Eds.). Springer International [26] P. Yawalkar and S. Ranu. 2019. Route Recommendations on Road Net- Publishing, Cham, 168–180. works for Arbitrary User Preference Functions. In 2019 IEEE 35th In- [5] J. Dai, B. Yang, C. Guo, and Z. Ding. 2015. Personalized route recommendation ternational Conference on Data Engineering (ICDE). 602–613. https: using big trajectory data. In 2015 IEEE 31st International Conference on Data //doi . org/10 . 1109/ICDE . 2019 . 00060 Engineering. 543–554. https://doi . org/10 . 1109/ICDE . 2015 . 7113313 [27] Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, and Nadia Magnenat Thal- [6] Siyuan Du, Hua Zhang, Hualin Xu, Jirui Yang, and Oscar Tu. 2019. To make the mann. 2013. Time-aware Point-of-interest Recommendation. In Proceedings travel healthier: a new tourism personalized route recommendation algorithm. of the 36th International ACM SIGIR Conference on Research and Develop- Journal of Ambient Intelligence and Humanized Computing 10, 9 (01 Sep 2019), ment in Information Retrieval (SIGIR ’13). ACM, New York, NY, USA, 363–372. 3551–3562. https://doi . org/10 . 1007/s12652-018-1081-z https://doi . org/10 . 1145/2484028 . 2484030 [7] Stefan Funke and Sabine Storandt. 2015. Personalized route planning in road [28] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. Under- networks. In Proceedings of the 23rd SIGSPATIAL International Conference on standing Mobility Based on GPS Data. In Proceedings of the 10th International advances in geographic information systems (SIGSPATIAL ’15), Vol. 03-06-. ACM, Conference on Ubiquitous Computing (UbiComp ’08). ACM, New York, NY, USA, 1–10. 312–321. https://doi . org/10 . 1145/1409635 . 1409677 [8] Lorraine Mc Ginty and Barry Smyth. 2001. Collaborative Case-Based Reason- [29] Yu Zheng, Xing Xie, and Wei-Ying Ma. 2010. GeoLife: A Collaborative Social ing: Applications in Personalised Route Planning. In Case-Based Reasoning Networking Service among User, location and trajectory. IEEE Data(base) Research and Development, David W. Aha and Ian Watson (Eds.). Springer Engineering Bulletin (June 2010). https://www . microsoft . com/en- Berlin Heidelberg, Berlin, Heidelberg, 362–376. us/research/publication/geolife-a-collaborative-social-networking- [9] Gang Hu, Yi Qin, and Jie Shao. 2018. Personalized travel route recommendation service-among-user-location-and-trajectory/ from multi-source social media data. Multimedia Tools and Applications (18 [30] Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining Interesting Oct 2018). https://doi . org/10 . 1007/s11042-018-6776-9 Locations and Travel Sequences from GPS Trajectories. In Proceedings of the [10] Y. Koren, R. Bell, and C. Volinsky. 2009. Matrix Factorization Techniques 18th International Conference on World Wide Web (WWW ’09). ACM, New for Recommender Systems. Computer 42, 8 (Aug 2009), 30–37. https: York, NY, USA, 791–800. https://doi . org/10 . 1145/1526709 . 1526816 //doi . org/10 . 1109/MC . 2009 . 263 [31] X. Zhu, R. Hao, H. Chi, and X. Du. 2017. FineRoute: Personalized and [11] H. Kriegel, M. Renz, and M. Schubert. 2010. Route skyline queries: Time-Aware Route Recommendation Based on Check-Ins. IEEE Trans- A multi-preference path planning approach. In 2010 IEEE 26th Interna- actions on Vehicular Technology 66, 11 (Nov 2017), 10461–10469. https: tional Conference on Data Engineering (ICDE 2010). 261–272. https: //doi . org/10 . 1109/TVT . 2017 . 2764999 //doi . org/10 . 1109/ICDE . 2010 . 5447845 [12] Julia Letchner, John Krumm, and Eric Horvitz. 2006. Trip Router with Individualized Preferences (TRIP): Incorporating Personalization into Route Planning. https://www . microsoft . com/en-us/research/publication/trip- router-individualized-preferences-trip-incorporating-personalization- route-planning/ [13] Long Liu, Jin Xu, Stephen Shaoyi Liao, and Huaping Chen. 2014. A real-time personalized route recommendation system for self-drive tourists based on vehicle to vehicle communication. Expert Systems With Applications 41, 7 (2014), 3409–3417. [14] B. Lv, W. Yu, L. Wang, and J.A. McCann. 2014. Efficient processing node proximity via random walk with restart. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8709, 542–549. [15] Lorraine McGinty and Barry Smyth. 2000. Personalised Route Planning: A Case-Based Approach. In Advances in Case-Based Reasoning, Enrico Blanzieri and Luigi Portinale (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 431–443. [16] S. Nadi and M.R. Delavar. 2011. Multi-criteria, personalized route planning using quantifier-guided ordered weighted averaging operators. International Journal of Applied Earth Observation and Geoinformation 13, 3 (2011), 322 – 335. https://doi . org/10 . 1016/j . jag . 2011 . 01 . 003 [17] Paul Newson and John Krumm. 2009. Hidden Markov Map Matching Through Noise and Sparseness. In Proceedings of the 17th ACM SIGSPATIAL International