BR2: A Travel Behavioral Approach to Personalized Route
            Recommendation Based on GPS Trajectories
             Rodrigo Augusto de Oliveira e Silva                                                                 Ge Cui
                           University of Calgary                                                        University of Calgary
                          Calgary, Alberta, Canada                                                     Calgary, Alberta, Canada
                           radeoliv@ucalgary.ca                                                           cuig@ucalgary.ca

                 Seyyed Mohammadreza Rahimi                                                                    Xin Wang
                            University of Calgary                                                       University of Calgary
                          Calgary, Alberta, Canada                                                     Calgary, Alberta, Canada
                           smrahimi@ucalgary.ca                                                         xcwang@ucalgary.ca

ABSTRACT                                                                               travel behaviors reflected in historical GPS trajectories are the
Various unknown circumstances may affect users’ travel behav-                          implicit feedback of their preferences, as they do not explicitly
iors between two locations on the road network, hence it is                            rate preferred routes. Since it is laborious to use surveys and
complicated to provide satisfactory personalized route recom-                          questionnaires to acquire the explicit feedback about traveling
mendations. In this paper, we believe that users’ travel behaviors                     preferences from individual users, users’ travel behaviors are
are reflected and can be learned from their historical GPS trajec-                     captured based on implicit feedback obtained from their past
tories. The Behavior-based Route Recommendation (BR2) method                           travels.
is proposed to compute personalized routes based exclusively on                           The consideration of users’ preferences in route recommen-
users’ travel preferences. The concepts of appearance and transi-                      dation can be problematic when suggesting routes to untrav-
tion behaviors are used to describe users’ travel behaviors. The                       eled locations. To alleviate this issue, some studies have been
behaviors are extracted from users’ past travels and the missing                       conducted in personalized route recommendation based on col-
behaviors, of unvisited locations, are estimated with the Opti-                        laborative filtering (CF) methods, e.g. item-based CF and matrix
mized Random Walk with Restart technique. Then, the temporal                           factorization (MF), to acquire more information through users
dependency of travel behaviors is considered by constructing a                         that share similar preferences [3, 4, 8]. The use of CF has been
time difference interval histogram. Last, a behavior graph is gen-                     proven to provide more accurate recommendations compared
erated to allow the maximum probability route computation with                         to the shortest distance route method, but they usually require
the shortest path algorithm, resulting in the most likely route                        quadratic space and considerable time to process [3, 22]. In this
to be taken by a user. Experiments conducted on two real GPS                           context, it is a complex and computational expensive task due to
trajectory data sets demonstrate the efficiency and effectiveness                      the huge amount of data needed to describe user travel behaviors
of the proposed method.                                                                and the overall sparseness of known data. The rapid and accurate
                                                                                       estimation of missing users’ travel behaviors is crucial to support
                                                                                       personalized route recommendation systems.
1    INTRODUCTION                                                                         Time is another essential factor for the effective comprehen-
With advances in Global Positioning System (GPS) technology                            sion of users’ travel behaviors, as users’ behaviors tend to be
and the popularity of mobile devices, massive amounts of hu-                           similar during specific intervals of the day. For instance, a user
man movement data in GPS trajectories have been collected and                          that has a behavior of taking a route between an origin and des-
are available for research, which provides an alternative way to                       tination at noon might have a different behavior at midnight, but
explore travel route recommendation. Route recommendation                              the same behavior is expected to be shown at times close to noon.
systems usually suggest routes based on the optimization of cost                       To accurately represent temporal dependency and to effectively
functions of either distance or travelling time. However, it has                       incorporate it in the recommendation process are challenges that
been observed that in the real world the shortest or quickest                          need to be addressed.
routes are often not taken [5]. A personalized route recommenda-                          In this study, we propose a personalized Behavior-based Route
tion system, on the other hand, provides route recommendations                         Recommendation method, named BR2, to extract users’ travel
based on users’ travel preferences [8]. For instance, some drivers                     behaviors and compute the route with maximum probability,
want to reach the destination as fast as possible, while others are                    which refers to the route that users are most likely to take based
willing to take a slightly longer route that does not go through                       on their previous travels. In BR2, the implicit information of users’
highways or busy roads. The identification of all factors which                        travel behaviors is extracted from their historical movements.
may affect users’ travel behaviors is still very challenging and an                    Compared with popular CF methods, the random walk with
attempt to acquire as much as these aspects as possible may prove                      restart (RWR) has been proven to provide better estimations
to be infeasible, as distinct factors affect each person particularly                  when considering users’ implicit feedback [18]. Therefore, we
[21].                                                                                  utilize a RWR approach to estimate users’ missing behaviors and
   The proper representation of GPS readings allows the analysis                       investigate optimizations to better train the model. In addition, we
of the relationship between users and their behaviors. Users’                          make use of temporal dependency within users’ travel behaviors
                                                                                       to enhance the accuracy of the recommendation. In summary,
Copyright © 2020 for this paper by its author(s). Published in the Workshop Proceed-   the contributions of this study are:
ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen,
Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At-
tribution 4.0 International (CC BY 4.0).
    • The missing behavior frequencies for each user on spe-          edges connecting the user-item pair. The missing ratings are esti-
      cific road segments are estimated by using the RWR tech-        mated by traversing the graph according to the weights and the
      nique, as it yields more accurate estimations due to implicit   restart probability until convergence.
      data nature. We utilize the Optimized Random Walk with
      Restart (ORWR) to improve the efficiency of the behavior        2.2    General Route Recommendation
      estimation.
                                                                      General route recommendation aims to provide a route between
    • The data sparsity problem is mitigated with the considera-
                                                                      two points in a road network, an origin and a destination, based
      tion of temporal dependency. A time difference histogram
                                                                      on a given cost function. Researches have been conducted to
      is computed based on time interval differences from GPS
                                                                      study human movement and provide route recommendations
      historical data, which weights users’ preferences at similar
                                                                      based on historical GPS trajectories.
      time accordingly and provide more data for the recom-
                                                                         Chen et al. [2] study the most popular route based on users’
      mendation.
                                                                      traveling behaviors. A popularity indicator is used to discover the
    • Experiments are conducted on two real GPS trajectory data
                                                                      frequent routes in a network in order to assist users when they
      sets in China, Geolife from Beijing and taxi GPS trajecto-
                                                                      travel to an unfamiliar area. Wei et al. [24] propose an algorithm
      ries from Shenzhen. The results indicate that the proposed
                                                                      to construct popular routes from historical trajectories in regions
      approach BR2 outperforms the baseline methods.
                                                                      of the road network.
                                                                         These studies contribute to the area of route planning and
   In the remainder of this paper, Section 2 outlines the related     travel recommendation, but neither of them is truly personalized,
works regarding CF techniques and route recommendation. Sec-          as they do not study users’ personal route preferences on the
tion 3 presents the personalized route recommendation method-         road network.
ology and explains the proposed method step by step. Section 4
discusses the experiment results. Finally, Section 5 exposes the
conclusion and further work of this research.
                                                                      2.3    Personalized Route Recommendation
                                                                      Differently from general route recommendation, personalized
                                                                      route recommendation considers users’ preferences in the defini-
2 RELATED WORK                                                        tion or calculation of a cost function. While some studies explore
                                                                      users’ preferences through explicit feedback, collecting informa-
2.1 Collaborative Filtering Techniques
                                                                      tion by directly querying users, others assume that preference
Recommendation systems are popular for providing predictions          factors cannot be modelled entirely and that users might not be
that interest people in various scenarios. One of the approaches      fully aware of their own preferences. Our study is focused on the
of recommendation systems is given by CF, which considers             latter assumption.
information of preference from multiple users to predict missing         Several researches explore personalized route recommenda-
ratings of users on items, and some of its popular methods are        tion through the optimization of the cost function based on mul-
the item-based CF, MF, and RWR.                                       tiple criteria either weighted directly by users or defined by their
   The item-based CF is one of the most simplistic methods which      driving preferences information [1, 5, 7, 11, 16, 26]. Users’ pref-
considers the similarities between items to calculate values for      erences are explicitly collected, modeled, and/or used in the cal-
others without a user rating [25]. Since every item needs to be       culation of candidate routes, which might not reflect their true
compared to compute the similarity score, in the worst-case sce-      behaviors and are invariant to time.
nario, the total number of evaluations is equal to the combination       Another approach is focused on providing personalized tourism
of two for all the elements in the data set. The algorithm does       route recommendation based on social media data. Studies have
not scale well for large data sets, involves intensive processing,    used users’ data to model route attributes [9], build data set of
and requires O(n2 ) space to store the similarities between n items   popular locations [6], and even mine their preferences and tem-
[20].                                                                 poral information [31]. Nevertheless, these approaches focus on
   Among the latent factor models, MF is one of the most used         movement between points of interest or visiting locations, disre-
methods for recommendation problems [10]. The objective of            garding the evaluation of users’ preferences directly on the road
this technique is to represent rating data into two vectors of        network.
latent factors, in which the dimensionality may vary based on            Personalized route recommendation based on historical GPS
the data itself, so that the dot product of the two vectors result    trajectories is another branch of research. McGinty and Smyth
in approximate values of known ratings. The values of the la-         [15] proposed a personalized route planning method that consid-
tent factor vectors can be learned from the known data by the         ers historical trajectories to derive implicit driving preferences
minimization of regularized squared error with the use of the         without defining a preference model. The method combines and
stochastic gradient descent algorithm.                                reuses routes sections from previous travels to generate new
   Random walk is a process that describes the probabilities of       routes, but it fails to recommend routes to unfamiliar areas of the
series of random movements in a dimension space. In recom-            road network. As an extension to their previous work, McGinty
mender systems, one of the techniques that considers personal         and Smyth [8] used a type of distributed case-based reasoning
preferences is the RWR. While the usual approach of random            strategy, in which historical trajectories of similar drivers are
walk traverses a graph only based on its structure, leading to        borrowed to recommend routes in unfamiliar map territories.
an exclusive convergence, the RWR makes uses of a probability         However, considering the behaviors of similar drivers by directly
of returning to the original node on each movement. Therefore,        using their trajectories in the recommendation might not pre-
the technique allows personalization by having lower ranking          cisely reflect the preferences of a driver.
values given to nodes farther from the origin node [14]. RWR             Letchner et al. [12] introduced a method of personalized route
is graph-based, having users and items as nodes and ratings as        planning by considering users’ historical trajectories, extracted
from GPS readings, in the calculation of an inefficiency ratio that              a user’s movements. For a given user u and an associated ap-
represents the proportion of time extended in a trip compared to                 pearance behavior b, the frequency that user u has behavior b is
the shortest possible time. Liu et al. [13] explored personalized                represented by f rq(u, b).
route recommendation for self-drive tourists not only consid-                        Definition 6 - Transition behavior. A transition behavior
ering the drivers’ visiting preferences, but also real-time traffic              tb represents the relationship of two sequential appearance be-
information. In these approaches, the authors define a metric to                 haviors bi = (ei , t) and b j = (e j , t) at the same time interval
represent users’ implicit preferences but include other factors                  t, such that ei .end = e j .start. It is denoted as an ordered tuple
like distance and travel time as part of the objective function.                 tb = (bi → b j ) or tb(i→j) in short. Similarly, the frequency that
   Cui et al. [3] extracted users’ travel behaviors from their histor-           user u has transition behavior tb is given by f rq(u, tb).
ical GPS trajectories to represent their preferences. By predicting                  Both appearance and transition behaviors contribute to cap-
missing travel behaviors with CF technique, a route with maxi-                   ture the implicit travel preferences of each user. This implicit
mum probability is computed for specific users. In addition, Cui                 travel preference information is essential in solving the problem
and Wang [4] proposed a different representation of travel be-                   of the personalized route recommendation, as the recommenda-
havior and improved the performance of route recommendation.                     tions need to truly reflect users’ priorities.
However, the methodologies disregard the implicit data nature
and temporal dependency is not fully explored.
   A different approach is proposed by Wang et al. [23], in which
                                                                                 3.2    Framework Overview
neural networks are used to learn the optimal cost functions                     The framework of the proposed BR2 is illustrated in Figure 1. The
of the A* algorithm. The presented results show considerable                     first step of BR2 is data preprocessing. The GPS trajectories are
improvement in precision, recall, and F1-score, but the training                 split into trips with defined origins and destinations, the trips
time might impede its usage in real-world applications with new                  are matched to the road network by applying the map matching
data constantly being fed into the system.                                       technique, and the outliers are removed from the trajectories.
                                                                                 As a final step of the preprocessing component, the appearance
3     BEHAVIOR-BASED ROUTE                                                       and transition behaviors are generated from the historical users’
                                                                                 routes. Since users, in general, travel on few routes daily, cov-
      RECOMMENDATION
                                                                                 ering a limited number of road segments in a study area, the
This section discusses the proposed method of the Behavior-                      missing travel behaviors for each user need to be estimated. In
based Route Recommendation (BR2). The following preliminaries                    the second step, the RWR technique is used to estimate users’
are first defined for this research.                                             appearance and transition behaviors on each untraveled road
                                                                                 segment. With the missing behaviors frequencies estimated, the
3.1     Preliminaries                                                            temporal dependency is evaluated in the data set by building a
The preliminaries of this study are defined as follows.                          time interval difference histogram. The histogram indicates how
    Definition 1 - Road network. The road network is a graph                     the data is distributed and suggests the number of intervals that
G = (V , E) composed by a set of vertices V and edges E. A vertex                should be considered in the route recommendation process. Then,
v ∈ V represents the boundary of road segments and an edge                       the probabilities are calculated from the travel behaviors for a
e ∈ E represents a road segment, containing starting and end-                    defined origin and destination considering the Markov property.
ing vertices, denoted as e.start and e.end, respectively, where                  In addition, the Laplace smoothing method is applied to estimate
e.start ∈ V and e.end ∈ V .                                                      the probability of users’ missing travel behaviors for the road
    Definition 2 - GPS reading. A GPS-reading is a 3-tuple p =                   segments that have never been visited previously by any user.
(t, lat, lnд) in which t represents a timestamp, and lat and lnд                    Finally, the last stage is the recommendation of the route with
are the latitude and longitude of the location of the GPS-reading.               maximum travel behavior probability. To facilitate the route com-
    Definition 3 - GPS trajectory. A GPS trajectory tr j = (p1,                  putation, a behavior graph is constructed to represent the travel
p2, p3, · · · , pk ) consists of a sequence of GPS-readings, such that           behaviors and the relationship among them. Dijkstra’s algorithm
pi .t − p(i−1) .t > 0 and 1 < i ≤ k.                                             is used in the behavior graph to find the route maximum travel
    Definition 4 - Route. Given a road network G = (V , E), a                    behavior probability.
route R starting from vertex vi and ending at vertex v j is a se-
quence of connected road segments R = (vi , e 1, e 2, e 3, · · · , el , v j ),
where vi , v j ∈ V and ei ∈ E; ei is the i-th road segment in R,
                                                                                 3.3    Data Preprocessing
ei , e j if i , j, e 1 .start = vi , and el .end = v j .                         The preprocessing can be divided into three parts. First, since
    Two types of behaviors are proposed to extract the user’s                    a GPS trajectory usually tracks users for a long period of time
movement behaviors on the road network. First, to provide a                      and contains multiple trips of the users, trajectory segmenta-
global overview of how frequently users are in specific locations                tion is first applied to divide the raw trajectory into several
at a certain time on the road network, the concept of appearance                 sub-trajectories [3]. After the trajectory segmentation, each GPS
behavior is defined to represent the relation between location                   trajectory corresponds to a single travel route. Second, since
and time of users’ movement. The second, transition behavior,                    the trajectory is usually noisy due to the urban canyon or mea-
captures the sequential relation of appearance behaviors, not                    surement errors, outliers in GPS trajectories are detected and
only giving a sense of location regularity but most importantly                  removed. In this study, if the distance between a GPS point and its
of direction. Both behaviors are formally described as follows.                  nearest road segment exceeds a threshold of 180 seconds, the GPS
    Definition 5 - Appearance behavior. An appearance behav-                     point is considered as an outlier. Besides, given the maximum
ior is defined as tuple of the road segment and the time, denoted                moving speed, if the distance that a GPS point moves exceeds a
as b = (e, t), where e is an edge of the road network and t is a                 threshold within the time interval from its previous GPS point, it
time interval of a day, and it describes the location and time of                is also taken as outlier. Lastly, GPS trajectories are mapped onto
                                   Preprocessing
                                                                      Travel behavior calculation

                                 GPS                 Road
                              trajectory            network                                                    Route recommendation
                                                                                 Travel behavior
                                                                              frequency estimation
                                                                                     (RWR)
                                     Map matching                                                                       Behavior graph
                                    & segmentation                                                                        generation
                                                                                    Temporal
                                                                                  dependency
                                                                                   processing
                                       Historical                                                                        Route search
                                        routes
                                                                              Behavior probability
                                                                                  estimation
                                        Travel
                                       behaviors


                                Figure 1: Behavior-based Route Recommendation framework overview.

                                                                                   Users                   Behaviors                                   Users     Behaviors
the road network to obtain users’ historical routes with the map
matching method proposed in [17].                                                    u1          w11          b1


                                                                                                                                         Users
                                                                                                                                                         1          2
                                                                                                 w1r
                                                                                                                                                                   UBn×m
3.4     Behavior Frequency Estimation                                                u2
                                                                                                w2(r+1)       br


                                                                                                                                         Behaviors
Each user in general covers a very limited number of road seg-                       u3          w3r
                                                                                                              tb1                                        3          4
ments in a city, so the missing travel behaviors for each user
                                                                                                 wn(r+1)                                                UBn×mT
need to be estimated. In this section, we discuss how the RWR is
applied in the estimation and provide the motivation for the use                     un          wnm          tbq
of ORWR.
                                                                                          (a) Bipartite graph                                        (b) Adjacency matrix
   3.4.1 Random Walk with Restart for Behavior Estimation. In
this study, the appearance and transition behaviors are first ex-                           Figure 2: User-behavior graph representations.
tracted from the historical users’ trajectories and a user-behavior
matrix U Bn×m is built. In user-behavior matrix, rows represent
users U = (u 1, u 2, · · · , un ), columns represent both appearance               is a m × m matrix (behaviors by behaviors) composed similarities
and transition behaviors B = (b1, b2, · · · , br , tb1, tb2, · · · , tbq ),        between behaviors. This structure is commonly unoptimized due
where m = r + q, and the element U Bi,j of user i and behavior                     to the exceedingly higher number of behaviors compared to the
j represents the frequency of the pair (ui , b j ) or (ui , tb j ). The            number of users in a data set.
user-behavior matrix can also be represented as a user-behavior                        The adjacency matrix and restart probability c are provided
graph, in which the nodes (users and behaviors) are connected                      as inputs to the RWR algorithm. The score matrix r , is calculated
through edges with weights as the corresponding frequency of                       by:
the user-behavior pair, as illustrated in Figure 2a.                                                      "               #
   The RWR technique tackles the problem based on a graph and                                                  0   U B n×m (t )
                                                                                       r (t +1) = (1 − c)     T             r + c I (n+m)×(n+m)
can approach it with an adjacency matrix representation. Consid-                                           U B n×m     0
ering the structure of the user-behavior graph shown in Figure
2a, the idea of RWR is to traverse the graph by either moving to                   where U B n×m represents the row-normalized user-behavior ma-
a neighbor node or going back to an initial node based on a given                           T
                                                                                   trix, U B nxm represents the transposed row-normalized user-
restart probability value. Therefore, a behavior node is highly                    behavior matrix, r (t ) is the score matrix of t-th iteration – initially
related to a user when it is visited multiple times. A behavior                    set as an identity matrix – and I (n+m)×(n+m) is the identity matrix
node associated with a higher score value represents the higher                    with (n + m) rows and columns. The score matrix is calculated
probability of being visited from a user node when the graph is                    iteratively until convergence, in which the difference between
traversed. The score values are represented as the weights of the                  the new and previous scores is smaller than a defined threshold
edges in the graph. Overall, the RWR estimates the probability                     ε.
values of all edges in the user-behavior graph by incrementally
                                                                                                                    |r (t +1) − r (t ) | < ε
updating the user-behavior probability values based on past be-
haviors of the user and the behaviors of the similar behaving                      After convergence, the RWR algorithm returns the score matrix,
user. Since users and behaviors are interpreted as nodes in an                     which contains the normalized probability values associated with
undirected graph, the adjacency matrix of the graph is generated                   the previously unknown behavior frequencies.
with the combination of both users and behaviors, resulting in a                      Considering the total number of users n and behaviors m,
large and sparse symmetric matrix, illustrated in Figure 2b. The                   the adjacency matrix representation contains (n + m)2 elements.
adjacency matrix has (n + m) rows and columns, composed by                         Consequently, the time complexity of the algorithm is defined by
four distinct parts: part 1 is a n × n matrix (users by users) com-                the total number of iterations for convergence k and the matrices
posed by similarities between users, part 2 is the user-behavior                   multiplication in every iteration, resulting in a time complexity
matrix, part 3 is the transposed user-behavior matrix, and part 4                  of O(k(n + m)3 ).
    3.4.2 Optimized Random Walk with Restart for Behavior Es-                                        1                                                                 1.2
                                                                                                                                                                                                      Exponential Trendline
                                                                                                                                                                                                      Normalized Frequency

timation. The common RWR algorithm approach makes use of                                            0.8
                                                                                                                                                                        1


                                                                                                                                                Normalized Frequency
                                                                             Normalized Frequency
an adjacency matrix as input, processed for as many iterations                                      0.6
                                                                                                                                                                       0.8


as needed until the process converges. However, to fit the data                                     0.4
                                                                                                                                                                       0.6


                                                                                                                                                                       0.4
into an adjacency matrix representation, both sets of users and
                                                                                                    0.2
                                                                                                                                                                       0.2
behaviors need to be combined in rows and columns to form a
                                                                                                     0                                                                  0
symmetric matrix, as shown in Figure 2b.                                                                  -10    -5        0         5
                                                                                                                Time Difference Interval
                                                                                                                                           10                                0   1              2
                                                                                                                                                                                     Time Difference Interval
                                                                                                                                                                                                                3             4


    This representation could be useful if similarity measures be-
                                                                                                                       (a)                                                                      (b)
tween users and between behaviors are available, describing the
inner relations between themselves, as the data could be used to
fill parts 1 and 4 of the adjacency matrix presented in Figure 2b.          Figure 3: (a) Normalized time difference histogram. (b)
Nevertheless, since we do not explore the similarities between              Five first intervals of time difference histogram approxi-
users and between behaviors, the matrix representation is un-               mated by an exponential function (a = 1.0778, b = −0.58).
questionably unoptimized and leads to unnecessary processing
overhead, as the total number of elements in the matrix is ex-
                                                                            following equation:
ceedingly higher than the actual data in the user-behavior matrix.
For instance, for n users and m behaviors the total number of
                                                                                                              !                        !
                                                                                           Ír      f rq(bi )     Íq        f rq(tbi )
                                                                                                                + i=1                    |k | = 0
                                                                                          
elements in the adjacency matrix is (n +m)2 while the actual data
                                                                                          
                                                                                           i=1
                                                                                          
                                                                                          
                                                                                                        2                        2
consists of the part 2 in Figure 2b, containing (n ∗ m) elements.             f req(k) = Ír Ír
                                                                                          
                                                                                          
                                                                                                           f rq(bi ) f rq(b j ) +          i , j and
    To better handle the estimation of missing behavior frequen-                           Íqi=1 Íqj=1
                                                                                          
                                                                                          
                                                                                          
cies we use the ORWR algorithm, which considers the user-
                                                                                          
                                                                                           i=1 j=1
                                                                                                          f rq(tb i ) f rq(tb  j )            |k | , 0
behavior bipartite graph represented by the user-behavior matrix
                                                                                Since we evaluate all behaviors against themselves, only the
only [19]. Using the user-behavior matrix instead of the entire
                                                                            absolute time difference intervals are calculated and equally di-
adjacency matrix has a strong impact in performance, resulting
                                                                            vided for their correspondent interval. For instance, frequencies
in a time complexity reduction from O(k(n + m)3 ) to O(knm 2 ).
                                                                            of time difference intervals of absolute value of one hour are
                                                                            calculated and divided in half for the intervals of -1 and 1. Finally,
3.5    Temporal Dependency                                                  the obtained frequencies are normalized in the range from zero
Time is one of the important factors that influence users’ actions          to one. For example, Figure 3a illustrates a histogram obtained
throughout the day, as people are more prone to go to different             from a real-world data set Geolife by the computation of time
places during specific times [27]. In addition, it is possible to           difference of behaviors of all driving travelers.
identify that people present similar travel behaviors at closer                 As expected, most of the frequencies are concentrated in the
time intervals. For instance, many people go to work from home              first few intervals close to the zero interval, showing that most
by roughly the same route at similar times in the morning and               behaviors happen at similar time intervals. As an attempt to avoid
it is not expected for them to show this movement behavior                  overfitting and to reduce the noise from the data, a function can be
during the evening, many hours apart from their common travel               used to represent the values with most importance, closer to the
behavior. Therefore, the temporal information should also be                zero interval. In this study, an exponential function f (x) = ae bx
considered when providing personalized route recommendations.               is used to represent the data while conveying the notion of decay
    An appropriate strategy to predict user’s behaviors by con-             as the time interval is farther from the evaluated behavior. Figure
sidering temporal information is to study the relation between              3b presents how closely an exponential function represents the
behaviors at different time intervals. Behaviors associated to a            first five intervals of the distribution.
given road segment are most likely to be shown on closer time                   The values from the exponential function for each time inter-
intervals. Temporal dependency is implemented in this study                 val allow weighting the behaviors frequencies from time intervals
by considering existing behaviors of the same road segment at               close to the recommendation time interval. In Figure 3b, for in-
similar time intervals in the calculation of travel behaviors’ prob-        stance, the weighting values w1, w2, w3, and w4, regarding the
abilities.                                                                  time difference intervals of 1 to 4, are approximately 0.6, 0.3,
    To identify how behaviors at different time intervals impact            0.2, and 0.1, respectively. The frequencies of similar time inter-
the route recommendation process, a time difference histogram               vals are weighted by multiplying them with the corresponding
is generated by comparing behaviors of all users related to the             value of the exponential function for a time difference inter-
same road segments in pairs. For example, if there are 24 time              val. The weighted frequencies are added to the frequency of the
intervals in total, each time interval representing each hour of the        behavior at the time interval used for recommendation. For ex-
day, the time difference histogram is divided into values ranging           ample, for two appearance behaviors bi and b j of the same road
from -12 to 12, with increments of one, and each interval consists          segment and user u at time intervals t and t + 1 respectively,
of the frequency that behaviors related to the same road segment            if the recommendation is provided at time interval t and the
happened at a specific time difference.                                     correspondent value of the exponential function for a time dif-
    A pair of appearance behaviors bi and b j refer to the same             ference interval of 1 is w1, the total frequency of behavior bi is
road segment if bi .e = b j .e and the correspondent absolute time          f rq(u, bi ) = f rq(u, bi ) + w 1 f rq(u, b j ).
difference interval is defined as |k | = |bi .t − b j .t |. Since transi-
tion behaviors consist of sequential appearance behaviors, the              3.6                             Probability Calculation for Travel
rationale is the same. Given a total number of appearance and                                               Behavior
transition behaviors r and q respectively, the frequency for each           Based on behavior frequencies, probabilities are computed for
time difference interval value k is computed according to the               a user u given a route R at a specific time t. The route with
maximum probability reflects the route that the user is most
                                                                                             v1             v2              v3               b1              b2              b3
inclined to take, and it is preferred above all others. The route                                      e1         e2

probability is defined as:                                                                             b1         b2                              ϑ(b1,b2)        ϑ(b2,b3)

                                                   P(e 1, e 2, e 3, · · · , el , t |u)       e4                             e3       v1                                           v6


                                                                                                                       b3
                                                                                                  b4
  P(R|u, t) = P(e 1, e 2, e 3, · · · , el |u, t) =
                                                               P(t |u)                                 b5         b6
                                                                                                                                                  ϑ(b4,b5)        ϑ(b5,b6)

where e 1, e 2, e 3, · · · , el represent the road network edges, and                        v4        e5   v5    e6        v6               b4              b5              b6
e 1 and el are the incident edges with the origin and destination
vertices, respectively. If the user u and time t are known, P(t |u)
                                                                                                  (a) Road network                          (b) Behavior graph
is constant, thus simplifying the problem to the maximization of
the numerator.
                                                                                         Figure 4: Comparison of road network and behavior graph,
    With the assumption that the series of behavior probabili-
                                                                                         given a trajectory from origin v 1 to destination v 6 .
ties are described as Markov property, the probability of a user
behavior depends on the immediate previous behavior, if not re-
lated to the origin, simplifying the representation of the problem.                      Table 1: Summary of training data used in experiments.
Extending the previous equation, the route probability is given
by:                                                                                                                                                       Data set
                                                                                                                 Features
                                                                                                                                                   Geolife Shenzhen
   P(b1, b2, b3, · · · , bl |u) = P(b1 |u)P(tb1→2 |u)P(tb2→3 |u) · · ·                       Users                                                    22          274
                                                                  P(tbl −1→l |u)             Behaviors                                              26,123     677,068
where P(b1 |u) is the appearance behavior probability of origin                              Time interval                                         1 hour       1 hour
and P(tbi−1→i |u) is the transition behavior probability, from                               Elements in user-behavior matrix                      574,706 185,516,632
appearance behavior bi−1 to appearance behavior bi .                                         Sparseness in user-behavior matrix                    95.45%       99.63%
   Since traditional pathfinding algorithms search for the path                              Segments in road network                               15,493      29,411
with the minimal weight, the probability function is transformed                             Vertices in road network                               38,485      62,113
in order to shift the problem objective from maximization to
minimization. It is achieved by using the logarithm of inversed
probabilities as follows:                                                                edges. Therefore, a different structure is constructed to enable
                                                                                         graph traverse considering probabilities. It is denominated behav-
                                    1
        L=                                                                               ior graph and its structure is illustrated in Figure 4. The behavior
             P(b1 |u)P(tb1→2 |u)P(tb2→3 |u) · · · P(tbl −1→l |u)                         graph represents each vertex as an appearance behavior and each
                                       1                                                 edge as a transition behavior. The weight ϑ defines the correspon-
      ln L = ln
                P(b1 |u)P(tb1→2 |u)P(tb2→3 |u) · · · P(tbl −1→l |u)                      dent behavior probability. In addition, the origin and destination
                            l −1                                                         vertices of a given trajectory are included in the structure and
                     1      Õ       1
      ln L = ln           +      ln                                                      their edges represent appearance behaviors, illustrated with a
                  P(b1 |u) i=1 P(tbi→i+1 |u)
                                                                                         different notation from other edges and vertices in the graph.
   If there is no historical data of users’ travels through some                            Despite adding complexity in the process by generating a
road segments, the related behaviors will not exist. To prevent the                      new structure to represent probabilities, the personalized recom-
designation of zero to the probabilities of nonexistent behaviors,                       mended route can be computed as the least log-inverse proba-
the Laplace smoothing method is employed for both appearance                             bility path, which can be solved by any traditional shortest path
and transition behaviors by considering the following:                                   algorithm.
                          fd
                           r q(u,b)+α
                                                         f rq(u, b) > 0                  4        EXPERIMENTAL RESULTS
                       
                                                        d
                       
                            bi ∈S o f r q(u,bi )+α d
                       Í
          P(b |u) =
                             d
                               α                       otherwise                        In this section, the performance of the proposed BR2 is evaluated
                     bi ∈So fd
                              r q(u,bi )+α d
                    
                    Í
                                                                                         through the experiments on two real data sets. The first data set,
                           fd
                            r q(u,tbi →j )+α                                             Geolife [28–30], contains trajectories extracted from drivers in
                                                           f rq(u, tbi→j ) > 0
                       
                       
                                                          d
   P(tbi→j |u) =            k =1 f r q(u,tbi →k )+α d
                        Íd d
                                                                                        the central district of Beijing and the second data set consists of
                                  α                       otherwise                     taxi drivers’ trajectories from Shenzhen. The training data has
                        Íd fd
                        k =1 r q(u,tbi →k )+α d
                       
                                                                                         80% of the trajectories while 20% was used for testing purposes.
                                                                                         Some details of the data sets are presented in Table 1.
                                                                                            Four accuracy measures are considered to evaluate the perfor-
where d f rq represents the estimated behavior frequency, P(b|u)
                                                                                         mance of the model: precision and recall based on the number of
is the probability of appearance behavior b of user u at time t, So
                                                                                         road segments and based on the distance of road segments. The
is the set of appearance behaviors starting at origin o, P(tbi→j |u)
                                                                                         measures definitions are presented as follows.
is the probability of transition behavior tbi→j , α is the smooth-
ing parameter, and d is number of appearance behaviors – the                                                                        # of correct road segments
                                                                                          Precision segments =
behaviors related to the destination road segment, in the case of                                                           # of road segments on recommended route
transition behavior.
                                                                                                                                    # of correct road segments
                                                                                                       Recall segments =
3.7      Route Search Based on Probability                                                                                       # of road segments on true route
The structure of the road map network graph does not allow                                                                       distance of correct road segments
                                                                                                   Precision distance =
the representation of travel behavior probabilities as weights in                                                                 distance of recommended route
Table 2: Recommendation performance of baseline meth-                 Table 3: Recommendation performance with different CF
 ods and BR2 with and without temporal dependency (BR2                methods.
- TD).
                                                                                         Precision             Recall
                    Precision            Recall                                    Segments Distance Segments Distance
              Segments Distance Segments Distance                                                Geolife
                           Geolife                                        IBF        0.507       0.517   0.522        0.549
 SD             0.246       0.266  0.215        0.239                     MF         0.533       0.537   0.543        0.569
 MaP2R          0.533       0.537  0.543        0.569                     ORWR       0.576       0.573   0.562        0.587
 BR2 - TD       0.576       0.573  0.562        0.587                                           Shenzhen
 BR2            0.631       0.637  0.613        0.642                     IBF          -           -       -            -
                          Shenzhen                                        MF         0.555       0.562   0.520        0.562
 SD             0.289       0.311  0.247        0.267                     ORWR       0.628       0.632   0.580        0.613
 MaP2R          0.555       0.562  0.520        0.562
 BR2 - TD       0.628       0.632  0.580        0.613
 BR2            0.688       0.692  0.626        0.654
                                                                      for Geolife data set and 1.0−7 for Shenzhen, the convergence
                                                                      value in ORWR as 5.0−10 , and the restart probability as 0.5 for
                         distance of correct road segments
        Recall distance =                                             Geolife and 0.1 for Shenzhen. The results are shown in Table 3.
                               distance of true route                 Due to the large size of the Shenzhen data set, the experiment
The different aspects of the route recommendation performance         with the item-based CF could not be conducted, as a result of
are assessed by the conduction of experiments. First, we compare      scalability issue [20].
the performance of BR2 with other baseline methods. Second,              As seen in the results, for both data sets, the route recommen-
we evaluate the route recommendation performance based on             dation with the ORWR obtained the highest precision and recall
behavior estimation, considering the ORWR in the proposed BR2         values for the number of road segments and distance compared
method and the previously discussed CF techniques. Last, the          to IBF and MF due to its capability to better handle implicit data.
influence of temporal dependency on the performance of route          In addition, the higher amount of data in Shenzhen justifies a
recommendation is assessed.                                           larger gap in performance between the ORWR and the other
                                                                      methods.
4.1    Overall Recommendation Performance
In this experiment, the recommendation performance of BR2 is          4.3    Temporal Dependency Impact
compared against MaP2R [4] and the shortest distance (SD) route
method. With respect to the parameters associated with MaP2R,         This experiment evaluates the influence of the temporal de-
the number of latent factors in MF was set as 30 and the Laplace      pendency (TD) in the accuracy of recommendations. The ex-
smoothing parameter as 1.0−5 for Geolife data set and 1.0−7 for       ponential functions, y = 1.067e −0.57 |x | for Geolife data set and
Shenzhen. The BR2 used the same values as MaP2R for the Laplace       y = 1.0057e −0.038|x | for Shenzhen data set, were built to repre-
smoothing parameters, the ORWR considered a convergence               sent the generated data distribution histogram, as exemplified
value of 5.0−10 and restart probability of 0.5 for Geolife and 0.1    in Figure 3b. The weight for the behaviors with different time
for Shenzhen data sets, and three time difference intervals were      intervals was determined according to the function. These func-
used for temporal dependency. The obtained precision and recall       tions were obtained considering behaviors of the closest three
values of the three recommendation methods are presented in           time intervals, as it yielded the best accuracy gain in the recom-
Table 2.                                                              mendation. The results presented in Table 2 show the difference
   The obtained results show that BR2 outperforms the other           in precision and recall of BR2 when temporal dependency was
methods in all accuracy measurements. For Geolife data set, BR2       not considered.
performs on average 38.9% better than the shortest distance route        As seen in the results, the overall accuracy of the model consid-
method and 8.5% better than MaP2R. Similarly, for Shenzhen data       erably decreased when temporal dependency was not considered,
set, BR2 shows an average enhancement of 38.6% and 11.5%. The         negatively impacting the accuracy, on average, by 5.6% and 5.1%
better performance of BR2 is due to a more effective estimation       for Geolife and Shenzhen data sets, respectively. The impact of
of users’ travel behaviors using RWR and to the consideration of      temporal dependency depends on the behavior distribution and
temporal dependency between travel behaviors.                         the amount of similar behaviors considered in the recommenda-
                                                                      tion process.
4.2    Performance Based on Behavior
       Estimation                                                     5     CONCLUSION AND FUTURE WORK
The effective estimation of users’ travel behaviors is critical for   In this study, we proposed the Behavior-based Route Recom-
personalized route recommendation. In this experiment, we com-        mendation (BR2) to provide personalized routes based on users’
pare the route recommendation performance of ORWR against             preferences. The methodology uses the appearance and transition
two popular CF methods, i.e. the item-based CF (represented           behaviors to capture users’ travel preferences, adopts the ORWR
as IBF) and MF, in the estimation of the frequencies of missing       to estimate missing behavior frequencies, evaluates temporal
behaviors. We do not consider the temporal dependency in the          dependency through a time difference interval histogram, and
experiment.                                                           searches for the route with maximum probability on the behav-
   In this analysis, the number of latent factors in matrix factor-   ior graph. Experiments show the effectiveness of the proposed
ization was set as 30, the Laplace smoothing parameter as 1.0−5       method.
   The lack of data of users with few behaviors was alleviated                                   Conference on Advances in Geographic Information Systems (GIS ’09). ACM,
with temporal dependency but the problem with routes without                                     New York, NY, USA, 336–343. https://doi . org/10 . 1145/1653771 . 1653818
                                                                                            [18] H. Park, J. Jung, and U. Kang. 2017. A comparative study of matrix fac-
any previous data was not focused in this research. It is believed                               torization and random walk with restart in recommender systems. In 2017
that with the inclusion of spatial correlation and other data such                               IEEE International Conference on Big Data (Big Data). 756–765. https:
                                                                                                 //doi . org/10 . 1109/BigData . 2017 . 8257991
as road type, number of lanes, speed limit, and traffic density can                         [19] Seyyed Mohammadreza Rahimi, Rodrigo Augusto de Oliveira e Silva, Behrouz
lead to a more concise and robust recommendation. Future work                                    Far, and Xin Wang. 2019. Optimized Random Walk with Restart for Recom-
also includes the comparison of the optimized model with other                                   mendation Systems. In Advances in Artificial Intelligence, Marie-Jean Meurs
                                                                                                 and Frank Rudzicz (Eds.). Springer International Publishing, Cham, 320–332.
personalized route recommendation methods that may provide a                                [20] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-
treasure trove of improvements to be incorporated to the model.                                  based Collaborative Filtering Recommendation Algorithms. In Proceedings of
                                                                                                 the 10th International Conference on World Wide Web (WWW ’01). ACM, New
                                                                                                 York, NY, USA, 285–295. https://doi . org/10 . 1145/371920 . 372071
Acknowledgments. The research is supported by the Natural                                   [21] A. M. Tawfik, H. A. Rakha, and S. D. Miller. 2010. Driver route choice behav-
Sciences and Engineering Research Council of Canada Discovery                                    ior: Experiences, perceptions, and choices. In 2010 IEEE Intelligent Vehicles
Grant to Xin Wang.                                                                               Symposium. 1195–1200. https://doi . org/10 . 1109/IVS . 2010 . 5547968
                                                                                            [22] Hanghang Tong, Christos Faloutsos, Christos Faloutsos, and Jia-Yu Pan. 2006.
                                                                                                 Fast Random Walk with Restart and Its Applications. In Proceedings of the Sixth
REFERENCES                                                                                       International Conference on Data Mining (ICDM ’06). IEEE Computer Society,
                                                                                                 Washington, DC, USA, 613–622. https://doi . org/10 . 1109/ICDM . 2006 . 70
 [1] P. Campigotto, C. Rudloff, M. Leodolter, and D. Bauer. 2017. Personalized and          [23] Jingyuan Wang, Ning Wu, Wayne Xin Zhao, Fanzhang Peng, and Xin Lin. 2019.
     Situation-Aware Multimodal Route Recommendations: The FAVOUR Algo-                          Empowering A* Search Algorithms with Neural Networks for Personalized
     rithm. IEEE Transactions on Intelligent Transportation Systems 18, 1 (Jan 2017),            Route Recommendation. In Proceedings of the 25th ACM SIGKDD International
     92–102. https://doi . org/10 . 1109/TITS . 2016 . 2565643                                   Conference on Knowledge Discovery & Data Mining (KDD ’19). ACM, New York,
 [2] Z. Chen, H. T. Shen, and X. Zhou. 2011. Discovering popular routes from                     NY, USA, 539–547. https://doi . org/10 . 1145/3292500 . 3330824
     trajectories. In 2011 IEEE 27th International Conference on Data Engineering.          [24] Ling-Yin Wei, Yu Zheng, and Wen-Chih Peng. 2012.                     Construct-
     900–911. https://doi . org/10 . 1109/ICDE . 2011 . 5767890                                  ing Popular Routes from Uncertain Trajectories. In Proceedings of the
 [3] Ge Cui, Jun Luo, and Xin Wang. 2018.                      Personalized travel route         18th ACM SIGKDD International Conference on Knowledge Discovery and
     recommendation using collaborative filtering based on GPS trajectories.                     Data Mining (KDD ’12). ACM, New York, NY, USA, 195–203.                   https:
     International Journal of Digital Earth 11, 3 (2018), 284–307.                  http:        //doi . org/10 . 1145/2339530 . 2339562
     //www . tandfonline . com/doi/abs/10 . 1080/17538947 . 2017 . 1326535                  [25] S. Wei, N. Ye, S. Zhang, X. Huang, and J. Zhu. 2012. Item-Based Collaborative
 [4] Ge Cui and Xin Wang. 2017. MaP2R: A Personalized Maximum Probability                        Filtering Recommendation Algorithm Combining Item Category with Inter-
     Route Recommendation Method Using GPS Trajectories. In Advances in Knowl-                   estingness Measure. In 2012 International Conference on Computer Science and
     edge Discovery and Data Mining, Jinho Kim, Kyuseok Shim, Longbing Cao,                      Service System. 2038–2041. https://doi . org/10 . 1109/CSSS . 2012 . 507
     Jae-Gil Lee, Xuemin Lin, and Yang-Sae Moon (Eds.). Springer International              [26] P. Yawalkar and S. Ranu. 2019. Route Recommendations on Road Net-
     Publishing, Cham, 168–180.                                                                  works for Arbitrary User Preference Functions. In 2019 IEEE 35th In-
 [5] J. Dai, B. Yang, C. Guo, and Z. Ding. 2015. Personalized route recommendation               ternational Conference on Data Engineering (ICDE). 602–613.               https:
     using big trajectory data. In 2015 IEEE 31st International Conference on Data               //doi . org/10 . 1109/ICDE . 2019 . 00060
     Engineering. 543–554. https://doi . org/10 . 1109/ICDE . 2015 . 7113313                [27] Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, and Nadia Magnenat Thal-
 [6] Siyuan Du, Hua Zhang, Hualin Xu, Jirui Yang, and Oscar Tu. 2019. To make the                mann. 2013. Time-aware Point-of-interest Recommendation. In Proceedings
     travel healthier: a new tourism personalized route recommendation algorithm.                of the 36th International ACM SIGIR Conference on Research and Develop-
     Journal of Ambient Intelligence and Humanized Computing 10, 9 (01 Sep 2019),                ment in Information Retrieval (SIGIR ’13). ACM, New York, NY, USA, 363–372.
     3551–3562. https://doi . org/10 . 1007/s12652-018-1081-z                                    https://doi . org/10 . 1145/2484028 . 2484030
 [7] Stefan Funke and Sabine Storandt. 2015. Personalized route planning in road            [28] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. Under-
     networks. In Proceedings of the 23rd SIGSPATIAL International Conference on                 standing Mobility Based on GPS Data. In Proceedings of the 10th International
     advances in geographic information systems (SIGSPATIAL ’15), Vol. 03-06-. ACM,              Conference on Ubiquitous Computing (UbiComp ’08). ACM, New York, NY, USA,
     1–10.                                                                                       312–321. https://doi . org/10 . 1145/1409635 . 1409677
 [8] Lorraine Mc Ginty and Barry Smyth. 2001. Collaborative Case-Based Reason-              [29] Yu Zheng, Xing Xie, and Wei-Ying Ma. 2010. GeoLife: A Collaborative Social
     ing: Applications in Personalised Route Planning. In Case-Based Reasoning                   Networking Service among User, location and trajectory. IEEE Data(base)
     Research and Development, David W. Aha and Ian Watson (Eds.). Springer                      Engineering Bulletin (June 2010).             https://www . microsoft . com/en-
     Berlin Heidelberg, Berlin, Heidelberg, 362–376.                                             us/research/publication/geolife-a-collaborative-social-networking-
 [9] Gang Hu, Yi Qin, and Jie Shao. 2018. Personalized travel route recommendation               service-among-user-location-and-trajectory/
     from multi-source social media data. Multimedia Tools and Applications (18             [30] Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining Interesting
     Oct 2018). https://doi . org/10 . 1007/s11042-018-6776-9                                    Locations and Travel Sequences from GPS Trajectories. In Proceedings of the
[10] Y. Koren, R. Bell, and C. Volinsky. 2009. Matrix Factorization Techniques                   18th International Conference on World Wide Web (WWW ’09). ACM, New
     for Recommender Systems. Computer 42, 8 (Aug 2009), 30–37. https:                           York, NY, USA, 791–800. https://doi . org/10 . 1145/1526709 . 1526816
     //doi . org/10 . 1109/MC . 2009 . 263                                                  [31] X. Zhu, R. Hao, H. Chi, and X. Du. 2017. FineRoute: Personalized and
[11] H. Kriegel, M. Renz, and M. Schubert. 2010. Route skyline queries:                          Time-Aware Route Recommendation Based on Check-Ins. IEEE Trans-
     A multi-preference path planning approach. In 2010 IEEE 26th Interna-                       actions on Vehicular Technology 66, 11 (Nov 2017), 10461–10469. https:
     tional Conference on Data Engineering (ICDE 2010). 261–272.                   https:        //doi . org/10 . 1109/TVT . 2017 . 2764999
     //doi . org/10 . 1109/ICDE . 2010 . 5447845
[12] Julia Letchner, John Krumm, and Eric Horvitz. 2006. Trip Router with
     Individualized Preferences (TRIP): Incorporating Personalization into Route
     Planning. https://www . microsoft . com/en-us/research/publication/trip-
     router-individualized-preferences-trip-incorporating-personalization-
     route-planning/
[13] Long Liu, Jin Xu, Stephen Shaoyi Liao, and Huaping Chen. 2014. A real-time
     personalized route recommendation system for self-drive tourists based on
     vehicle to vehicle communication. Expert Systems With Applications 41, 7
     (2014), 3409–3417.
[14] B. Lv, W. Yu, L. Wang, and J.A. McCann. 2014. Efficient processing node
     proximity via random walk with restart. Lecture Notes in Computer Science
     (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
     Bioinformatics) 8709, 542–549.
[15] Lorraine McGinty and Barry Smyth. 2000. Personalised Route Planning: A
     Case-Based Approach. In Advances in Case-Based Reasoning, Enrico Blanzieri
     and Luigi Portinale (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
     431–443.
[16] S. Nadi and M.R. Delavar. 2011. Multi-criteria, personalized route planning
     using quantifier-guided ordered weighted averaging operators. International
     Journal of Applied Earth Observation and Geoinformation 13, 3 (2011), 322 –
     335. https://doi . org/10 . 1016/j . jag . 2011 . 01 . 003
[17] Paul Newson and John Krumm. 2009. Hidden Markov Map Matching Through
     Noise and Sparseness. In Proceedings of the 17th ACM SIGSPATIAL International