=Paper=
{{Paper
|id=Vol-2086/AICS2017_paper_26
|storemode=property
|title=Inferring Waypoints in the Absence of Knowledge of Driving Style
|pdfUrl=https://ceur-ws.org/Vol-2086/AICS2017_paper_26.pdf
|volume=Vol-2086
|authors=Daniel A. Desmond,Ken Brown
|dblpUrl=https://dblp.org/rec/conf/aics/DesmondB17
}}
==Inferring Waypoints in the Absence of Knowledge of Driving Style==
Inferring Waypoints in the Absence of Knowledge of Driving Style Daniel A. Desmond, Kenneth N. Brown Insight Centre for Data Analytics, Department of Computer Science, University College Cork, Cork, Ireland {daniel.desmond, ken.brown}@insight-centre.org Abstract. We present an algorithm for predicting intervals which con- tain waypoints from a GPS trace of a multi-part trip without having access to historical data about the driver or any other aggregated data sets. We assume the driver’s driving style is not known, but that it can be approximated by one of a set of cost preferences. The method uses a set of repeated forward and backward searches along the trace, where each of the searches represents one of the driving costs. We evaluate the algo- rithm empirically on multi-part trips on real route maps. The algorithm selects the results of the search with the fewest number of intervals and we achieve over 95% recall on estimating waypoints while the intervals cover less than 9% of the trace. 1 Introduction Recreating intent from activity traces is an important aspect of security, missing persons search, assisted living, and retail analysis. In the retail analysis case the inference would require the analysis of large sets of activity traces from many people, while in assisted living there will be many traces from one person. In the security and missing persons case the inferences may have to be determined from a single trace for an individual. In such cases the activity traces may be trajectories through space with intermediate waypoints intepreted as intent. In previous work [1] we showed it was possible to predict intervals which contain up to 97% of waypoints from a driver’s GPS trace when a driver’s route choice is based on shortest paths. We now we relax this assumption and consider the case where the driver’s route choice may be based on different criteria. We assume that we have a model which is an abstraction of road network, which contains details of travel times and distances, tolls and road types but no other information. We present an algorithm which compares the results of a series of repeated forward and backward searches for shortest paths based on different cost functions which approximate different route choices. We evaluate the al- gorithm empirically using randomly generated locations from which multi-trip routes are generated by an online route planner. We demonstrate that the algo- rithm generates intervals in the traces which cover over 95% of the waypoints, and where the intervals cover less than 9% of the trace. The remainder of the paper is organised as follows: Section 2 discusses related work. Our proposed approach to the problem is introduced in section 3. Section 4 describes the form of the experiments. The results of the experiments are reported in section 5 and section 6 concludes the paper. 2 Related Work Wardrop [2] proposed that the minimization of travel time was the most im- portant criterion in route selection and his first principle of route choice states “the journey times on all routes actually used are equal, and less than those which would be experienced by a single vehicle on any unused route”. Studies by Duffel and Kalombatis [3] and Huchingson et al. [4] also found that travel time was considered to be the most important factor when deciding on which route to take. However in a more recent review of multiple studies Chen et al. [5] showed that route choice was based on more than travel time, with the driver’s preferences, familiarity with the areas being travelled and other judgements also accounting for how routes were chosen. Zhu and Levinson [6] also carried out an empirical test of Wardrops’s first principle and found that the majority of people did not choose the shortest path by time. Currently the majority of research based on prediction or inference using GPS traces revolves around destination prediction and establishing if patterns exist such as popular or heavily travelled routes in the traces being examined. These require the use of a number of different types of machine learning methods such as pattern recognition [7] and Hidden Markov Models [8] among others. All of these methods require the use of large amounts of historical data to build up their models and do not use a graph of the geographical area being studied. Where the inference of waypoints via creating sub-traces from the trace is used, the ultimate aim is still to predict the destination [9]. The most similar problem to the one we are tackling here is one studied by Kafsi et al [10], where the aim is to infer a set of waypoints from a GPS trace. They ignore the time component when segmenting the trace but use historical data to estimate the waypoints computing the entropy of conditional Markov trajectories. To the best of our knowledge, we are the first to present a method for inferring waypoints where historical data is not required and we only assume that the driver is consistent in their driving style for the whole trip. There exist a multitude of trip planners both on-line and off-line such as Google maps [11], and those built using openstreetmap data(OSM)[12] data [13,14]. Openrouteservice [15] is an open source route planner in which it is possible to create multi-point trips with different travel profiles or cost functions and download the trace data for the trip. In previous work [1], we demonstrated a method for inferring waypoints using only a GPS trace and no other historical or aggregated data, where the assump- tion was made that the creator of the trace used shortest paths by time. The method is based on repeated forward and backwards searches using Dijkstra’s algorithm [16]. The method is discussed in section 3. 3 Approach Our hypotheses are – Given a multipart trajectory constructed from point-to-point trips, in an un- known driving style, the individual destinations (waypoints) can be inferred from the results of multiple simulations of different driving styles. – The simulated driving style which produces the fewest waypoint estimates generates a reliable estimate of the true waypoints. Let G = (V, E, t, d, c, H, h) be a strongly connected, multi-weighted, directed graph embedded in a two-dimensional (2D) space. V is the set of vertices where each vertex is a location in the space. E is the set of directed edges (vi , vj ) where vi , vj ∈ V and so each edge represents a road segment. t is a function t : E −→ N+ representing the cost of traversing an edge in seconds. d is a function d : E −→ N+ representing the cost of traversing an edge in miles. c is a function c : E −→ N+ representing the cost of using a toll in US dollars. h is a function h : E −→ H representing the type of road of the edge. H is set of different road types. A trip s is a sequence of vertices and s̄ is the last vertex in s. A multitrip M is a sequence of trips hs1 , s2 , ..., sj i such that s¯i is the first vertex in si+1 . A flattened multitrip is a sequence of vertices created by flattening the multitrip. A trace T = hv1 , v2 , ..., vk i is a sequence of vertices sampled in order from the flattened multitrip. Given a trace our aim is to reconstruct the individual trips i.e. the endpoints hs¯1 , s¯2 , ..., sj−1 ¯ i from the multitrip. We allow a relaxation in which our aim is to output a list of intervals h[a1 , b1 ], [a2 , b2 ], ..., [aj , bj ]i where s¯i is contained within [ai , bi ]. For this work we define a driving style to be a route choice preference function using an appropriate cost function χ. A shortest χ-path is the path with the smallest χ cost. If χ is based on travel time then the χ least cost path will be the path that takes the least time. To accommodate inaccuracies in travel costs in our abstraction of the road network and the driver’s bounded knowledge of the route taken we introduce an ε-shortest χ-path. Definition 1. ε-shortest χ-path: Path P from A to B is an ε-shortest χ-path from A to B if there is no other A, B path Q with χ(Q) ≤ χ(P) - ε. Definition 2. ε-shortest χ-path(percentage): Path P from A to B is an ε-shortest χ-path from A to B if there is no other A, B path Q with χ(Q) ≤ ( 100−ε 100 ) ∗ χ(P), where ε is a percentage. In our previous work [1] we presented an algorithm and demonstrated its success at inferring intervals containing waypoints when the driver was following the shortest path by time. In the current work we do not know the driving style. Instead we simulate different driving styles (approximated by different cost functions χ) and infer the intervals which contain waypoints from the results of the computations. The new algorithm modified for a specific cost function χ is Algorithm 1: Waypoint Estimation With Cost Function input : Allowable Tolerance ε input : Heading Tolerance α input : cost function χ input : Trace T output: List K of intervals 1 List K 2 List ST // will hold the calculated sub-traces 3 subT raceStart ←− T [0] 4 for i ← 2 to last point in T do 5 previous ←− T [i − 2] 6 current ←− T [i − 1] 7 next ←− T [i] 8 heading1 ←− heading traveled from previous to current 9 heading2 ←− heading traveled from current to next 10 if difference between heading1 and heading2 is an α-heading change then 11 add interval [previous, next] to K 12 add sub-trace from subT raceStart to previous to ST 13 subT raceStart ←− next 14 end 15 end 16 for st ∈ ST do 17 source ←− st[0] 18 while not at end of st do 19 Searching from source find first point B on st which is not a χ-cost ε-shortest path 20 Searching back from B find first point A on reverse search of st which is not a χ-cost ε-shortest path 21 add interval [A,B] to K 22 source ←− B 23 end 24 end 25 return K shown in Algorithm 1. The algorithm infers waypoints using two methods which are carried out sequentially. In the first method (lines 4-15) we identify abrupt reversals of direction which would be caused by driving in to an area and leaving by the reverse route and add these to the intervals. As each point is a location in 2D space, each successive pair of points in trace T has a direction between them. Therefore we define a α-heading change as follows Definition 3. α-heading change: Difference between heading of travel from ti−1 to ti and heading of travel from ti to ti+1 is 180◦ ±α◦ . We then split the trace into subtraces, separated by the intervals for abrupt reversals, where each subtrace is made up of contiguous points of the trace. The second method (lines 16-24) for inferring waypoints involves searching forward through a subtrace until we find a point B which is the first point not on a ε-shortest χ-path, then search backwards from B until we find point A which is the first point not on a ε-shortest χ-path from B. Interval [A,B] is added to the list of intervals, and the forward search resumes from point B. When we have reached the end of the subtraces we return the list of intervals found. Based on the results of [1] a search that uses the correct cost function should produce approximately the correct number of intervals (or fewer if one waypoint is on the shortest path between its predecessor and successor waypoints). A simulation using an incorrect cost function χ should introduce extra intervals to account for deviations from its shortest χ-path, although it is possible that it could miss a waypoint between the intervals for the same reason as above. Therefore we suspect that the cost function that returns the fewest intervals is a good estimate for the true driving style. If more than one function produces the fewest intervals we need a tiebreaker. Given a list of intervals, we assume each interval will need to be examined for places of interest, therefore we could choose wide intervals where we are more likely to find waypoints, which we call maximum spread, or we could choose narrow intervals where the search time would be smaller at the cost of missing a waypoint, which we call minimum spread, where spread is defined as total width of the intervals in seconds. We define the spread in terms of seconds since the GPS points are timestamped and we cannot compute distance travelled between GPS points. The results from both of these options will be compared in section 5. If there is still more than one function remaining then random choice will be used to select one. Other options such as selecting by the maximum spread or minimum spread initially and then using the number of estimates as a tiebreaker were implemented but gave poor results and so are not discussed further. The pseudocode for the Waypoint Estimation Assuming No Driving Style Information is shown in Algorithm 2. The inputs are the trace T , ε, α, and a set of cost functions representing different driving styles. Initialize two lists, AK which will hold the list of each the intervals returned from each of the waypoint estimation algorithms we are interested in and K to hold the intervals which will be returned(lines 1-2). A counter C to keep track of the minimum number of intervals returned by any cost function is initialized to ∞ (line 3). For each of the cost functions we return a list L of intervals (line 5). If the number of intervals is equal to C then L is added to AK (lines 6-8). If the number of intervals is less than C then AK is cleared, L added to the now empty list and C set to the number of intervals in L (lines 9-13). If AK contains only one list of intervals then set K equal to this list of intervals (line 16). If AK contains more than one list, we select the one with minimum or maximum spread. If there are still ties, we randomly select a list with minimum or maximum spread (lines 17-24). Return the list K of the intervals found (line 25). Algorithm 2: Waypoint Estimation Assuming No Driving Style Infor- mation input : Allowable Tolerance ε input : Heading Tolerance α input : Set of Cost functions X input : Trace T output: List K of intervals 1 List AK 2 List K 3 Integer C ←− ∞ 4 for χ ∈ X do 5 List L = Waypoint Estimation with Cost Function (T , ε, α, χ) 6 if L.size = C then 7 add L to AK 8 end 9 if L.size