Reconstructing Uncertain Pedestrian Trajectories From Low-Sampling-Rate Observations Ricardo Miguel Puma-Alvarez and Alneu de Andrade Lopes Instituto de Ciências Matemáticas e de Computação Universidade de São Paulo - Campus de São Carlos 13560-970 São Carlos, SP, Brazil rpuma@usp.br, alneu@icmc.usp.br Abstract sonic and infrared systems, etc (Feng and Zhu, 2016). All these technologies allow a large-scale The ever-greater number of technolo- generation of trajectory data of moving objects, gies providing location-based services which can be used to perform several data mining has given rise to a deluge of trajectory tasks. Thus, this scenario paved the way for the data. However, most of these trajectories rise of the trajectory data mining field. There are are low-sampling-rate and, consequently, several trajectory data mining applications such as many movement details are lost. Due to path discovery, location prediction, behavior anal- that, trajectory reconstruction techniques ysis, urban services improvement, etc. However, have been created to infer the missing there are still some important challenges to be ad- movement details and reduce uncertainty. dressed regarding storage, computation and trajec- Nevertheless, most effort has been put into tory data mining (Baraniuk, 2011). Due to stor- reconstructing vehicle trajectories. There- age and transmission issues, these trajectories are fore, we study the reconstruction of pedes- generally collected at low sampling rates, conse- trian trajectories by using road network in- quently, they have long time intervals between lo- formation. We compare a simple tech- cation updates. In general, these trajectories pro- nique that only uses road network infor- vide a very limited representation of the real paths. mation with a more complex technique This type of trajectories are called uncertain trajec- that, besides the road network, uses his- tories (Zheng and Zhou, 2011). torical trajectory data. Additionally, we Most often, trajectories can be tracked very ac- use three different trajectory segmentation curately with GPS-embedded devices like smart- settings to analyze their influence over re- phones or automotive navigation systems. Never- construction. Our experiment results show theless, a recent study has demonstrated that, aim- that, with the limited pedestrian trajec- ing at reducing energy consumption, the majority tory data available, a simple technique that of taxis of big cities use sampling intervals of two does not use historical data performs con- minutes (Wei et al., 2012). siderably better than a more complex tech- The high energy consumption of GPS impairs nique that does use it. Furthermore, our re- its use in smartphones for long periods of time. sults also show that trajectories segmented Furthermore, most social networks provide check- in such a way as to allow a greater dis- in services, which allow user location sharing. tance and time span between consecutive Thus, it is possible to create trajectories by sort- points obtain better reconstruction results ing these check-ins chronologically. In a simi- in the majority of the cases, regardless of lar way, trajectories can be generated from geo- the technique used. tagged photos in photo sharing sites like Flickr 1 . In spite of that, the location updates generated 1 Introduction through these sites are low-sampling-rate. Currently, there are many technologies providing Addressing this issue is very important for sev- location-based services. Some of them are the eral different trajectory data mining applications. GPS (Global Position System), RFID (Radio Fre- quency Identification), smartphone sensors, ultra- 1 https://www.flickr.com/ 161 For instance, trajectories generated from geo- thereby generating a set of network-constrained tagged photos could be reconstructed and used trajectories. Thirdly, we apply two different tra- in itinerary recommendations applications. Ad- jectory reconstruction techniques on this new tra- ditionally, other tasks like indexing and querying jectory set. We compare these two techniques, processing efficiency can be affected (Zheng et al., one of them a simple technique that only takes 2012). into account the road network information, and a Motivated by this problem, many works on tra- more complex one that besides the road network jectory reconstruction have been published. Most structure uses historical trajectory data. We show of them use road network information through a that, under limited data conditions, the simpler graph whose nodes represent intersections and ter- technique greatly outperforms the more complex minal points, and the edges depict road segments. technique in pedestrian trajectory reconstruction. On the other hand, there are also some works Furthermore, our results also demonstrate that tra- that do not take into account this kind of infor- jectories segmented in such a way as to allow a mation (Wei et al., 2012). These works aim to greater distance and time span between consecu- reconstruct trajectories in rural areas where there tive points obtain better reconstruction results in is no road network, and, also, trajectories of an- the majority of the cases, regardless of the tech- imals and certain natural phenomena like hurri- nique used. canes. However, here we are focused on pedes- trian trajectory reconstruction in urban areas. An 2 Reconstructing uncertain pedestrian example of a method of reconstruction that uses trajectories road network information is Infertra (Banerjee In this section, we describe the framework used to et al., 2014). This technique, instead of predict- reconstruct pedestrians trajectories. ing the most likely route, returns an edge-weighted graph that summarizes all probable routes. The 2.1 Trajectory Segmentation trajectory reconstruction process employs Gibss GPS logs generally record people’s movement for sampling by learning a Network Mobility Model long periods, in which the person could make mul- (NMM) from a database of historical trajectories. tiple trips. When the person stops for a relatively Other works that also use road information are long time, this could indicate the end of a trajec- Hunter (2013), Zheng (2012), Li (2015) and Chi- tory and the start of the next one. Therefore, in or- ang (2013), to cite a few. der to reflect the real pedestrian intention as well Nevertheless, most works are focused on re- as possible, we segment the GPS logs into effec- constructing vehicle trajectories. This is mainly tive trajectories, which have specific source and due to the fact that some pedestrian routes com- destination stay points. prise small alleys and trails that are so narrowed In the Table 1, we observe three different seg- to be traversed by other transportation mode dif- mentation settings based on the degree of toler- ferent from walk. Despite of that, free collabo- ance, which is based on two criteria, the time be- rative maps like OpenStreetMap2 allow the addi- tween one GPS point to the next one and the dis- tion of these type of routes exclusively traversed tance between them. A GPS point is the represen- by pedestrians to the road network. This way, it tation of a location update in terms of space and would be possible to reconstruct pedestrian tra- time by means of geographic coordinates (latitude jectories using road network information. Con- and longitude) and a timestamp. sidering that, we aim to study the reconstruction of pedestrian trajectories using road network in- Medium High Low formation. Consequently, we depict a framework Distance (m) 300 600 150 to reconstruct pedestrian trajectories composed by Time (mins) 10 20 5 three phases. Firstly, we segment trajectories by using three different settings in order to study Table 1: Trajectory segmentation settings their influence over the quality of the reconstruc- tion. Secondly, we perform a map matching task The rationale behind this arrangement of val- on these segmented trajectories using a free tool, ues lies mainly in the medium tolerance setting. This setting states that if the distance between a 2 https://www.openstreetmap.org location update and the next one is greater than 162 the combined lengths of three average city blocks constrained trajectories and a road network to cre- (100m (HARRIS et al., 2008; Yeang et al., 2000; ate a generative model called Network Mobil- of the German Aerospace Center et al., 2012)), the ity Model (NMM), which is a weighted directed previous location update is considered as the end graph whose edge weights denote the probabil- of a trajectory and the last one as the start of the ity of the corresponding road segment being tra- next trajectory. Likewise, if the time between two versed. Hence, NMM learns the mobility pat- location updates is more than 10 minutes, the first terns in a road network from a database of his- location update and its successor are considered as torical trajectories. Secondly, given an uncertain the end and the start, respectively, of two different trajectory (a trajectory with low-sample-rate loca- and successive trajectories. The idea behind the tion updates), NMM is used to generate a weighted period of 10 minutes is to assume that a pedestrian subgraph that depicts the probabilities associated can make some small stops due to external fac- to each possible trajectory arising from the uncer- tors such as a quick conversation with some unex- tain trajectory location updates. On the other hand, pected acquaintance on the way or waiting for the we also used the Shortest Path technique to estab- traffic light to change to cross a street, which far lish contrast with InferTra. The Shortest Path is from meaning a source or destination, are just trip a much simpler technique compared to InferTra, interruptions. Thus, finally, the half and the double so this comparison can reveal whether a simple or of the values of these time and distance thresholds more complex approach performs better when it are allocated to the low and high tolerance settings comes to reconstructing pedestrian trajectories us- respectively. ing road network information. 2.2 Map matching 3 Experiments The second phase of this framework is the map We use two data sets to perform the reconstruc- matching process, which aims to transform our set tion of pedestrian trajectories. In each data set, the of GPS trajectories into network-constrained tra- three trajectory segmentation settings previously jectories by matching each GPS point to an edge of depicted are used, low, medium and high toler- the road network of a certain city. As already men- ance, in order to study their influence over the per- tioned, with the use of free collaborative maps, formance of the reconstruction. Finally, we com- now, these edges can also represent small alleys pare the performance of Infertra (Banerjee et al., and trails walked exclusively by pedestrians. Map 2014) and Shortest Path for different sampling in- matching is an important research topic and there tervals. are many works focused on it (Lou et al., 2009; Greenfeld, 2002; Yuan et al., 2010). Addition- 3.1 Data sets ally, there are free tools available that perform map matching tasks as Graphhopper 3 . Using a cor- The data sets considered in our experiments are rect map matching method to align GPS points (i) RadrPlus and (ii) Geolife (Zheng et al., 2009, onto the road segments is relevant because the 2008, 2010). RardPlus is a location-based social GPS points do not reflect their true position due network developed in the University of São Paulo, to the GPS measurement error. Finally, this way, campus of São Carlos, Brazil. This social network we used the Graphhopper tool to create a set of has the unique characteristic of being focused on network-constrained trajectories that are used by communities. Therefore, RadrPlus provides func- the reconstructing methods in the next phase. tionalities not just for individual users like tradi- tional social networks, but for groups of users as 2.3 Reconstruction well, within a geolocated environment. The Radr- Plus data set comprises trajectories of a group of One of the best techniques of trajectory recon- 15 users in a period of 9 months. These trajec- struction using road network information is Infer- tories were recorded in different parts of the city Tra (Banerjee et al., 2014). This method outper- of São Carlos, but mainly around the campus of forms other state-of-the-art techniques by a large the university and its surroundings. Additionally, margin. Infertra is composed by two phases. RadrPlus data set trajectories were labeled with Firstly, this technique uses the historical network- two transportation modes, car and walk. The sec- 3 https://www.graphhopper.com ond data set is provided by the Geolife project, a 163 Sampling LT MT HT location-based social network developed by Mi- Interval crosoft Research Asia. The Geolife data set con- RadrPlus (SP) 1 0.944 0.930 0.963 2 0.846 0.852 0.903 tains trajectories of 182 users in a period of over 3 0.770 0.790 0.857 4 0.796 0.771 0.800 five years. These trajectories were recorded in 30 5 0.793 0.748 0.762 cities of China and some cities in USA and Eu- 6 7 0.746 0.718 0.689 0.680 0.732 0.717 rope; however, most trajectories were recorded in 8 0.639 0.636 0.700 9 0.558 0.567 0.676 the city of Beijing, China. In addition to that, a 10 0.578 0.558 0.673 group of 73 users labeled their trajectories with Geolife (SP) 1 2 0.959 0.902 0.960 0.909 0.962 0.907 transportation modes like walk, bike, bus, car, 3 0.862 0.868 0.870 4 0.818 0.835 0.837 train, airplane and others. In spite of the variety 5 0.775 0.789 0.797 6 0.754 0.782 0.783 of trajectory data of both data sets, we only se- 7 0.746 0.766 0.771 lect for our experiments the trajectories made by 8 0.721 0.751 0.761 9 0.694 0.717 0.732 pedestrians, i.e, trajectories labeled with walk as 10 0.671 0.705 0.727 RadrPlus (IT) 1 0.502 0.608 0.654 transportation mode. 2 0.510 0.481 0.593 3 0.446 0.430 0.539 4 0.467 0.404 0.507 3.2 Experimental Results 5 0.404 0.350 0.434 6 0.298 0.323 0.411 We present and discuss the results of using the 7 0.205 0.303 0.353 8 0.197 0.237 0.315 InferTra and Shortest Path techniques in the re- 9 0.018 0.230 0.273 10 0.005 0.154 0.231 construction of pedestrian trajectories. Each tech- Geolife (IT) 1 0.558 0.569 0.558 nique was implemented in Java. We segment tra- 2 0.484 0.494 0.490 3 0.437 0.455 0.450 jectories from RadrPlus and Geolife data sets by 4 0.395 0.421 0.424 5 0.363 0.382 0.401 using the low, medium and high tolerance set- 6 0.342 0.371 0.386 tings and for each resulting trajectory set, we use 7 0.314 0.344 0.369 8 0.281 0.326 0.346 the Graphhopper tool to transform these trajecto- 9 0.264 0.300 0.326 10 0.250 0.285 0.309 ries into network-constrained trajectories. Finally, we apply the aforementioned reconstruction tech- Table 3: Results of trajectory reconstruction on niques on the six different data sets generated as RadrPlus and Geolife data sets under trajectory showed in Table 2. The main characteristics de- segmentation configurations High Tolerance (HT), picted in that table are the Number of Trajectories Medium Tolerance (MT) and Low Tolerance (LT) (NT), Average Length (AL), which is the average for different sampling intervals. number of points per trajectory (pp), and the Av- erage Duration (AD) of the trajectories. and the shorter the trajectory length and duration. NT AL (pp) AD (secs) These results are interesting due to the fact that RadrPlus (LT) 76 14.12 487.48 Geolife’s data was collected with a fixed, short RadrPlus (MT) 116 17.31 604.15 sampling interval (Static Duty Cycle (Wu et al., RadrPlus (HT) 155 21.06 1045.61 2011)) while RadrPlus’ data collection process Geolife (LT) 3895 13.06 721.74 used a dynamic system that allocates short and Geolife (MT) 3601 13.71 819.57 long sampling intervals depending on the context Geolife (HT) 3134 13.79 877.2 (Dynamic Duty Cycle (Wu et al., 2011)). There- fore, we observe how the election of the data col- Table 2: Main characteristics of each result- lection method affects the main characteristics of ing trajectory data set after applying segmentation the resulting segmented trajectories. settings Low Tolerance (LT), Medium Tolerance On the other hand, since Infertra reconstructs a (MT) and High Tolerance (HT) on RadrPlus and trajectory as a weighted graph, we use the adapted Geolife data sets. F-score measure described in Banerjee (2014) to We notice that, in the case of Geolife, the lower evaluate Infertra performance, whereas the stan- the tolerance, the greater the number of trajecto- dard F-score was used in the case of Shortest Path. ries, and the shorter the trajectory length and du- These two techniques are evaluated for different ration. However, in the case of RadrPlus, the lower sampling rates expressed in minutes. the tolerance, the lesser the number of trajectories, From Figures 1 and 2, we can easily observe 164 (a) (b) (c) Figure 1: Results of the trajectory reconstruction on the RadrPlus data set using InferTra and Shortest Path algorithms. Each point in the figure indicates the value of the F-score for a certain sampling interval, expressed in minutes, under the segmentation settings (a) High Tolerance, (b) Medium Tolerance and (c) Low Tolerance. (a) (b) (c) Figure 2: Results of the trajectory reconstruction on the Geolife data set using InferTra and Shortest Path algorithms. Each point in the figure indicates the value of the F-score for a certain sampling interval, expressed in minutes, under the segmentation settings (a) High Tolerance, (b) Medium Tolerance and (c) Low Tolerance. that Shortest Path greatly outperforms InferTra, ily compare the obtained results for each setting. regardless of the data set, trajectory segmenta- This way, in Table 3, we observe a clear evidence tion setting and sampling interval used. Addition- that the High Tolerance (HT) setting obtains better ally, as expected, we also observe that the best re- results for the majority of the cases. This setting sults correspond to the shortest sampling intervals. only presents lesser values of F-score in the 17.5% Nevertheless, it is not clear if there is a difference of the cases. among trajectory segmentation settings. Thus, to analyze the impact of these settings over the recon- struction, we organize the data so that we can eas- 165 4 Conclusion connectivity redefined. Technical report, The New Jersey Department of Transportation, New Jersey. We studied the reconstruction of pedestrian tra- jectories using road network information. Three Timothy Hunter, Pieter Abbeel, and Alexandre M Bayen. 2013. The path inference filter: model-based different segmentation settings were proposed low-latency map matching of probe vehicle data. In to study their influence over the reconstruction. Algorithmic Foundations of Robotics X, Springer, These settings were established based on the con- pages 591–607. cept of tolerance which was defined based on two Mu Li, Amr Ahmed, and Alexander J Smola. 2015. In- criteria that comprise the distance and time span ferring movement trajectories from gps snippets. In between points in a trajectory. Moreover, two Proceedings of the Eighth ACM International Con- state-of-the-art techniques were tested. One of ference on Web Search and Data Mining. ACM, these techniques uses both road network informa- pages 325–334. tion and historical trajectories, whereas the sim- Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei pler one only uses the road network structure. Em- Wang, and Yan Huang. 2009. Map-matching for pirical analysis of these two techniques on two low-sampling-rate gps trajectories. In Proceedings data sets shows that the simpler technique per- of the 17th ACM SIGSPATIAL international confer- ence on advances in geographic information sys- forms better under limited data conditions than the tems. ACM, pages 352–361. more complex one when it comes to pedestrian trajectories. Additionally, the high tolerance seg- Transportation Studies Group of the German mentation setting proposed obtains better recon- Aerospace Center et al. 2012. Urban Block Design guideline / manual to best practice - Project struction results in a majority of the cases for both METRASYS. techniques. Ling-Yin Wei, Yu Zheng, and Wen-Chih Peng. 2012. Acknowledgments Constructing popular routes from uncertain trajecto- ries. In Proceedings of the 18th ACM SIGKDD in- This work was supported by FAPESP grant ternational conference on Knowledge discovery and 2015/14228-9, CNPq grants 302645/2015-2, data mining. ACM, pages 195–203. 162262/2014-0 and CAPES grant PROEX- Chao-Lin Wu, Yu-Te Huang, Cheng-Lung Wu, Hao- 9152559/M. hua Chu, Polly Huang, and Ling-Jyh Chen. 2011. An adaptive duty-cycle scheme for gps scheduling in mobile location sensing applications. Proc. of References PhoneSense . Prithu Banerjee, Sayan Ranu, and Sriram Raghavan. Llewelyn Davies Yeang et al. 2000. Urban design com- 2014. Inferring uncertain trajectories from partial pendium. English Partnerships/Housing Corpora- observations. In 2014 IEEE International Confer- tion, London . ence on Data Mining. IEEE, pages 30–39. Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie, Richard G Baraniuk. 2011. More is less: sig- and Guang-Zhong Sun. 2010. An interactive-voting nal processing and the data deluge. Science based map matching algorithm. In Mobile Data 331(6018):717–719. Management (MDM), 2010 Eleventh International Meng-Fen Chiang, Yung-Hsiang Lin, Wen-Chih Peng, Conference on. IEEE, pages 43–52. and Philip S Yu. 2013. Inferring distant-time loca- Kai Zheng, Yu Zheng, Xing Xie, and Xiaofang Zhou. tion in low-sampling-rate trajectories. In Proceed- 2012. Reducing uncertainty of low-sampling- ings of the 19th ACM SIGKDD international con- rate trajectories. In Data Engineering (ICDE), ference on Knowledge discovery and data mining. 2012 IEEE 28th International Conference on. IEEE, ACM, pages 1454–1457. pages 1144–1155. Zhenni Feng and Yanmin Zhu. 2016. A survey on tra- jectory data mining: Techniques and applications. Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and IEEE Access 4:2056–2067. Wei-Ying Ma. 2008. Understanding mobility based on gps data. In Proceedings of the 10th international Joshua S Greenfeld. 2002. Matching gps observations conference on Ubiquitous computing. ACM, pages to locations on a digital map. In Transportation Re- 312–321. search Board 81st Annual Meeting. Yu Zheng, Xing Xie, and Wei-Ying Ma. 2010. Geolife: DMJM HARRIS, AECOM, and Clarke Caton Hintz. A collaborative social networking service among 2008. New jersey long range transportation plan user, location and trajectory. IEEE Data Eng. Bull. 2030. technical memorandum. task 11: Local street 33(2):32–39. 166 Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. Mining interesting locations and travel se- quences from gps trajectories. In Proceedings of the 18th international conference on World wide web. ACM, pages 791–800. Yu Zheng and Xiaofang Zhou. 2011. Computing with Spatial Trajectories. Springer New York. 167