Reconstructing Uncertain Pedestrian Trajectories From
                       Low-Sampling-Rate Observations

                 Ricardo Miguel Puma-Alvarez and Alneu de Andrade Lopes
                      Instituto de Ciências Matemáticas e de Computação
                      Universidade de São Paulo - Campus de São Carlos
                                13560-970 São Carlos, SP, Brazil
                          rpuma@usp.br, alneu@icmc.usp.br


                    Abstract                             sonic and infrared systems, etc (Feng and Zhu,
                                                         2016). All these technologies allow a large-scale
    The ever-greater number of technolo-                 generation of trajectory data of moving objects,
    gies providing location-based services               which can be used to perform several data mining
    has given rise to a deluge of trajectory             tasks. Thus, this scenario paved the way for the
    data. However, most of these trajectories            rise of the trajectory data mining field. There are
    are low-sampling-rate and, consequently,             several trajectory data mining applications such as
    many movement details are lost. Due to               path discovery, location prediction, behavior anal-
    that, trajectory reconstruction techniques           ysis, urban services improvement, etc. However,
    have been created to infer the missing               there are still some important challenges to be ad-
    movement details and reduce uncertainty.             dressed regarding storage, computation and trajec-
    Nevertheless, most effort has been put into          tory data mining (Baraniuk, 2011). Due to stor-
    reconstructing vehicle trajectories. There-          age and transmission issues, these trajectories are
    fore, we study the reconstruction of pedes-          generally collected at low sampling rates, conse-
    trian trajectories by using road network in-         quently, they have long time intervals between lo-
    formation. We compare a simple tech-                 cation updates. In general, these trajectories pro-
    nique that only uses road network infor-             vide a very limited representation of the real paths.
    mation with a more complex technique                 This type of trajectories are called uncertain trajec-
    that, besides the road network, uses his-            tories (Zheng and Zhou, 2011).
    torical trajectory data. Additionally, we                Most often, trajectories can be tracked very ac-
    use three different trajectory segmentation          curately with GPS-embedded devices like smart-
    settings to analyze their influence over re-         phones or automotive navigation systems. Never-
    construction. Our experiment results show            theless, a recent study has demonstrated that, aim-
    that, with the limited pedestrian trajec-            ing at reducing energy consumption, the majority
    tory data available, a simple technique that         of taxis of big cities use sampling intervals of two
    does not use historical data performs con-           minutes (Wei et al., 2012).
    siderably better than a more complex tech-
                                                             The high energy consumption of GPS impairs
    nique that does use it. Furthermore, our re-
                                                         its use in smartphones for long periods of time.
    sults also show that trajectories segmented
                                                         Furthermore, most social networks provide check-
    in such a way as to allow a greater dis-
                                                         in services, which allow user location sharing.
    tance and time span between consecutive
                                                         Thus, it is possible to create trajectories by sort-
    points obtain better reconstruction results
                                                         ing these check-ins chronologically. In a simi-
    in the majority of the cases, regardless of
                                                         lar way, trajectories can be generated from geo-
    the technique used.
                                                         tagged photos in photo sharing sites like Flickr
                                                         1 . In spite of that, the location updates generated
1   Introduction
                                                         through these sites are low-sampling-rate.
Currently, there are many technologies providing             Addressing this issue is very important for sev-
location-based services. Some of them are the            eral different trajectory data mining applications.
GPS (Global Position System), RFID (Radio Fre-
quency Identification), smartphone sensors, ultra-          1
                                                                https://www.flickr.com/


                                                   161
For instance, trajectories generated from geo-           thereby generating a set of network-constrained
tagged photos could be reconstructed and used            trajectories. Thirdly, we apply two different tra-
in itinerary recommendations applications. Ad-           jectory reconstruction techniques on this new tra-
ditionally, other tasks like indexing and querying       jectory set. We compare these two techniques,
processing efficiency can be affected (Zheng et al.,     one of them a simple technique that only takes
2012).                                                   into account the road network information, and a
   Motivated by this problem, many works on tra-         more complex one that besides the road network
jectory reconstruction have been published. Most         structure uses historical trajectory data. We show
of them use road network information through a           that, under limited data conditions, the simpler
graph whose nodes represent intersections and ter-       technique greatly outperforms the more complex
minal points, and the edges depict road segments.        technique in pedestrian trajectory reconstruction.
On the other hand, there are also some works             Furthermore, our results also demonstrate that tra-
that do not take into account this kind of infor-        jectories segmented in such a way as to allow a
mation (Wei et al., 2012). These works aim to            greater distance and time span between consecu-
reconstruct trajectories in rural areas where there      tive points obtain better reconstruction results in
is no road network, and, also, trajectories of an-       the majority of the cases, regardless of the tech-
imals and certain natural phenomena like hurri-          nique used.
canes. However, here we are focused on pedes-
trian trajectory reconstruction in urban areas. An       2     Reconstructing uncertain pedestrian
example of a method of reconstruction that uses                trajectories
road network information is Infertra (Banerjee           In this section, we describe the framework used to
et al., 2014). This technique, instead of predict-       reconstruct pedestrians trajectories.
ing the most likely route, returns an edge-weighted
graph that summarizes all probable routes. The           2.1    Trajectory Segmentation
trajectory reconstruction process employs Gibss          GPS logs generally record people’s movement for
sampling by learning a Network Mobility Model            long periods, in which the person could make mul-
(NMM) from a database of historical trajectories.        tiple trips. When the person stops for a relatively
Other works that also use road information are           long time, this could indicate the end of a trajec-
Hunter (2013), Zheng (2012), Li (2015) and Chi-          tory and the start of the next one. Therefore, in or-
ang (2013), to cite a few.                               der to reflect the real pedestrian intention as well
   Nevertheless, most works are focused on re-           as possible, we segment the GPS logs into effec-
constructing vehicle trajectories. This is mainly        tive trajectories, which have specific source and
due to the fact that some pedestrian routes com-         destination stay points.
prise small alleys and trails that are so narrowed          In the Table 1, we observe three different seg-
to be traversed by other transportation mode dif-        mentation settings based on the degree of toler-
ferent from walk. Despite of that, free collabo-         ance, which is based on two criteria, the time be-
rative maps like OpenStreetMap2 allow the addi-          tween one GPS point to the next one and the dis-
tion of these type of routes exclusively traversed       tance between them. A GPS point is the represen-
by pedestrians to the road network. This way, it         tation of a location update in terms of space and
would be possible to reconstruct pedestrian tra-         time by means of geographic coordinates (latitude
jectories using road network information. Con-           and longitude) and a timestamp.
sidering that, we aim to study the reconstruction
of pedestrian trajectories using road network in-                              Medium      High    Low
formation. Consequently, we depict a framework                 Distance (m)     300        600     150
to reconstruct pedestrian trajectories composed by             Time (mins)       10         20      5
three phases. Firstly, we segment trajectories by
using three different settings in order to study               Table 1: Trajectory segmentation settings
their influence over the quality of the reconstruc-
tion. Secondly, we perform a map matching task             The rationale behind this arrangement of val-
on these segmented trajectories using a free tool,       ues lies mainly in the medium tolerance setting.
                                                         This setting states that if the distance between a
   2
       https://www.openstreetmap.org                     location update and the next one is greater than


                                                   162
the combined lengths of three average city blocks         constrained trajectories and a road network to cre-
(100m (HARRIS et al., 2008; Yeang et al., 2000;           ate a generative model called Network Mobil-
of the German Aerospace Center et al., 2012)), the        ity Model (NMM), which is a weighted directed
previous location update is considered as the end         graph whose edge weights denote the probabil-
of a trajectory and the last one as the start of the      ity of the corresponding road segment being tra-
next trajectory. Likewise, if the time between two        versed. Hence, NMM learns the mobility pat-
location updates is more than 10 minutes, the first       terns in a road network from a database of his-
location update and its successor are considered as       torical trajectories. Secondly, given an uncertain
the end and the start, respectively, of two different     trajectory (a trajectory with low-sample-rate loca-
and successive trajectories. The idea behind the          tion updates), NMM is used to generate a weighted
period of 10 minutes is to assume that a pedestrian       subgraph that depicts the probabilities associated
can make some small stops due to external fac-            to each possible trajectory arising from the uncer-
tors such as a quick conversation with some unex-         tain trajectory location updates. On the other hand,
pected acquaintance on the way or waiting for the         we also used the Shortest Path technique to estab-
traffic light to change to cross a street, which far      lish contrast with InferTra. The Shortest Path is
from meaning a source or destination, are just trip       a much simpler technique compared to InferTra,
interruptions. Thus, finally, the half and the double     so this comparison can reveal whether a simple or
of the values of these time and distance thresholds       more complex approach performs better when it
are allocated to the low and high tolerance settings      comes to reconstructing pedestrian trajectories us-
respectively.                                             ing road network information.

2.2 Map matching                                          3     Experiments
The second phase of this framework is the map
                                                          We use two data sets to perform the reconstruc-
matching process, which aims to transform our set
                                                          tion of pedestrian trajectories. In each data set, the
of GPS trajectories into network-constrained tra-
                                                          three trajectory segmentation settings previously
jectories by matching each GPS point to an edge of
                                                          depicted are used, low, medium and high toler-
the road network of a certain city. As already men-
                                                          ance, in order to study their influence over the per-
tioned, with the use of free collaborative maps,
                                                          formance of the reconstruction. Finally, we com-
now, these edges can also represent small alleys
                                                          pare the performance of Infertra (Banerjee et al.,
and trails walked exclusively by pedestrians. Map
                                                          2014) and Shortest Path for different sampling in-
matching is an important research topic and there
                                                          tervals.
are many works focused on it (Lou et al., 2009;
Greenfeld, 2002; Yuan et al., 2010). Addition-            3.1    Data sets
ally, there are free tools available that perform map
matching tasks as Graphhopper 3 . Using a cor-            The data sets considered in our experiments are
rect map matching method to align GPS points              (i) RadrPlus and (ii) Geolife (Zheng et al., 2009,
onto the road segments is relevant because the            2008, 2010). RardPlus is a location-based social
GPS points do not reflect their true position due         network developed in the University of São Paulo,
to the GPS measurement error. Finally, this way,          campus of São Carlos, Brazil. This social network
we used the Graphhopper tool to create a set of           has the unique characteristic of being focused on
network-constrained trajectories that are used by         communities. Therefore, RadrPlus provides func-
the reconstructing methods in the next phase.             tionalities not just for individual users like tradi-
                                                          tional social networks, but for groups of users as
2.3 Reconstruction                                        well, within a geolocated environment. The Radr-
                                                          Plus data set comprises trajectories of a group of
One of the best techniques of trajectory recon-
                                                          15 users in a period of 9 months. These trajec-
struction using road network information is Infer-
                                                          tories were recorded in different parts of the city
Tra (Banerjee et al., 2014). This method outper-
                                                          of São Carlos, but mainly around the campus of
forms other state-of-the-art techniques by a large
                                                          the university and its surroundings. Additionally,
margin. Infertra is composed by two phases.
                                                          RadrPlus data set trajectories were labeled with
Firstly, this technique uses the historical network-
                                                          two transportation modes, car and walk. The sec-
   3
       https://www.graphhopper.com                        ond data set is provided by the Geolife project, a


                                                    163
                                                                              Sampling   LT      MT      HT
location-based social network developed by Mi-                                Interval
crosoft Research Asia. The Geolife data set con-              RadrPlus (SP)   1          0.944   0.930   0.963
                                                                              2          0.846   0.852   0.903
tains trajectories of 182 users in a period of over                           3          0.770   0.790   0.857
                                                                              4          0.796   0.771   0.800
five years. These trajectories were recorded in 30                            5          0.793   0.748   0.762
cities of China and some cities in USA and Eu-                                6
                                                                              7
                                                                                         0.746
                                                                                         0.718
                                                                                                 0.689
                                                                                                 0.680
                                                                                                         0.732
                                                                                                         0.717
rope; however, most trajectories were recorded in                             8          0.639   0.636   0.700
                                                                              9          0.558   0.567   0.676
the city of Beijing, China. In addition to that, a                            10         0.578   0.558   0.673
group of 73 users labeled their trajectories with             Geolife (SP)    1
                                                                              2
                                                                                         0.959
                                                                                         0.902
                                                                                                 0.960
                                                                                                 0.909
                                                                                                         0.962
                                                                                                         0.907
transportation modes like walk, bike, bus, car,                               3          0.862   0.868   0.870
                                                                              4          0.818   0.835   0.837
train, airplane and others. In spite of the variety                           5          0.775   0.789   0.797
                                                                              6          0.754   0.782   0.783
of trajectory data of both data sets, we only se-                             7          0.746   0.766   0.771
lect for our experiments the trajectories made by                             8          0.721   0.751   0.761
                                                                              9          0.694   0.717   0.732
pedestrians, i.e, trajectories labeled with walk as                           10         0.671   0.705   0.727
                                                              RadrPlus (IT)   1          0.502   0.608   0.654
transportation mode.                                                          2          0.510   0.481   0.593
                                                                              3          0.446   0.430   0.539
                                                                              4          0.467   0.404   0.507
3.2 Experimental Results                                                      5          0.404   0.350   0.434
                                                                              6          0.298   0.323   0.411
We present and discuss the results of using the                               7          0.205   0.303   0.353
                                                                              8          0.197   0.237   0.315
InferTra and Shortest Path techniques in the re-                              9          0.018   0.230   0.273
                                                                              10         0.005   0.154   0.231
construction of pedestrian trajectories. Each tech-           Geolife (IT)    1          0.558   0.569   0.558
nique was implemented in Java. We segment tra-                                2          0.484   0.494   0.490
                                                                              3          0.437   0.455   0.450
jectories from RadrPlus and Geolife data sets by                              4          0.395   0.421   0.424
                                                                              5          0.363   0.382   0.401
using the low, medium and high tolerance set-                                 6          0.342   0.371   0.386
tings and for each resulting trajectory set, we use                           7          0.314   0.344   0.369
                                                                              8          0.281   0.326   0.346
the Graphhopper tool to transform these trajecto-                             9          0.264   0.300   0.326
                                                                              10         0.250   0.285   0.309
ries into network-constrained trajectories. Finally,
we apply the aforementioned reconstruction tech-          Table 3: Results of trajectory reconstruction on
niques on the six different data sets generated as        RadrPlus and Geolife data sets under trajectory
showed in Table 2. The main characteristics de-           segmentation configurations High Tolerance (HT),
picted in that table are the Number of Trajectories       Medium Tolerance (MT) and Low Tolerance (LT)
(NT), Average Length (AL), which is the average           for different sampling intervals.
number of points per trajectory (pp), and the Av-
erage Duration (AD) of the trajectories.
                                                          and the shorter the trajectory length and duration.
                     NT     AL (pp)     AD (secs)         These results are interesting due to the fact that
 RadrPlus (LT)       76      14.12       487.48           Geolife’s data was collected with a fixed, short
 RadrPlus (MT)      116      17.31       604.15           sampling interval (Static Duty Cycle (Wu et al.,
 RadrPlus (HT)      155      21.06       1045.61          2011)) while RadrPlus’ data collection process
 Geolife (LT)       3895     13.06       721.74           used a dynamic system that allocates short and
 Geolife (MT)       3601     13.71       819.57           long sampling intervals depending on the context
 Geolife (HT)       3134     13.79        877.2           (Dynamic Duty Cycle (Wu et al., 2011)). There-
                                                          fore, we observe how the election of the data col-
Table 2:     Main characteristics of each result-         lection method affects the main characteristics of
ing trajectory data set after applying segmentation       the resulting segmented trajectories.
settings Low Tolerance (LT), Medium Tolerance
                                                             On the other hand, since Infertra reconstructs a
(MT) and High Tolerance (HT) on RadrPlus and
                                                          trajectory as a weighted graph, we use the adapted
Geolife data sets.
                                                          F-score measure described in Banerjee (2014) to
   We notice that, in the case of Geolife, the lower      evaluate Infertra performance, whereas the stan-
the tolerance, the greater the number of trajecto-        dard F-score was used in the case of Shortest Path.
ries, and the shorter the trajectory length and du-          These two techniques are evaluated for different
ration. However, in the case of RadrPlus, the lower       sampling rates expressed in minutes.
the tolerance, the lesser the number of trajectories,        From Figures 1 and 2, we can easily observe


                                                    164
                               (a)                                              (b)


                                                         (c)

Figure 1: Results of the trajectory reconstruction on the RadrPlus data set using InferTra and Shortest
Path algorithms. Each point in the figure indicates the value of the F-score for a certain sampling interval,
expressed in minutes, under the segmentation settings (a) High Tolerance, (b) Medium Tolerance and (c)
Low Tolerance.


                               (a)                                              (b)


                                                         (c)

Figure 2: Results of the trajectory reconstruction on the Geolife data set using InferTra and Shortest Path
algorithms. Each point in the figure indicates the value of the F-score for a certain sampling interval,
expressed in minutes, under the segmentation settings (a) High Tolerance, (b) Medium Tolerance and (c)
Low Tolerance.

that Shortest Path greatly outperforms InferTra,               ily compare the obtained results for each setting.
regardless of the data set, trajectory segmenta-               This way, in Table 3, we observe a clear evidence
tion setting and sampling interval used. Addition-             that the High Tolerance (HT) setting obtains better
ally, as expected, we also observe that the best re-           results for the majority of the cases. This setting
sults correspond to the shortest sampling intervals.           only presents lesser values of F-score in the 17.5%
Nevertheless, it is not clear if there is a difference         of the cases.
among trajectory segmentation settings. Thus, to
analyze the impact of these settings over the recon-
struction, we organize the data so that we can eas-


                                                     165
4   Conclusion                                                connectivity redefined. Technical report, The New
                                                              Jersey Department of Transportation, New Jersey.
We studied the reconstruction of pedestrian tra-
jectories using road network information. Three             Timothy Hunter, Pieter Abbeel, and Alexandre M
                                                              Bayen. 2013. The path inference filter: model-based
different segmentation settings were proposed
                                                              low-latency map matching of probe vehicle data. In
to study their influence over the reconstruction.             Algorithmic Foundations of Robotics X, Springer,
These settings were established based on the con-             pages 591–607.
cept of tolerance which was defined based on two
                                                            Mu Li, Amr Ahmed, and Alexander J Smola. 2015. In-
criteria that comprise the distance and time span            ferring movement trajectories from gps snippets. In
between points in a trajectory. Moreover, two                Proceedings of the Eighth ACM International Con-
state-of-the-art techniques were tested. One of              ference on Web Search and Data Mining. ACM,
these techniques uses both road network informa-             pages 325–334.
tion and historical trajectories, whereas the sim-          Yin Lou, Chengyang Zhang, Yu Zheng, Xing Xie, Wei
pler one only uses the road network structure. Em-            Wang, and Yan Huang. 2009. Map-matching for
pirical analysis of these two techniques on two               low-sampling-rate gps trajectories. In Proceedings
data sets shows that the simpler technique per-               of the 17th ACM SIGSPATIAL international confer-
                                                              ence on advances in geographic information sys-
forms better under limited data conditions than the
                                                              tems. ACM, pages 352–361.
more complex one when it comes to pedestrian
trajectories. Additionally, the high tolerance seg-         Transportation Studies Group of the German
mentation setting proposed obtains better recon-              Aerospace Center et al. 2012.        Urban Block
                                                              Design guideline / manual to best practice - Project
struction results in a majority of the cases for both         METRASYS.
techniques.
                                                            Ling-Yin Wei, Yu Zheng, and Wen-Chih Peng. 2012.
Acknowledgments                                               Constructing popular routes from uncertain trajecto-
                                                              ries. In Proceedings of the 18th ACM SIGKDD in-
This work was supported by FAPESP grant                       ternational conference on Knowledge discovery and
2015/14228-9, CNPq grants 302645/2015-2,                      data mining. ACM, pages 195–203.
162262/2014-0 and CAPES grant PROEX-
                                                            Chao-Lin Wu, Yu-Te Huang, Cheng-Lung Wu, Hao-
9152559/M.                                                    hua Chu, Polly Huang, and Ling-Jyh Chen. 2011.
                                                              An adaptive duty-cycle scheme for gps scheduling
                                                              in mobile location sensing applications. Proc. of
References                                                    PhoneSense .
Prithu Banerjee, Sayan Ranu, and Sriram Raghavan.           Llewelyn Davies Yeang et al. 2000. Urban design com-
   2014. Inferring uncertain trajectories from partial        pendium. English Partnerships/Housing Corpora-
   observations. In 2014 IEEE International Confer-           tion, London .
   ence on Data Mining. IEEE, pages 30–39.
                                                            Jing Yuan, Yu Zheng, Chengyang Zhang, Xing Xie,
Richard G Baraniuk. 2011.    More is less: sig-                and Guang-Zhong Sun. 2010. An interactive-voting
  nal processing and the data deluge.   Science                based map matching algorithm. In Mobile Data
  331(6018):717–719.                                           Management (MDM), 2010 Eleventh International
Meng-Fen Chiang, Yung-Hsiang Lin, Wen-Chih Peng,               Conference on. IEEE, pages 43–52.
 and Philip S Yu. 2013. Inferring distant-time loca-
                                                            Kai Zheng, Yu Zheng, Xing Xie, and Xiaofang Zhou.
 tion in low-sampling-rate trajectories. In Proceed-
                                                              2012.     Reducing uncertainty of low-sampling-
 ings of the 19th ACM SIGKDD international con-
                                                              rate trajectories. In Data Engineering (ICDE),
 ference on Knowledge discovery and data mining.
                                                              2012 IEEE 28th International Conference on. IEEE,
 ACM, pages 1454–1457.
                                                              pages 1144–1155.
Zhenni Feng and Yanmin Zhu. 2016. A survey on tra-
  jectory data mining: Techniques and applications.         Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and
  IEEE Access 4:2056–2067.                                    Wei-Ying Ma. 2008. Understanding mobility based
                                                              on gps data. In Proceedings of the 10th international
Joshua S Greenfeld. 2002. Matching gps observations           conference on Ubiquitous computing. ACM, pages
   to locations on a digital map. In Transportation Re-       312–321.
   search Board 81st Annual Meeting.
                                                            Yu Zheng, Xing Xie, and Wei-Ying Ma. 2010. Geolife:
DMJM HARRIS, AECOM, and Clarke Caton Hintz.                   A collaborative social networking service among
 2008. New jersey long range transportation plan              user, location and trajectory. IEEE Data Eng. Bull.
 2030. technical memorandum. task 11: Local street            33(2):32–39.


                                                      166
Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma.
  2009. Mining interesting locations and travel se-
  quences from gps trajectories. In Proceedings of the
  18th international conference on World wide web.
  ACM, pages 791–800.
Yu Zheng and Xiaofang Zhou. 2011. Computing with
  Spatial Trajectories. Springer New York.


                                                     167