Self-Adapting Trajectory Segmentation Agnese Bonavita Riccardo Guidotti Mirco Nanni Scuola Normale Superiore University of Pisa ISTI-CNR, Pisa Pisa, Italy Pisa, Italy Pisa, Italy agnese.bonavita@sns.it riccardo.guidotti@di.unipi.it mirco.nanni@isti.cnr.it ABSTRACT section), each of them apparently capturing well some specific Identifying the portions of trajectory data where movement ends concept or some application-specific idea of stop. For instance, and a significant stop starts is a basic, yet fundamental task that some solutions simply identify the moments where the object can affect the quality of any mobility analytics process. Most of did not move, based on some thresholds, while others select the the many existing solutions adopted by researchers and practi- stops that have a duration compatible with some specific task, tioners are simply based on fixed spatial and temporal thresholds for instance discarding stops at a supermarket if their duration stating when the moving object remained still for a significant is physically too short to be able to enter, buy and exit. However, amount of time, yet such thresholds remain as static parame- most existing solutions suffer from two important limitations: ters for the user to guess. In this work we study the trajectory (i) they are based on critical thresholds that the user needs to segmentation from a multi-granularity perspective, looking for choose accurately, and in most cases it is difficult to understand a better understanding of the problem and for an automatic, what value is the best; (ii) such thresholds are global, i.e. the same parameter-free and user-adaptive solution that flexibly adjusts threshold value applies to all the moving individuals, irrespective the segmentation criteria to the specific user under study. Experi- of any distinctive characteristics they might have. The reason of ments over real data and comparison against simple competitors the latter is that, while an overall evaluation might be performed show that the flexibility of the proposed method has a positive to guide the choice of a global threshold, doing that separately impact on results. for each individual might be impossible if their number is huge. In this work we try to overcome the limitations highlighted KEYWORDS above, providing a general methodology that inspects the mobil- ity of the individual, and identifies segmentation thresholds that Mobility Data Mining, Segmentation, User Modeling apparently match her mobility features. The process allows to get rid of any input parameter, adapts thresholds to each single 1 INTRODUCTION individual and, most importantly, is completely automatic, thus Thanks to the wide diffusion of localization technologies and applicable to large pools of users. mobile services based on the positioning of users and devices, the availability of mobility traces is increasing fast in several applica- The paper is organized as follows: Section 2 discusses the re- tion domains. Location-based services provided through smart- lated works and how our proposal differs from existing solutions; phones are nowadays extremely popular, from nearby restaurant Section 3 provides some preliminary definitions; Section 4 defines suggestions to travel assistants, and in the near future all cir- the problem we want to tackle; Section 5 introduces our proposed culating vehicles will be equipped with localization capabilities method to solve the problem; Section 6 defines some evaluation for their continuous monitoring. The vast amounts of data that measures to quantify the quality of a segmentation; Section 7 this trend leads to produce and collect open the door to several provides empirical quantitative and qualitative evaluations of opportunities of converting them into better services, economical results, also comparing against a few baselines; finally, Section returns, more sustainable cities, improved living conditions, etc. 8 closes the paper with some conclusions and pointers towards All this starts from appropriate mobility analysis operations able future developments. to extract from raw data usable and useful information, such as deeper domain knowledge, patterns, models and forecasts. In mobility analytics one of the fundamental concepts is move- 2 RELATED WORK ment, meaning with that the part of mobility data that describes Segmentation is a technique for decomposing a given sequence a transfer from one place where the individual (or the object) into homogeneous and possibly meaningful pieces, or segments was staying, to another one were the user will stop. Identifying such that the data in each segment describe a simple event or movements in the raw stream of positions, for instance the con- structure. Segmentation methods are widely used for extracting tinuous flow of GPS traces of a vehicle, is essential yet non-trivial. structures from sequences, and are applied in a large variety of While it is simple to define a stop in geometrical terms, it is much contexts [22]: time series [4, 9], genomic sequences [15, 17, 18], less clear how to define significant stops, i.e. stops that might and text [12], to cite a few. have some meaning for the user (for instance, stopping to do The segmentation of human trajectories is a very valuable task some activity before leaving), as opposed to stops that are simply as it enables the development of mobility data models [7, 19] and incidental (for instance, due to a small traffic jam). applications like carpooling [6], or trajectory prediction [24]. Var- Practitioners in the mobility analytics domain defined several ious simple approaches are currently adopted in practice. In [23] simple strategies to select stops in the mobility data stream (a human trajectories are extracted adopting a predefined rule based brief account of literature on this topic is provided in the next on a pair of spatio-temporal parameters regulating the end of a © 2020 Copyright held by the owner/author(s). Published in the Workshop Pro- trajectory and the start of the subsequent one. Similarly, in [8] ceedings of the EDBT/ICDT 2020 Joint Conference, March 30-April 2, 2020: on the trajectory is divided into subsequent trips if the time inter- CEUR-WS.org val of “nonmovement” exceeds a certain threshold. In [26] it is Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). described a change-point-based segmentation approach for GPS trajectories according to the transportation means adopting a uni- Definition 3.1 (Individual trajectory). Given a user u, her Indi- versal threshold for determining whether a segment is “walk” or vidual Trajectory Tu is the sequence of n points Tu = ⟨p1, . . . , pn ⟩ “nonwalk”. The work in [3] presents a theoretical framework that that describes her position in time, where each point p ∈ Tu is computes an optimal segmentation by using several criteria (e.g., defined as a triple p = (p.x, p.y, p.t), representing its spatial co- speed, direction, location disk) that are satisfied in each partition, ordinates x and y and the corresponding timestamp t. Moreover, thus making the approach local, and applied computational ge- points are in chronological order, i.e. ∀1 < i ≤ n.pi−1 .t < pi .t. ometry methods. However, their methods are general and not Definition 3.2 (Pseudo-stop duration). Given an individual tra- clearly applicable to the human trajectory context, where a trip jectory T = ⟨p1, . . . , pn ⟩ and a spatial threshold σ , the Pseudo- can be complex and not show the geometrical/movement unifor- stop duration associated to point pi is defined as SD(T , i) = mity the methods look for. Finally, each criterion corresponds min{p j .t − pi .t |i < j ≤ n ∧ d(pi , p j ) > σ }, where d represents to thresholds that the user must set, without clear guidelines on the geometrical Euclidean or geographical distance. how to choose them. The authors of [25] segment the trajectories in two steps. Notice that the last point pn will have SD(T , n) = min ∅ = ∞. The first segmentation is performed by means of simple policies Definition 3.3 (Segmented trajectory). Given a trajectory T = with respect to temporal and/or spatial predefined constraints. ⟨p1, . . . , pn ⟩, a spatial threshold σ and a temporal threshold τ , Then, the trajectories are divided into stops and moves observing we define the (σ, τ )-segmentation of T as T σ ,τ = ⟨S 1, . . . , Sm ⟩, variations of the speed of the object. If the variations of the speed such that: is below a speed threshold and there is a sufficient number of observations, then the portion of trajectory is annotated as a (i) ∀1 ≤ i ≤ m.∃1 ≤ s < e ≤ n : Si = ⟨ps , ps+1, . . . , pe ⟩ Ðm stop. The speed threshold is not general but changes according (ii) i=1 set(Si ) = set(T ) to the user behavior and also to the surrounding of the stop. (iii) ∀1 ≤ i ≤ m.∀1 ≤ j ≤ |Si | : SD(Si , j) > τ ⇔ j = |Si | In [20] is defined a measure of the density of the points in the (iv) ∀1 ≤ i ≤ m.Si is maximal neighbourhood of each trajectory point, the Spatio-Temporal where set(I ) = {p ∈ I }. Kernel Window (STKW) statistics. To determine the start and end Conditions (i) and (ii) imply that the segments of the seg- points of segments, the algorithm looks for maximal changes in mented trajectory of T form a partitioning of the elements of T STKW values. The focus of the approach is on capturing changes in the strictly mathematical sense. Moreover, conditions (iii) and of transportation mode, including stops, which are simply points (iv) state that all the points in a segment are movement points, i.e. with low speed. their pseudo-stop duration is smaller than the given threshold, In addition to those mentioned above, several other solutions excepted the last point. Therefore, each point in T that has a high to the trajectory segmentation problem have been proposed in pseudo-stop duration will act as a split point, and corresponds to literature, yet with objectives different from ours. For example, a distinct partition in T σ ,τ . cost-function based strategies were presented in [11][10], while clustering-based ones are introduced in [13] [14]. All these ap- proaches are focused on splitting a movement into homogeneous 4 PROBLEM FORMULATION parts, rather than discovering significant stops, which is the pur- Existing trajectory segmentation methods assume that the same pose of this paper. rules and the same parameters should apply to all moving objects. Since different objects can show very different movement char- In this work we provide a segmentation method that, opposed acteristics, the above assumption leads to make choices that on to most of the approaches mentioned above, is not based on fixed average fit best the dataset, yet potentially making sub-optimal space and/or time thresholds to be fixed by the user – this is choices on single individuals. the case, for instance, of [8, 23, 25, 26]. Instead, we aim to make Our objective is to overcome this limitation, making the seg- the segmentation parameter-free and also adaptive to the single mentation process adaptive to the individual and taking into con- user’s data, giving the opportunity to have different kinds of sideration her overall mobility. Our problem statement extends segmentation over different users. Also, our approach is com- the traditional formulation of segmentation as a threshold-based plementary to the STKW-based one [20], as the latter aims to operation, thus the core issue is to find good parameter values differentiate movements with different speed profiles, including for each user. stops as a particular example, while we focus on stop timing and try to understand which stops are actually significant (e.g. Definition 4.1 (Individual cut threshold problem). Given an Indi- not too short) for the user. A similar work was proposed in [5]. vidual TrajectoryTu , and a global spatial threshold σ , the problem Here the authors proposed a new approach called Octal Window is to identify the temporal threshold τ that yields the optimal Segmentation(OWS) for unsupervised trajectory segmentation. segmentation T σ ,τ . The intuition behind their approach is that when a moving object Since the number of moving objects can be very large, the changes behavior, this shift may be detected using only its geolo- process must be completely automatized and require no human cation over time. So the work focuses on finding these changes intervention. In Section 5 we will introduce a simple and effective only from the object’s coordinates using interpolation methods approach to solve the problem and thus find a suitable value of to generate an error signal. This error signal is then used as a τ for each user. In addition, some basic guidelines to choose a criterion to split the trajectories into sub-trajectories. value for the global spatial parameter will be provided. 3 SETTING THE STAGE 5 PROPOSED METHOD We start by defining trajectory segmentation based on a spatial The proposed solution to the individual cut threshold problem and a temporal threshold, in a way similar to standard approaches consists in fixing the spatial threshold to a global value (i.e. to be in literature. used for all users) and then in studying the segmentations that we would obtain by applying different temporal thresholds. We will Algorithm 1: ATS(T , σ ) start describing the process for choosing the temporal threshold, 1 Input: Individual trajectory T , spatial threshold σ which is the central part of the solution, and later discuss how 2 Output: Cut threshold τ the spatial one can be chosen. 3 S = ⟨ SD(T , i) | 1 ≤ i ≤ |T | ⟩; 4 F = frequency distribution of S values (F (a) = |{a ∈ S }|); 5.1 Self-Adaptive Trajectory Segmentation 5 C = {t |t ∈ ranдe(F ) ∧ TT (F (t), ⟨F (t )|t > t⟩) = true}; ′ ′ When very small values of τ are used, the segmentation obtained // TT (a, B) = Modified Thompson Tau Test of a vs. set B 6 return min C will contain a huge number of very short segments, till the ex- treme case where each point forms its own segment. As the threshold is increased, more and more segments will merge to- gether, since some of the former splitting points will fall below τ . The process is expected to gradually enlarge the trajectory segments by first including simple slowdowns (i.e. not really stop points), then temporary stops (e.g. at traffic lights), and so on. Our approach consists in (virtually) monitoring such progres- sion, and detect the moment where an anomalous increase in the number of segments is observed, which represents a sort of change of state. This follows the same kind of idea adopted in vari- ous unsupervised classification contexts, such as the knee method Figure 1: Frequency distribution of pseudo-stop durations for deciding the number k of clusters for the k-means algorithm, for a user trajetory (left), and the durations of the seg- or analogous solutions to choose the radius for density-based ments obtained using a specific threshold to cut the tra- clustering (e.g. DBScan). jectory (right). The threshold used corresponds to the ver- In our solution, rather than relying on visual or similar heuris- tical line on the left image. tic criteria, we will base the threshold selection on a statistical test. In particular, we will adopt the Modified Thompson Tau Test [2] which, basically, checks whether a given value fits the distribu- 5.2 Fixing the spatial threshold tion of the rest of the data or not. Since we look for anomalous In our approach, the threshold σ represents the minimum dis- values in a sequence, we apply the test iteratively, comparing tance between two (consecutive) points that can be considered each value n(t) (the number of segments obtained with τ = t) as a movement, and the temporal parameter is indeed measured against the values n(t ′ ) obtained for larger thresholds t ′ . as the time needed to make a movement. A simple way to fix This process yields a set of thresholds that have an anomalous its value is to adopt the minimum value that, according to the number of partitions as compared to the successive thresholds. accuracy of our dataset, cannot be mistaken for a positioning Among them, we simply choose the smallest one, thus deciding error, for instance due to GPS uncertainty. In our experiments we to select the segments that emerge at the first change of state, adopt road vehicle GPS traces that are expected to have errors not also representing shorter and finer granularity movements. larger than 10 meters, therefore we could fix σ = 20 (the worst The procedure, named ats (self-Adaptive Trajectory Segmen- case distance between two points that have the maximal error in tation) is summarized in Algorithm 1. Step 3 collects the pseudo- opposite directions). We decided to slightly increase it to 50 in stop durations SD of all the points i that make up the segment, order to stay on the safe side, also to take into account that errors and step 4 computes the frequency F of each value, basically rep- are slightly higher than average in urban centers, which is the resenting the number of new segments obtained using that value application context where our experiments are performed. Since as τ w.r.t. the previous smaller thresholds. In our implementa- we do not have data source from other kind of transport (ships, tion such frequency distribution is computed through smoothed planes, etc.) the selected threshold seems to meet our purposes. histograms, grouping values into bins of 1-minute width. Fig- However, empirical results confirm that the value of the global ure 1(left) shows the frequency distribution of a sample trajectory, parameter σ is not critical, as our approach shows a low sensitiv- the vertical line corresponding to a possible cut point. The re- ity to it. For this reason, the value we chose in our experiments sulting set of segments obtained is described in Figure 1(right) (50 meters) can be considered a good guess for generic vehicle in terms of segments duration. Finally, step 5 selects the fre- GPS data. Other data sources with a higher spatial uncertainty quency values that appear to be anomalous (based on the Modi- might require larger values. fied Thompson Tau Test) w.r.t. the frequency of larger thresholds, and step 6 returns the earliest time threshold that has an anoma- 6 EVALUATION MEASURES lous frequency. The reconstruction error generally used for evaluating segmen- Computational complexity. The cost of Algorithm 1 is dom- tation problems [1] just measures how well each segment can inated by step 3, since the computation of each pseudo-stop du- be approximated with one value, and thus seems not to fit with ration (SD) could in principle require to scan all the remaining trajectory segmentation. Therefore, similarly to clustering evalu- points of the individual trajectory, thus yielding a O(n2 ) cost, ation, we propose three internal evaluation measures [21]. Let where n is the size of the individual trajectory. However, in prac- T be the sequence of n points and TS = ⟨S 1, . . . , Sm ⟩ its seg- tical applications the trajectory portion needed for each SD is mentation. We denote with At = duration(T ) = pn .t − p1 .t the relatively small, leading to a quasi-linear cost. The remaining total elapsed time from the first point of p1 ∈ T to the last point pn ∈ T , and Ad = lenдth(T ) = n−1i=1 d(pi , pi+1 ) the total distance parts of the algorithm can be realized a linear time, including the Í Modified Thompson Tau Test which can be computed for each covered by the trajectory, computed by considering every couple of subsequent points in T . Let Mt = Si ∈TS duration(Si ) be the points through incremental updates. Í method MF .25 TP DC ratio sr #segms (avg ± std) ats .951 .951 .981 0.049 837.34±854.52 fts 120 .925 .996 .456 0.015 592.26 ±652.78 fts 1200 .948 .947 .997 0.053 746.28 ± 733.96 rts 1 .279 .268 .722 0.700 2094.85± 2472.36 rts 2 .124 .118 .877 0.883 899.59 ± 926.03 Table 1: Evaluation on Rome data. The first three columns show the measures adopted to test our new approach. The Figure 2: Time threshold distributions for trajectories in fourth one reports the ratio between the average sampling Rome and London. The peaks show the ideal thresholds period of non-stop points over that of all points, and the to be set to build the trajectories. last column is the number of segments. sum of the segments’ duration, i.e., the time spent driving, and method MF 25 TP DC ratio sr #segms (avg ± std) Md = Si ∈TS lenдth(Si ) be the sum of the segments’ length, i.e., Í ats .955 .953 .999 0.047 433.915±513.715 the distance traveled. Then, we define the following measures: fts 120 .958 .961 .944 0.040 1131.829± 1431.810 • time precision: TP = 1 − Mt /At fts 1200 .952 .950 .999 0.050 359.545 ± 410.606 • distance coverage: DC = Md /Ad rts 1 .267 .256 .695 1.007 2833.718 ± 4203.049 • mobility f-measure: MF β = (1+β 2 )·TP ·DC/((β 2 ·TP)+DC) rts 2 .035 .033 .958 1.008 445.645 ±527.969 All measures range from zero to one. The higher the value the Table 2: Evaluation on London data. The first three better the result. The objective of these measures is to promote columns show the measures adopted to test our new ap- segmentations capturing long stops (time precision) yet also cov- proach. The fourth one reports the ratio between the av- ering most of the overall distance (distance coverage). These two erage sampling period of non-stop points over that of all objectives are conflictual, since making stops longer reduces the points, and the last column is the number of trajectories. number of points that contribute to the distance covered. The mobility f-measure accounts for both aspects simultaneously. In the experiments we adopt β = 0.25, which weighs time precision much higher than distance coverage by augmenting the relevance of missing precision in stop detection. The reason is that i) it is relatively easy to guarantee an high distance coverage, and ii) the main focus of the paper is on the temporal aspects of trajectory partitioning. 7 EXPERIMENTS Figure 3: Boxplots for the MF .25 results. On the Rome data We experimented the proposed self-adaptive trajectory segmen- ats yields better results than the fts solutions, while in tation approach (ats) described above over a real dataset of GPS London all three produce almost the same results. The vehicle traces. The results commented in the following refer to variability of ats results is consistently smaller than the 2000 users of the area of Rome (Italy), and London (UK). The other methods, which is a sign of robustness. means and standard deviations of the sampling rate for the users analyzed are 12194.67 ± 22575.66 and 4385.76 ± 9359.14, for Rome Although every user has her own mobility behavior with its and London respectively. The high values and their high variabil- own mix of regular and more erratic behaviours [16], we observe ity is due to the presence of several long time gaps, typically due two clear peaks in the distributions for both Rome and London. to parking periods. This means that with respect to ats we mainly recognize two In the following we first analyze the personal temporal thresh- different types of users regarding to the minimum duration of the olds returned by ats, then we propose a quantitative and qual- stops. This supports the intuition behind our approach, namely itative evaluation of the results for understanding the benefits to have a self-adaptive procedure selecting a personalized best of the novel method with respect to existing ones. We compare temporal threshold for each user. Selecting one single threshold ats against the trajectory segmentation method with fixed pa- value for all the data might negatively affect the segmentation of rameters proposed in [23] (fts temp-thr ). Moreover, we adopt as some users, partitioning their trajectories either too much or too baseline a random trajectory segmentation method that segments little. The first peak is at about 600 seconds (∼ 10 minutes), while the sequence of points T = ⟨p1, . . . , pn ⟩ into m equal-length seg- the second peak at 1200 seconds (∼ 20 minutes). These values ments (i) with m randomly extracted between 2 and n/2 (rts 1 ), correspond to the temporal thresholds that the ats procedure or (ii) with m set to the number of segments returned by the uses to cut each trajectory. There is also a minority of users proposed ats method (rts 2 ). having values outside the two peeks. 7.1 Self-Adaptive Temporal Threshold 7.2 Comparison of Evaluation Measures We observe in Figure 2 the distribution of the time thresholds In this section we compare the proposed self-adaptive trajectory selected by ats for each user (vertical axis represents value fre- segmentation approach with the other methods taken into ac- quencies in log-scale). count. In Tables 1 and 2 we report the results obtained with all the methods. The first three columns show the evaluation measures described above. The fourth column shows the ratio between the average sampling period of movement points (thus discarding the stop portions of the user’s trajectory) and the average sampling period of the full trajectory, while in the last one the average number of segments with its standard deviation is given. In gen- eral, we can observe that the best results were obtained with the ats and fts methods, both for Rome and London. Analyzing the ratio (fourth column) we can see that values are low for both ats and the fts ones, meaning that the long stops are ignored (i.e. they are recognized as real stops) and just the short ones are considered. On the contrary, with the random approaches the ratio is bigger because the algorithm function evaluates all stops in the same way. Looking at the number of segments it is possible to note that fts and ats methods produce different quantities, especially the fts120 result produces less segments in the Rome case and much more in London. About the last two approaches, the rts1 method works with a random number of segments, so it is normal that the final result differs from the others, while the rts2 takes as number of segments the same of the ats approach Figure 4: Distributions of average number of points per so we aspect to achieve similar results. segment in Rome (left) and London (right). For the evaluation measures we can see that our new approach reached the goal we expected, i.e., yielding a quality of results which is always comparable or higher than fixed-threshold ap- proaches and more robust. Indeed, for both Rome and London the values obtained by ats are compatible with the fts results, even better in the MF .25 for Rome and in the distance coverage for London. In particular, in the Rome example, having a high MF .25 values means that also the time precision and the distance cover- age are well correlated in a way that produce a satisfying result. If we see the fts120 result we can note that the time precision is high but the distance coverage is very low because the algorithm builds short trajectories with few points. An analogous reason- ing can be done analyzing the fts1200 method which produces an excellent distance coverage score but a lower time precision. Our solution reaches a good balance, thanks to its self-adaptive characteristic that allows to control and correct the trajectory fragmentation, and all its evaluation measures are always either the best or the second best of the group. To have a better understanding of the quality of our new approach, the distribution of MF .25 values for the different ap- proaches on the two datasets is shown in Figure 3 through a boxplot visualization. For the Rome case we can observe that Figure 5: Distribution of the number of trajectory seg- with the ats approach the median value is the highest (closest ments over Rome (left column) and London (right column) to 1) and the inter-quartile range is smaller than the other two, with each segmentation method (on the rows, grouped by meaning that we have a smaller variabiliy and thus more robust family). results. The London case appears to be different, and the best MF .25 values are obtained with the fts1200 , with a median similar to ats and a slightly narrower box. This leaves room for future segment for Rome and London. For all methods, the majority of improvements of our methodology. segments have less than 20 points, probably meaning that most of the trips take place within the city. However, in the distribution tails some long trajectories with more points emerge. We observe 7.3 Comparison of Segmentation Statistics that the distribution peaks of ats place somehow in between the In the following we analyze other statistical indicators on the peaks of the two fts variants (though closer to fts1200 , especially trajectory segments extracted by the various methods. The next in London) thus finding a trade-off between them. Moreover we plots want to show other significant features for the segmentation can see that London and Rome distributions are different: London problem in order to compare their distribution and try to infer has a wider distribution than Rome, meaning that the variety of something more about the segmentation. In addiction discovering trips is greater in London. some hidden correlations between trajectory features and the In Figure 5 are displayed the distributions of the average num- segmentation approach could lead to a better understanding of ber of segments per user. In London most of the users have less the problem and highlight other relevant aspects. In Figure 4 than 20 trajectory segments. The peak of the distribution is be- we report the distributions of the average number of points per tween 5 and 10 segments. Between 30 and 100 segments the Figure 6: Distributions of the average length (top) and du- ration (bottom) for the trajectory segments returned by ats (left) and fts (right) for the area of Rome. distribution remains stable at a small value larger than zero. In Rome we observe a similar result with a peak between 15 and 20 trajectories. Also in this case, the peak of ats distribution tends to stay in the middle of the fts ones. In Figure 6 we compare the distribution of average length and average duration of the segments returned by ats (left) and fts (right) for the area of Rome. With the ats method the peak value is around 10km, thus confirming that most of the trips are short, and likely to take place around the city. With the fts methods the peak position depends on the temporal threshold imposed: with a threshold of 1200 seconds the average distance is similar to ats, while with 120 seconds it becomes lower and close to 5 km. The results for the rts methods are omitted, since their plots are very similar to the fts ones. Also, the plots in London show exactly the same kind of behaviour observed on Rome. In terms of segment duration, ats yields a distribution with a peak around 1200 − 1500 seconds (∼ 20 − 25 minutes). With the Figure 7: Trajectory segmentation returned by fts 1200 fts methods the peaks change: for fts120 the peak is around 500 (left) and ats (right). The user is travelling from South seconds while for fts1200 the peak is centered in 1800 seconds. to North. Top: spatial representation showing the trajec- Also in this case, the results on London are very similar and tory segments. Center: temporal segmentation showing omitted here. the inter-leaving time between GPS points. Bottom: zoom on the service area highlighted in the top maps where the user probably stops for ∼ 15 minutes. Best view in color. 7.4 Case Study In this section we show qualitatively on a case study the effec- tiveness of ats with respect to fts. In Figure 7 we report the segmentation returned by fts 1200 [23] (left) and by ats (right), the user is travelling from south to north. fts 1200 [23] returns 8 CONCLUSION two trajectories (green and blue), while ats returns three trajecto- The paper presented a user adaptive method for solving the ries (green, orange and blue). The second line of plots report the trajectory segmentation problem, a very common and useful inter-leaving time between consecutive GPS points. The colors task in mobility data mining, especially in preprocessing phases. match the trajectory segments, while stops are highlighted in Though preliminary, the experiments show that it is possible to red. We observe how ats identifies the short stop of less than 15 derive user-adaptive cut thresholds, improving the performances minutes at the service area similarly to the subsequent longer of the segmentation over less flexible solutions. This is an ongoing stop. On the other hand, fts 1200 considers the first stop as part work, and several improvements are being explored. Among of the green trajectory. The map in the bottom line of Figure 7 them, the future lines of research will aim to derive thresholds shows the service area which is very close to the GPS points for trajectory segmentation which are not only user-adaptive, reported on the bottom right corner of the map. This case study but also location-adaptive, thus considering the fact that a stop at highlights how various existing stops under a certain predefined different places might require time intervals of different duration threshold can be missed with a segmentation approach like fts, to be considered a significant stay – and thus a trajectory cut while a more data-driven and self-adaptive method like ats is point. Also, we will study the possibility of exploiting the context able to take into account specific user behavior and return a around the (moving) user, such as the mobility of other users and better result. the geographical area surrounding the candidate stops. ACKNOWLEDGMENTS This work is partially supported by the European Community H2020 programme under the funding scheme Track &Know (Big Data for Mobility Tracking Knowledge Extraction in Urban Ar- eas), G.A. 780754, https://trackandknowproject.eu/. REFERENCES [1] Ella Bingham. 2010. Finding segmentations of sequences. In Inductive Databases and Constraint-Based Data Mining. Springer, 177–197. [2] Ronald Bremer. 1995. Outliers in statistical data. Taylor & Francis. [3] Maike Buchin et al. 2010. An algorithmic framework for segmenting trajecto- ries based on spatio-temporal criteria. In SIGSPATIAL. ACM, 202–211. [4] Harmen J Bussemaker et al. 2000. Regulatory element detection using a probabilistic segmentation model. In Proceedings of the International Conference on Intelligent Systems for Molecular Biology. 67–74. [5] Mohammad Etemad et al. 2019. A Trajectory Segmentation Algorithm Based on Interpolation-based Change Detection Strategies. In EDBT/ICDT Work- shops. [6] Riccardo Guidotti et al. 2017. Never drive alone: Boosting carpooling with network analysis. IS 64 (2017), 237–257. [7] Riccardo Guidotti et al. 2017. There’s a path for everyone: A data-driven personal model reproducing mobility agendas. In DSAA. IEEE, 303–312. [8] Sini Guo et al. 2018. GPS trajectory data segmentation based on probabilistic logic. International Journal of Approximate Reasoning 103 (2018), 227–247. [9] Johan Himberg et al. 2001. Time series segmentation for context recognition in mobile devices. In ICDM. IEEE, 203–210. [10] Amílcar Soares Júnior et al. 2015. GRASP-UTS: an algorithm for unsupervised trajectory segmentation. International Journal of Geographical Information Science 29, 1 (2015), 46–68. [11] Amílcar Soares Júnior et al. 2018. A semi-supervised approach for the semantic segmentation of trajectories. In 19th IEEE International Conference on Mobile Data Management (MDM). 145–154. [12] Victor Lavrenko et al. 2000. Mining of concurrent text and time series. In KDD Workshop on Text Mining, Vol. 2000. 37–44. [13] Jae-Gil Lee et al. 2007. Trajectory Clustering: A Partition-and-Group Frame- work. In ACM SIGMOD. ACM, 593–604. [14] Luis Leiva and Enrique Vidal. 2013. Warped K-Means: An algorithm to cluster sequentially-distributed data. Information Sciences 237 (07 2013), 196–210. [15] Wentian Li. 2001. DNA Segmentation as a Model Selection Process. In Pro- ceedings of the Fifth Annual International Conference on Computational Biology (RECOMB ’01). ACM, 204–210. [16] Luca Pappalardo et al. 2015. Returners and explorers dichotomy in human mobility. Nature communications 6 (2015), 8166. [17] Adam Pavlıček et al. 2002. A compact view of isochores in the draft human genome sequence. FEBS letters 511, 1-3 (2002), 165–169. [18] Vasily E Ramensky et al. 2000. DNA segmentation through the Bayesian approach. Journal of Computational Biology 7, 1-2 (2000), 215–231. [19] Salvatore Rinzivillo et al. 2014. The purpose of motion: Learning activities from individual mobility networks. In DSAA. IEEE, 312–318. [20] Katarzyna Siła-Nowicka et al. 2016. Analysis of human mobility patterns from GPS trajectories and contextual information. IJGIS 30, 5 (2016), 881–906. [21] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2018. Introduction to data mining. Pearson Education India. [22] Evimaria Terzi and Panayiotis Tsaparas. 2006. Efficient algorithms for se- quence segmentation. In SDM. SIAM, 316–327. [23] Roberto Trasarti et al. 2011. Mining mobility user profiles for car pooling. In KDD. ACM, 1190–1198. [24] Roberto Trasarti et al. 2017. Myway: Location prediction via mobility profiling. IS 64 (2017), 350–367. [25] Zhixian Yan et al. 2013. Semantic trajectories: Mobility data computation and annotation. ACM TIST 4, 3 (2013), 49. [26] Yu Zheng et al. 2011. Recommending friends and locations based on individual location history. ACM Transactions on the Web 5, 1 (2011), 5.