=Paper=
{{Paper
|id=Vol-2578/BMDA3
|storemode=property
|title=Self-Adapting Trajectory Segmentation
|pdfUrl=https://ceur-ws.org/Vol-2578/BMDA3.pdf
|volume=Vol-2578
|authors=Agnese Bonavita,Riccardo Guidotti,Mirco Nanni
|dblpUrl=https://dblp.org/rec/conf/edbt/BonavitaGN20
}}
==Self-Adapting Trajectory Segmentation==
<pdf width="1500px">https://ceur-ws.org/Vol-2578/BMDA3.pdf</pdf>
<pre>
                              Self-Adapting Trajectory Segmentation
              Agnese Bonavita                                      Riccardo Guidotti                               Mirco Nanni
          Scuola Normale Superiore                                  University of Pisa                             ISTI-CNR, Pisa
                  Pisa, Italy                                           Pisa, Italy                                  Pisa, Italy
           agnese.bonavita@sns.it                             riccardo.guidotti@di.unipi.it                    mirco.nanni@isti.cnr.it

ABSTRACT                                                                         section), each of them apparently capturing well some specific
Identifying the portions of trajectory data where movement ends                  concept or some application-specific idea of stop. For instance,
and a significant stop starts is a basic, yet fundamental task that              some solutions simply identify the moments where the object
can affect the quality of any mobility analytics process. Most of                did not move, based on some thresholds, while others select the
the many existing solutions adopted by researchers and practi-                   stops that have a duration compatible with some specific task,
tioners are simply based on fixed spatial and temporal thresholds                for instance discarding stops at a supermarket if their duration
stating when the moving object remained still for a significant                  is physically too short to be able to enter, buy and exit. However,
amount of time, yet such thresholds remain as static parame-                     most existing solutions suffer from two important limitations:
ters for the user to guess. In this work we study the trajectory                 (i) they are based on critical thresholds that the user needs to
segmentation from a multi-granularity perspective, looking for                   choose accurately, and in most cases it is difficult to understand
a better understanding of the problem and for an automatic,                      what value is the best; (ii) such thresholds are global, i.e. the same
parameter-free and user-adaptive solution that flexibly adjusts                  threshold value applies to all the moving individuals, irrespective
the segmentation criteria to the specific user under study. Experi-              of any distinctive characteristics they might have. The reason of
ments over real data and comparison against simple competitors                   the latter is that, while an overall evaluation might be performed
show that the flexibility of the proposed method has a positive                  to guide the choice of a global threshold, doing that separately
impact on results.                                                               for each individual might be impossible if their number is huge.
                                                                                     In this work we try to overcome the limitations highlighted
KEYWORDS                                                                         above, providing a general methodology that inspects the mobil-
                                                                                 ity of the individual, and identifies segmentation thresholds that
Mobility Data Mining, Segmentation, User Modeling
                                                                                 apparently match her mobility features. The process allows to
                                                                                 get rid of any input parameter, adapts thresholds to each single
1    INTRODUCTION                                                                individual and, most importantly, is completely automatic, thus
Thanks to the wide diffusion of localization technologies and                    applicable to large pools of users.
mobile services based on the positioning of users and devices, the
availability of mobility traces is increasing fast in several applica-              The paper is organized as follows: Section 2 discusses the re-
tion domains. Location-based services provided through smart-                    lated works and how our proposal differs from existing solutions;
phones are nowadays extremely popular, from nearby restaurant                    Section 3 provides some preliminary definitions; Section 4 defines
suggestions to travel assistants, and in the near future all cir-                the problem we want to tackle; Section 5 introduces our proposed
culating vehicles will be equipped with localization capabilities                method to solve the problem; Section 6 defines some evaluation
for their continuous monitoring. The vast amounts of data that                   measures to quantify the quality of a segmentation; Section 7
this trend leads to produce and collect open the door to several                 provides empirical quantitative and qualitative evaluations of
opportunities of converting them into better services, economical                results, also comparing against a few baselines; finally, Section
returns, more sustainable cities, improved living conditions, etc.               8 closes the paper with some conclusions and pointers towards
All this starts from appropriate mobility analysis operations able               future developments.
to extract from raw data usable and useful information, such as
deeper domain knowledge, patterns, models and forecasts.
   In mobility analytics one of the fundamental concepts is move-                2    RELATED WORK
ment, meaning with that the part of mobility data that describes                 Segmentation is a technique for decomposing a given sequence
a transfer from one place where the individual (or the object)                   into homogeneous and possibly meaningful pieces, or segments
was staying, to another one were the user will stop. Identifying                 such that the data in each segment describe a simple event or
movements in the raw stream of positions, for instance the con-                  structure. Segmentation methods are widely used for extracting
tinuous flow of GPS traces of a vehicle, is essential yet non-trivial.           structures from sequences, and are applied in a large variety of
While it is simple to define a stop in geometrical terms, it is much             contexts [22]: time series [4, 9], genomic sequences [15, 17, 18],
less clear how to define significant stops, i.e. stops that might                and text [12], to cite a few.
have some meaning for the user (for instance, stopping to do                        The segmentation of human trajectories is a very valuable task
some activity before leaving), as opposed to stops that are simply               as it enables the development of mobility data models [7, 19] and
incidental (for instance, due to a small traffic jam).                           applications like carpooling [6], or trajectory prediction [24]. Var-
   Practitioners in the mobility analytics domain defined several                ious simple approaches are currently adopted in practice. In [23]
simple strategies to select stops in the mobility data stream (a                 human trajectories are extracted adopting a predefined rule based
brief account of literature on this topic is provided in the next                on a pair of spatio-temporal parameters regulating the end of a
© 2020 Copyright held by the owner/author(s). Published in the Workshop Pro-
                                                                                 trajectory and the start of the subsequent one. Similarly, in [8]
ceedings of the EDBT/ICDT 2020 Joint Conference, March 30-April 2, 2020: on      the trajectory is divided into subsequent trips if the time inter-
CEUR-WS.org                                                                      val of “nonmovement” exceeds a certain threshold. In [26] it is
Use permitted under Creative Commons License Attribution 4.0 International (CC
BY 4.0).                                                                         described a change-point-based segmentation approach for GPS
trajectories according to the transportation means adopting a uni-          Definition 3.1 (Individual trajectory). Given a user u, her Indi-
versal threshold for determining whether a segment is “walk” or          vidual Trajectory Tu is the sequence of n points Tu = ⟨p1, . . . , pn ⟩
“nonwalk”. The work in [3] presents a theoretical framework that         that describes her position in time, where each point p ∈ Tu is
computes an optimal segmentation by using several criteria (e.g.,        defined as a triple p = (p.x, p.y, p.t), representing its spatial co-
speed, direction, location disk) that are satisfied in each partition,   ordinates x and y and the corresponding timestamp t. Moreover,
thus making the approach local, and applied computational ge-            points are in chronological order, i.e. ∀1 < i ≤ n.pi−1 .t < pi .t.
ometry methods. However, their methods are general and not
                                                                            Definition 3.2 (Pseudo-stop duration). Given an individual tra-
clearly applicable to the human trajectory context, where a trip
                                                                         jectory T = ⟨p1, . . . , pn ⟩ and a spatial threshold σ , the Pseudo-
can be complex and not show the geometrical/movement unifor-
                                                                         stop duration associated to point pi is defined as SD(T , i) =
mity the methods look for. Finally, each criterion corresponds
                                                                         min{p j .t − pi .t |i < j ≤ n ∧ d(pi , p j ) > σ }, where d represents
to thresholds that the user must set, without clear guidelines on
                                                                         the geometrical Euclidean or geographical distance.
how to choose them.
    The authors of [25] segment the trajectories in two steps.               Notice that the last point pn will have SD(T , n) = min ∅ = ∞.
The first segmentation is performed by means of simple policies
                                                                            Definition 3.3 (Segmented trajectory). Given a trajectory T =
with respect to temporal and/or spatial predefined constraints.
                                                                         ⟨p1, . . . , pn ⟩, a spatial threshold σ and a temporal threshold τ ,
Then, the trajectories are divided into stops and moves observing
                                                                         we define the (σ, τ )-segmentation of T as T σ ,τ = ⟨S 1, . . . , Sm ⟩,
variations of the speed of the object. If the variations of the speed
                                                                         such that:
is below a speed threshold and there is a sufficient number of
observations, then the portion of trajectory is annotated as a                (i) ∀1 ≤ i ≤ m.∃1 ≤ s < e ≤ n : Si = ⟨ps , ps+1, . . . , pe ⟩
                                                                                       Ðm
stop. The speed threshold is not general but changes according               (ii)         i=1 set(Si ) = set(T )
to the user behavior and also to the surrounding of the stop.               (iii) ∀1 ≤ i ≤ m.∀1 ≤ j ≤ |Si | : SD(Si , j) > τ ⇔ j = |Si |
In [20] is defined a measure of the density of the points in the            (iv) ∀1 ≤ i ≤ m.Si is maximal
neighbourhood of each trajectory point, the Spatio-Temporal              where set(I ) = {p ∈ I }.
Kernel Window (STKW) statistics. To determine the start and end
                                                                            Conditions (i) and (ii) imply that the segments of the seg-
points of segments, the algorithm looks for maximal changes in
                                                                         mented trajectory of T form a partitioning of the elements of T
STKW values. The focus of the approach is on capturing changes
                                                                         in the strictly mathematical sense. Moreover, conditions (iii) and
of transportation mode, including stops, which are simply points
                                                                         (iv) state that all the points in a segment are movement points, i.e.
with low speed.
                                                                         their pseudo-stop duration is smaller than the given threshold,
    In addition to those mentioned above, several other solutions
                                                                         excepted the last point. Therefore, each point in T that has a high
to the trajectory segmentation problem have been proposed in
                                                                         pseudo-stop duration will act as a split point, and corresponds to
literature, yet with objectives different from ours. For example,
                                                                         a distinct partition in T σ ,τ .
cost-function based strategies were presented in [11][10], while
clustering-based ones are introduced in [13] [14]. All these ap-
proaches are focused on splitting a movement into homogeneous
                                                                         4    PROBLEM FORMULATION
parts, rather than discovering significant stops, which is the pur-      Existing trajectory segmentation methods assume that the same
pose of this paper.                                                      rules and the same parameters should apply to all moving objects.
                                                                         Since different objects can show very different movement char-
   In this work we provide a segmentation method that, opposed           acteristics, the above assumption leads to make choices that on
to most of the approaches mentioned above, is not based on fixed         average fit best the dataset, yet potentially making sub-optimal
space and/or time thresholds to be fixed by the user – this is           choices on single individuals.
the case, for instance, of [8, 23, 25, 26]. Instead, we aim to make         Our objective is to overcome this limitation, making the seg-
the segmentation parameter-free and also adaptive to the single          mentation process adaptive to the individual and taking into con-
user’s data, giving the opportunity to have different kinds of           sideration her overall mobility. Our problem statement extends
segmentation over different users. Also, our approach is com-            the traditional formulation of segmentation as a threshold-based
plementary to the STKW-based one [20], as the latter aims to             operation, thus the core issue is to find good parameter values
differentiate movements with different speed profiles, including         for each user.
stops as a particular example, while we focus on stop timing
and try to understand which stops are actually significant (e.g.            Definition 4.1 (Individual cut threshold problem). Given an Indi-
not too short) for the user. A similar work was proposed in [5].         vidual TrajectoryTu , and a global spatial threshold σ , the problem
Here the authors proposed a new approach called Octal Window             is to identify the temporal threshold τ that yields the optimal
Segmentation(OWS) for unsupervised trajectory segmentation.              segmentation T σ ,τ .
The intuition behind their approach is that when a moving object            Since the number of moving objects can be very large, the
changes behavior, this shift may be detected using only its geolo-       process must be completely automatized and require no human
cation over time. So the work focuses on finding these changes           intervention. In Section 5 we will introduce a simple and effective
only from the object’s coordinates using interpolation methods           approach to solve the problem and thus find a suitable value of
to generate an error signal. This error signal is then used as a         τ for each user. In addition, some basic guidelines to choose a
criterion to split the trajectories into sub-trajectories.               value for the global spatial parameter will be provided.

3    SETTING THE STAGE                                                   5    PROPOSED METHOD
We start by defining trajectory segmentation based on a spatial          The proposed solution to the individual cut threshold problem
and a temporal threshold, in a way similar to standard approaches        consists in fixing the spatial threshold to a global value (i.e. to be
in literature.                                                           used for all users) and then in studying the segmentations that we
would obtain by applying different temporal thresholds. We will              Algorithm 1: ATS(T , σ )
start describing the process for choosing the temporal threshold,
                                                                            1   Input: Individual trajectory T , spatial threshold σ
which is the central part of the solution, and later discuss how
                                                                            2   Output: Cut threshold τ
the spatial one can be chosen.
                                                                            3 S = ⟨ SD(T , i) | 1 ≤ i ≤ |T | ⟩;
                                                                            4 F = frequency distribution of S values (F (a) = |{a ∈ S }|);
5.1    Self-Adaptive Trajectory Segmentation
                                                                            5 C = {t |t ∈ ranдe(F ) ∧ TT (F (t), ⟨F (t )|t > t⟩) = true};
                                                                                                                      ′ ′
When very small values of τ are used, the segmentation obtained                    // TT (a, B) = Modified Thompson Tau Test of a vs. set B
                                                                            6 return min C
will contain a huge number of very short segments, till the ex-
treme case where each point forms its own segment. As the
threshold is increased, more and more segments will merge to-
gether, since some of the former splitting points will fall below
τ . The process is expected to gradually enlarge the trajectory
segments by first including simple slowdowns (i.e. not really stop
points), then temporary stops (e.g. at traffic lights), and so on.
    Our approach consists in (virtually) monitoring such progres-
sion, and detect the moment where an anomalous increase in
the number of segments is observed, which represents a sort of
change of state. This follows the same kind of idea adopted in vari-
ous unsupervised classification contexts, such as the knee method           Figure 1: Frequency distribution of pseudo-stop durations
for deciding the number k of clusters for the k-means algorithm,            for a user trajetory (left), and the durations of the seg-
or analogous solutions to choose the radius for density-based               ments obtained using a specific threshold to cut the tra-
clustering (e.g. DBScan).                                                   jectory (right). The threshold used corresponds to the ver-
    In our solution, rather than relying on visual or similar heuris-       tical line on the left image.
tic criteria, we will base the threshold selection on a statistical test.
In particular, we will adopt the Modified Thompson Tau Test [2]
which, basically, checks whether a given value fits the distribu-           5.2     Fixing the spatial threshold
tion of the rest of the data or not. Since we look for anomalous            In our approach, the threshold σ represents the minimum dis-
values in a sequence, we apply the test iteratively, comparing              tance between two (consecutive) points that can be considered
each value n(t) (the number of segments obtained with τ = t)                as a movement, and the temporal parameter is indeed measured
against the values n(t ′ ) obtained for larger thresholds t ′ .             as the time needed to make a movement. A simple way to fix
    This process yields a set of thresholds that have an anomalous          its value is to adopt the minimum value that, according to the
number of partitions as compared to the successive thresholds.              accuracy of our dataset, cannot be mistaken for a positioning
Among them, we simply choose the smallest one, thus deciding                error, for instance due to GPS uncertainty. In our experiments we
to select the segments that emerge at the first change of state,            adopt road vehicle GPS traces that are expected to have errors not
also representing shorter and finer granularity movements.                  larger than 10 meters, therefore we could fix σ = 20 (the worst
    The procedure, named ats (self-Adaptive Trajectory Segmen-              case distance between two points that have the maximal error in
tation) is summarized in Algorithm 1. Step 3 collects the pseudo-           opposite directions). We decided to slightly increase it to 50 in
stop durations SD of all the points i that make up the segment,             order to stay on the safe side, also to take into account that errors
and step 4 computes the frequency F of each value, basically rep-           are slightly higher than average in urban centers, which is the
resenting the number of new segments obtained using that value              application context where our experiments are performed. Since
as τ w.r.t. the previous smaller thresholds. In our implementa-             we do not have data source from other kind of transport (ships,
tion such frequency distribution is computed through smoothed               planes, etc.) the selected threshold seems to meet our purposes.
histograms, grouping values into bins of 1-minute width. Fig-               However, empirical results confirm that the value of the global
ure 1(left) shows the frequency distribution of a sample trajectory,        parameter σ is not critical, as our approach shows a low sensitiv-
the vertical line corresponding to a possible cut point. The re-            ity to it. For this reason, the value we chose in our experiments
sulting set of segments obtained is described in Figure 1(right)            (50 meters) can be considered a good guess for generic vehicle
in terms of segments duration. Finally, step 5 selects the fre-             GPS data. Other data sources with a higher spatial uncertainty
quency values that appear to be anomalous (based on the Modi-               might require larger values.
fied Thompson Tau Test) w.r.t. the frequency of larger thresholds,
and step 6 returns the earliest time threshold that has an anoma-           6     EVALUATION MEASURES
lous frequency.                                                             The reconstruction error generally used for evaluating segmen-
    Computational complexity. The cost of Algorithm 1 is dom-               tation problems [1] just measures how well each segment can
inated by step 3, since the computation of each pseudo-stop du-             be approximated with one value, and thus seems not to fit with
ration (SD) could in principle require to scan all the remaining            trajectory segmentation. Therefore, similarly to clustering evalu-
points of the individual trajectory, thus yielding a O(n2 ) cost,           ation, we propose three internal evaluation measures [21]. Let
where n is the size of the individual trajectory. However, in prac-         T be the sequence of n points and TS = ⟨S 1, . . . , Sm ⟩ its seg-
tical applications the trajectory portion needed for each SD is             mentation. We denote with At = duration(T ) = pn .t − p1 .t the
relatively small, leading to a quasi-linear cost. The remaining             total elapsed time from the first point of p1 ∈ T to the last point
                                                                            pn ∈ T , and Ad = lenдth(T ) = n−1i=1 d(pi , pi+1 ) the total distance
parts of the algorithm can be realized a linear time, including the                                         Í
Modified Thompson Tau Test which can be computed for each                   covered by the trajectory, computed by considering every couple
                                                                            of subsequent points in T . Let Mt = Si ∈TS duration(Si ) be the
points through incremental updates.                                                                               Í
                                                                         method     MF .25    TP      DC    ratio sr   #segms (avg ± std)
                                                                           ats       .951    .951    .981    0.049        837.34±854.52
                                                                         fts 120     .925    .996    .456    0.015       592.26 ±652.78
                                                                         fts 1200    .948    .947    .997    0.053       746.28 ± 733.96
                                                                          rts 1      .279    .268    .722    0.700      2094.85± 2472.36
                                                                          rts 2      .124    .118    .877    0.883       899.59 ± 926.03
                                                                        Table 1: Evaluation on Rome data. The first three columns
                                                                        show the measures adopted to test our new approach. The
Figure 2: Time threshold distributions for trajectories in              fourth one reports the ratio between the average sampling
Rome and London. The peaks show the ideal thresholds                    period of non-stop points over that of all points, and the
to be set to build the trajectories.                                    last column is the number of segments.


sum of the segments’ duration, i.e., the time spent driving, and         method     MF 25    TP      DC     ratio sr   #segms (avg ± std)
Md = Si ∈TS lenдth(Si ) be the sum of the segments’ length, i.e.,
       Í
                                                                           ats       .955    .953   .999    0.047        433.915±513.715
the distance traveled. Then, we define the following measures:           fts 120     .958    .961   .944    0.040      1131.829± 1431.810
      • time precision: TP = 1 − Mt /At                                  fts 1200    .952    .950   .999    0.050       359.545 ± 410.606
      • distance coverage: DC = Md /Ad                                    rts 1      .267    .256   .695    1.007      2833.718 ± 4203.049
      • mobility f-measure: MF β = (1+β 2 )·TP ·DC/((β 2 ·TP)+DC)         rts 2      .035    .033   .958    1.008       445.645 ±527.969
All measures range from zero to one. The higher the value the           Table 2: Evaluation on London data. The first three
better the result. The objective of these measures is to promote        columns show the measures adopted to test our new ap-
segmentations capturing long stops (time precision) yet also cov-       proach. The fourth one reports the ratio between the av-
ering most of the overall distance (distance coverage). These two       erage sampling period of non-stop points over that of all
objectives are conflictual, since making stops longer reduces the       points, and the last column is the number of trajectories.
number of points that contribute to the distance covered. The
mobility f-measure accounts for both aspects simultaneously. In
the experiments we adopt β = 0.25, which weighs time precision
much higher than distance coverage by augmenting the relevance
of missing precision in stop detection. The reason is that i) it is
relatively easy to guarantee an high distance coverage, and ii) the
main focus of the paper is on the temporal aspects of trajectory
partitioning.

7     EXPERIMENTS                                                       Figure 3: Boxplots for the MF .25 results. On the Rome data
We experimented the proposed self-adaptive trajectory segmen-           ats yields better results than the fts solutions, while in
tation approach (ats) described above over a real dataset of GPS        London all three produce almost the same results. The
vehicle traces. The results commented in the following refer to         variability of ats results is consistently smaller than the
2000 users of the area of Rome (Italy), and London (UK). The            other methods, which is a sign of robustness.
means and standard deviations of the sampling rate for the users
analyzed are 12194.67 ± 22575.66 and 4385.76 ± 9359.14, for Rome            Although every user has her own mobility behavior with its
and London respectively. The high values and their high variabil-       own mix of regular and more erratic behaviours [16], we observe
ity is due to the presence of several long time gaps, typically due     two clear peaks in the distributions for both Rome and London.
to parking periods.                                                     This means that with respect to ats we mainly recognize two
   In the following we first analyze the personal temporal thresh-      different types of users regarding to the minimum duration of the
olds returned by ats, then we propose a quantitative and qual-          stops. This supports the intuition behind our approach, namely
itative evaluation of the results for understanding the benefits        to have a self-adaptive procedure selecting a personalized best
of the novel method with respect to existing ones. We compare           temporal threshold for each user. Selecting one single threshold
ats against the trajectory segmentation method with fixed pa-           value for all the data might negatively affect the segmentation of
rameters proposed in [23] (fts temp-thr ). Moreover, we adopt as        some users, partitioning their trajectories either too much or too
baseline a random trajectory segmentation method that segments          little. The first peak is at about 600 seconds (∼ 10 minutes), while
the sequence of points T = ⟨p1, . . . , pn ⟩ into m equal-length seg-   the second peak at 1200 seconds (∼ 20 minutes). These values
ments (i) with m randomly extracted between 2 and n/2 (rts 1 ),         correspond to the temporal thresholds that the ats procedure
or (ii) with m set to the number of segments returned by the            uses to cut each trajectory. There is also a minority of users
proposed ats method (rts 2 ).                                           having values outside the two peeks.

7.1     Self-Adaptive Temporal Threshold                                7.2    Comparison of Evaluation Measures
We observe in Figure 2 the distribution of the time thresholds          In this section we compare the proposed self-adaptive trajectory
selected by ats for each user (vertical axis represents value fre-      segmentation approach with the other methods taken into ac-
quencies in log-scale).                                                 count. In Tables 1 and 2 we report the results obtained with all the
methods. The first three columns show the evaluation measures
described above. The fourth column shows the ratio between the
average sampling period of movement points (thus discarding the
stop portions of the user’s trajectory) and the average sampling
period of the full trajectory, while in the last one the average
number of segments with its standard deviation is given. In gen-
eral, we can observe that the best results were obtained with the
ats and fts methods, both for Rome and London. Analyzing the
ratio (fourth column) we can see that values are low for both
ats and the fts ones, meaning that the long stops are ignored
(i.e. they are recognized as real stops) and just the short ones are
considered. On the contrary, with the random approaches the
ratio is bigger because the algorithm function evaluates all stops
in the same way. Looking at the number of segments it is possible
to note that fts and ats methods produce different quantities,
especially the fts120 result produces less segments in the Rome
case and much more in London. About the last two approaches,
the rts1 method works with a random number of segments, so
it is normal that the final result differs from the others, while the
rts2 takes as number of segments the same of the ats approach           Figure 4: Distributions of average number of points per
so we aspect to achieve similar results.                                segment in Rome (left) and London (right).
    For the evaluation measures we can see that our new approach
reached the goal we expected, i.e., yielding a quality of results
which is always comparable or higher than fixed-threshold ap-
proaches and more robust. Indeed, for both Rome and London the
values obtained by ats are compatible with the fts results, even
better in the MF .25 for Rome and in the distance coverage for
London. In particular, in the Rome example, having a high MF .25
values means that also the time precision and the distance cover-
age are well correlated in a way that produce a satisfying result.
If we see the fts120 result we can note that the time precision is
high but the distance coverage is very low because the algorithm
builds short trajectories with few points. An analogous reason-
ing can be done analyzing the fts1200 method which produces
an excellent distance coverage score but a lower time precision.
Our solution reaches a good balance, thanks to its self-adaptive
characteristic that allows to control and correct the trajectory
fragmentation, and all its evaluation measures are always either
the best or the second best of the group.
    To have a better understanding of the quality of our new
approach, the distribution of MF .25 values for the different ap-
proaches on the two datasets is shown in Figure 3 through a
boxplot visualization. For the Rome case we can observe that            Figure 5: Distribution of the number of trajectory seg-
with the ats approach the median value is the highest (closest          ments over Rome (left column) and London (right column)
to 1) and the inter-quartile range is smaller than the other two,       with each segmentation method (on the rows, grouped by
meaning that we have a smaller variabiliy and thus more robust          family).
results. The London case appears to be different, and the best
MF .25 values are obtained with the fts1200 , with a median similar
to ats and a slightly narrower box. This leaves room for future         segment for Rome and London. For all methods, the majority of
improvements of our methodology.                                        segments have less than 20 points, probably meaning that most of
                                                                        the trips take place within the city. However, in the distribution
                                                                        tails some long trajectories with more points emerge. We observe
7.3    Comparison of Segmentation Statistics                            that the distribution peaks of ats place somehow in between the
In the following we analyze other statistical indicators on the         peaks of the two fts variants (though closer to fts1200 , especially
trajectory segments extracted by the various methods. The next          in London) thus finding a trade-off between them. Moreover we
plots want to show other significant features for the segmentation      can see that London and Rome distributions are different: London
problem in order to compare their distribution and try to infer         has a wider distribution than Rome, meaning that the variety of
something more about the segmentation. In addiction discovering         trips is greater in London.
some hidden correlations between trajectory features and the               In Figure 5 are displayed the distributions of the average num-
segmentation approach could lead to a better understanding of           ber of segments per user. In London most of the users have less
the problem and highlight other relevant aspects. In Figure 4           than 20 trajectory segments. The peak of the distribution is be-
we report the distributions of the average number of points per         tween 5 and 10 segments. Between 30 and 100 segments the
Figure 6: Distributions of the average length (top) and du-
ration (bottom) for the trajectory segments returned by
ats (left) and fts (right) for the area of Rome.


distribution remains stable at a small value larger than zero. In
Rome we observe a similar result with a peak between 15 and 20
trajectories. Also in this case, the peak of ats distribution tends
to stay in the middle of the fts ones.
   In Figure 6 we compare the distribution of average length and
average duration of the segments returned by ats (left) and fts
(right) for the area of Rome. With the ats method the peak value
is around 10km, thus confirming that most of the trips are short,
and likely to take place around the city. With the fts methods
the peak position depends on the temporal threshold imposed:
with a threshold of 1200 seconds the average distance is similar
to ats, while with 120 seconds it becomes lower and close to 5
km. The results for the rts methods are omitted, since their plots
are very similar to the fts ones. Also, the plots in London show
exactly the same kind of behaviour observed on Rome.
   In terms of segment duration, ats yields a distribution with a
peak around 1200 − 1500 seconds (∼ 20 − 25 minutes). With the          Figure 7: Trajectory segmentation returned by fts 1200
fts methods the peaks change: for fts120 the peak is around 500        (left) and ats (right). The user is travelling from South
seconds while for fts1200 the peak is centered in 1800 seconds.        to North. Top: spatial representation showing the trajec-
Also in this case, the results on London are very similar and          tory segments. Center: temporal segmentation showing
omitted here.                                                          the inter-leaving time between GPS points. Bottom: zoom
                                                                       on the service area highlighted in the top maps where the
                                                                       user probably stops for ∼ 15 minutes. Best view in color.
7.4    Case Study
In this section we show qualitatively on a case study the effec-
tiveness of ats with respect to fts. In Figure 7 we report the
segmentation returned by fts 1200 [23] (left) and by ats (right),
the user is travelling from south to north. fts 1200 [23] returns      8   CONCLUSION
two trajectories (green and blue), while ats returns three trajecto-   The paper presented a user adaptive method for solving the
ries (green, orange and blue). The second line of plots report the     trajectory segmentation problem, a very common and useful
inter-leaving time between consecutive GPS points. The colors          task in mobility data mining, especially in preprocessing phases.
match the trajectory segments, while stops are highlighted in          Though preliminary, the experiments show that it is possible to
red. We observe how ats identifies the short stop of less than 15      derive user-adaptive cut thresholds, improving the performances
minutes at the service area similarly to the subsequent longer         of the segmentation over less flexible solutions. This is an ongoing
stop. On the other hand, fts 1200 considers the first stop as part     work, and several improvements are being explored. Among
of the green trajectory. The map in the bottom line of Figure 7        them, the future lines of research will aim to derive thresholds
shows the service area which is very close to the GPS points           for trajectory segmentation which are not only user-adaptive,
reported on the bottom right corner of the map. This case study        but also location-adaptive, thus considering the fact that a stop at
highlights how various existing stops under a certain predefined       different places might require time intervals of different duration
threshold can be missed with a segmentation approach like fts,         to be considered a significant stay – and thus a trajectory cut
while a more data-driven and self-adaptive method like ats is          point. Also, we will study the possibility of exploiting the context
able to take into account specific user behavior and return a          around the (moving) user, such as the mobility of other users and
better result.                                                         the geographical area surrounding the candidate stops.
ACKNOWLEDGMENTS
This work is partially supported by the European Community
H2020 programme under the funding scheme Track &Know (Big
Data for Mobility Tracking Knowledge Extraction in Urban Ar-
eas), G.A. 780754, https://trackandknowproject.eu/.

REFERENCES
 [1] Ella Bingham. 2010. Finding segmentations of sequences. In Inductive
     Databases and Constraint-Based Data Mining. Springer, 177–197.
 [2] Ronald Bremer. 1995. Outliers in statistical data. Taylor & Francis.
 [3] Maike Buchin et al. 2010. An algorithmic framework for segmenting trajecto-
     ries based on spatio-temporal criteria. In SIGSPATIAL. ACM, 202–211.
 [4] Harmen J Bussemaker et al. 2000. Regulatory element detection using a
     probabilistic segmentation model. In Proceedings of the International Conference
     on Intelligent Systems for Molecular Biology. 67–74.
 [5] Mohammad Etemad et al. 2019. A Trajectory Segmentation Algorithm Based
     on Interpolation-based Change Detection Strategies. In EDBT/ICDT Work-
     shops.
 [6] Riccardo Guidotti et al. 2017. Never drive alone: Boosting carpooling with
     network analysis. IS 64 (2017), 237–257.
 [7] Riccardo Guidotti et al. 2017. There’s a path for everyone: A data-driven
     personal model reproducing mobility agendas. In DSAA. IEEE, 303–312.
 [8] Sini Guo et al. 2018. GPS trajectory data segmentation based on probabilistic
     logic. International Journal of Approximate Reasoning 103 (2018), 227–247.
 [9] Johan Himberg et al. 2001. Time series segmentation for context recognition
     in mobile devices. In ICDM. IEEE, 203–210.
[10] Amílcar Soares Júnior et al. 2015. GRASP-UTS: an algorithm for unsupervised
     trajectory segmentation. International Journal of Geographical Information
     Science 29, 1 (2015), 46–68.
[11] Amílcar Soares Júnior et al. 2018. A semi-supervised approach for the semantic
     segmentation of trajectories. In 19th IEEE International Conference on Mobile
     Data Management (MDM). 145–154.
[12] Victor Lavrenko et al. 2000. Mining of concurrent text and time series. In KDD
     Workshop on Text Mining, Vol. 2000. 37–44.
[13] Jae-Gil Lee et al. 2007. Trajectory Clustering: A Partition-and-Group Frame-
     work. In ACM SIGMOD. ACM, 593–604.
[14] Luis Leiva and Enrique Vidal. 2013. Warped K-Means: An algorithm to cluster
     sequentially-distributed data. Information Sciences 237 (07 2013), 196–210.
[15] Wentian Li. 2001. DNA Segmentation as a Model Selection Process. In Pro-
     ceedings of the Fifth Annual International Conference on Computational Biology
     (RECOMB ’01). ACM, 204–210.
[16] Luca Pappalardo et al. 2015. Returners and explorers dichotomy in human
     mobility. Nature communications 6 (2015), 8166.
[17] Adam Pavlıček et al. 2002. A compact view of isochores in the draft human
     genome sequence. FEBS letters 511, 1-3 (2002), 165–169.
[18] Vasily E Ramensky et al. 2000. DNA segmentation through the Bayesian
     approach. Journal of Computational Biology 7, 1-2 (2000), 215–231.
[19] Salvatore Rinzivillo et al. 2014. The purpose of motion: Learning activities
     from individual mobility networks. In DSAA. IEEE, 312–318.
[20] Katarzyna Siła-Nowicka et al. 2016. Analysis of human mobility patterns from
     GPS trajectories and contextual information. IJGIS 30, 5 (2016), 881–906.
[21] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. 2018. Introduction to
     data mining. Pearson Education India.
[22] Evimaria Terzi and Panayiotis Tsaparas. 2006. Efficient algorithms for se-
     quence segmentation. In SDM. SIAM, 316–327.
[23] Roberto Trasarti et al. 2011. Mining mobility user profiles for car pooling. In
     KDD. ACM, 1190–1198.
[24] Roberto Trasarti et al. 2017. Myway: Location prediction via mobility profiling.
     IS 64 (2017), 350–367.
[25] Zhixian Yan et al. 2013. Semantic trajectories: Mobility data computation and
     annotation. ACM TIST 4, 3 (2013), 49.
[26] Yu Zheng et al. 2011. Recommending friends and locations based on individual
     location history. ACM Transactions on the Web 5, 1 (2011), 5.

</pre>