<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Quality Metrics for Optimizing Parameters Tuning in Clustering Algorithms for Extraction of Points of Interest in Human Mobility</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Miguel Nue˜z del Prado Cortez Peru I</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I Technopark IDI miguel.nunez@peruidi.com</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hugo Alatrista-Salas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GRPIAA Labs.</institution>
          ,
          <addr-line>PUCP</addr-line>
          <country>Peru I</country>
        </aff>
      </contrib-group>
      <fpage>14</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>Clustering is an unsupervised learning technique used to group a set of elements into nonoverlapping clusters based on some predefined dissimilarity function. In our context, we rely on clustering algorithms to extract points of interest in human mobility as an inference attack for quantifying the impact of the privacy breach. Thus, we focus on the input parameters selection for the clustering algorithm, which is not a trivial task due to the direct impact of these parameters in the result of the attack. Namely, if we use too relax parameters we will have too many point of interest but if we use a too restrictive set of parameters, we will find too few groups. Accordingly, to solve this problem, we propose a method to select the best parameters to extract the optimal number of POIs based on quality metrics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The first step in inference attacks over mobility
traces is the extraction of the point of interest (POI)
from a trail of mobility traces. Indeed, this phase
impacts directly the global accuracy of an inference
attack that relies on POI extraction. For instance,
if an adversary wants to discover Alice’s home and
place of work the result of the extraction must be as
accurate as possible, otherwise they can confuse or
just not find important places. In addition, for a more
sophisticated attack such as next place prediction, a
mistake when extracting POIs can decrease
significantly the global precision of the inference. Most
of the extraction techniques use heuristics and
clustering algorithms to extract POIs from location data.
On one hand, heuristics rely on the dwell time, which
is the lost of signal when user gets into a building.
Another used heuristic is the residence time, which
represents the time that a user spends at a
particular place. On the other hand, clustering algorithms
group nearby mobility traces into clusters.</p>
      <p>In particular, in the context of POI extraction, it is
important to find a suitable set of parameters, for a
specific cluster algorithm, in order to obtain a good
accuracy as result of the clustering. The main
contribution of this paper is a methodology to find a
“optimal” configuration of input parameters for a
clustering algorithm based on quality indices. This
optimal set of parameters allows us to have the
appropriate number of POIs in order to perform another
inference attack. This paper is organized as follows.
First, we present some related works on parameters
estimation techniques in Section 2. Afterwards, we
describe the clustering algorithms used to perform
the extraction of points of interests (POIs) as well as
the metrics to measure the quality of formed clusters
in sections 3 and 4, respectively. Then, we introduce
the method to optimize the choice of the parameters
in Section 5. Finally, Section 6 summarizes the
results and presents the future directions of this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>
        Most of the previous works estimate the parameters
of the clustering algorithms for the point of interest
extraction by using empirical approaches or highly
computationally expensive methods. For instance,
we use for illustration purpose two classical
clustering approaches, K-means
        <xref ref-type="bibr" rid="ref14">(MacQueen et al., 1967)</xref>
        and DBSCAN
        <xref ref-type="bibr" rid="ref7">(Ester et al., 1996)</xref>
        . In the former
clustering algorithm, the main issue is how to
determine k, the number of clusters. Therefore, several
approaches have been proposed to address this issue
        <xref ref-type="bibr" rid="ref11 ref15">(Hamerly and Elkan, 2003; Pham et al., 2005)</xref>
        . The
latter algorithm relies on OPTICS
        <xref ref-type="bibr" rid="ref1">(Ankerst et al.,
1999)</xref>
        algorithm, which searches the space of
parameters of DBSCAN in order to find the optimal
number of clusters. The more parameters the clustering
algorithm has, the bigger the combinatorial space of
parameters is. Nevertheless, the methods to calibrate
cluster algorithm inputs do not guarantee a good
accuracy for extracting meaningful POIs. In the next
section, we described the cluster algorithms used in
our study.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Clustering algorithms for extraction of points of interest</title>
      <p>To perform the POI extraction, we rely on the
following clustering algorithms:
3.1</p>
      <sec id="sec-3-1">
        <title>Density Joinable Cluster (DJ-Cluster)</title>
        <p>
          DJ-Cluster
          <xref ref-type="bibr" rid="ref17">(Zhou et al., 2004)</xref>
          is a clustering
algorithm taking as input a minimal number of points
minpts, a radius r and a trail of mobility traces
M . This algorithm works in two phases. First, the
pre-processing phase discards all the moving points
(i.e. whose speed is above ✏, for ✏ a small value)
and then, squashes series of repeated static points
into a single occurrence for each series. Next, the
second phase clusters the remaining points based
on neighborhood density. More precisely, the
number of points in the neighborhood must be equal or
greater than minpts and these points must be within
radius r from the medoid of a set of points. Where
medioid is the real point m that minimizes the sum of
distances from the point m to the other points in the
cluster. Then, the algorithm merges the new cluster
with the clusters already computed, which share at
least one common point. Finally, during the
merging, the algorithm erases old computed clusters and
only keeps the new cluster, which contains all the
other merged clusters.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Density Time Cluster (DT-Cluster)</title>
        <p>
          DT-Cluster
          <xref ref-type="bibr" rid="ref12">(Hariharan and Toyama, 2004)</xref>
          is an
iterative clustering algorithm taking as input a distance
threshold d, a time threshold t and a trail of
mobility traces M . First, the algorithm starts by building
a cluster C composed of all the consecutive points
within distance d from each other. Afterwards, the
algorithm checks if the accumulated time of
mobility traces between the youngest and the oldest ones
is greater than the threshold t. If it is the case, the
cluster is created and added to the list of POIs.
Finally as a post-processing step, DT-Cluster merges
the clusters whose medioids are less than d/3 far
from each other.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Time Density (TD-Cluster)</title>
        <p>
          Introduced in
          <xref ref-type="bibr" rid="ref9">(Gambs et al., 2011)</xref>
          , TD-Cluster is
a clustering algorithm inspired from DT Cluster,
which takes as input parameters a radius r, a time
window t, a tolerance rate ⌧ , a distance threshold d
and a trail of mobility traces M . The algorithm starts
by building iteratively clusters from a trail M of
mobility traces that are located within the time window
t. Afterwards, for each cluster, if a fraction of the
points (above the tolerance rate ⌧ ) are within radius
r from the medoid, the cluster is integrated to the list
of clusters outputted, whereas otherwise it is simply
discarded. Finally, as for DT Cluster, the algorithm
merges the clusters whose medoids are less than d
far from each other.
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Begin-end heuristic</title>
        <p>
          The objective of the begin and end location finder
inference attack
          <xref ref-type="bibr" rid="ref8">(Gambs et al., 2010)</xref>
          is to take as
meaningful points the first and last of a journey.
More precisely, this heuristic considers that the
beginning and ending locations of a user, for each
working day, might convey some meaningful
information.
        </p>
        <p>Since we have introduced the different clustering
algorithms to extract points of interest, we present in
the next section the indices to measure the quality of
the clusters.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Cluster quality indices</title>
      <p>One aspect of the extraction of POIs inference
attacks is the quality of the obtained clusters, which
impacts on the precision and recall of the attack.
In the following subsection we describe some
metrics to quantify how accurate or “how good“ is the
outcome of the clustering task. Intuitively, a good
clustering is one that identifies a group of clusters
that are well separated one from each other, compact
and representative. Table 1 summarizes the notation
used in this section.</p>
      <sec id="sec-4-1">
        <title>Symbol</title>
        <p>C
ci
nc
mi
d(x, y)
|ci|
m0
m00
|C|</p>
      </sec>
      <sec id="sec-4-2">
        <title>Definition</title>
        <p>An ensemble of clusters.</p>
        <p>The ith cluster of C.</p>
        <p>The number of clusters in C.</p>
        <p>The medoid point of the ith cluster.</p>
        <p>The Euclidean distance between x and y.</p>
        <p>The number of points in a cluster ci.</p>
        <p>The closest point to the medoid mi.</p>
        <p>The second closest point to the medoid mi.</p>
        <p>
          The total number of points in a set of C.
The intra-inter cluster ratio
          <xref ref-type="bibr" rid="ref13">(Hillenmeyer, 2012)</xref>
          measures the relation between compact (Equation 1)
and well separated groups (Equation 3). More
precisely, we first take the inter-cluster distance, which
is the average distance from each point in a cluster
ci to its medoid mi.
        </p>
        <p>DIC(ci) =
d(xj , mi)
(1)
1
|ci|
1
|ci|</p>
        <p>X
xj2 ci,xj6=mi
Then, the average intra-cluster distance (DIC) is
computed using Equation 2.</p>
        <p>AV G DIC(C) =
Afterwards, the mean distance among all medoids
(DOC) in the cluster C is computed, using Equation
3.</p>
        <p>1
|nC | ⇥ (|nC |
1)
|C|
X
|C|</p>
        <p>
          X
ci2 C cj2 C,i6=j
Inspired by the Ben-David and Ackerman
          <xref ref-type="bibr" rid="ref2">(BenDavid and Ackerman, 2008)</xref>
          k-additive Point
Margin (K-AM) metric , which evaluates how well
centered clusters are. We measure the difference
between the medoid mi and its two closest points m0
and m00 of a given group ci belonging to a cluster C
(Equation 5).
        </p>
        <p>K</p>
        <p>AM (ci) = d(mi, m0i0)
d(mi, m0i)
(5)
Since the average of the k-additive point margins for
all groups ci in a cluster C is computed, we take the
ratio between the average k-additive Point Margin
and the minimal inter-cluster distance (Equation 1)
as shown in Equation 6.</p>
        <p>1 Pnc
AM (C) = minci2 C nc ci2 DCIKC(ci)
AM (ci)
(6)
The additive margin method has a linear complexity
in the number of clusters. This metric gives a high
value for a well centered clusters.
4.3</p>
        <sec id="sec-4-2-1">
          <title>Information loss</title>
          <p>
            The information loss ratio is a metric inspired by the
work of Sole and coauthors
            <xref ref-type="bibr" rid="ref16">(Sole´ et al., 2012)</xref>
            . The
basic idea is to measure the percent of information
that is lost while representing original data only by
a certain number of groups (e.g., when we represent
the POIs by the cluster medoids instead of the whole
set of points). To evaluate the percent of information
loss, we compute the sum of distance of each point
represented by xi to its medoid mi for all clusters
ci 2 C as we shown in Equation 7.
          </p>
          <p>nc |c|
SSE(C) = X X d(xj , mi)</p>
          <p>ci2 C xj2 ci
Then, we estimate the accumulated distance of all
points of a trail of mobility traces in the cluster C to
a global centroid (GC) using the following equation
Equation 8.</p>
          <p>|C|
SST (C) = X d(xi, GC)</p>
          <p>xi2 C
Finally, the ratio between aforementioned distances
is computed using Equation 9, which results in the
(7)
(8)
diam(ci) = maxx,y2 ci,x6=yd(x, y)
(10)</p>
          <p>DBI(C) =
information loss ratio.</p>
          <p>IL(C) =</p>
          <p>SSE(C)
SST (C)
The computation of this ratio has a linear
complexity. The lowest is the value of this ratio, the more
representative the clusters are.
4.4</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Dunn index</title>
          <p>
            This quality index
            <xref ref-type="bibr" rid="ref10 ref6">(Dunn, 1973; Halkidi et al., 2001)</xref>
            attempts to recognize compact and well-separated
clusters. The computation of this index relies on
a dissimilarity function (e.g. Euclidean distance d)
between medoids and the diameter of a cluster (c.f,
Equation 10) as a measure of dispersion.
          </p>
          <p>Then, if the clustering C is compact (i.e, the
diameters tend to be small) and well separated (distance
between cluster medoids are large), the result of the
index, given by the Equation 11, is expected to be
large.</p>
          <p>DIL(C) = minci2 C [mincj2 C,j=i+1
[
The greater is this index, the better the performance
of the clustering algorithm is assumed to be. The
main drawbacks of this index is the computational
complexity and the sensitivity to noise in data.
4.5</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Davis-Bouldin index</title>
          <p>
            The objective of the Davis-Bouldin index (DBI)
            <xref ref-type="bibr" rid="ref10 ref5">(Davies and Bouldin, 1979; Halkidi et al., 2001)</xref>
            is to
evaluate how well the clustering was performed by
using properties inherent to the dataset considered.
First, we use a scatter function within the cluster ci
of the clustering C (Equation 12).
          </p>
          <p>v</p>
          <p>n</p>
          <p>S(ci) = utu n1c xXj2ci d(xj , mi)2
Then, we compute the distance between two
different clusters ci and cj , given by Equation 13.
q
M (ci, cj ) =</p>
          <p>Afterwards, a similarity measure between two
clusters ci and cj , called R-similarity, is estimated, based
on Equation 14.
(14)
(16)
R(ci, cj ) =</p>
          <p>S(ci) + S(cj )</p>
          <p>M (ci, cj )
After that, the most similar cluster cj to ci is the
one maximizing the result of the function Rall(ci),
which is given by Equation 15 for i 6= j.</p>
          <p>Rall(ci) = maxcj2 C,i6=j R(ci, cj )
(15)
Finally, the DBI is equal to the average of the
similarity between clusters in the clustering set C
(Equation 16).
Ideally, the clusters ci 2 C should have the
minimum possible similarity to each other. Accordingly,
the lower is the DB index, the better is the
clustering formed. These indices would be used to
maximize the number of significant places a cluster
algorithm could find. More precisely, in the next section
we evaluate the cluster algorithm aforementioned as
well as the method to extract the meaningful places
using the quality indices.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Selecting the optimal parameters for clustering</title>
      <p>
        In order to establish how to select the best set
of parameters for a given clustering algorithm, we
have computed the precision, recall and F-measure
of all users of LifeMap dataset
        <xref ref-type="bibr" rid="ref3">(Chon and Cha,
2011)</xref>
        . One of the unique characteristic of this
dataset is that the POIs have been annotated by
the users. Consequently, given a set of clusters
ci 2 C such that C = {c1, c2, c3, . . . , cn} and a
set of points of interest (POIs) defined by the users
Ppoi = {ppoi 1, ppoi 2, ppoi 3, . . . , ppoi n} we were
able to compute the precision, recall and f-measure
as we detail in the next subsection.
5.1
      </p>
      <sec id="sec-5-1">
        <title>Precision, recall and F-measure</title>
        <p>To compute the recall (c.f. Equation 17), we take as
input a clustering set C, the ground truth represented
by the vector Ppoi (which was defined manually by
each user) as well as a radius to count all the
clusters c 2 C that are within the radius of ppoi 2 Ppoi,
which represents the ”good clusters”. Then, the ratio
of the number of good clusters compared to the total
number of found clusters is computed. This measure
illustrates the ratio of extracted cluster that are POIs
divided by the total number of extracted clusters.</p>
        <sec id="sec-5-1-1">
          <title>Characteristics</title>
          <p>Total nb of users
Collection period (nb of days)</p>
          <p>Average nb of traces/user</p>
          <p>Total nb of traces
Min #Traces for a user
Max #Traces for a user</p>
          <p>To compute the recall (c.f. Equation 18), we take
as input a clustering set C, a vector of POIs Ppoi as
well as a radius to count the discovered POIs ppoi 2
Ppoi within a radius of the clusters c 2 C, which
represents the ”good POIs”. Then, the ratio between
the number of good POIs and the total number of
POIs is evaluated. This metric represents the percent
of the extracted unique POIs.
(18)
(19)
Recall =</p>
          <p>good P OIs
total number of P OIs</p>
          <p>Finally, the F-measure is defined as the weighted
average of the precision and recall as we can see in
Equation 19.</p>
          <p>F
measure = 2 ⇥
precision ⇥ recall
precision + recall
We present the dataset used for our experiments in
the next subsection.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Dataset description</title>
        <p>
          In order to evaluate our approach, we use the
LifeMap dataset
          <xref ref-type="bibr" rid="ref4">(Chon et al., 2012)</xref>
          , which is
composed of mobility traces of 12 user collected for a
year in Seoul, Korea. This dataset comprises
location (latitude and longitude) collected with a
frequency between 2 to 5 minutes with the user defined
point of interest as true if the mobility trace is
considered as important or meaningfull for each user.
Table 2 summarizes the main characteristics of this
dataset, such as the collect period, the average
number of traces per user, the total number of mobility
traces in the dataset, the minimal and maximal
number of users’ mobility traces.
        </p>
        <p>Since we have described our dataset, we present
the results of our experiments in the next subsection.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3 Experimental results</title>
        <p>This section is composed of two parts, in the first
part we compare the performance of the previously
described clustering algorithms, with two
baseline clustering algorithms namely k-means and
DBSCAN. In the second part, a method to select the
most suitable parameters for a clustering algorithm
is presented.</p>
        <p>Input parameters
Tolerance rate (%)
Tolerance rate (%)
Minpts (points)
Eps (Km.)
Merge distance (Km.)
Time shift (hour)
K (num. clusters)</p>
        <p>Possible values
{0.75, 0.8, 0.85, 0.9}
{0.75, 0.8, 0.85, 0.9}
{3, 4, 5, 6, 7, 8, 9, 10, 20, 50}
{0.01, 0.02, 0.05, 0.1, 0.2}
{0.02, 0.04, 0.1, 0.2, 0.4}
{1, 2, 3, 4, 5, 6}
{5, 6, 7,8, 9}</p>
        <p>In order to compare the aforementioned clustering
algorithms, we have take into account the precision,
recall, F-measure obtained, average execution time,
number of input parameters and time complexity.
To evaluate these algorithms, we used the LifeMap
dataset with POIs annotation and a set of different
parameters configurations for each algorithm, which
are summarized in Table 3. After running these
configurations, we obtained the results shown in Table
4 for the different input values.</p>
        <p>It is possible to observe that the precision of
DJCluster out performs better than the other
clustering algorithms. In terms of recall DBSCAN and
TD-Cluster perform the best but DJ-Cluster is just
behind them. Moreover, DJ-Cluster has the best
F-measure. Regarding the execution time,
DTClustering the fastest one while DJ-Cluster is the
slowest algorithm due to the preprocessing phase.
Despite the high computational time of DJ-Cluster,
this algorithm performs well in terms of F-measure.</p>
        <p>
          In the following, we describe our method to
choose “optimal” parameters for obtaining a good
F-measure. We have used the aforementioned
algorithms with a different set of input parameters
configurations for users with POIs annotations in the
LifeMap dataset
          <xref ref-type="bibr" rid="ref3">(Chon and Cha, 2011)</xref>
          . Once
clusters are built, we evaluate the clusters issued from
different configurations of distinct algorithms
using the previously described quality indices.
Afterwards, we were able to estimate the precision, recall
and F-measure using the manual annotation of POIs
by the users in the LifeMap dataset.
        </p>
        <p>Regarding the relation between the quality indices
and the F-measure, we studied the relationship
between these factors, in order to identify the indices
that are highly correlated with the F-measure, as
can be observed in Figure 1. We observe that the
two best performing indices, except for k-means,
are IL and DBI. The former shows a negative
correlation with respect to the F-measure. While the
latter, has a positive dependency to the F-measure.
Our main objective is to be able to identify the
relationship between quality and F-measure among the
previous evaluated clustering algorithms.
Accordingly, we discard the inter-intra cluster ratio (RII)
and the adaptive margin (AM), which only perform
well when using k-means and the DJ clustering
algorithms. Finally, we observe that the Dunn index
has a poor performance. Based on these
observations, we were able to propose an algorithm to
automatically choose the best configuration of input
parameters.
5.4</p>
      </sec>
      <sec id="sec-5-4">
        <title>Parameter selection method</title>
        <p>Let us define a vector of parameters pi 2 P and P a
set of vectors, such that P = {p1, p2, p3, . . . , pn},
a trail of mobility traces M of a user. From
previous sections we have the clustering function
C(pi) and the quality metrics Information Loss
IL(C) and Davis-Bouldin index DBI(C). Thus,
for each vector of parameters we have a tuple
composed of the trail of mobility traces, the result
of the clustering algorithm and the quality metrics
(pi, M, Cpi , ILCpi , DBICpi ). When we compute
the clustering algorithm and the quality metrics for
each vector of parameter for a given user u. We
define also a 0u matrix, which the matrix u sorted by
IL ascending. Finally, the result matrix u is of the
form:
u =
p1
p2
p3
. . .
pn</p>
        <p>M
M
M
M</p>
        <p>Cp1
Cp2
Cp3
Cpn</p>
        <p>ILCp1
ILCp2
ILCp3
ILCpn</p>
        <p>DBICp1
DBICp2
DBICp3
DBICpn
0.6
0.4
0.2
e
ru0.0
s
a
e
m
_
F
0.6
0.4
0.2
0.0
k means</p>
        <p>Resume</p>
        <p>TD cluster</p>
        <p>Heuristic</p>
        <p>DBI
IL
IL_DBI
MAX
u1 u2 u3 u4 u5 u6 u7 u1 u2 u3 u4 u5 u6 u7 u1 u2 u3 u4 u5 u6 u7
User</p>
        <p>Therefore, the parameter selection function
S( u) could be defined as:</p>
        <p>(pi, if maxpi (DBI) &amp; minpi (IL)
S( u) =
p0i, if maxp0i (DBI) in 1st quartile
(20)</p>
        <p>In detail, the function S takes as input a matrix
containing the parameters vector pi, a trail of
mobility traces M , the computed clustering C(pi, M ) as
well as the quality metrics, such as Information loss
(IL(C)) and the Davis-Bouldin index (DBI(C)).
Once all these values have been computed for each
evaluated set of parameters, two cases are possible.
In the first case, both IL and DBI agree on the same
set of input parameters. In the second situation, both
IL and DBI refer each one to a different set of
parameters. In this case, the algorithm sorts the values
by IL in the ascending order (i.e., from the
smallest to the largest information loss value). Then, it
chooses the set of parameters with the greatest DBI
in the first quartile.</p>
        <p>For the sake of evaluation, our methodology was
tested using the LifeMap dataset to check if the
chosen parameters are optimal. We have tested the
method with the seven users of LifeMap that have
annotated manually their POIs. Consequently, for
every set of settings of each clustering algorithm, we
have computed the F-measure because we have the
ground truth as depicted in Figure 2. The ”MAX”
bar represents the best F-measure for the given user
and it is compare to the F-measures obtained when
using the ”DBI”, ”IL” or ”IL DBI” as indicators to
choose the best input parameters configuration.
Finally, this method has a satisfactory performance
extracting a good number of POIs for maximizing the
F-measure achieving a difference of only 9% with
respect to the F-measure computed from the data
with the ground truth.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the current paper, we have presented a method
to extract the a optimal number of POIs.
Consequently, based on the method described in tis paper,
we are able to find an appropriate number of POIs
relying only on the quality metrics of the extracted
clusters and without the knowledge of the ground
truth. Nonetheless, we are aware of the small size
of dataset but the results encourage us to continue in
this direction.</p>
      <p>Therefore, in the future we plan to test our method
in a larger dataset and in presence of noise like
downsamplig or random distortion. Another idea is
to evaluate the impact of this method in more
complex attacks like prediction of future locations or
deanonymization to verify if this step can affect the
global result of a chain of inference attacks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ankerst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breunig</surname>
            ,
            <given-names>M. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kriegel</surname>
            ,
            <given-names>H.-P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Optics: ordering points to identify the clustering structure</article-title>
          .
          <source>ACM SIGMOD Record</source>
          ,
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ben-David</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ackerman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Measures of clustering quality: A working set of axioms for clustering</article-title>
          .
          <source>In Proceedings of the Annual Conference on Neural Information Processing Systems</source>
          , pages
          <fpage>121</fpage>
          -
          <lpage>128</lpage>
          , Vancouver, Canada.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Chon</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Cha</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>LifeMap: A smartphonebased context provider for location-based services</article-title>
          .
          <source>Pervasive Computing</source>
          , IEEE,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <fpage>58</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Chon</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talipov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Cha</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>CRAWDAD data set yonsei/lifemap (v.</article-title>
          <year>2012</year>
          -
          <volume>01</volume>
          -03). Downloaded from http://crawdad.cs.dartmouth.edu/yonsei/lifemap.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Davies</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Bouldin</surname>
            ,
            <given-names>D. W.</given-names>
          </string-name>
          (
          <year>1979</year>
          ).
          <article-title>A cluster separation measure</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          , PAMI-
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>224</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Dunn</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          (
          <year>1973</year>
          ).
          <article-title>A fuzzy relative of the ISODATA process and its use in detecting compact wellseparated clusters</article-title>
          .
          <source>Journal of Cybernetics</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <fpage>32</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Ester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>peter Kriegel</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Sander</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise</article-title>
          .
          <source>Knowledge Discovery and Data Mining</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ):
          <fpage>226</fpage>
          -
          <lpage>231</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Gambs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Killijian</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>O.</surname>
          </string-name>
          ,
          <source>and Nu´n˜ez del Prado Cortez</source>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>GEPETO: A GEoPrivacy-Enhancing TOolkit</article-title>
          .
          <source>In Advanced Information Networking and Applications Workshops</source>
          , pages
          <fpage>1071</fpage>
          -
          <lpage>1076</lpage>
          , Perth, Australia.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Gambs</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Killijian</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>O.</surname>
          </string-name>
          ,
          <source>and Nu´n˜ez del Prado Cortez</source>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Show me how you move and I will tell you who you are</article-title>
          .
          <source>Transactions on Data Privacy</source>
          ,
          <volume>2</volume>
          (
          <issue>4</issue>
          ):
          <fpage>103</fpage>
          -
          <lpage>126</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Halkidi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batistakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Vazirgiannis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>On clustering validation techniques</article-title>
          .
          <source>Journal of Intelligent Information Systems</source>
          ,
          <volume>17</volume>
          (
          <issue>3</issue>
          ):
          <fpage>107</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Hamerly</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Elkan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Learning the k in Kmeans</article-title>
          .
          <source>In In Neural Information Processing Systems</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          , Vancouver, Canada.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Hariharan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Toyama</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Project lachesis: Parsing and modeling location histories</article-title>
          .
          <source>Lecture notes in computer science - Geographic information science</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <fpage>106</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Hillenmeyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Intra and inter cluster distance</article-title>
          . http://www.stanford.edu/ ˜maureenh/quals/html/ml/node82.html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>MacQueen</surname>
          </string-name>
          , J. et al. (
          <year>1967</year>
          ).
          <article-title>Some methods for classification and analysis of multivariate mbservations</article-title>
          .
          <source>In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability</source>
          , pages
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          , Berkeley, CA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>D. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimov</surname>
            ,
            <given-names>S. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Selection of K in K-means clustering</article-title>
          .
          <source>Journal of Mechanical Engineering Science</source>
          ,
          <volume>205</volume>
          (
          <issue>1</issue>
          ):
          <fpage>103</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Sole´</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Munte´s-</article-title>
          <string-name>
            <surname>Mulero</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Efficient microaggregation techniques for large numerical data volumes</article-title>
          .
          <source>International Journal of Information</source>
          Security - Special Issue:
          <article-title>Supervisory control and data acquisition</article-title>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <fpage>253</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frankowski</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ludford</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shekhar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Terveen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Discovering personal gazetteers: an interactive clustering approach</article-title>
          .
          <source>In Proceedings of the annual ACM international workshop on Geographic information systems</source>
          , pages
          <fpage>266</fpage>
          -
          <lpage>273</lpage>
          , New York, NY, USA.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>