<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Shape-Based Map Matching Approach for Geographic Transferability of Discriminative Subtrajectories</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>CristianoLandi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Guidott</string-name>
          <email>riccardo.guidotti@unipi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Map Matching, Geographic Transferability, Machine Learning, Discriminative Subtrajectories</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISTI-CNR</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>25</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>This paper addresses the challenge of map matching and geographic transferability in trajectory analysis. Existing methods often face limitations tied to specific coordinates or road networks. In response, we propoGseASM, a shape-based map matching method that relies solely on trajectory shapes, irrespective of geographic oGriAgSinM. introduces a symbolic road network representation, facilitating eficient searches based solely on trajectory shapes. Our experimentation, spanning over 5,000 km of roads, demonstrateGsASM's ability to accurately position trajectories with an impressive accuracy exceeding 90%. Notably,GASM stands as the first in the literature to perform shape-based symbolic map matching without prior knowledge of the geographic region.</p>
      </abstract>
      <kwd-group>
        <kwd>of Movelet by providing normalized subtrajectories</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
fectively applying them in another regio8n, 9[].</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <sec id="sec-2-1">
        <title>Points Of Interest (POI), feature engineerin1g,[2, 3, 4], or</title>
        <p>In recent years, the widespread adoption of cutting-edge Particularly noteworthy are recent advancements in
technologies equipped with Global Positioning Systemmachine learning leveraging shapelet-based
subtrajec(GPS) devices has enabled the recording of positions fortories [5, 6, 7]. Originating from the domain of time
various moving objects, ranging from cars and transporta-series analytics, shapelets represent discriminatory
subtion vehicles to phones and wearables. Unfortunatelsye,quences that encapsulate a collection of distinctive
the coordinates captured by these sensors often fail to ac-shapes, crucial for discerning specific classes [10].
Varicurately reflect real positions due to physical constraintosus approaches exist for defining discriminative
subtraand/or legal regulations. Nevertheless, in various applicaje-ctories. In [6], theMovelet method is introduced—an
tions, it is imperative to accurately align GPS trajectoriaepsproach for extracting discriminative subtrajectories
with a road network. For instance, in navigation ser-selected through a rigorous statistical test. During the
vices, map matching empowers drivers to monitor theirdiscovery phase,Movelet generates candidate
subtrajecexact locations and receive optimal routes to specifiedtories by extracting all possible subsequences with more
destinations. Conversely, in machine learning tasks, mapthan two contiguous observations, utilizing a sliding
winmatching enhances users’ mobility information by in-dow. Building upon the foundation laid bMy ovelet,
corporating knowledge related to the territory, such Gaseolet is introduced in [7]. This extension incorporates
a normalization step after the discovery phase. The
northe identification of discriminatory subsequences, such malization step is designed to ease the comparison of
as mobility shapelets5,[6, 7]. Without an appropriate discriminative subsequences with trajectories recorded
map-matching procedure, reliance on an expert becomes in diverse geographical regions. The underlying
rationecessary to determine which features can be extractednale is that a subtrajectory pinpointing a sudden break
from trajectories concerning the territory. However, thien a road segment in one city should exhibit similarities
reliance on ad-hoc features restricts the applicability otfo a subtrajectory associated with the same event in
anmachine learning methods and amplifies sensitivity to other city. This normalization enhances the method’s
input changes [1], rendering it unsuitable for geographic adaptability across various geographic contexts.
transferability. This implies the challenge of extracting While Geolet successfully addresses the limitation
Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 thereby enhancing geographic transferability
indepennEvelop-O
(R. Guidotti)
∗Corresponding author.
dent of specific GPS coordinates, it introduces a potential
vulnerability tied to the road network.</p>
        <p>Our underlying hypothesis is that the less frequently
a trajectory occurs, the greater the likelihood that
shapebased methods utilizing it as a discriminative
subsequence may not capture the intrinsic features of thkeey strategies employed to tackle this challenge.
trajectory but rather only its geographic position. In In the literature of trajectory analysis, a multitude of
essence, if a discriminative subtrajectory is intrinsicallystrategies exists for mapping trajectories onto a road
netlinked to a particular road network due to its distinwc-ork. For high-frequency sampled trajectories, the
simtive shape, it becomes unsuitable for geographic transp-lest approach involves associating each spatio-temporal
ferability. This limitation arises from the fact that tphoeint with the nearest street segme1n1t, [12]. However,
discriminatory aspect is not rooted in the movementtshese techniques, while fast and straightforward, have
exthemselves but rather in the structural characteristhicibsited inaccuracies, particularly at intersections and
parof the road network. Consequently, evaluating the geaol-lel roads. To address these limitations, enhancements
graphic transferability of discriminative subtrajectoriehsave been introduced, incorporating heading direction
necessitates a shape-based map-matching approach thator employing a Kalman filter to eliminate outlier points
exclusively relies on shapes without prior knowledge oifn trajectories1[3]. Alternatively, some approaches
leverthe position. Regrettably, to the best of our knowledaggee, probabilistic-based map-matching algorithms,
insuch an approach is currently unavailable. This under-tegrating hidden Markov models to identify the most
scores the need for innovative solutions in the realm olifkely sequence of road segments aligning with the
trashape-based map matching to comprehensively assess jectory [14, 15]. On the other hand, in the context of
the adaptability of discriminative subtrajectories acroslsow-sampled trajectories, much of the existing literature
diverse geographical contexts. presupposes that the most probable route connecting two</p>
        <p>To overcome this limitation, our paper introducesuccessive points is also the shortest or
faste1s6t].[HowGASM, an Geographic AutomatonShape-based map ever, in [17], is introduced a map-matching algorithm
Matching approach.GASM relies solely on the shape of athat exploits temporal intervals between GPS points. This
trajectory to accurately determine its position within tmhethod identifies the optimal match between two GPS
road network. Specifically, GASM employs a symbolic points by selecting the route with the most similar travel
representation to transform the road network, constructti m-e. Also, in 1[8] is proposed a method for map
matching a spatial index independent of coordinates. Thising low-sampled trajectories based on supplementary
unique approach allows for eficient trajectory searches information such as speed and moving direction,
typibased solely on their shapes. To the best of our knowlc-ally collected alongside spatial locations.
edge, GASM is the first proposal in the literature that Geographic transferability encapsulates the challenge
exclusively utilizes a discretized representation of a trajeocf-extending knowledge gleaned from one geographic
retory’s shape, devoid of any knowledge of the geographicgion to another. This entails constructing machine
learnregion, for shape-based map matching. Our experimenta- ing models capable of adeptly executing tasks in regions
tion withGASM on a novel comprehensive geographical distinct from their original training grounds. The core
dataset spanning over 5,000 km of roads in Tuscany, cen- of this challenge emerges from pronounced disparities
tral Italy, demonstrates its capability to identify corriencdtata distributions, patterns, and characteristics across
alignments with an impressive accuracy exceeding 90%. diverse geographical locations. As articulated 8in,9[],
Furthermore,GASM exhibits eficiency, as it can con- models trained on data from one region may encounter
struct the necessary representation for the entire datasdeitficulties in generalization when confronted with the
in less than 1.5 hours, maintaining a linear complexity atunique variations inherent in another region. In pursuit
query time. of a global model, a fundamental strategy is employed</p>
        <p>The paper is organized as follows. Section2 sum- as demonstrated in8[], where diverse data sources are
marises the related works concerning map-matchingaggregated on a global scale to build a transferable model.
methods and the challenges posed by geographic transC-onversely, in [9], city indicators are identified
pertainferability. In Section3, we encapsulate the technicaling to road networks, trafic flows, and individual
mobilconcepts essential for comprehending the algorithm deit-y, to facilitate the assessment of similarities between
gelineated in Section4. The outcomes of experiments con- ographical regions. Subsequently, an ensemble classifier
ducted withGASM are detailed in Section5. Finally, is devised, computing the output as a weighted average of
Section6 encapsulates our findings and delves into po- outputs generated by individual local classifiers. Notably,
tential avenues for future developments. the importance of local models is determined by their
higher similarity to the target regions compared to others
in the ensemble with respect to the city indicators.
Al2. Related Works ternate methodologies involve adapting a model initially
trained in a data-rich region and transplanting it to a
tarIn the following, we provide a concise overview of theget region characterized by limited data availability. This
literature concerning map matching methods and
eluci</p>
        <p>adaptation process entails integrating additional data to
date the geographic transferability problem, introducincgompute sub-region similarities, subsequently enabling
the remapping of the model19[, 20].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Setting</title>
      <p>matrix ∈ ℝ ×ℎ , obtained by taking thebest fitting of
each trajector y ∈</p>
      <p>, and each subtrajector y∈  .</p>
      <sec id="sec-3-1">
        <title>Movelet computes thebest fitting</title>
        <p>as the minimal
Euclidean Distance (ED) between in each subsequence of
elements. Subsequently, we delve into the transformOan- the other handG,eolet computes thebest fitting in
tion facilitated bGyeolet, a catalyst for the motivationthe same way, but geographically shifting to overlap
behind this work. Ultimately, we present a formal intreoa-ch subsequence of of length . In particular,Geolet
In this section, we articulate the fundamental concepltesngth = || of  , formally,
essential for comprehending our proposal. Initially, we
establish a shared language and framework by
introducing notation that serves as a basis for discussing key
bestfit Movelets( , ) =
∶+ , ) }</p>
        <p>(1)
−
min{(
=0
duction to the problem at hand.
of spatio-temporal point=s {⃗ x 0, … , x⃗ } ∈ ℝ×3</p>
        <p>Definition 1 (Trajectory.) A trajectory  is a sequence</p>
        <p>where
the spatial vectorx⃗s
  = (la t, long ) are sorted by
in</p>
        <p>creasing time  , i.e., ∀1 ≤  &lt;</p>
        <p>we have   &lt;  +1 .</p>
        <p>In a sense, trajectories can be viewed as multivariate
time series containing two signals, i.e., the latitude and
longitude, recorded at non-constant sampling rat5e,s [
extends the best fitting function oMf ovelet by adding a
pre-processing function,shift , that subtracts the value of
the first vector of the subsequence from all the others,
−
min{
=0
bestfit Geolet( , ) =</p>
        <p>(shift ( ∶+ ), shift () )} (2)
where shift ( ) = { ⃗ 0 −  ⃗0, … ,  ⃗ −  ⃗0}. The shift function
makes Geolet suitable for geographic transferability not
being tide to the territory.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Finally, in order to present map matching we define a</title>
        <p>21, 6]. In order to simplify notation, we will usienstead
of trajectories with a vector of labels attached. FormalDlye:finition 5
of   every time. A trajectory classification dataset is a setroad network as follows:
Definition 2 (Trajectory Dataset.)A trajectory dataset ⟨ , ⟩
 ∈ ℝ ××3 is a set of trajectories, = {
0 … ,   }.</p>
        <p>(Road Network.) A road network =
is a directed graph where = { 1, … ,   } is the set
of  road junction (or nodes), and = { 1, … ,   } is the
set of road segments (or edges), wher e = (  1,   2).</p>
        <p>For simplicity, we use a single symbo l to denote the
lengths of the trajectories, even if a dataset can contaWine underline that we rely on an enhanced road network
trajectories with a diferent number of observations. Sim-representation where, for each edge, we also have
acilarly, we emphasize that there is no constraint on thceess to the road segment geometries expressed as a
sesampling rate, i.e., we can have a non-constant samplequence of  latitude and longitude, formalslhyape() =
in the same trajectory. Furthermore, we define asubtra- { ⃗0, … ,  ⃗ } for some  ∈ ℰ . As for trajectories, for
simjectory as:
plicity of notation, we use a single symb otlo denote
the lengths of the points describing the geometry of the
Definition 3 (Subtrajectory.) Given a trajector y of road segment, even if, in real-case scenarios, the shape
length , a subtrajectory = {⃗  , … , ⃗+ } ⊂  , of length can be described using an arbitrary number of points.
 ≤  , is an ordered sequence of consecutive values such
that0 ≤  ≤  −  .</p>
      </sec>
      <sec id="sec-3-3">
        <title>We are now able to formalize thsehape-based map</title>
        <p>matching problem as follows:
As previously mentionedM,ovelet [6] and Geolet [7]</p>
        <p>Definition 6 (Shape-based Map Matching). Given a
are shapelet-inspired 1[0] trajectory approaches thatroad network = ⟨ , ⟩
and a trajectory ,
theshapeidentify discriminative subtrajectories for classificationbased map matching problem consists in finding the best
purposes. They both select the most discriminative subs-equence of edges  = { 1, … ,   } ⊆  such that does
trajectories w.r.t. the target label usingmtuhteual
information [22]. Like shapelet-based approachesM,ovelet
can be used to train any machine learning model23[].
and Geolet extract discriminative subtrajectories thabtestfit Geolet( , shape( )) .
not exist another sequence of edg es′ ⊂  diferent
from  , i.e.,  ′ ≠  , where bestfit Geolet( , shape( ′)) &lt;</p>
      </sec>
      <sec id="sec-3-4">
        <title>Indeed, once the most discriminative subtrajectories are</title>
        <p>identified, a trajectory dataset can be transformed into
a tabular representation capturing the distance between
trajectories and discriminative subtrajectories through
the subtrajectory transform function:</p>
      </sec>
      <sec id="sec-3-5">
        <title>In other words, the shape-based map matching problem</title>
        <p>involves determining the optimal alignment for a
(sub)trajectory within a designated road networ k, relying on
the configuration of the edges comprising the road
segments. It is essential to emphasize that this map matching
endeavor necessitates resolution without any reliance on
Definition 4 (Subtrajectory Transform.) Given a dataset GPS coordinates as the usage of thsehift operator
nor and a set containingℎ subtrajectories,
thseubtrajecmalizes the trajector y rendering state-of-the-art map
tory transform converts ∈ ℝ ××3
into a real-valued matching methods unsuitable for this particular task.</p>
        <p>Algorithm 1: build(, , Σ, ℎ)
Input :  - road segments, - resampling
distance, - max nbr. of symbols,
ℎ - h-hop aggregation</p>
        <p>Output : ℎ - Aho-Corasick automaton
1  ℎ ← aggregate(, ℎ) ; // aggregate trajectories
2  ← ∅ ;
3 for  ∈  ℎ do // for each road segment
4  ← resample(ℎ(), ) ; // resample traj.
5  ⃗← direction( ) ; // get traj. direction
6  ← SAX( ,⃗) ; // discretize traj.
7  ←  ∪ { } ; // add to dict.
8 return Aho-Corasick( ) ;</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Shape-based Map Matching</title>
      <p>To tackle the shape-based map-matching problem, our
aim is to design a map-matching method with the
capability to accurately deduce the original GPS coordinates ofAlgorithm 2: ℎ( , , , Σ)
a trajectory within a designated road network. Crucially,
this precision is sought exclusively through an examina- Input :  - query traj., - Aho-Corasick
tion of the trajectory’s shape and the configurations of automaton, - road segments,
the edges within the road network, entirely independent resampling dist., - max nbr. of symbols
of any reliance on GPS coordinates. Output :  ∗,  - best methc and best candidates</p>
      <p>A brute-force approach to address the problem in-1  ′ ← resample( , ) ; // resample traj.
volves map matching all conceivable alignments of 2  ⃗′ ← direction() ; // get traj. direction
within every segmen t of the road networ k , employ- 3  ← SAX( ⃗′, ) ; // discretize using SAX
ing thebestfit Geolet function. However, this naive strategy4  ← search(, , ) ; // get best candidates
is only viable for small road networks due to the alg5o- ∗ ← arg min ∈ bestfit Geolet( ,  ′);
rithmic complexity being(||( − )) , where  and
 represent the number of points characterizing the tra6- return  ; // return the best match
jectory and the number of points describing each road
segments in , respectivel1y. This limitation also extends
to other map-matching algorithms that rely on latitufinditee sequence of symbols over an alphabetΣ.
Subseand longitude coordinates to confine the matching scopequently, it builds a finite-state automaton based on the
to the nearest roads. sequences in  within a given finite symbol sequence
de</p>
      <p>We overcome this limitation by proposinGgASM a fined over the alphabetΣ. Consequently, the automaton,
Geographic AutomatonShape-based map Matching ap- constructed using the dictiona ry , identifies a subset
proach that is able to significantly reduce the number ′ ⊂  wherein each sequence in ′ is contained in
of road segment alignments to test with the brute forc.eFigure 1 provides an illustration of the automaton
cremethod. In essence,GASM comprises two key steps. Ini- ated by the Aho-Corasick algorithm, using the sequences
tially, leveraging the Aho-Corasick algorith2m4][,GASM  = {, , , } over the alphabeΣt = {, , , } .
constructs a shape-based index for all road segments inThe Aho-Corasick algorithm initiates by constructing
 , portraying it as a geographic finite state automatona. sufix trie [ 25], depicted in black in Figure1.
SubseSubsequently,GASM facilitates querying the automatonquently, it designates all leaves of the trie as final states
to pinpoint a set of candidate partial matches betweeonf the automaton and introduces edges to complete the
 and {shape( )| ∀  ⊆ } . Further elucidation of theseautomaton. Two types of edges are incorporated,
contwo steps is provided in the subsequent sections. necting their respective sufixes: sufix edges , depicted</p>
      <p>Geographic Automaton Construction. GASM in blue, are utilized in the case of a mismatch, without
leverages the Aho-Corasick algorithm to constructgauaranteeing that the sufix is also a sequence in the
dicgeographic automaton, serving as a spatial index fortionary. In contrastd,ictionary-sufix edges , portrayed in
expedited query processing 2[4]. The Aho-Corasick green, guarantee that the sufix is a sequence present in
algorithm, renowned for string searching, takes a setthe dictionary. These operations unfold linearly
concern = { 1, … ,   } as input, where each  represents a ing the total number of symbols in the input dictio nar.y
The automaton enables the search for all sequences
contained in a query by traversing the automaton, achievable
1For each  ⊆  , we compute the Euclidean Distance (linear com-in linear time relative to the query’s length.
plexity), for all the possib l−e alignments ofshape( ) in  .</p>
      <p>Algorithm1 delineates the procedural steps requisite
of GASM for constructing the Aho-Corasick automaton.</p>
      <p>The algorithm accepts, as input, the road segme ntosf
the road networ k , the resampling distance, the
maximum allowed number of symbol s, and the number of
hops ℎ, producing a geographical automaton as
output. TheGASM-build algorithm begins by aggregating
the road network, concatenatinℎgtimes a road segment
to linked road segments inℎ to extend the length of
existing segments and enhance their representativeness
(line 1). Subsequently, it initializes an empty dictionary
 (line 2). The following steps are applied for each road
segment in ℎ (denoted as ). Given that the shape of a
road segment may be described by varying numbers of
points based on its length and sinuositGyA,SM initially
resamples the geometries into a series of evenly spaced
points . This ensures that the symbolic representation’Fsigure 2: Summary of GASM, depicting geographic
automalength, crucial for Aho-Corasick automaton constructiotno,n construction (left) and shape-based matching (right).
is proportional solely to the road length. To fulfill the
prerequisite of representing each road segmen tin a
discretized space, a sequence of symbols is generated (line
5). Subsequently,GASM determines the heading
direction ⃗ between consecutive points along the resampled
road segment, transforming the shape of each road se-Table 1

|Σ|
ℎ
quence  into a univariate time series of direction⃗swith
a consistent length-based sampling ra te(line 6). This
Hyperparameter GASM
Resampling distance (m)
Alphabet size
h-hop aggregation</p>
      <p>Values
mation (SAX) [26] to obtain a symbolic representationthe geographic automaton construction phase is depicted,
representations are added to the diction ary. Finally, is indexed according to its heading direction. On the
the dictionary of discretized representations of the roardight side, the shape-based matching phase is illustrated.
segments is employed to construct the Aho-Corasick auH-ere, given a trajectory with known shape but unknown
tomaton, which is then returned as the output (line 8)o.rigin point,GASM computes the set of potential partial
Shape-based Matching. Once the construction of themap matches. Subsequently, it selects the match that
wherein each road within an arbitrary large road network
geographic automaton is completGe,ASM can execute
shape-based map matching over the automaton following
the steps outlined in Algorith2m.GASM-search takes as
input the query trajecto ry, the geographical automaton
sequence of edges ⊆</p>
      <p>that minimizes thebestfit
function, as per the ensuing procedure. The initial three
steps of Algorithm2, aligning with Algorithm1, involve</p>
      <p>Geolet
 , the road segments , the resampling distance, and
the maximum allowed number of symbols. It yields the In this section, we evaluate the efectiveness GofASM2.
resampling the query trajecto ry, extracting its direc- result of a sensitivity analysis w.r.t. some data properties.
tion, and transforming it into a symbolic representation Experimental Setting. Regrettably, only a handful
 . Indeed, the same preprocessing applied to the road seg-of mobility datasets, such as GeoLife and Porto T3a,xaire
ments is applied to the query trajectories. Subsequentlayv,ailable as open access1[, 7]. However, these datasets
the geographic automaton is utilized to perform a lin-possess limited geographic coverage, rendering them
First, we present the experimental setting, then we report
and discuss the best performance achieved. Finally, we
illustrate details of the hyperparameter tuning and the
minimizes thebestfit Geolet function.</p>
      <p>5. Experiments
best match candidates= {</p>
      <p>1, … ,   }. From this set, the 2Python code: https://t.ly/wVl X.SWe ran our experiments on a
ear search for the best matche s among all possibilities
ofered by  (line 4). This implementation enablGesASM
to identify an “initial best match”, presenting a set of
ifnal selection of the optimal alignment∗ is determined
through a naive approach (line 5).</p>
      <p>unsuitable for our study. Thus, we introduce a novel
high-sample rate dataset derived from the publicly
acces2xIntel Xeon Gold 6342 24-core CPU, limiting each test to use at
most 12 cores.</p>
      <sec id="sec-4-1">
        <title>3GeoLife: https://t.ly/6VJ-.EPorto:https://t.ly/0GMR. 9</title>
        <p>Province</p>
        <p>Arezzo</p>
        <p>Firenze
Grosseto
Livorno</p>
        <p>Lucca
M. Carrara</p>
        <p>Pisa
Pistoia
Prato
Siena
sible 2013 GPS traces on OpenStreetM4a.pAlthough the the metric ofaccuracy.On the other hand, the evaluation
initial OpenStreetMap dataset encompasses GPS trajectoof- the second question relies on the metricseolfectivity,
ries spanning the entire globe, our analysis concentratecsommonly employed in database literatu2re7][.
Selecon the ten provinces in Tuscany, a region encompass-tivity measures the reduction of potential alignments
ing 22, 985 2 in central Italy. The Mappymatch pythonbetween a query result and the entire dataset. In our
conpackage5 was employed to map-match each trajectoryt,ext, selectivity is defined as the ratio of matched road
retaining only those trajectories with an average error osfegments (| | ) to the total number of road segmen|t|s)(.
less than10 . The final dataset encompasses 358 distinct For accuracy, higher values indicate better results, while
trajectories, covering a total travel distanc5e, 0o2f4 for selectivity, lower values indicate better outcomes.
and described by 300, 049 GPS points. Additional infor- GASM Performance. Table 5 presents the
performation on the types of roads traversed in each provincemance metrics ofGASM across individual provinces. To
in Tuscany, as per the OpenStreetMap taxono m6,yis determine the optimal hyperparameters, a grid search
presented in Table6. was conducted over the values outlined in
Tab6,lespecif</p>
        <p>Within the framework of our shape-based map- ically for the province of Grosseto. This process yielded
matching formulation, we aim to address the followintghe following hyperparameters: a resampling distance
questions. First, to what extent cGanASM infer the of  = 10 meters, an alphabet size o f= 8 symbols, and
original GPS coordinates without utilizing them for maap street aggregation oℎf = 2 hops. GASM showcases
matching? Second, how efectively canGASM reduce the an impressive ability to deduce the original GPS
coornumber of potential alignments compared to the entidreinates, achieving an average light accuracy o9f0.1%.
road network? The first question is evaluated throughFurthermore, it significantly narrows down the potential
alignments, as indicated by the selectivity factor,
reduc54OMpaepnpSytmreatetchM: haptt2p0s1:3//ptu.lbyl/iRcHGaPfSS tracesh:ttps://t.ly/q7u2N ing it to just10.8% of the original road network. These
6Highway taxonomy:https://t.ly/NpxZv commendable results are attained while maintaining a
reasonable automaton construction time of 7.8 minutecsriminative nature of trajectories based on the type of
per province, resulting in a total indexing time of a merreoad. Our hypothesis posits that straight streets, such
1.29 hours for the entire region. as motorways, exhibit lower discriminative
characteris</p>
        <p>Hyperparameters Tuning. In this section, we tics. Consequently, trajectories observed on such roads
present the results of experiments conducted on thaere more likely to avoid re-identification, suggesting
enprovince of Grosseto while varying the hyperparame-hanced geographic transferability. To investigate this, we
ters detailed in Tabl6e. Initially, we compute the Pearsonidentify the type of road traveled within each segment.
correlation between the method’s hyperparameters anIdn cases where multiple types of roads are encountered,
two key performance metrics: the selectivity factor anwde perform a majority vote weighted by road length.
Adaccuracy. Figure 3 visually depicts the changes in perfor-ditionally, to examine the influence of changes in the
mance metrics, emphasizing variations in the top threeinput data, we create random subtrajectories of varying
most influential hyperparameters—those exhibiting thelengths, including 100m, 150m, 500m, 1km, 1.5km, 5km,
highest absolute values of Pearson correlation. Notablayn,d 10km, derived from our OpenStreetMap dataset. In
the most influential hyperparameter is the number oforder to assess our hypothesis, we evaluate
tshtreaightroad segment aggregationsℎ(), demonstrating a corre- ness [4] of each subtrajectory by calculating the ratio
lation of−0.52 with the selectivity. Thus, increasinℎg between the shortest path from the origin to the
destinaproves beneficial for GASM as it helps select fewer can-tion and the actual trajectory.
didate road segments without significantly impacting Figure 4 encapsulates these results. The initial plot on
accuracy. The alphabet size () displays correlations of the left highlights a notable trend: an increase in
subtra−0.44 and 0.40 with respect to the selectivity and accuje-ctory length correlates with a rapid elevation in both
racy, respectively. This hyperparameter introduces aaccuracy and selectivity. In simpler terms, as the
subtratrade-of, as increasing the number of symbols reduces jectory length extends, the model’s precision improves.
selectivity but may lead to a slight decrease in accuracTyh.e central plot reveals that trajectory straightness has
Finally, the resampling distance)(exhibits a correlation a negligible impact on the number of candidate matches.
of 0.22 with accuracy. Interestingly, decreasin gslightly However, as trajectories become more linear, the
accuenhances accuracy according to our observations. racy experiences a decline. Finally, the rightmost plot
Sensitivity Analysis. We delve here into the varia-illustrates the method’s performance across various road
types. This plot validates the findings of the straightness
tions in performance with respect to the length of thpleot: roads with greater straightness, like motorways,
query trajectory . Additionally, we explore the
dis</p>
        <p>pose the greatest challenge for re-identification.
Conversely, more sinuous roads present a slightly higher [7] C. Landi, et al., Geolet: An interpretable model for
dificulty in re-identification, reflected in a higher selec- trajectory classification, in: IDA, volume 13876 of
tivity but with a concomitant boost in accuracy. LNCS, Springer, 2023, pp. 236–248.</p>
        <p>[8] F. Veronesi, et al., Assessing accuracy and
geograph6. Conclusion ical transferability of machine learning algorithms
for wind speed modelling, in: AGILE, LNGC, 2017.</p>
        <p>In this paper we have introducedGASM, a map matching [9] M. Nanni, et al., City indicators for geographical
method capable of determining a trajectory’s position transfer learning: an application to crash prediction,
solely based on its shape. Our experiments showcase GeoInformatica 26 (2022) 581–612.
thatGASM significantly reduces the number of poten- [10] J. Lines, et al., A shapelet transform for time series
tial alignments and deduces the original GPS coordinates classification, in: ACM SIGKDD, 2012, pp. 289–297.
with remarkable accuracy. Further analysis reveals that[11] C. White, et al., Some map matching algorithms for
longer and less linear trajectories are more straightfor- personal navigation assistants, TRC 8 (2000).
ward to map match. However, this observation raises[12] D. Bernstein, et al., An introduction to map
matchconcerns about the potential for shape-based methods ing for personal navigation assistants (1996).
to inadvertently learn geographic positions instead o[1f3] M. A. Quddus, et al., A general map matching
alfocusing on other intrinsic features. As a part of future gorithm for transport telematics applications, GPS
work, as outlined at the beginning, we aim to assess the solutions 7 (2003) 157–167.
geographic transferability of shape-based methods, such[14] Y. Liu, Z. Li, A novel algorithm of low sampling
as Geolet, by incorporatingGASM. Specifically, we pro- rate GPS trajectories on map-matching, EURASIP
pose giving more weight to the selection of discriminative 2017 (2017) 30.
subsequences with higher selectivity rather than basing[15] X. Liu, et al., A st-crf map-matching method for
the decision solely on a statistical test. low-frequency floating car data, TITS 18 (2016)
1241–1254.</p>
        <p>Acknowledgments [16] L. Jiang, et al., From driving trajectories to driving
paths: a survey on map-matching algorithms, CCF
This work is partially supported by the EU NextGener- TPCI 4 (2022) 252–267.
ationEU programme under the funding schemes PNRR- [17] P. Cintia, M. Nanni, An efective time-aware map
“SoBigData.it - Strengthening the Italian RI for Social matching process for low sampling gps data, arXiv
Mining and Big Data Analytics” - Prot. IR0000013, H2020- preprint arXiv:1603.07376 (2016).
INFRAIA-2019-1: Res. Infr. G.A. 871042 SoBigData++, [18] G. Hu, et al., If-matching: Towards accurate
mapand GreenDatAI G.A. 101070416. matching with information fusion, TKDE 29 (2016)
114–127.</p>
        <p>References [19] L. Wang, et al., Smart city development with urban
transfer learning, Computer 51 (2018) 32–41.
[1] C. L. da Silva, et al., A survey and comparison of [20] L. Wang, et al., Cross-city transfer learning for
trajectory classification methods, in: BRACIS, IEEE, deep spatio-temporal prediction, in: IJCAI, ijcai.org,
2019, pp. 788–793. 2019, pp. 1893–1899.
[2] A. Bolbol, et al., Inferring hybrid transportatio[n21] R. Trasarti, et al., Myway: Location prediction via
modes from sparse GPS data using a moving win- mobility profiling, Inf. Syst. 64 (2017) 350–367.
dow SVM classification, CEUS 36 (2012) 526–537. [22] P.-N. Tan, M. Steinbach, V. Kumar, Introduction to
[3] S. Dodge, et al., Revealing the physics of movement: data mining, Pearson Education India, 2016.</p>
        <p>Comparing the similarity of movement characteris[-23] A. J. Bagnall, et al., The great time series
classificatics, CEUS 33 (2009) 419–434. tion bake of, CoRR abs/1602.01711 (2016).
[4] P. J. Almeida, et al., Indices of movement behaviour:[24] A. V. Aho, et al., Eficient string matching: An aid
conceptual background, efects of scale and location to bibliographic search, ACM 18 (1975) 333–340.
errors, Zoologia (Curitiba) 27 (2010) 674–680. [25] E. M. McCreight, A space-economical sufix tree
[5] I. Kontopoulos, et al., Traclets: Harnessing the construction algorithm, JACM 23 (1976) 262–272.
power of computer vision for trajectory classifica- [26] J. Lin, et al., A symbolic representation of time
tion, arXiv:2205.13880 (2022). series, with implications for streaming algorithms,
[6] C. A. Ferrero, et al., MOVELETS: exploring relevant in: DMKD, ACM, 2003, pp. 2–11.</p>
        <p>subtrajectories for robust trajectory classification[,27] S. Acharya, et al., Selectivity estimation in spatial
in: SAC, ACM, 2018, pp. 849–856. databases, in: SIGMOD, ACM, 1999, pp. 13–24.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>