-

1613-0073

A Shape-Based Map Matching Approach for Geographic Transferability of Discriminative Subtrajectories

CristianoLandi

0 1

Riccardo Guidott

riccardo.guidotti@unipi.it 1

Map Matching, Geographic Transferability, Machine Learning, Discriminative Subtrajectories

0 ISTI-CNR , Pisa , Italy 1 University of Pisa , Pisa , Italy

2024

25 28

This paper addresses the challenge of map matching and geographic transferability in trajectory analysis. Existing methods often face limitations tied to specific coordinates or road networks. In response, we propoGseASM, a shape-based map matching method that relies solely on trajectory shapes, irrespective of geographic oGriAgSinM. introduces a symbolic road network representation, facilitating eficient searches based solely on trajectory shapes. Our experimentation, spanning over 5,000 km of roads, demonstrateGsASM's ability to accurately position trajectories with an impressive accuracy exceeding 90%. Notably,GASM stands as the first in the literature to perform shape-based symbolic map matching without prior knowledge of the geographic region.

of Movelet by providing normalized subtrajectories

CEUR

ceur-ws.org fectively applying them in another regio8n, 9[].

1. Introduction Points Of Interest (POI), feature engineerin1g,[2, 3, 4], or

In recent years, the widespread adoption of cutting-edge Particularly noteworthy are recent advancements in technologies equipped with Global Positioning Systemmachine learning leveraging shapelet-based subtrajec(GPS) devices has enabled the recording of positions fortories [5, 6, 7]. Originating from the domain of time various moving objects, ranging from cars and transporta-series analytics, shapelets represent discriminatory subtion vehicles to phones and wearables. Unfortunatelsye,quences that encapsulate a collection of distinctive the coordinates captured by these sensors often fail to ac-shapes, crucial for discerning specific classes [10]. Varicurately reflect real positions due to physical constraintosus approaches exist for defining discriminative subtraand/or legal regulations. Nevertheless, in various applicaje-ctories. In [6], theMovelet method is introduced—an tions, it is imperative to accurately align GPS trajectoriaepsproach for extracting discriminative subtrajectories with a road network. For instance, in navigation ser-selected through a rigorous statistical test. During the vices, map matching empowers drivers to monitor theirdiscovery phase,Movelet generates candidate subtrajecexact locations and receive optimal routes to specifiedtories by extracting all possible subsequences with more destinations. Conversely, in machine learning tasks, mapthan two contiguous observations, utilizing a sliding winmatching enhances users’ mobility information by in-dow. Building upon the foundation laid bMy ovelet, corporating knowledge related to the territory, such Gaseolet is introduced in [7]. This extension incorporates a normalization step after the discovery phase. The northe identification of discriminatory subsequences, such malization step is designed to ease the comparison of as mobility shapelets5,[6, 7]. Without an appropriate discriminative subsequences with trajectories recorded map-matching procedure, reliance on an expert becomes in diverse geographical regions. The underlying rationecessary to determine which features can be extractednale is that a subtrajectory pinpointing a sudden break from trajectories concerning the territory. However, thien a road segment in one city should exhibit similarities reliance on ad-hoc features restricts the applicability otfo a subtrajectory associated with the same event in anmachine learning methods and amplifies sensitivity to other city. This normalization enhances the method’s input changes [1], rendering it unsuitable for geographic adaptability across various geographic contexts. transferability. This implies the challenge of extracting While Geolet successfully addresses the limitation Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 thereby enhancing geographic transferability indepennEvelop-O (R. Guidotti) ∗Corresponding author. dent of specific GPS coordinates, it introduces a potential vulnerability tied to the road network.

Our underlying hypothesis is that the less frequently a trajectory occurs, the greater the likelihood that shapebased methods utilizing it as a discriminative subsequence may not capture the intrinsic features of thkeey strategies employed to tackle this challenge. trajectory but rather only its geographic position. In In the literature of trajectory analysis, a multitude of essence, if a discriminative subtrajectory is intrinsicallystrategies exists for mapping trajectories onto a road netlinked to a particular road network due to its distinwc-ork. For high-frequency sampled trajectories, the simtive shape, it becomes unsuitable for geographic transp-lest approach involves associating each spatio-temporal ferability. This limitation arises from the fact that tphoeint with the nearest street segme1n1t, [12]. However, discriminatory aspect is not rooted in the movementtshese techniques, while fast and straightforward, have exthemselves but rather in the structural characteristhicibsited inaccuracies, particularly at intersections and parof the road network. Consequently, evaluating the geaol-lel roads. To address these limitations, enhancements graphic transferability of discriminative subtrajectoriehsave been introduced, incorporating heading direction necessitates a shape-based map-matching approach thator employing a Kalman filter to eliminate outlier points exclusively relies on shapes without prior knowledge oifn trajectories1[3]. Alternatively, some approaches leverthe position. Regrettably, to the best of our knowledaggee, probabilistic-based map-matching algorithms, insuch an approach is currently unavailable. This under-tegrating hidden Markov models to identify the most scores the need for innovative solutions in the realm olifkely sequence of road segments aligning with the trashape-based map matching to comprehensively assess jectory [14, 15]. On the other hand, in the context of the adaptability of discriminative subtrajectories acroslsow-sampled trajectories, much of the existing literature diverse geographical contexts. presupposes that the most probable route connecting two

To overcome this limitation, our paper introducesuccessive points is also the shortest or faste1s6t].[HowGASM, an Geographic AutomatonShape-based map ever, in [17], is introduced a map-matching algorithm Matching approach.GASM relies solely on the shape of athat exploits temporal intervals between GPS points. This trajectory to accurately determine its position within tmhethod identifies the optimal match between two GPS road network. Specifically, GASM employs a symbolic points by selecting the route with the most similar travel representation to transform the road network, constructti m-e. Also, in 1[8] is proposed a method for map matching a spatial index independent of coordinates. Thising low-sampled trajectories based on supplementary unique approach allows for eficient trajectory searches information such as speed and moving direction, typibased solely on their shapes. To the best of our knowlc-ally collected alongside spatial locations. edge, GASM is the first proposal in the literature that Geographic transferability encapsulates the challenge exclusively utilizes a discretized representation of a trajeocf-extending knowledge gleaned from one geographic retory’s shape, devoid of any knowledge of the geographicgion to another. This entails constructing machine learnregion, for shape-based map matching. Our experimenta- ing models capable of adeptly executing tasks in regions tion withGASM on a novel comprehensive geographical distinct from their original training grounds. The core dataset spanning over 5,000 km of roads in Tuscany, cen- of this challenge emerges from pronounced disparities tral Italy, demonstrates its capability to identify corriencdtata distributions, patterns, and characteristics across alignments with an impressive accuracy exceeding 90%. diverse geographical locations. As articulated 8in,9[], Furthermore,GASM exhibits eficiency, as it can con- models trained on data from one region may encounter struct the necessary representation for the entire datasdeitficulties in generalization when confronted with the in less than 1.5 hours, maintaining a linear complexity atunique variations inherent in another region. In pursuit query time. of a global model, a fundamental strategy is employed

The paper is organized as follows. Section2 sum- as demonstrated in8[], where diverse data sources are marises the related works concerning map-matchingaggregated on a global scale to build a transferable model. methods and the challenges posed by geographic transC-onversely, in [9], city indicators are identified pertainferability. In Section3, we encapsulate the technicaling to road networks, trafic flows, and individual mobilconcepts essential for comprehending the algorithm deit-y, to facilitate the assessment of similarities between gelineated in Section4. The outcomes of experiments con- ographical regions. Subsequently, an ensemble classifier ducted withGASM are detailed in Section5. Finally, is devised, computing the output as a weighted average of Section6 encapsulates our findings and delves into po- outputs generated by individual local classifiers. Notably, tential avenues for future developments. the importance of local models is determined by their higher similarity to the target regions compared to others in the ensemble with respect to the city indicators. Al2. Related Works ternate methodologies involve adapting a model initially trained in a data-rich region and transplanting it to a tarIn the following, we provide a concise overview of theget region characterized by limited data availability. This literature concerning map matching methods and eluci

adaptation process entails integrating additional data to date the geographic transferability problem, introducincgompute sub-region similarities, subsequently enabling the remapping of the model19[, 20].

3. Problem Setting

matrix ∈ ℝ ×ℎ , obtained by taking thebest fitting of each trajector y ∈

, and each subtrajector y∈ .

Movelet computes thebest fitting

as the minimal Euclidean Distance (ED) between in each subsequence of elements. Subsequently, we delve into the transformOan- the other handG,eolet computes thebest fitting in tion facilitated bGyeolet, a catalyst for the motivationthe same way, but geographically shifting to overlap behind this work. Ultimately, we present a formal intreoa-ch subsequence of of length . In particular,Geolet In this section, we articulate the fundamental concepltesngth = || of , formally, essential for comprehending our proposal. Initially, we establish a shared language and framework by introducing notation that serves as a basis for discussing key bestfit Movelets( , ) = ∶+ , ) }

(1) − min{( =0 duction to the problem at hand. of spatio-temporal point=s {⃗ x 0, … , x⃗ } ∈ ℝ×3

Definition 1 (Trajectory.) A trajectory is a sequence

where the spatial vectorx⃗s = (la t, long ) are sorted by in

creasing time , i.e., ∀1 ≤ <

we have < +1 .

In a sense, trajectories can be viewed as multivariate time series containing two signals, i.e., the latitude and longitude, recorded at non-constant sampling rat5e,s [ extends the best fitting function oMf ovelet by adding a pre-processing function,shift , that subtracts the value of the first vector of the subsequence from all the others, − min{ =0 bestfit Geolet( , ) =

(shift ( ∶+ ), shift () )} (2) where shift ( ) = { ⃗ 0 − ⃗0, … , ⃗ − ⃗0}. The shift function makes Geolet suitable for geographic transferability not being tide to the territory.

Finally, in order to present map matching we define a

21, 6]. In order to simplify notation, we will usienstead of trajectories with a vector of labels attached. FormalDlye:finition 5 of every time. A trajectory classification dataset is a setroad network as follows: Definition 2 (Trajectory Dataset.)A trajectory dataset ⟨ , ⟩ ∈ ℝ ××3 is a set of trajectories, = { 0 … , }.

(Road Network.) A road network = is a directed graph where = { 1, … , } is the set of road junction (or nodes), and = { 1, … , } is the set of road segments (or edges), wher e = ( 1, 2).

For simplicity, we use a single symbo l to denote the lengths of the trajectories, even if a dataset can contaWine underline that we rely on an enhanced road network trajectories with a diferent number of observations. Sim-representation where, for each edge, we also have acilarly, we emphasize that there is no constraint on thceess to the road segment geometries expressed as a sesampling rate, i.e., we can have a non-constant samplequence of latitude and longitude, formalslhyape() = in the same trajectory. Furthermore, we define asubtra- { ⃗0, … , ⃗ } for some ∈ ℰ . As for trajectories, for simjectory as: plicity of notation, we use a single symb otlo denote the lengths of the points describing the geometry of the Definition 3 (Subtrajectory.) Given a trajector y of road segment, even if, in real-case scenarios, the shape length , a subtrajectory = {⃗ , … , ⃗+ } ⊂ , of length can be described using an arbitrary number of points. ≤ , is an ordered sequence of consecutive values such that0 ≤ ≤ − .

We are now able to formalize thsehape-based map

matching problem as follows: As previously mentionedM,ovelet [6] and Geolet [7]

Definition 6 (Shape-based Map Matching). Given a are shapelet-inspired 1[0] trajectory approaches thatroad network = ⟨ , ⟩ and a trajectory , theshapeidentify discriminative subtrajectories for classificationbased map matching problem consists in finding the best purposes. They both select the most discriminative subs-equence of edges = { 1, … , } ⊆ such that does trajectories w.r.t. the target label usingmtuhteual information [22]. Like shapelet-based approachesM,ovelet can be used to train any machine learning model23[]. and Geolet extract discriminative subtrajectories thabtestfit Geolet( , shape( )) . not exist another sequence of edg es′ ⊂ diferent from , i.e., ′ ≠ , where bestfit Geolet( , shape( ′)) <

Indeed, once the most discriminative subtrajectories are

identified, a trajectory dataset can be transformed into a tabular representation capturing the distance between trajectories and discriminative subtrajectories through the subtrajectory transform function:

In other words, the shape-based map matching problem

involves determining the optimal alignment for a (sub)trajectory within a designated road networ k, relying on the configuration of the edges comprising the road segments. It is essential to emphasize that this map matching endeavor necessitates resolution without any reliance on Definition 4 (Subtrajectory Transform.) Given a dataset GPS coordinates as the usage of thsehift operator nor and a set containingℎ subtrajectories, thseubtrajecmalizes the trajector y rendering state-of-the-art map tory transform converts ∈ ℝ ××3 into a real-valued matching methods unsuitable for this particular task.

Algorithm 1: build(, , Σ, ℎ) Input : - road segments, - resampling distance, - max nbr. of symbols, ℎ - h-hop aggregation

Output : ℎ - Aho-Corasick automaton 1 ℎ ← aggregate(, ℎ) ; // aggregate trajectories 2 ← ∅ ; 3 for ∈ ℎ do // for each road segment 4 ← resample(ℎ(), ) ; // resample traj. 5 ⃗← direction( ) ; // get traj. direction 6 ← SAX( ,⃗) ; // discretize traj. 7 ← ∪ { } ; // add to dict. 8 return Aho-Corasick( ) ;

4. Shape-based Map Matching

To tackle the shape-based map-matching problem, our aim is to design a map-matching method with the capability to accurately deduce the original GPS coordinates ofAlgorithm 2: ℎ( , , , Σ) a trajectory within a designated road network. Crucially, this precision is sought exclusively through an examina- Input : - query traj., - Aho-Corasick tion of the trajectory’s shape and the configurations of automaton, - road segments, the edges within the road network, entirely independent resampling dist., - max nbr. of symbols of any reliance on GPS coordinates. Output : ∗, - best methc and best candidates

A brute-force approach to address the problem in-1 ′ ← resample( , ) ; // resample traj. volves map matching all conceivable alignments of 2 ⃗′ ← direction() ; // get traj. direction within every segmen t of the road networ k , employ- 3 ← SAX( ⃗′, ) ; // discretize using SAX ing thebestfit Geolet function. However, this naive strategy4 ← search(, , ) ; // get best candidates is only viable for small road networks due to the alg5o- ∗ ← arg min ∈ bestfit Geolet( , ′); rithmic complexity being(||( − )) , where and represent the number of points characterizing the tra6- return ; // return the best match jectory and the number of points describing each road segments in , respectivel1y. This limitation also extends to other map-matching algorithms that rely on latitufinditee sequence of symbols over an alphabetΣ. Subseand longitude coordinates to confine the matching scopequently, it builds a finite-state automaton based on the to the nearest roads. sequences in within a given finite symbol sequence de

We overcome this limitation by proposinGgASM a fined over the alphabetΣ. Consequently, the automaton, Geographic AutomatonShape-based map Matching ap- constructed using the dictiona ry , identifies a subset proach that is able to significantly reduce the number ′ ⊂ wherein each sequence in ′ is contained in of road segment alignments to test with the brute forc.eFigure 1 provides an illustration of the automaton cremethod. In essence,GASM comprises two key steps. Ini- ated by the Aho-Corasick algorithm, using the sequences tially, leveraging the Aho-Corasick algorith2m4][,GASM = {, , , } over the alphabeΣt = {, , , } . constructs a shape-based index for all road segments inThe Aho-Corasick algorithm initiates by constructing , portraying it as a geographic finite state automatona. sufix trie [ 25], depicted in black in Figure1. SubseSubsequently,GASM facilitates querying the automatonquently, it designates all leaves of the trie as final states to pinpoint a set of candidate partial matches betweeonf the automaton and introduces edges to complete the and {shape( )| ∀ ⊆ } . Further elucidation of theseautomaton. Two types of edges are incorporated, contwo steps is provided in the subsequent sections. necting their respective sufixes: sufix edges , depicted

Geographic Automaton Construction. GASM in blue, are utilized in the case of a mismatch, without leverages the Aho-Corasick algorithm to constructgauaranteeing that the sufix is also a sequence in the dicgeographic automaton, serving as a spatial index fortionary. In contrastd,ictionary-sufix edges , portrayed in expedited query processing 2[4]. The Aho-Corasick green, guarantee that the sufix is a sequence present in algorithm, renowned for string searching, takes a setthe dictionary. These operations unfold linearly concern = { 1, … , } as input, where each represents a ing the total number of symbols in the input dictio nar.y The automaton enables the search for all sequences contained in a query by traversing the automaton, achievable 1For each ⊆ , we compute the Euclidean Distance (linear com-in linear time relative to the query’s length. plexity), for all the possib l−e alignments ofshape( ) in .

Algorithm1 delineates the procedural steps requisite of GASM for constructing the Aho-Corasick automaton.

The algorithm accepts, as input, the road segme ntosf the road networ k , the resampling distance, the maximum allowed number of symbol s, and the number of hops ℎ, producing a geographical automaton as output. TheGASM-build algorithm begins by aggregating the road network, concatenatinℎgtimes a road segment to linked road segments inℎ to extend the length of existing segments and enhance their representativeness (line 1). Subsequently, it initializes an empty dictionary (line 2). The following steps are applied for each road segment in ℎ (denoted as ). Given that the shape of a road segment may be described by varying numbers of points based on its length and sinuositGyA,SM initially resamples the geometries into a series of evenly spaced points . This ensures that the symbolic representation’Fsigure 2: Summary of GASM, depicting geographic automalength, crucial for Aho-Corasick automaton constructiotno,n construction (left) and shape-based matching (right). is proportional solely to the road length. To fulfill the prerequisite of representing each road segmen tin a discretized space, a sequence of symbols is generated (line 5). Subsequently,GASM determines the heading direction ⃗ between consecutive points along the resampled road segment, transforming the shape of each road se-Table 1 |Σ| ℎ quence into a univariate time series of direction⃗swith a consistent length-based sampling ra te(line 6). This Hyperparameter GASM Resampling distance (m) Alphabet size h-hop aggregation

Values mation (SAX) [26] to obtain a symbolic representationthe geographic automaton construction phase is depicted, representations are added to the diction ary. Finally, is indexed according to its heading direction. On the the dictionary of discretized representations of the roardight side, the shape-based matching phase is illustrated. segments is employed to construct the Aho-Corasick auH-ere, given a trajectory with known shape but unknown tomaton, which is then returned as the output (line 8)o.rigin point,GASM computes the set of potential partial Shape-based Matching. Once the construction of themap matches. Subsequently, it selects the match that wherein each road within an arbitrary large road network geographic automaton is completGe,ASM can execute shape-based map matching over the automaton following the steps outlined in Algorith2m.GASM-search takes as input the query trajecto ry, the geographical automaton sequence of edges ⊆

that minimizes thebestfit function, as per the ensuing procedure. The initial three steps of Algorithm2, aligning with Algorithm1, involve

Geolet , the road segments , the resampling distance, and the maximum allowed number of symbols. It yields the In this section, we evaluate the efectiveness GofASM2. resampling the query trajecto ry, extracting its direc- result of a sensitivity analysis w.r.t. some data properties. tion, and transforming it into a symbolic representation Experimental Setting. Regrettably, only a handful . Indeed, the same preprocessing applied to the road seg-of mobility datasets, such as GeoLife and Porto T3a,xaire ments is applied to the query trajectories. Subsequentlayv,ailable as open access1[, 7]. However, these datasets the geographic automaton is utilized to perform a lin-possess limited geographic coverage, rendering them First, we present the experimental setting, then we report and discuss the best performance achieved. Finally, we illustrate details of the hyperparameter tuning and the minimizes thebestfit Geolet function.

5. Experiments best match candidates= {

1, … , }. From this set, the 2Python code: https://t.ly/wVl X.SWe ran our experiments on a ear search for the best matche s among all possibilities ofered by (line 4). This implementation enablGesASM to identify an “initial best match”, presenting a set of ifnal selection of the optimal alignment∗ is determined through a naive approach (line 5).

unsuitable for our study. Thus, we introduce a novel high-sample rate dataset derived from the publicly acces2xIntel Xeon Gold 6342 24-core CPU, limiting each test to use at most 12 cores.

3GeoLife: https://t.ly/6VJ-.EPorto:https://t.ly/0GMR. 9

Province

Arezzo

Firenze Grosseto Livorno

Lucca M. Carrara

Pisa Pistoia Prato Siena sible 2013 GPS traces on OpenStreetM4a.pAlthough the the metric ofaccuracy.On the other hand, the evaluation initial OpenStreetMap dataset encompasses GPS trajectoof- the second question relies on the metricseolfectivity, ries spanning the entire globe, our analysis concentratecsommonly employed in database literatu2re7][. Selecon the ten provinces in Tuscany, a region encompass-tivity measures the reduction of potential alignments ing 22, 985 2 in central Italy. The Mappymatch pythonbetween a query result and the entire dataset. In our conpackage5 was employed to map-match each trajectoryt,ext, selectivity is defined as the ratio of matched road retaining only those trajectories with an average error osfegments (| | ) to the total number of road segmen|t|s)(. less than10 . The final dataset encompasses 358 distinct For accuracy, higher values indicate better results, while trajectories, covering a total travel distanc5e, 0o2f4 for selectivity, lower values indicate better outcomes. and described by 300, 049 GPS points. Additional infor- GASM Performance. Table 5 presents the performation on the types of roads traversed in each provincemance metrics ofGASM across individual provinces. To in Tuscany, as per the OpenStreetMap taxono m6,yis determine the optimal hyperparameters, a grid search presented in Table6. was conducted over the values outlined in Tab6,lespecif

Within the framework of our shape-based map- ically for the province of Grosseto. This process yielded matching formulation, we aim to address the followintghe following hyperparameters: a resampling distance questions. First, to what extent cGanASM infer the of = 10 meters, an alphabet size o f= 8 symbols, and original GPS coordinates without utilizing them for maap street aggregation oℎf = 2 hops. GASM showcases matching? Second, how efectively canGASM reduce the an impressive ability to deduce the original GPS coornumber of potential alignments compared to the entidreinates, achieving an average light accuracy o9f0.1%. road network? The first question is evaluated throughFurthermore, it significantly narrows down the potential alignments, as indicated by the selectivity factor, reduc54OMpaepnpSytmreatetchM: haptt2p0s1:3//ptu.lbyl/iRcHGaPfSS tracesh:ttps://t.ly/q7u2N ing it to just10.8% of the original road network. These 6Highway taxonomy:https://t.ly/NpxZv commendable results are attained while maintaining a reasonable automaton construction time of 7.8 minutecsriminative nature of trajectories based on the type of per province, resulting in a total indexing time of a merreoad. Our hypothesis posits that straight streets, such 1.29 hours for the entire region. as motorways, exhibit lower discriminative characteris

Hyperparameters Tuning. In this section, we tics. Consequently, trajectories observed on such roads present the results of experiments conducted on thaere more likely to avoid re-identification, suggesting enprovince of Grosseto while varying the hyperparame-hanced geographic transferability. To investigate this, we ters detailed in Tabl6e. Initially, we compute the Pearsonidentify the type of road traveled within each segment. correlation between the method’s hyperparameters anIdn cases where multiple types of roads are encountered, two key performance metrics: the selectivity factor anwde perform a majority vote weighted by road length. Adaccuracy. Figure 3 visually depicts the changes in perfor-ditionally, to examine the influence of changes in the mance metrics, emphasizing variations in the top threeinput data, we create random subtrajectories of varying most influential hyperparameters—those exhibiting thelengths, including 100m, 150m, 500m, 1km, 1.5km, 5km, highest absolute values of Pearson correlation. Notablayn,d 10km, derived from our OpenStreetMap dataset. In the most influential hyperparameter is the number oforder to assess our hypothesis, we evaluate tshtreaightroad segment aggregationsℎ(), demonstrating a corre- ness [4] of each subtrajectory by calculating the ratio lation of−0.52 with the selectivity. Thus, increasinℎg between the shortest path from the origin to the destinaproves beneficial for GASM as it helps select fewer can-tion and the actual trajectory. didate road segments without significantly impacting Figure 4 encapsulates these results. The initial plot on accuracy. The alphabet size () displays correlations of the left highlights a notable trend: an increase in subtra−0.44 and 0.40 with respect to the selectivity and accuje-ctory length correlates with a rapid elevation in both racy, respectively. This hyperparameter introduces aaccuracy and selectivity. In simpler terms, as the subtratrade-of, as increasing the number of symbols reduces jectory length extends, the model’s precision improves. selectivity but may lead to a slight decrease in accuracTyh.e central plot reveals that trajectory straightness has Finally, the resampling distance)(exhibits a correlation a negligible impact on the number of candidate matches. of 0.22 with accuracy. Interestingly, decreasin gslightly However, as trajectories become more linear, the accuenhances accuracy according to our observations. racy experiences a decline. Finally, the rightmost plot Sensitivity Analysis. We delve here into the varia-illustrates the method’s performance across various road types. This plot validates the findings of the straightness tions in performance with respect to the length of thpleot: roads with greater straightness, like motorways, query trajectory . Additionally, we explore the dis

pose the greatest challenge for re-identification. Conversely, more sinuous roads present a slightly higher [7] C. Landi, et al., Geolet: An interpretable model for dificulty in re-identification, reflected in a higher selec- trajectory classification, in: IDA, volume 13876 of tivity but with a concomitant boost in accuracy. LNCS, Springer, 2023, pp. 236–248.

[8] F. Veronesi, et al., Assessing accuracy and geograph6. Conclusion ical transferability of machine learning algorithms for wind speed modelling, in: AGILE, LNGC, 2017.

In this paper we have introducedGASM, a map matching [9] M. Nanni, et al., City indicators for geographical method capable of determining a trajectory’s position transfer learning: an application to crash prediction, solely based on its shape. Our experiments showcase GeoInformatica 26 (2022) 581–612. thatGASM significantly reduces the number of poten- [10] J. Lines, et al., A shapelet transform for time series tial alignments and deduces the original GPS coordinates classification, in: ACM SIGKDD, 2012, pp. 289–297. with remarkable accuracy. Further analysis reveals that[11] C. White, et al., Some map matching algorithms for longer and less linear trajectories are more straightfor- personal navigation assistants, TRC 8 (2000). ward to map match. However, this observation raises[12] D. Bernstein, et al., An introduction to map matchconcerns about the potential for shape-based methods ing for personal navigation assistants (1996). to inadvertently learn geographic positions instead o[1f3] M. A. Quddus, et al., A general map matching alfocusing on other intrinsic features. As a part of future gorithm for transport telematics applications, GPS work, as outlined at the beginning, we aim to assess the solutions 7 (2003) 157–167. geographic transferability of shape-based methods, such[14] Y. Liu, Z. Li, A novel algorithm of low sampling as Geolet, by incorporatingGASM. Specifically, we pro- rate GPS trajectories on map-matching, EURASIP pose giving more weight to the selection of discriminative 2017 (2017) 30. subsequences with higher selectivity rather than basing[15] X. Liu, et al., A st-crf map-matching method for the decision solely on a statistical test. low-frequency floating car data, TITS 18 (2016) 1241–1254.

Acknowledgments [16] L. Jiang, et al., From driving trajectories to driving paths: a survey on map-matching algorithms, CCF This work is partially supported by the EU NextGener- TPCI 4 (2022) 252–267. ationEU programme under the funding schemes PNRR- [17] P. Cintia, M. Nanni, An efective time-aware map “SoBigData.it - Strengthening the Italian RI for Social matching process for low sampling gps data, arXiv Mining and Big Data Analytics” - Prot. IR0000013, H2020- preprint arXiv:1603.07376 (2016). INFRAIA-2019-1: Res. Infr. G.A. 871042 SoBigData++, [18] G. Hu, et al., If-matching: Towards accurate mapand GreenDatAI G.A. 101070416. matching with information fusion, TKDE 29 (2016) 114–127.

References [19] L. Wang, et al., Smart city development with urban transfer learning, Computer 51 (2018) 32–41. [1] C. L. da Silva, et al., A survey and comparison of [20] L. Wang, et al., Cross-city transfer learning for trajectory classification methods, in: BRACIS, IEEE, deep spatio-temporal prediction, in: IJCAI, ijcai.org, 2019, pp. 788–793. 2019, pp. 1893–1899. [2] A. Bolbol, et al., Inferring hybrid transportatio[n21] R. Trasarti, et al., Myway: Location prediction via modes from sparse GPS data using a moving win- mobility profiling, Inf. Syst. 64 (2017) 350–367. dow SVM classification, CEUS 36 (2012) 526–537. [22] P.-N. Tan, M. Steinbach, V. Kumar, Introduction to [3] S. Dodge, et al., Revealing the physics of movement: data mining, Pearson Education India, 2016.

Comparing the similarity of movement characteris[-23] A. J. Bagnall, et al., The great time series classificatics, CEUS 33 (2009) 419–434. tion bake of, CoRR abs/1602.01711 (2016). [4] P. J. Almeida, et al., Indices of movement behaviour:[24] A. V. Aho, et al., Eficient string matching: An aid conceptual background, efects of scale and location to bibliographic search, ACM 18 (1975) 333–340. errors, Zoologia (Curitiba) 27 (2010) 674–680. [25] E. M. McCreight, A space-economical sufix tree [5] I. Kontopoulos, et al., Traclets: Harnessing the construction algorithm, JACM 23 (1976) 262–272. power of computer vision for trajectory classifica- [26] J. Lin, et al., A symbolic representation of time tion, arXiv:2205.13880 (2022). series, with implications for streaming algorithms, [6] C. A. Ferrero, et al., MOVELETS: exploring relevant in: DMKD, ACM, 2003, pp. 2–11.

subtrajectories for robust trajectory classification[,27] S. Acharya, et al., Selectivity estimation in spatial in: SAC, ACM, 2018, pp. 849–856. databases, in: SIGMOD, ACM, 1999, pp. 13–24.