1. Introduction

Path-based trafic flow prediction

Efstratios Karkanis

Nikos Pelekis

Eva Chondrodima

Yannis Theodoridis

0 0 Department of Informatics, University of Piraeus , Piraeus , Greece 1 Department of Statistics and Insurance Science, University of Piraeus , Piraeus , Greece

Predicting trafic flow on a road network is crucial in various aspects of the transportation domain, encompassing safety and logistics. This paper introduces an innovative approach to forecast trafic flow, with a specific emphasis on predicting future trafic along paths within a road network. The proposed framework is tailored to accept GPS trajectory data as input, generating time series data illustrating trafic flow along designated pathways. It achieves this by utilizing only a subset of trajectories that strictly follow the paths without detours. This intuitive approach results in training data that better captures the essence of the forecasting problem. In the final step of our methodology, we employ state-of-the-art time series forecasting methods, including ensemble trees and recurrent neural networks. To validate the efectiveness of our approach, we evaluate it using a real-world dataset, demonstrating its capability in predicting trafic flow.

eol>Mobility Trajectories Trafic flow Forecasting methods Ensemble trees Neural networks

1. Introduction

network. On the other hand, specific vehicles, such as buses and taxis, are also expected to contain sensors that In the contemporary era, urbanisation has given rise to a periodically transmit their mobility status information, conspicuous surge in the demand for public transporta- including their GPS location, along with the respective tion within urban areas. This escalating need, fueled by sampling timestamp. The proposed framework aims to population growth, economic advancement, and evolv- provide predictions for urban trafic flow along predeing lifestyle preferences, underscores a significant trans- ifned paths by leveraging data collected from sensors formation in the dynamics of urban mobility [ 1 ]. Con- embedded in specific vehicles, such as buses and taxis. currently, challenges like trafic congestion, navigation In summary, the complete workflow of the proposed optimisation, and air quality management have become framework involves several key steps. Starting with the prevalent1. As a result, path-based trafic flow, which original trajectory data provided by vehicles, we create in terms of this study refers to the number of vehicles a time series dataset that captures historical trafic flow passing through a particular path (a set of consecutive values along predefined paths at uniform-sized time inroad parts between intersections) on a road network, tervals. A key aspect of our methodology involves the has increased substantially over the recent decades, em- measurement of trafic flow along each predefined path. phasising the critical necessity for accurate forecasting Therefore, we perform a search procedure on trajectory methods. data (the so-called Strict Path Query [ 6 ]) to retrieve all

Mobility data analytics and prediction has gained an trajectories that strictly cross a predefined path of a road increasing interest in recent years, due to its time-critical network within a specicfi time interval. A trajectory applications in urban, maritime and aviation domains strictly follows a path if and only if this path is a subset [ 2, 3, 4, 5 ]. In particular, predicting trafic flow is chal- of the trajectory. In other words, the trajectory does not lenging due to its non-linear nature. This complexity deviate from this predefined path. In this way, we are arises because trafic flow is influenced by non-linear sure that we compute trafic flow accurately, eliminatfactors, such as weather conditions, road trafic accidents ing trajectories that perform detours while crossing the and public holidays, posing dificulty in accurate fore- path. This approach allows us to reliably assess trafic casting. lfow throughout the entire length of the path, avoiding

Typically, road networks are equipped with sensors potential inaccuracies that might arise from considering that measure trafic flow at a specific point of the road trajectories with detours, which could lead to misleading indications of trafic flow.

Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 Figure 1 illustrates a road network where letters a Joint Conference (March 25-28, 2024), Paestum, Italy to l represent intersections. Four trajectories 1, 2, 3, $ stratoskarkanis2@gmail.com (E. Karkanis); npelekis@unipi.gr and 4 are depicted, each marked with a distinct colour, (yNt h.ePoedle@kiusn);ipeiv.gacrh(Yo.n@Thuenoidpoi.rgirdi(sE). Chondrodima); navigating within the network. Additionally, there is a

Copyright © 2024 for this paper by its authors. Use permitted under Creative Commons License predefined path b –> c –> h –> i –> d, marked with orange 1The EurAottrpibeutaionn 4.0 GIntreerneatnional D(CCeaBYl:4.0). https://eur-lex.europa.eu/legal- colour. To assess trafic flow within this specified path content/EN/TXT/?uri=COM:2019:640:FIN during a particular time period, trajectory 3 is retained as it follows this predefined path. Trajectories 1, 2 and 4 are excluded from consideration since they involve detours. In this particular example, we argue that trafic lfow within the orange path is represented better by only considering 3, although all trajectories touch the path.

Following the generation of time series data, we construct suitable Machine Learning (ML) models to address the specific problem at hand. In this research, we evaluate two such models: an ensemble-based decision tree model using XGBoost [ 7 ], and a model based on Long ShortTerm Memory (LSTM) [ 8 ], an evolution of the Recurrent Neural Network (RNN). We also analyze the advantages and disadvantages of using each model for forecasting trafic flow along predefined paths.

This paper aims to address the problem of efectively predicting trafic flow on road networks. While several studies have been conducted regarding this topic, the novelty of our work stands on that it proposes a distinct path-based methodology that focuses on solving this problem. To the best of our knowledge, no similar research has been published in this field yet.

In practice, the role of the proposed framework is to assist urban trafic management systems in guiding and managing trafic flow, thus controlling trafic volume, avoiding trafic congestion, and constructing an eficient road network.

To summarise, the main contributions of this paper are listed as follows: • We propose a novel methodology for predicting trafic flow on paths within a road network. Our methodology is generic as it based on timeseries forecasting, and towards this direction, we build an ensemble-based and RNN-based model. • We demonstrate the eficacy of our proposal by a thorough experimental study using a real-world dataset.

The rest of this paper is organised as follows: Section

2 discusses related work. In Section 3, preliminary terms and definitions are provided. In Section 4, we introduce our methodology. Section 5 presents the experimental study, followed by conclusions in Section 6.

2. Related Work

Several studies approach the problem of predicting trafic lfow on road networks as follows: given a dataset of historical trafic flow values, which consists of information about the number of vehicles moving through specific sections of the road network between intersections or multiple areas within the same road network monitored by trafic sensors, the objective is to predict the future trafic flow at these locations at a given time in the future.

These studies mainly focus on ML approaches.

Many researchers have tried algorithms like Support Vector Machines (SVM) [ 9, 10 ], K-Nearest Neighbors (KNN) [ 9 ], Bayesian networks [ 9 ], Autoregressive Integrated Moving Average (ARIMA) [ 10, 11, 12 ], Shallow Neural Networks (SNNs)[ 11 ], Deep Neural Networks (DNNs) [ 13, 11, 10, 14 ] and hybrid implementations [ 9 ], each with its strengths and weaknesses. For example, SVM and KNN are capable of capturing complex patterns in multidimensional data [ 9 ]. SNNs, while useful in many applications, struggle with non-linear patterns of trafic flow [ 11 ]. ARIMA models perform well in predicting short-term trafic [ 11 ], making them useful for scenarios where precise, immediate forecasts are needed.

Last but not least, hybrid approaches leverage the unique strengths of many models combined to achieve better predictions.

Due to the inherent volatility of trafic flow patterns, DNNs become essential [ 11, 13 ]. Numerous studies in the literature have recognized this necessity. In a specific instance, researchers introduced a trafic flow prediction framework centred on Fully Connected Neural Networks (FCNN) [ 11 ]. This model utilized historical trafic lfow data to predict trafic conditions one timestep ahead.

Notably, the authors enhanced model performance and generalization by incorporating optimization techniques such as Batch Normalization (BN) and dropout layers.

Another aspect of addressing the trafic flow prediction problem using DNNs involves the utilisation of Stacked Autoencoders (SAEs) [ 13 ]. The SAEs were used to distil meaningful information from trafic data by compressing input data into a lower-dimensional representation and reconstructing the original data. Comparative evaluation against alternative ML algorithms, such as SVM, Random Walk (RW), and Radial Basis Function Neural Network (RBFNN), underscored the superior performance of the proposed SAE model, particularly in situations characterised by high trafic flow volumes.

However, the most common deep learning approach proposed when utilising time-related historical data involves the use of an LSTM model. For instance, in [ 10 ], this model was employed by researchers to take historical process. The model is compared with other algorithms, trafic flow data as input and predict short-term trafic pat- such as the SVM and the XGBoost, without the use of WD terns within a 15-minute time horizon in the future. The technique. Results indicate that the proposed approach LSTM exhibited superior performance compared to SVM achieves superior performance. and Feedforward Neural Networks (FNN), efectively ad- In summary, the examined studies showcase notable dressing trafic flow’s non-linear and time-dependent progress in the domain of trafic flow prediction. Howcharacteristics. In another study, scientists introduced ever, these researches primarily focus on predicting futhe DeepCrowd model [ 14 ], which integrates Convolu- ture trafic flow in specific locations, whether it is a single tional LSTM Neural Networks with High-Dimensional road segment between intersections or many road segAttention Mechanisms (HDAM) to capture spatial and ments within intersections, each treated independently. temporal trafic data correlations. This difers from our objective, which involves forecast

In [ 15 ] the authors present the Foresight cloud-based ing trafic flow within paths. To the best of our knowlsystem for real-time spatio-temporal forecasting. The pa- edge, this study represents a novel contribution to the per discusses practical aspects to implement a real-time ifeld. system for real-world, while it presents experimental results of the application of several ML methods to the problem at hand. In addition, the authors adapt spatio- 3. Preliminaries temporal ML methods to incorporate dynamic urban events and vehicle-level flow information into the pre- In this section, we introduce some basic terms required dictive models. in the rest of the paper and formulate the problem at

Despite the widespread use of LSTM, SAE, and FCNN hand. models for handling time-related data, another study in- In our setting, trajectories are moving on a road nettroduces the power of Graph Neural Networks (GNNs) work. A road network is defined as a directed graph, when dealing with trafic data. The proposed methodol- denoted as G=(V,E), with V ={1, 2, ...} representing ogy, [ 12 ], leverages historical data, incorporating factors the set of vertices in the graph G and E={1, 2, ...} influencing trafic flow and speed, including time, space, representing the set of edges in the graph G. In a general weather conditions and activities such as public holi- context, when considering a graph that represents a road days. The chosen model for this approach is the Graph network, set V is understood to denote the intersections Hierarchical Convolutional Recurrent Neural Network within the network, while set E signifies the road seg(GHCRNN), which is specifically designed for urban ar- ments between two intersections. Within a road network eas. The GHCRNN model integrates both spatial and represented as a graph G, a path, also a pathway, is dehierarchical information to improve prediction accuracy. ifned to be a sequence of at least two consecutive edges It includes convolutional layers for spatial analysis, pool- in G. ing units for hierarchical structure representation and an In the context of a road network, diverse moving obencoder-decoder architecture utilising Gated Recurrent jects (MO) navigate through it. An MO is defined as any Unit (GRU) modules. The experimentation shows the entity that travels along the edges of a road network repproficiency of GHCRNN in handling extensive graphs resented as a graph G. Examples of MOs include cars, and delivering precise predictions. buses and taxis.

Besides the powerfulness of neural networks, the fol- The route of a MO is defined as a sequence lowing study proves that ML implementations also ad- <1, 2, ..., >, where each element is a triple (, , dress the trafic flow prediction problem with high accu- ) denoting the timestamp of the MO’s sampling and racy. The XGBoost algorithm with the Wavelet Decom- the respective GPS coordinates. position (WD) technique are used to predict short-term Based on the provided definitions, the problem adtrafic flow [ 9 ]. This method aims to extract additional dressed in this study is formulated as follows: given (i) information from the feature to be predicted, combining a graph G representing a road network, (ii) a dataset D low and high-frequency details during the reconstruction containing a set of routes of MOs traveling along the road network represented by G and (iii) a set P of predefined paths within G, the goal of trafic flow prediction is to Once the split trajectories are processed through the predict the occupancy of each path in P, i.e. the num- map-matching algorithm, the information given as outber of the MOs that will traverse each predefined path put encompasses the edges of the road network that the within a specific time horizon in the future using D as its trajectory traversed in chronological order. Additionknowledge base. ally, it includes information about the time interval the trajectory lied on each edge.

4. Methodology

In this section, we introduce the proposed methodology for eficiently resolving the challenge of forecasting trafic flow along a pathway within a road network. The workflow of the proposed methodology is depicted in Figure 2. The methodology starts by processing a collection of GPS data that contain routes of MOs, and more specifically by performing a route-splitting (i.e., segmentation) operation. Route splitting is the process of partitioning the entire route of a specific MO into distinct trajectories, guided by specific conditions. Subsequently, the resulting trajectories undergo map matching onto the road network. The next step in the methodology is to measure historical trafic flow volumes along each path that belongs in the set of predefined paths and at uniformsized time intervals, resulting in the creation of a time series dataset for each predefined path. After generating the time series data, we extract temporal features from timestamps to enhance the prediction accuracy of the forecasting models. Finally, prediction procedures of trafifc flow magnitude are conducted using two ML models, based on the XGBoost and the LSTM RNN approaches.

4.1. Dataset Preprocessing In the initial phase, it is crucial to acknowledge the poten

tial existence of routes of MOs that contain elements that happen to be recorded at sparse timestamps. To address this problem, a necessary step involves breaking down MO routes into trajectories. The time gap between any two consecutive elements within the same trajectory is constrained to a specific time value, denoted as Dt. A route undergoes a split when consecutive pairs of elements within the route exhibit a time diference greater than the specified value Dt.

The subsequent phase involves the map-matching process. The spatial locations of the trajectories may not precisely align with the road network due to potential noise during sampling or technical errors. Hence, the usage of a map-matching algorithm becomes imperative to ensure accurate alignment with the road network.

For precise map-matching of trajectory coordinates (lat and lon) onto the road network, the Valhalla Meili API2, recognized for its highly eficient map-matching algorithm [ 16 ], is utilized.

2https://valhalla.github.io/valhalla/api/map-matching/api

reference/

4.2. Generating Trafic Flow Time Series

The subsequent stage in the proposed methodology involves the generation of time series data using the trajectory data. Time series data represent the trafic flow inside each pathway under study at equal-sized time intervals. The utilisation of the Strict Path Queries (SPQs) algorithm [ 6 ] is crucial for this task. This type of query identifies all trajectories strictly following a specified path within a defined time interval. The start of this interval signifies the minimum acceptable timestamp for the trajectory to have entered the designated path, while the end of the interval indicates the tolerated timestamp for the trajectory to have exited it.

The first parameter that the SPQ algorithm requires is a path in which past trafic flow values will be measured at uniform-sized time intervals. The second parameter essential for the SPQ algorithm is a time interval. Because the SPQ algorithm is employed to generate time series data, uniform time intervals must be established. To achieve this, the total sampling duration of trajectories within dataset D is divided by a positive number, represented as Q. This division results in the creation of smaller time sub-intervals, each characterized by a duration of Q time units.

For each sub-interval, the SPQ algorithm measures trafic flow along the predefined path, producing a time series with trafic flow volume per sub-interval.

To improve the quality of time series data, we also extract time-related features from the timestamps. These features encompass the day (numerical representation, e.g., 1, 2, 3), the day of the week (e.g., Monday, Tuesday), the hour (numerical representation from 0 to 23), and the minutes (numerical representation from 0 to 59). To account for the cyclical nature of time, cyclic encoding is applied to these time-related features, capturing their periodic characteristics using semitones and cosines.

Incorporating additional time-related information, we introduce the 3-hour-interval attribute (numerical representation from 1 to 8) to signify the specific 3-hour interval of the day to which a particular recording within the time series belongs.

4.3. Forecasting Models Resulting time series data serve as the input for the two

ML models, XGBoost [ 7 ] and LSTM RNN [ 8 ], we evaluate in this study. These models are specifically trained to predict trafic flow on each predefined path. This task is categorized as a supervised ML process.

To structure the data into input and output sets, the sliding window technique is applied. This technique involves dividing the time series into subsets using a moving or sliding window. As the window progresses step by step through the time series, it captures fixed-size segments of historical values. The length of this window determines the amount of past data used to forecast the subsequent value in the time series and corresponds to parameter past-observations that actually indicates the count of historical observations utilized for prediction.

Figure 3 illustrates the operation of the sliding window technique. The table at the left displays samples of trafic lfow on a single path at 4 uniform-sized time intervals 1, 2, 3 and 4. For instance, equal to 4 at time interval 1, 5 at 2, and so on. If the decision is made to input trafic flow values from the two preceding time steps into the forecasting model to predict the next value, the window length is set to 2. Adhering to these criteria, the time series undergoes a transformation into the format showcased in the table at the right. trees, the model takes as input a vector with L features in order to predict future trafic flow. Each subsequent tree is designed to improve upon its predecessor. The ultimate prediction for a regression task is derived by consolidating the predictions made by each individual tree within the ensemble. This aggregation process ensures that the final forecast benefits from the collective contributions of all the trees in the model.

On the other hand, Figure 5 depicts the proposed LSTM model. The neural network is designed to take a vector with L features as input. This vector is then reshaped in a 3-dimensional tensor. The information traverses through 3 LSTM layers with 50, 25, and 12 units, respectively. Dropout layers are strategically employed right after the LSTM layers to deactivate certain neurons and prevent overfitting during the training process. Subsequently, the information passes through 3 fully connected layers with 6, 3, and 1 neuron, respectively, before producing the ifnal output y. ReLu activation functions are utilized in all LSTM layers, while the fully connected layers employ a linear activation function.

5. Experimental Study

Through the implementation of the sliding window technique, we can generate input and output sets for In this section, we evaluate the proposed methodology diferent values of the past-observations parameter. These using a real-world trajectory dataset. The source code sets are utilized for training and evaluating both machine that implements the proposed methodology is available learning models. Each model receives an input vector at GitHub3. with L features, including historical trafic flow values, In particular, we used the Cabspoting dataset4, a poputemporal characteristics with their cyclic encodings, and lar urban mobility dataset consisting of navigation routes the 3-hour-interval. Furthermore, the models incorporate from 537 taxis in San Francisco, CA, USA, recorded from the mean and variance of historical trafic flow values May 17, 2008, to June 10, 2008. The routes are captured as input. The goal of each model is to predict the trafic at an average sampling rate of approximately 30 seconds. lfow in the next sub-interval with duration Q. The entire dataset encompasses 11,220,491 GPS records.

To improve eficiency, a correlation matrix is utilized. The research was conducted using a combination of This matrix helps identify and remove attributes that computational resources. Data preparation and time seredundantly convey similar information, ensuring only ries generation procedures were performed on a Dell those with low correlation are included. Features with Inspiron 15 3000 series laptop equipped with an Intel low correlation are chosen for the training and testing Core i5-4210U processor and 8GB of RAM. For model phases of XGBoost and LSTM models.

In Figure 4, the architecture of the selected XGBoost model is illustrated. Using a set of 100 XGBoost decision

3https://github.com/stratoskar/Path_Based_Trafic_Flow_Prediction

4https://ieee-dataport.org/open-access/crawdad-epflmobility; https://stamen.com/work/cabspotting/ training and forecasting processes an NVIDIA Tesla T4 time interval. By repeating this procedure for each path GPU hosted on Google Colab was utilized. This GPU and every interval, we create a trafic flow time series for provided the necessary computational power to train the each path.

ML models eficiently. The application of the SPQ algorithm filters out the

Following the proposed methodology, the segmenta- trajectories that deviate while moving along a specific tion of taxi routes into trajectories was performed using path. Figure 6 illustrates the diference in trafic flow data a time threshold of Dt=90 seconds. The identification of obtained when using the SPQ algorithm on the set of 100 the route of a taxi before splitting is encoded as taxi-id, predefined paths and data collected without the use of while the subsequent identification of each newly gener- SPQ on the same set of predefined paths. In the absence ated trajectory is denoted as traj-id. Thus, each trajectory of SPQ restrictions, trafic flow on each route is calcuin the dataset is uniquely identified by a pair of taxi-id lated by counting the trajectories that have traversed all and traj-id values. The produced trajectory dataset en- path edges at least once within a defined time interval. compasses a total of 521,135 unique trajectories. However, when SPQ restrictions are applied, it is evident

The Valhalla API receives GPS coordinates for each el- that the collected trafic flow data are reduced. ement in a trajectory as input and subsequently produces a potential map-matched trajectory that aligns with the road network of San Francisco. Following this procedure, the resulting dataset includes details, such as the unique trajectory identifier (denoted by taxi-id and traj-id) and the exact sequence of road network edges traversed by the trajectory. The updated dataset from the Valhalla Meili map-matching algorithm also provides timestamps denoted as t_enter, indicating the moments when the trajectory enters the specified edges. Figure 6: Diference in average trafic flow across all prede

The next step involves constructing a time series fined paths grouped by timestamps: dataset 1 (blue) is formed dataset focusing on the recording of trafic flow by path using SPQ algorithm, whereas dataset 2 (orange) is developed and time interval. The available map-matched trajec- without adhering to SPQ rules. tory data spans from May 17th, starting at 10:00:04, to To build robust and highly precise ML models, it is cruJune 10th, concluding at 09:00:04. This time frame is seg- cial to selectively choose attributes from the time series mented into smaller intervals of half an hour each, i.e. Q data with low correlation. To accomplish this, a correla= 30 min. tion matrix is utilized. The correlation matrix presented

To quantify trafic flow along pathways, we choose to in Figure 7 indicates that the temporal characteristics create a set of 100 unique and distinct paths. The length day, dayofweek, hour and minute share common inforof these paths, determined by the number of edges, spans mation with their corresponding cyclic encoded features from 2 to 20 edges. Importantly, these paths are not day_of_week_sin, day_of_week_cos, hour_sin, hour_cos, randomly generated; rather, they are derived from the minute_sin, minute_cos, day_sin and day_cos. As a result, trajectory data. This ensures that each path created is a temporal attributes day, dayofweek, hour, and minute are subset of one or more trajectories. eliminated from the time series data. The rest of the at

The construction of a time series utilizes the SPQ al- tributes present in the heatmap representation are taken gorithm. For each call of the SPQ algorithm, the input into account in the training and evaluation procedures consists of the specific path for which trafic flow needs of the ML models. to be counted, along with the corresponding half-hour Time series data is divided into training and testing used to evaluate the performance of XGBoost in terms of RMSE and MAE regression evaluation metrics.

Concerning the LSTM model, the time series data within the training set undergo a reshaping process, resulting in a 3-dimensional tensor. The output y of the LSTM model represents the predicted trafic flow value in the time series. The LSTM model is then evaluated on the test set using the same evaluation metrics, RMSE and MAE.

To determine the optimal value for the pastobservations parameter, we employ varying sliding window lengths. Both models are assessed on the same test set for diferent past-observations values, specifically 2, 3, 4, 5, and 6. The outcomes are presented in Table 1 using the data created using the SPQ algorithm. Table 2 indicates the result of the same experiment using the data collected when the usage of SPQ is absent. The comparison of the two tables makes clear that by applying the proposed methodology we succeed lower RMSE and MAE scores. sets, with the training set encompassing the earliest data in the time series, spanning from May 17th to June 2nd inclusive. The remaining data forms the testing set.

The subsequent phase involves fine-tuning XGBoost’s hyperparameters utilizing the Grid Search Cross- Forecasts are produced using the known values from Validation method5. The hyperparameters targeted for the test set. Based on the conclusions drawn from Table optimization encompass regularization parameters, such 1, the XGBoost model is configured with a setting of 5 (6, as gamma, alpha, lambda, the learning rate influencing respectively) for the past-observations parameter. Figures the backpropagation algorithm, as well as the maximum 8 and 9 showcase the predictions made by both models depth of each decision tree. Employing the best combi- on the test set. The blue colour represents the observed nation of hyperparameter values, we train the XGBoost sum of trafic flow values in the test set, aggregated based model utilizing an ensemble of 100 trees. The test set is on timestamps. Conversely, the orange and red colours represent the predicted sum of trafic flow values across 5https://scikit-learn.org/stable/modules/grid_search.html all 100 predefined paths in the test set, organized by timestamp and generated using the trained XGBoost and LSTM models, respectively. It appears that both models efectively capture the trend and seasonality present in the time series data of the test set.

6. Conclusion In this study, we presented a generic methodology for

forecasting trafic flow volumes on pathways, employing ML algorithms, with a particular emphasis on the XGBoost and the LSTM models. When tested on a real-world dataset containing taxi routes in the San Francisco area, both models demonstrated good performance, which is evidently better than the case when all data is used.

A limitation that constrained the scope of our study is the reliance only on taxi trafic data resulted in an incomplete representation of the city’s overall trafic dynamics, overlooking the movements of other transportation modes, such as buses and private cars. Furthermore, a more holistic comprehension of trafic flow prediction dynamics could be achieved by integrating additional data, such as vehicle accidents and weather conditions, which are known to impact trafic flow patterns.

Lastly, for future research endeavours, it is crucial to simulate scenarios using diverse modelling approaches, including Graph Neural Networks (GNNs), to assess their efectiveness in capturing complex relationships within the road network. This comprehensive approach aims to contribute to a more nuanced and adaptable predictive framework for future trafic flow predictions.

Acknowledgments This work was supported by the Horizon Europe research and innovation programmes under GA 101070416 (Green.Dat.AI) & 101093051 (EMERALDS) funded by EU.

[1]

Xu ,

L. E.

Olmos ,

Mateo ,

Hernando ,

Yang ,

M. C.

González , Urban dynamics through the lens of human mobility , Nature Computational Science 3 ( 2023 ) 611 - 620 .

[2]

Pelekis ,

Theodoridis , Mobility Data Management and Exploration , Springer New York, 2014 .

[3]

Vouros , G. Andrienko,

Doulkeridis ,

Pelekis ,

Artikis ,

A.-L.

Jousselme ,

Ray ,

Cordero , Big Data Analytics for Time-Critical Mobility Forecasting , Springer Nature, 2020 .

[4]

Doulkeridis ,

Vlachou ,

Pelekis ,

Theodoridis , A survey on big data processing frameworks for mobility analytics , SIGMOD Rec . 50 ( 2021 ) 18 - 29 .

[5]

Chondrodima ,

Pelekis ,

Pikrakis ,

Theodoridis , An eficient lstm neural network-based framework for vessel location forecasting , IEEE Transactions on Intelligent Transportation Systems 24 ( 2023 ) 4872 - 4888 .

[6]

Krogh ,

Pelekis ,

Theodoridis ,

Torp , Pathbased queries on trajectory data , in: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems , SIGSPATIAL '14, Association for Computing Machinery, New York, NY, US, 2014 , pp. 341 - 350 .

[7]

Chen ,

Guestrin , Xgboost: A scalable tree boosting system , in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16 , ACM , 2016 .

[8]

R. C.

Staudemeyer ,

E. R.

Morris , Understanding lstm - a tutorial into long short-term memory recurrent neural networks , 2019 . arXiv: 1909 .09586.

[9]

Dong ,

Lei ,

Jin ,

Hou , Short-term trafic lfow prediction based on xgboost , in: 2018 IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS) , 2018 , pp. 854 - 859 .

[10]

Tian , L. Pan, Predicting short-term trafic flow by long short-term memory recurrent neural network , in: 2015 IEEE International Conference on Smart City/SocialCom/SustainCom, 2015 , pp. 153 - 158 .

[11]

Tampubolon , P.-A. Hsiung, Supervised deep learning based for trafic flow prediction , in: 2018 International Conference on Smart Green Technology in Electrical and Information Systems (ICSGTEIS) , 2018 , pp. 95 - 100 .

[12]

Lu ,

Zhang , H. Liu,

Xiong , Graph hierarchical convolutional recurrent neural network (ghcrnn) for vehicle condition prediction , 2019 . arXiv: 1903 .06261.

[13]

Lv ,

Duan ,

Kang ,

Li ,

F.-Y.

Wang , Trafifc flow prediction with big data: A deep learning approach , IEEE Transactions on Intelligent Transportation Systems 16 ( 2015 ) 865 - 873 .

[14]

Jiang ,

Cai ,

Wang ,

Yang ,

Fan ,

Chen ,

Tsubouchi ,

Song ,

Shibasaki , Deepcrowd: A deep model for large-scale citywide crowd density and flow prediction , IEEE Transactions on Knowledge and Data Engineering 35 ( 2023 ) 276 - 290 .

[15]

Conlan ,

Oakley ,

G. V.

Demirci ,

Sfyridis ,

Ferhatosmanoglu , Real-time spatio-temporal forecasting with dynamic urban event and vehiclelevel flow information , in: CEUR Workshop Proceedings , volume 3379 , 2023 .

[16]

Saki ,

Hagen , A practical guide to an opensource map-matching approach for big gps data , SN Computer Science 3 ( 2022 ).