=Paper=
{{Paper
|id=Vol-3946/BMDA_paper4
|storemode=property
|title=Shared Micro-mobility Demand Forecasting using Gradient Boosting methods
|pdfUrl=https://ceur-ws.org/Vol-3946/BMDA-4.pdf
|volume=Vol-3946
|authors=Antonios Tziorvas,George S. Theodoropoulos,Yannis Theodoridis
}}
==Shared Micro-mobility Demand Forecasting using Gradient Boosting methods==
Shared Micro-mobility Demand Forecasting using Gradient
Boosting methods
Antonios Tziorvas1,* , George S. Theodoropoulos1,† and Yannis Theodoridis1,†
1
Department of Informatics, University of Piraeus, Piraeus, Greece
Abstract
Urban demand forecasting plays a critical role in optimizing routing, dispatching, and congestion management within Intelligent
Transportation Systems. By leveraging data fusion and analytics techniques, traffic density estimation serves as a key intermediate
measure for identifying and predicting emerging demand patterns. In this paper, we propose two gradient boosting model variations,
one for classification and one for regression, both capable of generating demand forecasts at various temporal horizons, from 5 minutes
up to one hour. Our approach effectively integrates spatial and temporal features, enabling accurate predictions that are essential for
improving the efficiency of shared (micro-)mobility services. To evaluate the effectiveness of our approach, we utilize open shared
mobility data derived from e-scooters and e-bikes networks in two Dutch metropolitan areas. These real-world datasets enable us
to validate our approach and demonstrate its effectiveness in capturing the complexities of modern urban mobility. Ultimately, our
methodology offers novel insights on urban micro-mobility management, helping to tackle the challenges arising from rapid urbanization
and thus, contributing to more sustainable, efficient, and liveable cities.
Keywords
Gradient Boosting, Demand Forecasting, Spatial/Temporal Features, Open shared mobility data, E-scooters/E-bikes, Urban micro-mobility
management, Intelligent Transportation Systems
1. Introduction nificant computational demands, limiting their real-world
applicability. [10]. While optimization techniques such as
Urban shared mobility, also known as Mobility-as-a-Service compression and pruning can reduce model size and infer-
(MaaS) [1], integrates transportation modes, such as pub- ence time, they often come at the cost of decreased predictive
lic transit, micromobility services (e.g., bike- and scooter- accuracy.
sharing), and commute-based models (e.g., carpooling). The This trade-off between complexity, accuracy, and effi-
need for accurate mobility pattern forecasting is growing ciency underscores the need for alternative methodologies.
rapidly as real-time data analytics in Intelligent Transporta- In this work, we propose a novel approach based on gradient-
tion Systems (ITS) help alleviate congestion, reduce travel boosted trees, a class of models known for their superior
times, and enhance road safety in increasingly complex ur- performance on structured tabular data [11]. Our method
ban environments. In shared mobility services, including effectively forecasts micro-mobility demand across spatial
ride-hailing and bike-sharing, predicting demand across and temporal dimensions while maintaining computational
spatial and temporal dimensions is essential for efficient re- efficiency, making it suitable for large-scale deployment.
source allocation, reduced waiting times, and optimized fleet In our approach, we provide a robust feature extraction
deployment. This insight supports smart city initiatives by pipeline that can capture both spatial and temporal depen-
informing human-centric urban infrastructure design and dencies and integrate them into our model. We also present
facilitating sustainable development through data-driven de- a gradient boosting ML algorithm capable of predicting the
cisions in areas, such as energy consumption, public transit demand in a given area, either in the form of levels (e.g.,
planning, and emergency services. ’Low’, ’Medium’, ’High’) through a classifier, or in absolute
Several Machine Learning (ML) approaches have been demand values through a regressor. The performance of
proposed for detecting and forecasting spatio-temporal pat- our model is evaluated using two real-world micro-mobility
terns in timeseries, including Long Short-Term Memory datasets and the results turn out to be quite promising since
(LSTM) networks [2], Graph Neural Networks (GNNs) [3, 4], the our approach can effectively adapt to the uniqueness of
and Diffusion-based Models [5, 6]. However, these models each area and adequately model its intricacies.
often suffer from high computational complexity due to the The rest of this paper is structured as follows: Section 2
intricate nature of spatio-temporal data, which combines reviews related work, Section 3 introduces our forecasting
spatial correlations with temporal dependencies [7]. methodology, including the feature extraction process and
A common representation for such data is the Spatio- training configuration. Section 4 describes the experimental
Temporal Graph (STG), where nodes denote spatial locations setup and presents the results of both predictive models,
and edges encode relationships over time. Capturing both offering detailed performance metrics. Finally, Section 5
spatial and temporal dependencies requires advanced archi- concludes the paper with a summary of the key contribu-
tectures, such as Spatio-Temporal Graph Neural Networks tions and directions for future research.
(STGNNs) [8] or Spatio-Temporal Graph Convolutional Net-
works (STGCNs) [9]. However, these methods impose sig-
2. Related Work
Published in the Proceedings of the Workshops of the EDBT/ICDT 2025
Joint Conference (March 25-28, 2025), Barcelona, Spain A wide variety of modeling approaches have been proposed
*
Corresponding author.
† to tackle the challenges related to spatio-temporal forecast-
These authors contributed equally.
$ atzio@unipi.gr (A. Tziorvas); gstheo@unipi.gr
ing. In this section, we will identify and review previous
(G. S. Theodoropoulos); ytheod@unipi.gr (Y. Theodoridis) approaches in the context of shared mobility demand fore-
0009-0005-0037-6264 (A. Tziorvas); 0000-0003-4547-6646 casting by categorizing them in three groups: (traditional
(G. S. Theodoropoulos); 0000-0003-2589-7881 (Y. Theodoridis) or advanced) ML approaches, deep learning models, and
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
ensemble and hybrid methods, respectively. extended timeseries sequences.
Li et al. [19] proposed a hierarchical framework for proac-
2.1. ML Approaches tive bike redistribution in bike-sharing systems. A bipar-
tite clustering algorithm groups stations, followed by city-
Traditional machine learning (ML) methods provide a sta- wide rental prediction using a Gradient Boosting Regression
ble analytical foundation for structured predictive prob- Tree. Rental proportions and inter-cluster transitions are
lems. Feng et al. [12] developed a predictive model for then estimated via a multi-similarity-based model to predict
bike-sharing demand in Chicago using Poisson regression, station-level rentals and returns. Experiments on real-world
incorporating time, weather, population density, and activ- datasets from New York City and Washington D.C. indi-
ity density as features. Additionally, a Random Forest model cate significant performance gains over baseline models,
was employed to enhance predictive accuracy through en- particularly during periods of atypical demand.
semble learning, aiding in demand-supply optimization and
the identification of potential new station locations. Simi-
2.3. Ensemble and Hybrid Models
larly, Xiao et al. [13] applied clustering techniques to fore-
cast short-term car demand, utilizing an improved k-nearest The concept of ensemble learning, which involves combin-
neighbor (kNN) model for online car-hailing services in ing the predictions of multiple simple models to form a
Hefei City, China. Their results highlight the effectiveness new, more robust model, has been extensively explored in
of clustering methodologies in enhancing demand predic- the literature and has been shown to produce better results
tion accuracy. compared to its individual constituent models. Building on
A notable advancement in bike-share traffic prediction is this idea, Yuming et al. [20] developed a stacking ensemble
the two-step pattern recognition model proposed by Sohrabi learning framework that integrates the predictions of three
& Ermagun [14]. This approach first generates traffic pro- distinct base learners: Random Forest, LightGBM, and Long
files, where bike station traffic is represented as timeseries Short-Term Memory (LSTM) networks. These base learn-
profiles with 𝑡-minute intervals and an overlap length 𝑂. ers were trained in parallel using a common set of input
The kNN algorithm is then applied to identify spatiotem- features, including temporal, spatial, and weather-related
poral similarities, incorporating historical traffic data, tem- variables. The predictions generated by each base learner
poral factors (e.g., weather, weekdays), and spatial char- were then aggregated by a Support Vector Regression (SVR)
acteristics (e.g., socio-demographics, land use, infrastruc- meta-learner that produced the final demand forecast val-
ture). By leveraging weighted Euclidean distance metrics, ues.
this method enables accurate short- and long-term traffic
forecasting at both station and system-wide levels, demon- 2.4. Positioning of Our Work with Respect
strating the efficacy of integrating spatiotemporal features
with ML techniques.
to State-of-the-Art
Smith et al. [15] employ ML to predict bicycle and pedes- Our approach diverges from these approaches in multiple
trian traffic patterns across 20 U.S. metropolitan statistical ways. Firstly, our methodology employs a unified modeling
areas (MSAs). Building on previous efforts by Le et al. [16], workflow that can tackle the problem in two alternative
where stepwise linear regression models were developed ways, by either regression or classification, allowing us to
based on Census-derived, neighborhood-level covariates, adjust to specific needs. Additionally, our feature engineer-
this study incorporates traffic counts aggregated from 4,145 ing pipeline integrates both spatial and temporal features
locations alongside novel street-level data, including Google which are fed to a tuned gradient boosting model, enabling
Street View imagery and Point of Interest (PoI) data. The us to better identify the intricacies of urban micro-mobility.
authors evaluate several ML regressors, including linear, Lastly, by utilizing two real-world datasets for evaluation,
ridge, lasso, bagging, gradient boosting, and random forest, our method can be adequately validated under different sce-
reporting significant improvements in predictive accuracy narios, ensuring it is effective and robust. Nevertheless, as
over traditional linear regression, particularly when using part of our future research, we plan to align related work
augmented datasets. with our approach for a fair comparison under the same
settings over the same real datasets.
2.2. Deep Learning Models
Ge et al. [17] introduced a Self-Attention ConvLSTM (SA- 3. Our Approach for Efficient Shared
ConvLSTM) model to enhance the accuracy of ConvLSTM Micro-Mobility Demand
for online car-hailing demand forecasting. By converting
car-hailing trajectories into grid-based images, the model Forecasting
incorporates a self-attention module to capture long-range
spatiotemporal dependencies, leveraging pairwise similarity In this section, we define the shared mobility demand fore-
scores across input and memory positions. casting problem and present our proposed approach.
Luo et al. [18] proposed a Spatial-Temporal Diffusion
Convolutional Network (ST-DCN) to address limitations 3.1. Problem Definition
in modeling dynamic spatiotemporal dependencies for taxi
Generally speaking, a spatio-temporal timeseries is a se-
demand forecasting. Their approach integrates a two-phase
quence of data points, where each one is associated with
graph diffusion convolutional network with an attention
a time index and a spatial location. For the purposes of
mechanism to model spatial dependencies, while a temporal
our work, an area of interest (a city, a region, etc.) is par-
convolution module captures long-term trends, including
titioned into a set of geographical districts, where each
recent, daily, and weekly patterns. The use of stacked convo-
district 𝐴 is represented as a polygon. We denote 𝐷𝑡𝐴 as
lution layers further enhances the model’s ability to process
the percentile-encoded demand of micro-mobility vehicles
within district 𝐴 at time 𝑡. Given 𝜏 past observations for
Demnand (nr. of shared vehicles)
700
district 𝐴, we construct its spatio-temporal timeseries as
+1 , . . . , 𝐷𝑡−1 , 𝐷𝑡 }. The problem ad-
600
𝑋𝑡𝐴 = {𝐷𝑡−𝜏 𝐴 𝐴
, 𝐷𝑡−𝜏 𝐴 𝐴
dressed in this paper is to predict the micro-mobility demand 500
value 𝐷𝑡+𝑚𝐴
for district 𝐴, 𝑚 time steps ahead, given 𝑋𝑡𝐴 400
i.e., the current along with the past 𝜏 observations.
300
We will exemplify the above definition through a real
case extracted from our real-world dataset (dataset details 200
Mon 12:00 Tue 12:00 Wed 12:00 Thu 12:00 Fri 12:00 Sat 12:00 Sun 12:00
Time of the week
are presented in Section 4). As illustrated in Figure 1, the
problem at hand is to predict (as accurately as possible) the Figure 2: Average hourly demand across the week in Rotterdam
anticipated demand values of each district’s demand time- Centrum. The coloring indicates the relative demand intensity
series (in orange) given the current and some past demand and, along with the peaks, aid in highlighting daily and weekly
values (in blue), taking into account relevant (temporal, usage patterns.
spatial, context) features. To efficiently achieve this goal,
seasonal etc. patterns hidden in these timeseries should be
disclosed. For instance, looking at the timeseries of two modeled as an adjacency graph that allows us to extract
major districts in Rotterdam that are displayed in Figure 1, the neighbors of each district. Two districts are defined as
we identify seasonal commuting patterns: the highest and neighbors if they share a common boundary1 . The demands
most spread peak occurs on Friday, due to a combination of of the target as well as of the neighboring districts are repre-
leisure and work time. This is made more evident in Figure sented by the respective spatiotemporal timeseries (step 2),
2, which showcases the combination of the hour-of-day and as defined in Section 3.1, which are fed into a feature extrac-
day-of-week features in Rotterdam Centrum district. tion pipeline (step 3), to be detailed in Section 3.3. Finally,
853
all these features are combined into a single, augmented
dataset (step 4) to be used for the purposes of training our
674
two models, the classifier and the regressor (step 5).
496
318
140
3.3. Feature Extraction
2024-09 2024-10 2024-11 2024-12 2025-01
1755 Given a timeseries 𝑋𝑡𝐴 of a target district 𝐴, feature extrac-
1357 tion is performed to enhance the spatio-temporal modeling
959 capabilities of the proposed method. The extracted fea-
561 tures are categorized into four primary groups: time-related,
163
lagged, rolling, and exponentially weighted features. These
2024-09 2024-10 2024-11 2024-12 2025-01 features are derived not only from the target district but also
Timestamp
from its neighboring districts, incorporating both spatial
Figure 1: Two indicative timeseries for two regions of Rotterdam: and temporal dependencies.
Delfshaven (top) and Rotterdam Centrum (bottom). The x-axis Time-related capture temporal patterns influencing the
denotes time, while the y-axis indicates the measured demand target variable. Key attributes include the hour of the day
values. (0-23) to identify daily patterns like peak periods, and the
day of the week (0-6, with 0 as Monday) to capture weekly
trends. Monthly variations are represented by the month
3.2. Methodology (0-11, with 0 as January), accounting for seasonal behaviors.
Additionally, the minute component (0-59) highlights finer
We propose the so-called Gradient Boosting Demand Predic- time dependencies and intra-hour variations.
tor, in two variations, differentiated by their output type: The second category of features, denoted to as lagged
• A 𝑁 -class classifier (per horizon per district) features, is designed to capture the temporal dependencies
• A regressor (per horizon per district) between past observations of the timeseries and its current
state. For each district, a set of lagged features is generated
The purpose of the classifier is to classify the future by systematically shifting the original timeseries observa-
demand of a district into predefined levels (e.g., ’Low’, tions backward in time. Formally, given a timeseries 𝑋𝑡𝐴
’Medium’, ’High’), whereas the regressor aims to predict representing the observed values of a target district A at
the actual demand values for the respective horizon. In time 𝑡, the lagged feature at a temporal offset 𝜏 is defined as
other words, for an area of interest partitioned into 𝑀 dis- 𝐴
𝑋𝑡−𝜏 . A range of temporal offsets is selected to ensure that
tricts with the goal of 𝑇 individual prediction horizons, both both short-term and long-term dependencies are captured.
architectures require training 𝑇 × 𝑀 models, where each Specifically, the lagged features are computed at intervals
model predicts the output for a specific district-horizon pair. of 1, 5, 10, and 15 minutes, capturing short-term temporal
The reasoning behind training individual models per district dependencies across multiple time horizons.
is to allow each model to specialize in its respective spatial
region, focusing exclusively on its unique spatial and tem-
1
poral patterns. Nevertheless, in our future work we plan to The inclusion of neighboring districts is justified by the limited spatial
assess the efficiency of single global model for all districts range and temporal duration of trips usually made by e-bikes and e-
scooters. Due to their physical constraints, micro-mobility vehicles are
(where the district ID will be given as input to the model). unlikely to reach distant districts within short timeframes, making the
As illustrated in Figure 3, the initial step of our method- consideration of neighboring districts only being a sufficient choice
ology is the partitioning of the area of interest into districts, for effective spatio-temporal modeling.
Figure 3: Illustration of our Gradient Boosting Density Predictor proposed methodology
Table 1
Summary table of the extracted features used in our approach
Category Feature Name Description
Base target Target variable for the current district at the present time step
neighbor_* Feature variable of each of the target’s neighbors in neighboring districts
Time-Related hour-of-day Hour of the day of the timestamp (0-23)
minute-of-hour Minute of the hour of the timestamp (0-59)
day-of-week Day of the week of the timestamp (0=Monday, 6=Sunday)
month-of-year Month of the year of the timestamp (0=January, 11=December)
Lagged neighbor_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for neighboring districts
target_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for the current district
Rolling neighbor_rolling_mean_{5, 10, 15} Rolling mean of neighbors over the last {5, 10, 15} observations
target_rolling_mean_{5, 10, 15} Rolling mean of the target variable over the last {5, 10, 15} observations
EWMA neighbor_ewm_{5, 10, 15} Exponentially weighted moving average of neighbors with a span of {5, 10, 15}
target_ewm_{5, 10, 15} Exponentially weighted moving average of the target variable with a span of {5, 10, 15}
This shifting operation effectively creates additional fea- evolving patterns, including sudden variations induced by
ture representations of the past states of the system, allow- external factors such as weather conditions, road incidents,
ing the model to learn historical patterns that influence or public events, while maintaining a stable representation
future predictions. By incorporating lagged features into of long-term trends.
the predictive framework, the approach enhances its ability The features used in our methodology are summarized
to capture recurrent patterns, trends, and autocorrelations in Table 1.
inherent in spatio-temporal mobility data.
The rolling features provide a smoothed representation
of the data over various intervals, capturing localized fluc- 4. Experimental Study
tuations that are critical for accurate forecasting. These
This section describes the data and preprocessing steps used
features are computed using sliding windows, where sta-
in our experimental study (Section 4.1) as well as our find-
tistical measures are aggregated over a sequence of recent
ings (Section 4.2). The experiments were conducted on a
time steps to encapsulate short-term trends and variability
server with an AMD Epyc 64-Core CPU and an Nvidia A100
in mobility patterns. For each district, a rolling window of
GPU with 40GB of memory. Our source code is publicly
varying sizes—ranging from 5 to 15 minutes in 5-minute
available for reproducibility purposes.2
increments—is applied to the timeseries data. Within each
window, the mean is computed to capture central tendency
and, over larger windows, the degree of short-term variabil- 4.1. Experimental Setup
ity. In our study, we utilize shared micro-mobility data provided
Finally, the objective of the exponentially weighted fea- by Deelfiets Nederland, a popular micro-mobility service
tures is to prioritize recent observations while retaining provider in Netherlands, through an open API3 . Data con-
information from historical data. This is achieved through sists of real-time GPS positions for vehicles in the Nether-
the application of exponentially weighted techniques, which lands, which we query every 60 seconds, creating a dataset
assign progressively smaller weights to older data points, of sufficient granularity for our use-case. The dataset in-
ensuring that the model places greater emphasis on more cludes the latitude, longitude, vehicle type and company
recent trends. Among these methods, the Exponential name for each vehicle. For the purposes of this study, we
Weighted Moving Average (EWMA) is particularly effec- utilize mobility data from two major metropolitan areas,
tive, as it assigns exponentially decreasing weights to past Amsterdam and Rotterdam, focusing specifically on the ten
observations, thereby capturing temporal dependencies in most densely populated districts in each city. These districts
a manner that balances adaptability and historical context. exhibit diverse urban characteristics and distinct spatio-
For a given timeseries 𝑋𝑡 , the EWMA at time 𝑡 is computed temporal patterns, making them well-suited for evaluating
recursively as 𝐸𝑊 𝑀 𝐴𝑡 = 𝛼𝑋𝑡 + (1 − 𝛼)𝐸𝑊 𝑀 𝐴𝑡−1 the model’s adaptability to localized mobility trends. The
where 𝛼 represents the smoothing factor, typically defined
as 𝛼 = 𝑁2+1 for a given window size 𝑁 . This formula- 2
https://github.com/DataStories-UniPi/Shared-Mobility.git
tion allows the predictive framework to swiftly adapt to 3
https://api.deelfietsdashboard.nl/dashboard-api/public/vehicles_in_
public_space
temporal coverage of our data is about 2 months (November specific prediction for a given time horizon. It turns out that
11, 2024 - January 15, 2025), and 5 months (August 06, 2024 training of our models is extremely fast since it takes 1-2
- January 15, 2025), respectively. Since the original data seconds or even less to be performed, while the inference
source is a stream instead of a static dataset, we manually time is impressively in the order of 1 microsecond. This
append the temporal indicator in each subsequent request. analysis showcases the computational efficiency of the mod-
To derive demand estimates at the district level, a spa- els, providing insights into their scalability and suitability
tial filtering and aggregation procedure is applied. First, a for real-world spatio-temporal forecasting tasks.
point-polygon intersection is performed to associate each
mobility trace with its corresponding district boundaries, Table 2
discarding traces that fall outside the predefined study ar- Training and inference times of the two models across different
eas. Next, the filtered traces are aggregated based on their horizons.
unique spatio-temporal identifiers (i.e., timestamp and dis- City Model Prediction Horizon Training Time Inference Time
(min) (sec) (µsec)
trict ID), summing the occurrences within each region to 5 1.17 ± 0.43 0.99 ± 0.15
obtain demand estimates. This transformation preserves the Regressor
15 0.35 ± 1.04 1.07 ± 0.30
30 0.22 ± 0.97 0.97 ± 0.09
spatial and temporal integrity of the data while ensuring 60 0.47 ± 1.10 1.10 ± 0.24
Rotterdam
compatibility with the forecasting models. In its final form, 5 2.21 ± 1.26 1.11 ± 0.17
15 1.10 ± 1.15 1.15 ± 0.21
the dataset consists of 89,022 observations across 97 districts Classifier
30 1.00 ± 1.10 1.10 ± 0.18
in Amsterdam and 218,325 observations across 22 districts 60 1.07 ± 1.19 1.19 ± 0.16
in Rotterdam. 5 0.68 ± 0.22 0.95 ± 0.12
15 1.21 ± 0.95 0.95 ± 0.18
To validate our models, we use a 70-30 train-test temporal Regressor
30 0.28 ± 0.94 0.94 ± 0.13
split validation method. To ensure effective training of the Amsterdam
60 0.29 ± 0.92 0.92 ± 0.08
5 1.40 ± 0.69 1.01 ± 0.13
classifier, additional preprocessing steps were taken apart 15 0.81 ± 1.10 1.10 ± 0.26
Classifier
from the feature extraction process discovered above. To 30
60
1.22 ± 1.07
1.07 ± 1.02
1.07 ± 0.28
1.02 ± 0.21
this extent, a quantile-based discretization function was
employed to transform continuous density values into 3
Tables 3-6 present the predictive performance of the pro-
discrete demand levels (Low, Medium, High, as they will be
posed spatio-temporal forecasting models evaluated at four
defined in the following paragraph). This transformation
prediction horizons: 5, 15, 30, and 60 minutes. The qual-
enhances the classifier’s ability to model class distinctions
ity metrics we used include 𝑅2 , RMSE, sMAPE and MASE
and mitigates the challenges associated with imbalanced
for the regressor and F1-score, Accuracy, Recall and Pre-
class distributions.
cision for the classifier. These metrics collectively assess
We define ’Medium’ demand as the range where demand
the model’s predictive accuracy, robustness, and relative
levels are centered around 50% of the value of the day with
performance across districts with varying population sizes
the highest demand, which serves as the baseline value.
and temporal dynamics.
Specifically, for a given threshold 𝑑 ∈ (0, 50), the range
Aggregated metrics are added to summarize the overall
[50 − 𝑑, 50 + 𝑑) is categorized as normal demand. Demand
performance across all districts, offering insights into the
levels below (above) this range are classified as ’Low’ (’High’,
model’s general effectiveness and consistency across vary-
respectively) demand. In our experiments, we set 𝑑 = 17, in
ing prediction horizons. It is evident that both architectures
order for the demand levels to correspond to three equally
perform really well at both short-term (5-15 min.) and long-
sized ranges. This threshold provides a balanced distribution
term (30-60 min.) forecasting. From the summary statistics
of data across the defined demand categories, facilitating a
we can deduce that both models consistently perform well
robust analysis of varying demand patterns.
for the vast majority of the districts. Indicatively, the regres-
The optimization framework employed in this study in-
sor’s 𝑅2 ranges on the average from 0.94 down to 0.84 as
corporates Bayesian Optimization due to its ability to bal-
we move from lower to higher prediction horizons. As for
ance exploration and exploitation effectively. This approach
the classifier, 𝐹 1-score ranges on the average from 0.89 to
facilitates the exploration of the search space for better solu-
0.80, respectively.
tions while simultaneously exploiting regions with high po-
tential, thereby reducing the number of iterations required
for optimization. Its suitability is particularly evident when 4.3. A Note on Feature Importance
dealing with extensive experimental evaluations, such as In order to assess the effect of the underlying features in the
the computationally intensive training of ML models, as quality of prediction of either the classifier or the regressor,
it leverages prior knowledge to guide sampling and opti- we present the average feature importance across all pre-
mize resources efficiently. Additionally, the probabilistic diction horizons for both models. The feature importance
framework of Bayesian Optimization enables the automatic values were computed using XGBoost’s internal feature
adjustment of the exploration-exploitation trade-off through importance computation algorithm. Given the minimal con-
its acquisition functions. tributions of some features (importance values below 10−4 ),
the y-axis of the corresponding plots has been log-scaled
4.2. Experimental Results to improve visualization. The results of this analysis are
grouped by city and illustrated in Figures 4 and 5.
In this section, we assess the overall performance of our
In both figures we can see that the average contribution
models across multiple districts and timestamps. Table 2
of the features related to the target is higher than those of
summarizes the training and inference times of our mod-
the neighbors, as expected. Overall, in decreasing order, the
els across different prediction horizons in the two cities
most important features appear to be the target_ewm_*, the
of interest. Training time corresponds to the overall time
target_rolling_mean_*, and the target_lag_*, denoting the
required to train the respective model, whereas inference
EWMA, the rolling mean, and the lagged value, respectively,
time corresponds to the average time taken to generate a
Table 3
Predictive performance of the regressor in Rotterdam across different prediction horizons.
5 min 15 min 30 min 60 min
District Avg Pop
R2 RMSE MAE sMAPE R2 RMSE MAE sMAPE R2 RMSE MAE sMAPE R2 RMSE MAE sMAPE
Kralingen-Crooswijk 507 0.97 9.70 6.94 0.01 0.96 11.00 7.86 0.02 0.95 13.17 9.81 0.02 0.90 18.11 13.45 0.03
Rotterdam Centrum 502 0.99 15.25 11.18 0.02 0.98 20.33 14.77 0.02 0.97 22.95 17.37 0.03 0.93 34.66 26.62 0.04
Hillegersberg-Schiebroek 476 0.80 17.36 12.67 0.02 0.76 18.82 14.06 0.03 0.73 20.03 15.20 0.03 0.68 21.79 16.78 0.03
Delfshaven 347 0.98 6.43 4.70 0.01 0.97 8.03 5.75 0.02 0.96 10.13 7.27 0.02 0.92 13.78 9.97 0.03
Feijenoord 347 0.97 6.31 4.61 0.01 0.92 10.09 7.72 0.02 0.94 9.29 7.00 0.02 0.92 10.31 7.63 0.02
Noord 307 0.88 15.69 10.11 0.03 0.85 17.49 11.61 0.03 0.81 19.76 13.61 0.04 0.73 23.96 16.87 0.04
Prins Alexander 303 0.99 6.04 4.37 0.01 0.99 5.91 4.26 0.01 0.98 6.76 4.92 0.01 0.98 7.99 5.87 0.02
Charlois 242 0.97 5.24 3.84 0.02 0.97 5.93 4.42 0.02 0.96 6.89 5.20 0.02 0.94 7.84 5.92 0.03
IJsselmonde 184 0.94 5.25 3.25 0.02 0.90 6.58 4.18 0.03 0.89 7.11 4.50 0.03 0.84 8.37 5.57 0.04
Spaanse Polder 36 0.90 2.18 1.36 0.03 0.69 3.88 2.18 0.05 0.87 2.53 1.76 0.04 0.54 4.75 3.43 0.07
0.94 8.94 6.30 0.02 0.90 10.80 7.68 0.02 0.90 11.86 8.66 0.03 0.84 15.16 11.21 0.03
Average (𝜇 ± 𝜎) ± 0.06 ± 5.29 ± 3.77 ± 0.01 ± 0.10 ± 5.98 ± 4.41 ± 0.01 ± 0.08 ± 6.86 ± 5.17 ± 0.01 ± 0.14 ± 9.40 ± 7.22 ± 0.02
Table 4
Predictive performance of the classifier in Rotterdam across different prediction horizons.
5 min 15 min 30 min 60 min
District Avg Pop
F1 Acc. Prec. Rec. F1 Acc. Prec. Rec. F1 Acc. Prec. Rec. F1 Acc. Prec. Rec.
Kralingen-Crooswijk 507 0.92 0.92 0.92 0.92 0.90 0.90 0.90 0.90 0.89 0.88 0.88 0.88 0.85 0.85 0.85 0.85
Rotterdam Centrum 502 0.91 0.99 0.96 0.87 0.87 0.98 0.94 0.80 0.83 0.98 0.92 0.78 0.75 0.97 0.84 0.70
Hillegersberg-Schiebroek 476 0.75 0.95 0.82 0.71 0.73 0.94 0.79 0.70 0.71 0.94 0.77 0.67 0.67 0.93 0.76 0.63
Delfshaven 347 0.91 0.96 0.91 0.91 0.89 0.95 0.90 0.89 0.86 0.93 0.86 0.87 0.80 0.90 0.81 0.80
Feijenoord 347 0.88 0.95 0.90 0.87 0.87 0.94 0.89 0.85 0.84 0.93 0.88 0.81 0.74 0.90 0.82 0.71
Noord 307 0.82 0.99 0.85 0.80 0.71 0.98 0.74 0.69 0.55 0.91 0.55 0.63 0.59 0.97 0.62 0.58
Prins Alexander 303 0.97 0.98 0.97 0.97 0.97 0.98 0.97 0.97 0.96 0.98 0.96 0.96 0.96 0.97 0.96 0.95
Charlois 242 0.91 0.91 0.92 0.91 0.90 0.90 0.90 0.90 0.89 0.89 0.89 0.89 0.87 0.87 0.87 0.87
IJsselmonde 184 0.92 0.97 0.92 0.92 0.91 0.97 0.91 0.91 0.90 0.97 0.90 0.91 0.89 0.97 0.89 0.90
Spaanse Polder 36 0.95 0.99 0.98 0.92 0.90 0.98 0.94 0.87 0.93 0.98 0.96 0.90 0.91 0.98 0.95 0.88
0.89 0.96 0.91 0.88 0.87 0.95 0.89 0.85 0.84 0.94 0.86 0.83 0.80 0.93 0.84 0.79
Average (𝜇 ± 𝜎) ± 0.06 ± 0.03 ± 0.05 ± 0.07 ± 0.08 ± 0.03 ± 0.07 ± 0.09 ± 0.12 ± 0.04 ± 0.12 ± 0.11 ± 0.11 ± 0.05 ± 0.10 ± 0.12
of the target variable. Focusing on the time-related features 5. Conclusions
only, hour-of-day and day-of-week show the relatively high-
est importance in both cases, underlying the presence of This study investigated spatio-temporal forecasting meth-
seasonal patterns. ods for density prediction in urban environments. By inte-
grating spatial dependencies with temporal trends, the pro-
Table 5
Predictive performance of the regressor in Amsterdam across different prediction horizons.
5 min 15 min 30 min 60 min
District Avg. Pop
R2 RMSE MAE sMAPE R2 RMSE MAE sMAPE R2 RMSE MAE sMAPE R2 RMSE MAE sMAPE
Oostelijk Havengebied 110 0.97 2.34 1.83 0.02 0.96 2.75 2.16 0.02 0.95 3.24 2.53 0.02 0.92 4.16 3.24 0.03
Buitenveldert-West 87 0.97 2.12 1.57 0.02 0.95 2.66 1.96 0.02 0.93 3.28 2.40 0.02 0.87 4.37 3.17 0.03
Burgwallen-Nieuwe Zijde 72 0.98 1.91 1.41 0.02 0.96 2.55 1.91 0.03 0.93 3.38 2.58 0.03 0.89 4.35 3.35 0.04
Jordaan 71 0.98 2.31 1.78 0.02 0.94 4.04 2.75 0.03 0.91 4.67 3.42 0.04 0.85 6.14 4.66 0.05
Scheldebuurt 71 0.98 1.56 1.19 0.01 0.96 2.06 1.57 0.02 0.94 2.66 2.02 0.02 0.89 3.50 2.64 0.03
Middenmeer 66 0.96 1.94 1.46 0.02 0.94 2.28 1.73 0.02 0.90 2.89 2.20 0.03 0.86 3.55 2.71 0.03
IJburg West 63 0.93 1.62 1.28 0.02 0.91 1.83 1.45 0.02 0.87 2.15 1.70 0.02 0.79 2.76 2.20 0.03
Westelijk Havengebied 63 0.88 3.72 1.60 0.03 0.85 4.05 1.75 0.03 0.83 4.28 1.87 0.04 0.78 4.74 2.20 0.04
Landlust 59 0.99 1.42 1.07 0.02 0.97 1.96 1.51 0.03 0.96 2.60 2.01 0.04 0.92 3.49 2.70 0.05
Nieuwmarkt/Lastage 58 0.92 2.82 1.94 0.02 0.88 3.34 2.37 0.03 0.76 4.71 3.35 0.04 0.70 5.29 3.88 0.05
0.95 2.18 1.51 0.02 0.93 2.75 1.92 0.02 0.90 3.38 2.41 0.03 0.85 4.24 3.08 0.04
Average (𝜇 ± 𝜎) ± 0.04 ± 0.69 ± 0.28 ± 0.01 ± 0.04 ± 0.81 ± 0.41 ± 0.01 ± 0.06 ± 0.89 ± 0.59 ± 0.01 ± 0.07 ± 0.99 ± 0.77 ± 0.01
Table 6
Predictive performance of the classifier in Amsterdam across different prediction horizons.
5 min 15 min 30 min 60 min
District Avg. Pop
F1 Acc. Prec. Rec. F1 Acc. Prec. Rec. F1 Acc. Prec. Rec. F1 Acc. Prec. Rec.
Oostelijk Havengebied 110 0.90 0.92 0.93 0.89 0.89 0.90 0.89 0.88 0.86 0.88 0.87 0.85 0.76 0.78 0.75 0.79
Buitenveldert-West 87 0.92 0.93 0.94 0.91 0.90 0.91 0.92 0.89 0.88 0.89 0.89 0.86 0.83 0.86 0.85 0.82
Burgwallen-Nieuwe Zijde 72 0.94 0.94 0.95 0.94 0.92 0.92 0.92 0.91 0.89 0.89 0.89 0.89 0.85 0.84 0.85 0.84
Jordaan 71 0.91 0.94 0.94 0.89 0.87 0.92 0.89 0.85 0.84 0.90 0.87 0.81 0.82 0.89 0.82 0.82
Scheldebuurt 71 0.91 0.93 0.93 0.90 0.88 0.90 0.90 0.87 0.85 0.88 0.88 0.83 0.79 0.83 0.83 0.77
Middenmeer 66 0.85 0.94 0.90 0.81 0.79 0.91 0.81 0.77 0.73 0.90 0.80 0.70 0.68 0.83 0.69 0.69
IJburg West 63 0.73 0.96 0.81 0.68 0.63 0.95 0.66 0.61 0.57 0.79 0.54 0.73 0.61 0.92 0.60 0.62
Westelijk Havengebied 63 0.91 0.89 0.94 0.90 0.89 0.87 0.93 0.88 0.90 0.89 0.92 0.90 0.87 0.85 0.89 0.86
Landlust 59 0.94 0.94 0.93 0.94 0.93 0.93 0.92 0.93 0.92 0.92 0.91 0.92 0.87 0.87 0.87 0.87
Nieuwmarkt/Lastage 58 0.79 0.94 0.88 0.73 0.69 0.92 0.78 0.65 0.62 0.91 0.65 0.60 0.55 0.88 0.57 0.53
0.88 0.93 0.91 0.86 0.84 0.91 0.86 0.83 0.81 0.88 0.82 0.81 0.76 0.86 0.77 0.76
Average (𝜇 ± 𝜎) ± 0.07 ± 0.02 ± 0.04 ± 0.09 ± 0.10 ± 0.02 ± 0.09 ± 0.11 ± 0.12 ± 0.03 ± 0.13 ± 0.10 ± 0.11 ± 0.04 ± 0.11 ± 0.11
Figure 4: Average feature importance across all 4 prediction horizons of the regressor in Rotterdam (left) and Amsterdam
(right)
Figure 5: Average feature importance across all 4 prediction horizons of the classifier in Rotterdam (left) and Amsterdam
(right)
posed methodology effectively captured localized patterns, This research also unveils opportunities for future explo-
generating accurate forecasts across multiple horizons. Key ration. Specifically, the integration of feature interactions
contributions include a flexible feature engineering pipeline and their contributions to the overall model quality warrant
that incorporates both intra- and inter-district interactions, further investigation. Incorporating external factors, such
alongside the application of efficient gradient boosting ar- as weather conditions or public events, could potentially
chitectures. This approach enhances predictive accuracy enhance predictive performance. Additionally, applying
while minimizing computational overhead, rendering it suit- the proposed methodology to other geographical areas and
able for large-scale forecasting tasks. Experimental results diverse mobility scenarios would help validate its gener-
demonstrated the efficacy of gradient-boosted density pre- alization capabilities and adaptability to varying contexts.
dictors for both regression and classification, exhibiting com- Finally, a comprehensive experimental comparison with re-
petitive performance across varying prediction horizons. lated work under specific settings is planned to facilitate
Although accuracy slightly declines for longer horizons, the fair benchmarking.
model remains robust, underscoring its adaptability and In conclusion, this study showcases the ability of data-
practical applicability in real-world mobility forecasting. driven approaches to effectively tackle spatio-temporal fore-
casting challenges. By leveraging the inherent spatial seg- [10] Z. Wu, S. Pan, et al., A Comprehensive Survey on
mentation of cities into districts, the methodology enables Graph Neural Networks, IEEE Transactions on Neural
the extraction of localized temporal patterns, facilitating Networks and Learning Systems 32 (2019) 4–24. doi:10.
more informed decision-making and contributing to smarter, 1109/TNNLS.2020.2978386.
more efficient urban planning from the perspective of mo- [11] L. Grinsztajn, E. Oyallon, et al., Why do tree-based
bility data science [21]. models still outperform deep learning on typical tab-
ular data?, in: Proceedings of the 36th Interna-
tional Conference on Neural Information Processing
Acknowledgments Systems, (NIPS ’22), 2022, pp. 507–520. doi:https:
//doi.org/10.48550/arXiv.2207.08815.
This work was supported in part by the Horizon Frame-
[12] H. Feng, Predicting the Dynamic Demand of Bike-
work Programme of the European Union under grant agree-
Sharing System in Chicago with Divvy Operation
ment No. 101093051 (EMERALDS; https://www.emeralds-
Data: A Data-Driven approach for bike-sharing de-
horizon.eu/).
mand forecasting, in: Proceedings of the 5th Interna-
tional Conference on E-Commerce, E-Business and E-
Declaration on Generative AI Government, ICEEG ’21, 2021, p. 30–34. doi:10.1145/
3466029.3466035.
During the preparation of this work, the authors used Large [13] Y. Xiao, W. Kong, et al., Short-Term Demand Fore-
Language Models in order to paraphrase and reword. After casting of Urban Online Car-Hailing Based on the
using this tool/service, the authors reviewed and edited K-Nearest Neighbor Model, Sensors 22 (2022). doi:10.
the content as needed and take full responsibility for the 3390/s22239456.
publication’s content. [14] S. Sohrabi, A. Ermagun, Dynamic bike sharing traf-
fic prediction using spatiotemporal pattern detec-
tion, Transportation Research Part D: Transport and
References Environment 90 (2021) 102647. doi:10.1016/j.trd.
2020.102647.
[1] M. N. Mladenović, Mobility as a Service, in: R. Vick-
[15] S. Hankey, W. Zhang, et al., Predicting bicycling and
erman (Ed.), International Encyclopedia of Trans-
walking traffic using street view imagery and destina-
portation, Elsevier, 2021, pp. 12–18. doi:10.1016/
tion data, Transportation Research Part D-transport
B978-0-08-102671-7.10607-4.
and Environment 90 (2021) 102651. doi:10.1016/j.
[2] F. O’Donncha, Y. Hu, et al., A spatio-temporal LSTM
trd.2020.102651.
model to forecast across multiple temporal and spatial
[16] H. T. K. Le, R. Buehler, et al., Correlates of the Built
scales, Ecological Informatics 69 (2022) 101687. doi:10.
Environment and Active Travel: Evidence from 20 US
1016/j.ecoinf.2022.101687.
Metropolitan Areas, Environmental Health Perspec-
[3] B. Yu, H. Yin, et al., Spatio-Temporal Graph Convo-
tives 126 (2018). doi:10.1289/EHP3389.
lutional Networks: A Deep Learning Framework for
[17] H. Ge, S. Li, et al., Self-Attention ConvLSTM for Spa-
Traffic Forecasting, in: Proceedings of the 27th Interna-
tiotemporal Forecasting of Short-Term Online Car-
tional Joint Conference on Artificial Intelligence, 2018,
Hailing Demand, Sustainability 14 (2022). doi:10.
p. 3634–3640. doi:10.24963/ijcai.2018/505.
3390/su14127371.
[4] C. Zheng, X. Fan, et al., GMAN: A Graph Multi-
[18] A. Luo, B. Shangguan, et al., Spatial-Temporal Diffu-
Attention Network for Traffic Prediction, Proceed-
sion Convolutional Network: A Novel Framework for
ings of the AAAI Conference on Artificial Intelligence
Taxi Demand Forecasting, ISPRS Int. J. Geo Inf. 11
34 (2020) 1234–1241. doi:10.1609/aaai.v34i01.
(2022) 193. doi:10.3390/ijgi11030193.
5477.
[19] Y. Li, Y. Zheng, et al., Traffic prediction in a bike-
[5] Y. Yang, M. Jin, et al., A Survey on Diffusion Models
sharing system, in: Proceedings of the 23rd ACM
for Time Series and Spatio-Temporal Data, ArXiv
SIGSPATIAL International Conference on Advances in
abs/2404.18886 (2024). doi:10.48550/arXiv.2404.
Geographic Information Systems, Sigspatial ’15, 2015.
18886.
doi:10.1145/2820783.2820837.
[6] Y. Li, R. Yu, et al., Diffusion Convolutional Recur-
[20] Y. Jin, X. Ye, et al., Demand Forecasting of Online Car-
rent Neural Network: Data-Driven Traffic Forecast-
Hailing With Stacking Ensemble Learning Approach
ing, arXiv: 1707.01926 (2017). doi:10.48550/arXiv.
and Large-Scale Datasets, IEEE Access 8 (2020) 199513–
1707.01926.
199522. doi:10.1109/ACCESS.2020.3034355.
[7] N. Pelekis, Y. Theodoridis, Mobility Data Management
[21] M. Mokbel, M. Sakr, et al., Mobility Data Science:
and Exploration, Springer, New York, NY, 2014. doi:10.
Perspectives and Challenges, ACM Trans. Spatial Al-
1007/978-1-4939-0392-4.
gorithms Syst. 10 (2024). doi:10.1145/3652158.
[8] W. Liao, B. Zeng, et al., Taxi demand forecasting based
on the temporal multimodal information fusion graph
neural network, Applied Intelligence 52 (2022) 12077–
12090. doi:10.1007/s10489-021-03128-1.
[9] X. Geng, Y. Li, et al., Spatiotemporal Multi-Graph Con-
volution Network for Ride-Hailing Demand Forecast-
ing, Proceedings of the AAAI Conference on Artificial
Intelligence 33 (2019) 3656–3663. doi:10.1609/aaai.
v33i01.33013656.