=Paper=
{{Paper
|id=Vol-3946/BMDA_paper4
|storemode=property
|title=Shared Micro-mobility Demand Forecasting using Gradient Boosting methods
|pdfUrl=https://ceur-ws.org/Vol-3946/BMDA-4.pdf
|volume=Vol-3946
|authors=Antonios Tziorvas,George S. Theodoropoulos,Yannis Theodoridis
}}
==Shared Micro-mobility Demand Forecasting using Gradient Boosting methods==
<pdf width="1500px">https://ceur-ws.org/Vol-3946/BMDA-4.pdf</pdf>
<pre>
                         Shared Micro-mobility Demand Forecasting using Gradient
                         Boosting methods
                         Antonios Tziorvas1,* , George S. Theodoropoulos1,† and Yannis Theodoridis1,†
                         1
                             Department of Informatics, University of Piraeus, Piraeus, Greece


                                            Abstract
                                            Urban demand forecasting plays a critical role in optimizing routing, dispatching, and congestion management within Intelligent
                                            Transportation Systems. By leveraging data fusion and analytics techniques, traffic density estimation serves as a key intermediate
                                            measure for identifying and predicting emerging demand patterns. In this paper, we propose two gradient boosting model variations,
                                            one for classification and one for regression, both capable of generating demand forecasts at various temporal horizons, from 5 minutes
                                            up to one hour. Our approach effectively integrates spatial and temporal features, enabling accurate predictions that are essential for
                                            improving the efficiency of shared (micro-)mobility services. To evaluate the effectiveness of our approach, we utilize open shared
                                            mobility data derived from e-scooters and e-bikes networks in two Dutch metropolitan areas. These real-world datasets enable us
                                            to validate our approach and demonstrate its effectiveness in capturing the complexities of modern urban mobility. Ultimately, our
                                            methodology offers novel insights on urban micro-mobility management, helping to tackle the challenges arising from rapid urbanization
                                            and thus, contributing to more sustainable, efficient, and liveable cities.

                                            Keywords
                                            Gradient Boosting, Demand Forecasting, Spatial/Temporal Features, Open shared mobility data, E-scooters/E-bikes, Urban micro-mobility
                                            management, Intelligent Transportation Systems


                         1. Introduction                                                                                               nificant computational demands, limiting their real-world
                                                                                                                                       applicability. [10]. While optimization techniques such as
                         Urban shared mobility, also known as Mobility-as-a-Service                                                    compression and pruning can reduce model size and infer-
                         (MaaS) [1], integrates transportation modes, such as pub-                                                     ence time, they often come at the cost of decreased predictive
                         lic transit, micromobility services (e.g., bike- and scooter-                                                 accuracy.
                         sharing), and commute-based models (e.g., carpooling). The                                                       This trade-off between complexity, accuracy, and effi-
                         need for accurate mobility pattern forecasting is growing                                                     ciency underscores the need for alternative methodologies.
                         rapidly as real-time data analytics in Intelligent Transporta-                                                In this work, we propose a novel approach based on gradient-
                         tion Systems (ITS) help alleviate congestion, reduce travel                                                   boosted trees, a class of models known for their superior
                         times, and enhance road safety in increasingly complex ur-                                                    performance on structured tabular data [11]. Our method
                         ban environments. In shared mobility services, including                                                      effectively forecasts micro-mobility demand across spatial
                         ride-hailing and bike-sharing, predicting demand across                                                       and temporal dimensions while maintaining computational
                         spatial and temporal dimensions is essential for efficient re-                                                efficiency, making it suitable for large-scale deployment.
                         source allocation, reduced waiting times, and optimized fleet                                                    In our approach, we provide a robust feature extraction
                         deployment. This insight supports smart city initiatives by                                                   pipeline that can capture both spatial and temporal depen-
                         informing human-centric urban infrastructure design and                                                       dencies and integrate them into our model. We also present
                         facilitating sustainable development through data-driven de-                                                  a gradient boosting ML algorithm capable of predicting the
                         cisions in areas, such as energy consumption, public transit                                                  demand in a given area, either in the form of levels (e.g.,
                         planning, and emergency services.                                                                             ’Low’, ’Medium’, ’High’) through a classifier, or in absolute
                            Several Machine Learning (ML) approaches have been                                                         demand values through a regressor. The performance of
                         proposed for detecting and forecasting spatio-temporal pat-                                                   our model is evaluated using two real-world micro-mobility
                         terns in timeseries, including Long Short-Term Memory                                                         datasets and the results turn out to be quite promising since
                         (LSTM) networks [2], Graph Neural Networks (GNNs) [3, 4],                                                     the our approach can effectively adapt to the uniqueness of
                         and Diffusion-based Models [5, 6]. However, these models                                                      each area and adequately model its intricacies.
                         often suffer from high computational complexity due to the                                                       The rest of this paper is structured as follows: Section 2
                         intricate nature of spatio-temporal data, which combines                                                      reviews related work, Section 3 introduces our forecasting
                         spatial correlations with temporal dependencies [7].                                                          methodology, including the feature extraction process and
                            A common representation for such data is the Spatio-                                                       training configuration. Section 4 describes the experimental
                         Temporal Graph (STG), where nodes denote spatial locations                                                    setup and presents the results of both predictive models,
                         and edges encode relationships over time. Capturing both                                                      offering detailed performance metrics. Finally, Section 5
                         spatial and temporal dependencies requires advanced archi-                                                    concludes the paper with a summary of the key contribu-
                         tectures, such as Spatio-Temporal Graph Neural Networks                                                       tions and directions for future research.
                         (STGNNs) [8] or Spatio-Temporal Graph Convolutional Net-
                         works (STGCNs) [9]. However, these methods impose sig-
                                                                                                                                       2. Related Work
                         Published in the Proceedings of the Workshops of the EDBT/ICDT 2025
                         Joint Conference (March 25-28, 2025), Barcelona, Spain                                                        A wide variety of modeling approaches have been proposed
                         *
                           Corresponding author.
                         †                                                                                                             to tackle the challenges related to spatio-temporal forecast-
                           These authors contributed equally.
                         $ atzio@unipi.gr (A. Tziorvas); gstheo@unipi.gr
                                                                                                                                       ing. In this section, we will identify and review previous
                         (G. S. Theodoropoulos); ytheod@unipi.gr (Y. Theodoridis)                                                      approaches in the context of shared mobility demand fore-
                          0009-0005-0037-6264 (A. Tziorvas); 0000-0003-4547-6646                                                      casting by categorizing them in three groups: (traditional
                         (G. S. Theodoropoulos); 0000-0003-2589-7881 (Y. Theodoridis)                                                  or advanced) ML approaches, deep learning models, and
                                        © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                        Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
ensemble and hybrid methods, respectively.                        extended timeseries sequences.
                                                                     Li et al. [19] proposed a hierarchical framework for proac-
2.1. ML Approaches                                                tive bike redistribution in bike-sharing systems. A bipar-
                                                                  tite clustering algorithm groups stations, followed by city-
Traditional machine learning (ML) methods provide a sta-          wide rental prediction using a Gradient Boosting Regression
ble analytical foundation for structured predictive prob-         Tree. Rental proportions and inter-cluster transitions are
lems. Feng et al. [12] developed a predictive model for           then estimated via a multi-similarity-based model to predict
bike-sharing demand in Chicago using Poisson regression,          station-level rentals and returns. Experiments on real-world
incorporating time, weather, population density, and activ-       datasets from New York City and Washington D.C. indi-
ity density as features. Additionally, a Random Forest model      cate significant performance gains over baseline models,
was employed to enhance predictive accuracy through en-           particularly during periods of atypical demand.
semble learning, aiding in demand-supply optimization and
the identification of potential new station locations. Simi-
                                                                  2.3. Ensemble and Hybrid Models
larly, Xiao et al. [13] applied clustering techniques to fore-
cast short-term car demand, utilizing an improved k-nearest       The concept of ensemble learning, which involves combin-
neighbor (kNN) model for online car-hailing services in           ing the predictions of multiple simple models to form a
Hefei City, China. Their results highlight the effectiveness      new, more robust model, has been extensively explored in
of clustering methodologies in enhancing demand predic-           the literature and has been shown to produce better results
tion accuracy.                                                    compared to its individual constituent models. Building on
   A notable advancement in bike-share traffic prediction is      this idea, Yuming et al. [20] developed a stacking ensemble
the two-step pattern recognition model proposed by Sohrabi        learning framework that integrates the predictions of three
& Ermagun [14]. This approach first generates traffic pro-        distinct base learners: Random Forest, LightGBM, and Long
files, where bike station traffic is represented as timeseries    Short-Term Memory (LSTM) networks. These base learn-
profiles with 𝑡-minute intervals and an overlap length 𝑂.         ers were trained in parallel using a common set of input
The kNN algorithm is then applied to identify spatiotem-          features, including temporal, spatial, and weather-related
poral similarities, incorporating historical traffic data, tem-   variables. The predictions generated by each base learner
poral factors (e.g., weather, weekdays), and spatial char-        were then aggregated by a Support Vector Regression (SVR)
acteristics (e.g., socio-demographics, land use, infrastruc-      meta-learner that produced the final demand forecast val-
ture). By leveraging weighted Euclidean distance metrics,         ues.
this method enables accurate short- and long-term traffic
forecasting at both station and system-wide levels, demon-        2.4. Positioning of Our Work with Respect
strating the efficacy of integrating spatiotemporal features
with ML techniques.
                                                                       to State-of-the-Art
   Smith et al. [15] employ ML to predict bicycle and pedes-      Our approach diverges from these approaches in multiple
trian traffic patterns across 20 U.S. metropolitan statistical    ways. Firstly, our methodology employs a unified modeling
areas (MSAs). Building on previous efforts by Le et al. [16],     workflow that can tackle the problem in two alternative
where stepwise linear regression models were developed            ways, by either regression or classification, allowing us to
based on Census-derived, neighborhood-level covariates,           adjust to specific needs. Additionally, our feature engineer-
this study incorporates traffic counts aggregated from 4,145      ing pipeline integrates both spatial and temporal features
locations alongside novel street-level data, including Google     which are fed to a tuned gradient boosting model, enabling
Street View imagery and Point of Interest (PoI) data. The         us to better identify the intricacies of urban micro-mobility.
authors evaluate several ML regressors, including linear,         Lastly, by utilizing two real-world datasets for evaluation,
ridge, lasso, bagging, gradient boosting, and random forest,      our method can be adequately validated under different sce-
reporting significant improvements in predictive accuracy         narios, ensuring it is effective and robust. Nevertheless, as
over traditional linear regression, particularly when using       part of our future research, we plan to align related work
augmented datasets.                                               with our approach for a fair comparison under the same
                                                                  settings over the same real datasets.
2.2. Deep Learning Models
Ge et al. [17] introduced a Self-Attention ConvLSTM (SA-          3. Our Approach for Efficient Shared
ConvLSTM) model to enhance the accuracy of ConvLSTM                  Micro-Mobility Demand
for online car-hailing demand forecasting. By converting
car-hailing trajectories into grid-based images, the model           Forecasting
incorporates a self-attention module to capture long-range
spatiotemporal dependencies, leveraging pairwise similarity       In this section, we define the shared mobility demand fore-
scores across input and memory positions.                         casting problem and present our proposed approach.
   Luo et al. [18] proposed a Spatial-Temporal Diffusion
Convolutional Network (ST-DCN) to address limitations             3.1. Problem Definition
in modeling dynamic spatiotemporal dependencies for taxi
                                                                  Generally speaking, a spatio-temporal timeseries is a se-
demand forecasting. Their approach integrates a two-phase
                                                                  quence of data points, where each one is associated with
graph diffusion convolutional network with an attention
                                                                  a time index and a spatial location. For the purposes of
mechanism to model spatial dependencies, while a temporal
                                                                  our work, an area of interest (a city, a region, etc.) is par-
convolution module captures long-term trends, including
                                                                  titioned into a set of geographical districts, where each
recent, daily, and weekly patterns. The use of stacked convo-
                                                                  district 𝐴 is represented as a polygon. We denote 𝐷𝑡𝐴 as
lution layers further enhances the model’s ability to process
the percentile-encoded demand of micro-mobility vehicles
within district 𝐴 at time 𝑡. Given 𝜏 past observations for


                                                                    Demnand (nr. of shared vehicles)
                                                                                                       700
district 𝐴, we construct its spatio-temporal timeseries as
                        +1 , . . . , 𝐷𝑡−1 , 𝐷𝑡 }. The problem ad-
                                                                                                       600
𝑋𝑡𝐴 = {𝐷𝑡−𝜏  𝐴      𝐴
                , 𝐷𝑡−𝜏                𝐴      𝐴

dressed in this paper is to predict the micro-mobility demand                                          500

value 𝐷𝑡+𝑚𝐴
               for district 𝐴, 𝑚 time steps ahead, given 𝑋𝑡𝐴                                           400
i.e., the current along with the past 𝜏 observations.
                                                                                                       300
   We will exemplify the above definition through a real
case extracted from our real-world dataset (dataset details                                            200
                                                                                                             Mon   12:00   Tue   12:00   Wed   12:00    Thu   12:00   Fri   12:00   Sat   12:00   Sun   12:00
                                                                                                                                                       Time of the week
are presented in Section 4). As illustrated in Figure 1, the
problem at hand is to predict (as accurately as possible) the       Figure 2: Average hourly demand across the week in Rotterdam
anticipated demand values of each district’s demand time-           Centrum. The coloring indicates the relative demand intensity
series (in orange) given the current and some past demand           and, along with the peaks, aid in highlighting daily and weekly
values (in blue), taking into account relevant (temporal,           usage patterns.
spatial, context) features. To efficiently achieve this goal,
seasonal etc. patterns hidden in these timeseries should be
disclosed. For instance, looking at the timeseries of two           modeled as an adjacency graph that allows us to extract
major districts in Rotterdam that are displayed in Figure 1,        the neighbors of each district. Two districts are defined as
we identify seasonal commuting patterns: the highest and            neighbors if they share a common boundary1 . The demands
most spread peak occurs on Friday, due to a combination of          of the target as well as of the neighboring districts are repre-
leisure and work time. This is made more evident in Figure          sented by the respective spatiotemporal timeseries (step 2),
2, which showcases the combination of the hour-of-day and           as defined in Section 3.1, which are fed into a feature extrac-
day-of-week features in Rotterdam Centrum district.                 tion pipeline (step 3), to be detailed in Section 3.3. Finally,
 853
                                                                    all these features are combined into a single, augmented
                                                                    dataset (step 4) to be used for the purposes of training our
 674
                                                                    two models, the classifier and the regressor (step 5).
 496

 318

 140
                                                                    3.3. Feature Extraction
            2024-09   2024-10        2024-11   2024-12   2025-01
1755                                                                Given a timeseries 𝑋𝑡𝐴 of a target district 𝐴, feature extrac-
1357                                                                tion is performed to enhance the spatio-temporal modeling
 959                                                                capabilities of the proposed method. The extracted fea-
 561                                                                tures are categorized into four primary groups: time-related,
 163
                                                                    lagged, rolling, and exponentially weighted features. These
            2024-09   2024-10        2024-11   2024-12   2025-01    features are derived not only from the target district but also
                                Timestamp
                                                                    from its neighboring districts, incorporating both spatial
Figure 1: Two indicative timeseries for two regions of Rotterdam:   and temporal dependencies.
Delfshaven (top) and Rotterdam Centrum (bottom). The x-axis            Time-related capture temporal patterns influencing the
denotes time, while the y-axis indicates the measured demand        target variable. Key attributes include the hour of the day
values.                                                             (0-23) to identify daily patterns like peak periods, and the
                                                                    day of the week (0-6, with 0 as Monday) to capture weekly
                                                                    trends. Monthly variations are represented by the month
3.2. Methodology                                                    (0-11, with 0 as January), accounting for seasonal behaviors.
                                                                    Additionally, the minute component (0-59) highlights finer
We propose the so-called Gradient Boosting Demand Predic-           time dependencies and intra-hour variations.
tor, in two variations, differentiated by their output type:           The second category of features, denoted to as lagged
       • A 𝑁 -class classifier (per horizon per district)           features, is designed to capture the temporal dependencies
       • A regressor (per horizon per district)                     between past observations of the timeseries and its current
                                                                    state. For each district, a set of lagged features is generated
   The purpose of the classifier is to classify the future          by systematically shifting the original timeseries observa-
demand of a district into predefined levels (e.g., ’Low’,           tions backward in time. Formally, given a timeseries 𝑋𝑡𝐴
’Medium’, ’High’), whereas the regressor aims to predict            representing the observed values of a target district A at
the actual demand values for the respective horizon. In             time 𝑡, the lagged feature at a temporal offset 𝜏 is defined as
other words, for an area of interest partitioned into 𝑀 dis-           𝐴
                                                                    𝑋𝑡−𝜏   . A range of temporal offsets is selected to ensure that
tricts with the goal of 𝑇 individual prediction horizons, both      both short-term and long-term dependencies are captured.
architectures require training 𝑇 × 𝑀 models, where each             Specifically, the lagged features are computed at intervals
model predicts the output for a specific district-horizon pair.     of 1, 5, 10, and 15 minutes, capturing short-term temporal
The reasoning behind training individual models per district        dependencies across multiple time horizons.
is to allow each model to specialize in its respective spatial
region, focusing exclusively on its unique spatial and tem-
                                                                    1
poral patterns. Nevertheless, in our future work we plan to                   The inclusion of neighboring districts is justified by the limited spatial
assess the efficiency of single global model for all districts                range and temporal duration of trips usually made by e-bikes and e-
                                                                              scooters. Due to their physical constraints, micro-mobility vehicles are
(where the district ID will be given as input to the model).                  unlikely to reach distant districts within short timeframes, making the
   As illustrated in Figure 3, the initial step of our method-                consideration of neighboring districts only being a sufficient choice
ology is the partitioning of the area of interest into districts,             for effective spatio-temporal modeling.
            Figure 3: Illustration of our Gradient Boosting Density Predictor proposed methodology


       Table 1
       Summary table of the extracted features used in our approach
Category         Feature Name                        Description
Base             target                              Target variable for the current district at the present time step
                 neighbor_*                          Feature variable of each of the target’s neighbors in neighboring districts
Time-Related     hour-of-day                         Hour of the day of the timestamp (0-23)
                 minute-of-hour                      Minute of the hour of the timestamp (0-59)
                 day-of-week                         Day of the week of the timestamp (0=Monday, 6=Sunday)
                 month-of-year                       Month of the year of the timestamp (0=January, 11=December)
Lagged           neighbor_lag_{1, 5, 10, 15}         Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for neighboring districts
                 target_lag_{1, 5, 10, 15}           Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for the current district
Rolling          neighbor_rolling_mean_{5, 10, 15}   Rolling mean of neighbors over the last {5, 10, 15} observations
                 target_rolling_mean_{5, 10, 15}     Rolling mean of the target variable over the last {5, 10, 15} observations
EWMA             neighbor_ewm_{5, 10, 15}            Exponentially weighted moving average of neighbors with a span of {5, 10, 15}
                 target_ewm_{5, 10, 15}              Exponentially weighted moving average of the target variable with a span of {5, 10, 15}


   This shifting operation effectively creates additional fea-            evolving patterns, including sudden variations induced by
ture representations of the past states of the system, allow-             external factors such as weather conditions, road incidents,
ing the model to learn historical patterns that influence                 or public events, while maintaining a stable representation
future predictions. By incorporating lagged features into                 of long-term trends.
the predictive framework, the approach enhances its ability                  The features used in our methodology are summarized
to capture recurrent patterns, trends, and autocorrelations               in Table 1.
inherent in spatio-temporal mobility data.
   The rolling features provide a smoothed representation
of the data over various intervals, capturing localized fluc-             4. Experimental Study
tuations that are critical for accurate forecasting. These
                                                                          This section describes the data and preprocessing steps used
features are computed using sliding windows, where sta-
                                                                          in our experimental study (Section 4.1) as well as our find-
tistical measures are aggregated over a sequence of recent
                                                                          ings (Section 4.2). The experiments were conducted on a
time steps to encapsulate short-term trends and variability
                                                                          server with an AMD Epyc 64-Core CPU and an Nvidia A100
in mobility patterns. For each district, a rolling window of
                                                                          GPU with 40GB of memory. Our source code is publicly
varying sizes—ranging from 5 to 15 minutes in 5-minute
                                                                          available for reproducibility purposes.2
increments—is applied to the timeseries data. Within each
window, the mean is computed to capture central tendency
and, over larger windows, the degree of short-term variabil-              4.1. Experimental Setup
ity.                                                                      In our study, we utilize shared micro-mobility data provided
   Finally, the objective of the exponentially weighted fea-              by Deelfiets Nederland, a popular micro-mobility service
tures is to prioritize recent observations while retaining                provider in Netherlands, through an open API3 . Data con-
information from historical data. This is achieved through                sists of real-time GPS positions for vehicles in the Nether-
the application of exponentially weighted techniques, which               lands, which we query every 60 seconds, creating a dataset
assign progressively smaller weights to older data points,                of sufficient granularity for our use-case. The dataset in-
ensuring that the model places greater emphasis on more                   cludes the latitude, longitude, vehicle type and company
recent trends. Among these methods, the Exponential                       name for each vehicle. For the purposes of this study, we
Weighted Moving Average (EWMA) is particularly effec-                     utilize mobility data from two major metropolitan areas,
tive, as it assigns exponentially decreasing weights to past              Amsterdam and Rotterdam, focusing specifically on the ten
observations, thereby capturing temporal dependencies in                  most densely populated districts in each city. These districts
a manner that balances adaptability and historical context.               exhibit diverse urban characteristics and distinct spatio-
For a given timeseries 𝑋𝑡 , the EWMA at time 𝑡 is computed                temporal patterns, making them well-suited for evaluating
recursively as 𝐸𝑊 𝑀 𝐴𝑡 = 𝛼𝑋𝑡 + (1 − 𝛼)𝐸𝑊 𝑀 𝐴𝑡−1                           the model’s adaptability to localized mobility trends. The
where 𝛼 represents the smoothing factor, typically defined
as 𝛼 = 𝑁2+1 for a given window size 𝑁 . This formula-                     2
                                                                              https://github.com/DataStories-UniPi/Shared-Mobility.git
tion allows the predictive framework to swiftly adapt to                  3
                                                                              https://api.deelfietsdashboard.nl/dashboard-api/public/vehicles_in_
                                                                              public_space
temporal coverage of our data is about 2 months (November          specific prediction for a given time horizon. It turns out that
11, 2024 - January 15, 2025), and 5 months (August 06, 2024        training of our models is extremely fast since it takes 1-2
- January 15, 2025), respectively. Since the original data         seconds or even less to be performed, while the inference
source is a stream instead of a static dataset, we manually        time is impressively in the order of 1 microsecond. This
append the temporal indicator in each subsequent request.          analysis showcases the computational efficiency of the mod-
   To derive demand estimates at the district level, a spa-        els, providing insights into their scalability and suitability
tial filtering and aggregation procedure is applied. First, a      for real-world spatio-temporal forecasting tasks.
point-polygon intersection is performed to associate each
mobility trace with its corresponding district boundaries,         Table 2
discarding traces that fall outside the predefined study ar-       Training and inference times of the two models across different
eas. Next, the filtered traces are aggregated based on their       horizons.
unique spatio-temporal identifiers (i.e., timestamp and dis-       City       Model     Prediction Horizon Training Time Inference Time
                                                                                               (min)            (sec)         (µsec)
trict ID), summing the occurrences within each region to                                          5          1.17 ± 0.43    0.99 ± 0.15
obtain demand estimates. This transformation preserves the                   Regressor
                                                                                                 15          0.35 ± 1.04    1.07 ± 0.30
                                                                                                 30          0.22 ± 0.97    0.97 ± 0.09
spatial and temporal integrity of the data while ensuring                                        60          0.47 ± 1.10    1.10 ± 0.24
                                                                   Rotterdam
compatibility with the forecasting models. In its final form,                                     5          2.21 ± 1.26    1.11 ± 0.17
                                                                                                 15          1.10 ± 1.15    1.15 ± 0.21
the dataset consists of 89,022 observations across 97 districts              Classifier
                                                                                                 30          1.00 ± 1.10    1.10 ± 0.18
in Amsterdam and 218,325 observations across 22 districts                                        60          1.07 ± 1.19    1.19 ± 0.16
in Rotterdam.                                                                                     5          0.68 ± 0.22    0.95 ± 0.12
                                                                                                 15          1.21 ± 0.95    0.95 ± 0.18
   To validate our models, we use a 70-30 train-test temporal                Regressor
                                                                                                 30          0.28 ± 0.94    0.94 ± 0.13
split validation method. To ensure effective training of the       Amsterdam
                                                                                                 60          0.29 ± 0.92    0.92 ± 0.08
                                                                                                  5          1.40 ± 0.69    1.01 ± 0.13
classifier, additional preprocessing steps were taken apart                                      15          0.81 ± 1.10    1.10 ± 0.26
                                                                             Classifier
from the feature extraction process discovered above. To                                         30
                                                                                                 60
                                                                                                             1.22 ± 1.07
                                                                                                             1.07 ± 1.02
                                                                                                                            1.07 ± 0.28
                                                                                                                            1.02 ± 0.21
this extent, a quantile-based discretization function was
employed to transform continuous density values into 3
                                                                      Tables 3-6 present the predictive performance of the pro-
discrete demand levels (Low, Medium, High, as they will be
                                                                   posed spatio-temporal forecasting models evaluated at four
defined in the following paragraph). This transformation
                                                                   prediction horizons: 5, 15, 30, and 60 minutes. The qual-
enhances the classifier’s ability to model class distinctions
                                                                   ity metrics we used include 𝑅2 , RMSE, sMAPE and MASE
and mitigates the challenges associated with imbalanced
                                                                   for the regressor and F1-score, Accuracy, Recall and Pre-
class distributions.
                                                                   cision for the classifier. These metrics collectively assess
   We define ’Medium’ demand as the range where demand
                                                                   the model’s predictive accuracy, robustness, and relative
levels are centered around 50% of the value of the day with
                                                                   performance across districts with varying population sizes
the highest demand, which serves as the baseline value.
                                                                   and temporal dynamics.
Specifically, for a given threshold 𝑑 ∈ (0, 50), the range
                                                                      Aggregated metrics are added to summarize the overall
[50 − 𝑑, 50 + 𝑑) is categorized as normal demand. Demand
                                                                   performance across all districts, offering insights into the
levels below (above) this range are classified as ’Low’ (’High’,
                                                                   model’s general effectiveness and consistency across vary-
respectively) demand. In our experiments, we set 𝑑 = 17, in
                                                                   ing prediction horizons. It is evident that both architectures
order for the demand levels to correspond to three equally
                                                                   perform really well at both short-term (5-15 min.) and long-
sized ranges. This threshold provides a balanced distribution
                                                                   term (30-60 min.) forecasting. From the summary statistics
of data across the defined demand categories, facilitating a
                                                                   we can deduce that both models consistently perform well
robust analysis of varying demand patterns.
                                                                   for the vast majority of the districts. Indicatively, the regres-
   The optimization framework employed in this study in-
                                                                   sor’s 𝑅2 ranges on the average from 0.94 down to 0.84 as
corporates Bayesian Optimization due to its ability to bal-
                                                                   we move from lower to higher prediction horizons. As for
ance exploration and exploitation effectively. This approach
                                                                   the classifier, 𝐹 1-score ranges on the average from 0.89 to
facilitates the exploration of the search space for better solu-
                                                                   0.80, respectively.
tions while simultaneously exploiting regions with high po-
tential, thereby reducing the number of iterations required
for optimization. Its suitability is particularly evident when     4.3. A Note on Feature Importance
dealing with extensive experimental evaluations, such as           In order to assess the effect of the underlying features in the
the computationally intensive training of ML models, as            quality of prediction of either the classifier or the regressor,
it leverages prior knowledge to guide sampling and opti-           we present the average feature importance across all pre-
mize resources efficiently. Additionally, the probabilistic        diction horizons for both models. The feature importance
framework of Bayesian Optimization enables the automatic           values were computed using XGBoost’s internal feature
adjustment of the exploration-exploitation trade-off through       importance computation algorithm. Given the minimal con-
its acquisition functions.                                         tributions of some features (importance values below 10−4 ),
                                                                   the y-axis of the corresponding plots has been log-scaled
4.2. Experimental Results                                          to improve visualization. The results of this analysis are
                                                                   grouped by city and illustrated in Figures 4 and 5.
In this section, we assess the overall performance of our
                                                                      In both figures we can see that the average contribution
models across multiple districts and timestamps. Table 2
                                                                   of the features related to the target is higher than those of
summarizes the training and inference times of our mod-
                                                                   the neighbors, as expected. Overall, in decreasing order, the
els across different prediction horizons in the two cities
                                                                   most important features appear to be the target_ewm_*, the
of interest. Training time corresponds to the overall time
                                                                   target_rolling_mean_*, and the target_lag_*, denoting the
required to train the respective model, whereas inference
                                                                   EWMA, the rolling mean, and the lagged value, respectively,
time corresponds to the average time taken to generate a
      Table 3
      Predictive performance of the regressor in Rotterdam across different prediction horizons.
                                                     5 min                                   15 min                                30 min                               60 min
District                   Avg Pop
                                       R2        RMSE       MAE      sMAPE       R2      RMSE      MAE       sMAPE     R2      RMSE     MAE       sMAPE     R2      RMSE     MAE      sMAPE
Kralingen-Crooswijk          507       0.97       9.70      6.94       0.01      0.96    11.00      7.86      0.02     0.95    13.17     9.81      0.02     0.90    18.11    13.45     0.03
Rotterdam Centrum            502       0.99      15.25     11.18       0.02      0.98    20.33     14.77      0.02     0.97    22.95    17.37      0.03     0.93    34.66    26.62     0.04
Hillegersberg-Schiebroek     476       0.80      17.36     12.67       0.02      0.76    18.82     14.06      0.03     0.73    20.03    15.20      0.03     0.68    21.79    16.78     0.03
Delfshaven                   347       0.98       6.43      4.70       0.01      0.97     8.03      5.75      0.02     0.96    10.13     7.27      0.02     0.92    13.78     9.97     0.03
Feijenoord                   347       0.97       6.31      4.61       0.01      0.92    10.09      7.72      0.02     0.94     9.29     7.00      0.02     0.92    10.31     7.63     0.02
Noord                        307       0.88      15.69     10.11       0.03      0.85    17.49     11.61      0.03     0.81    19.76    13.61      0.04     0.73    23.96    16.87     0.04
Prins Alexander              303       0.99       6.04      4.37       0.01      0.99     5.91      4.26      0.01     0.98     6.76     4.92      0.01     0.98     7.99     5.87     0.02
Charlois                     242       0.97       5.24      3.84       0.02      0.97     5.93      4.42      0.02     0.96     6.89     5.20      0.02     0.94     7.84     5.92     0.03
Ĳsselmonde                   184       0.94       5.25      3.25       0.02      0.90     6.58      4.18      0.03     0.89     7.11     4.50      0.03     0.84     8.37     5.57     0.04
Spaanse Polder                36       0.90       2.18      1.36       0.03      0.69     3.88      2.18      0.05     0.87     2.53     1.76      0.04     0.54     4.75     3.43     0.07
                                       0.94       8.94      6.30       0.02      0.90    10.80      7.68      0.02     0.90    11.86     8.66      0.03     0.84    15.16    11.21     0.03
Average (𝜇 ± 𝜎)                       ± 0.06     ± 5.29    ± 3.77     ± 0.01    ± 0.10   ± 5.98    ± 4.41    ± 0.01   ± 0.08   ± 6.86   ± 5.17    ± 0.01   ± 0.14   ± 9.40   ± 7.22   ± 0.02


      Table 4
      Predictive performance of the classifier in Rotterdam across different prediction horizons.
                                                          5 min                               15 min                               30 min                               60 min
District                   Avg Pop
                                         F1        Acc.      Prec.      Rec.       F1      Acc.      Prec.     Rec.     F1      Acc.     Prec.     Rec.      F1      Acc.     Prec.    Rec.
Kralingen-Crooswijk           507       0.92   0.92   0.92   0.92   0.90   0.90   0.90   0.90   0.89   0.88   0.88   0.88   0.85   0.85   0.85   0.85
Rotterdam Centrum             502       0.91   0.99   0.96   0.87   0.87   0.98   0.94   0.80   0.83   0.98   0.92   0.78   0.75   0.97   0.84   0.70
Hillegersberg-Schiebroek      476       0.75   0.95   0.82   0.71   0.73   0.94   0.79   0.70   0.71   0.94   0.77   0.67   0.67   0.93   0.76   0.63
Delfshaven                    347       0.91   0.96   0.91   0.91   0.89   0.95   0.90   0.89   0.86   0.93   0.86   0.87   0.80   0.90   0.81   0.80
Feijenoord                    347       0.88   0.95   0.90   0.87   0.87   0.94   0.89   0.85   0.84   0.93   0.88   0.81   0.74   0.90   0.82   0.71
Noord                         307       0.82   0.99   0.85   0.80   0.71   0.98   0.74   0.69   0.55   0.91   0.55   0.63   0.59   0.97   0.62   0.58
Prins Alexander               303       0.97   0.98   0.97   0.97   0.97   0.98   0.97   0.97   0.96   0.98   0.96   0.96   0.96   0.97   0.96   0.95
Charlois                      242       0.91   0.91   0.92   0.91   0.90   0.90   0.90   0.90   0.89   0.89   0.89   0.89   0.87   0.87   0.87   0.87
IJsselmonde                   184       0.92   0.97   0.92   0.92   0.91   0.97   0.91   0.91   0.90   0.97   0.90   0.91   0.89   0.97   0.89   0.90
Spaanse Polder                 36       0.95   0.99   0.98   0.92   0.90   0.98   0.94   0.87   0.93   0.98   0.96   0.90   0.91   0.98   0.95   0.88
                                        0.89   0.96   0.91   0.88   0.87   0.95   0.89   0.85   0.84   0.94   0.86   0.83   0.80   0.93   0.84   0.79
Average (𝜇 ± 𝜎)                        ± 0.06 ± 0.03 ± 0.05 ± 0.07 ± 0.08 ± 0.03 ± 0.07 ± 0.09 ± 0.12 ± 0.04 ± 0.12 ± 0.11 ± 0.11 ± 0.05 ± 0.10 ± 0.12


of the target variable. Focusing on the time-related features                                         5. Conclusions
only, hour-of-day and day-of-week show the relatively high-
est importance in both cases, underlying the presence of                                              This study investigated spatio-temporal forecasting meth-
seasonal patterns.                                                                                    ods for density prediction in urban environments. By inte-
                                                                                                      grating spatial dependencies with temporal trends, the pro-


      Table 5
      Predictive performance of the regressor in Amsterdam across different prediction horizons.
                                                      5 min                                  15 min                                30 min                               60 min
District                   Avg. Pop
                                        R2       RMSE       MAE       sMAPE       R2     RMSE       MAE      sMAPE      R2     RMSE     MAE       sMAPE     R2      RMSE     MAE      sMAPE
Oostelijk Havengebied        110        0.97      2.34       1.83      0.02      0.96     2.75      2.16      0.02     0.95     3.24     2.53      0.02     0.92     4.16     3.24     0.03
Buitenveldert-West            87        0.97      2.12       1.57      0.02      0.95     2.66      1.96      0.02     0.93     3.28     2.40      0.02     0.87     4.37     3.17     0.03
Burgwallen-Nieuwe Zijde       72        0.98      1.91       1.41      0.02      0.96     2.55      1.91      0.03     0.93     3.38     2.58      0.03     0.89     4.35     3.35     0.04
Jordaan                       71        0.98      2.31       1.78      0.02      0.94     4.04      2.75      0.03     0.91     4.67     3.42      0.04     0.85     6.14     4.66     0.05
Scheldebuurt                  71        0.98      1.56       1.19      0.01      0.96     2.06      1.57      0.02     0.94     2.66     2.02      0.02     0.89     3.50     2.64     0.03
Middenmeer                    66        0.96      1.94       1.46      0.02      0.94     2.28      1.73      0.02     0.90     2.89     2.20      0.03     0.86     3.55     2.71     0.03
IJburg West                   63        0.93      1.62       1.28      0.02      0.91     1.83      1.45      0.02     0.87     2.15     1.70      0.02     0.79     2.76     2.20     0.03
Westelijk Havengebied         63        0.88      3.72       1.60      0.03      0.85     4.05      1.75      0.03     0.83     4.28     1.87      0.04     0.78     4.74     2.20     0.04
Landlust                      59        0.99      1.42       1.07      0.02      0.97     1.96      1.51      0.03     0.96     2.60     2.01      0.04     0.92     3.49     2.70     0.05
Nieuwmarkt/Lastage            58        0.92      2.82       1.94      0.02      0.88     3.34      2.37      0.03     0.76     4.71     3.35      0.04     0.70     5.29     3.88     0.05
                                        0.95      2.18       1.51      0.02      0.93     2.75      1.92      0.02     0.90     3.38     2.41      0.03     0.85     4.24     3.08     0.04
Average (𝜇 ± 𝜎)                        ± 0.04    ± 0.69     ± 0.28    ± 0.01    ± 0.04   ± 0.81    ± 0.41    ± 0.01   ± 0.06   ± 0.89   ± 0.59    ± 0.01   ± 0.07   ± 0.99   ± 0.77   ± 0.01


      Table 6
      Predictive performance of the classifier in Amsterdam across different prediction horizons.
                                                          5 min                               15 min                                30 min                              60 min
District                    Avg. Pop
                                            F1      Acc.      Prec.      Rec.      F1       Acc.     Prec.     Rec.      F1      Acc.     Prec.    Rec.      F1      Acc.     Prec.    Rec.
Oostelijk Havengebied         110        0.90   0.92   0.93   0.89   0.89   0.90   0.89   0.88   0.86   0.88   0.87   0.85   0.76   0.78   0.75   0.79
Buitenveldert-West             87        0.92   0.93   0.94   0.91   0.90   0.91   0.92   0.89   0.88   0.89   0.89   0.86   0.83   0.86   0.85   0.82
Burgwallen-Nieuwe Zijde        72        0.94   0.94   0.95   0.94   0.92   0.92   0.92   0.91   0.89   0.89   0.89   0.89   0.85   0.84   0.85   0.84
Jordaan                        71        0.91   0.94   0.94   0.89   0.87   0.92   0.89   0.85   0.84   0.90   0.87   0.81   0.82   0.89   0.82   0.82
Scheldebuurt                   71        0.91   0.93   0.93   0.90   0.88   0.90   0.90   0.87   0.85   0.88   0.88   0.83   0.79   0.83   0.83   0.77
Middenmeer                     66        0.85   0.94   0.90   0.81   0.79   0.91   0.81   0.77   0.73   0.90   0.80   0.70   0.68   0.83   0.69   0.69
IJburg West                    63        0.73   0.96   0.81   0.68   0.63   0.95   0.66   0.61   0.57   0.79   0.54   0.73   0.61   0.92   0.60   0.62
Westelijk Havengebied          63        0.91   0.89   0.94   0.90   0.89   0.87   0.93   0.88   0.90   0.89   0.92   0.90   0.87   0.85   0.89   0.86
Landlust                       59        0.94   0.94   0.93   0.94   0.93   0.93   0.92   0.93   0.92   0.92   0.91   0.92   0.87   0.87   0.87   0.87
Nieuwmarkt/Lastage             58        0.79   0.94   0.88   0.73   0.69   0.92   0.78   0.65   0.62   0.91   0.65   0.60   0.55   0.88   0.57   0.53
                                         0.88   0.93   0.91   0.86   0.84   0.91   0.86   0.83   0.81   0.88   0.82   0.81   0.76   0.86   0.77   0.76
Average (𝜇 ± 𝜎)                         ± 0.07 ± 0.02 ± 0.04 ± 0.09 ± 0.10 ± 0.02 ± 0.09 ± 0.11 ± 0.12 ± 0.03 ± 0.13 ± 0.10 ± 0.11 ± 0.04 ± 0.11 ± 0.11
          Figure 4: Average feature importance across all 4 prediction horizons of the regressor in Rotterdam (left) and Amsterdam
          (right)


          Figure 5: Average feature importance across all 4 prediction horizons of the classifier in Rotterdam (left) and Amsterdam
          (right)


posed methodology effectively captured localized patterns,             This research also unveils opportunities for future explo-
generating accurate forecasts across multiple horizons. Key         ration. Specifically, the integration of feature interactions
contributions include a flexible feature engineering pipeline       and their contributions to the overall model quality warrant
that incorporates both intra- and inter-district interactions,      further investigation. Incorporating external factors, such
alongside the application of efficient gradient boosting ar-        as weather conditions or public events, could potentially
chitectures. This approach enhances predictive accuracy             enhance predictive performance. Additionally, applying
while minimizing computational overhead, rendering it suit-         the proposed methodology to other geographical areas and
able for large-scale forecasting tasks. Experimental results        diverse mobility scenarios would help validate its gener-
demonstrated the efficacy of gradient-boosted density pre-          alization capabilities and adaptability to varying contexts.
dictors for both regression and classification, exhibiting com-     Finally, a comprehensive experimental comparison with re-
petitive performance across varying prediction horizons.            lated work under specific settings is planned to facilitate
Although accuracy slightly declines for longer horizons, the        fair benchmarking.
model remains robust, underscoring its adaptability and                In conclusion, this study showcases the ability of data-
practical applicability in real-world mobility forecasting.         driven approaches to effectively tackle spatio-temporal fore-
casting challenges. By leveraging the inherent spatial seg-      [10] Z. Wu, S. Pan, et al., A Comprehensive Survey on
mentation of cities into districts, the methodology enables           Graph Neural Networks, IEEE Transactions on Neural
the extraction of localized temporal patterns, facilitating           Networks and Learning Systems 32 (2019) 4–24. doi:10.
more informed decision-making and contributing to smarter,            1109/TNNLS.2020.2978386.
more efficient urban planning from the perspective of mo-        [11] L. Grinsztajn, E. Oyallon, et al., Why do tree-based
bility data science [21].                                             models still outperform deep learning on typical tab-
                                                                      ular data?, in: Proceedings of the 36th Interna-
                                                                      tional Conference on Neural Information Processing
Acknowledgments                                                       Systems, (NIPS ’22), 2022, pp. 507–520. doi:https:
                                                                      //doi.org/10.48550/arXiv.2207.08815.
This work was supported in part by the Horizon Frame-
                                                                 [12] H. Feng, Predicting the Dynamic Demand of Bike-
work Programme of the European Union under grant agree-
                                                                      Sharing System in Chicago with Divvy Operation
ment No. 101093051 (EMERALDS; https://www.emeralds-
                                                                      Data: A Data-Driven approach for bike-sharing de-
horizon.eu/).
                                                                      mand forecasting, in: Proceedings of the 5th Interna-
                                                                      tional Conference on E-Commerce, E-Business and E-
Declaration on Generative AI                                          Government, ICEEG ’21, 2021, p. 30–34. doi:10.1145/
                                                                      3466029.3466035.
During the preparation of this work, the authors used Large      [13] Y. Xiao, W. Kong, et al., Short-Term Demand Fore-
Language Models in order to paraphrase and reword. After              casting of Urban Online Car-Hailing Based on the
using this tool/service, the authors reviewed and edited              K-Nearest Neighbor Model, Sensors 22 (2022). doi:10.
the content as needed and take full responsibility for the            3390/s22239456.
publication’s content.                                           [14] S. Sohrabi, A. Ermagun, Dynamic bike sharing traf-
                                                                      fic prediction using spatiotemporal pattern detec-
                                                                      tion, Transportation Research Part D: Transport and
References                                                            Environment 90 (2021) 102647. doi:10.1016/j.trd.
                                                                      2020.102647.
 [1] M. N. Mladenović, Mobility as a Service, in: R. Vick-
                                                                 [15] S. Hankey, W. Zhang, et al., Predicting bicycling and
     erman (Ed.), International Encyclopedia of Trans-
                                                                      walking traffic using street view imagery and destina-
     portation, Elsevier, 2021, pp. 12–18. doi:10.1016/
                                                                      tion data, Transportation Research Part D-transport
     B978-0-08-102671-7.10607-4.
                                                                      and Environment 90 (2021) 102651. doi:10.1016/j.
 [2] F. O’Donncha, Y. Hu, et al., A spatio-temporal LSTM
                                                                      trd.2020.102651.
     model to forecast across multiple temporal and spatial
                                                                 [16] H. T. K. Le, R. Buehler, et al., Correlates of the Built
     scales, Ecological Informatics 69 (2022) 101687. doi:10.
                                                                      Environment and Active Travel: Evidence from 20 US
     1016/j.ecoinf.2022.101687.
                                                                      Metropolitan Areas, Environmental Health Perspec-
 [3] B. Yu, H. Yin, et al., Spatio-Temporal Graph Convo-
                                                                      tives 126 (2018). doi:10.1289/EHP3389.
     lutional Networks: A Deep Learning Framework for
                                                                 [17] H. Ge, S. Li, et al., Self-Attention ConvLSTM for Spa-
     Traffic Forecasting, in: Proceedings of the 27th Interna-
                                                                      tiotemporal Forecasting of Short-Term Online Car-
     tional Joint Conference on Artificial Intelligence, 2018,
                                                                      Hailing Demand, Sustainability 14 (2022). doi:10.
     p. 3634–3640. doi:10.24963/ijcai.2018/505.
                                                                      3390/su14127371.
 [4] C. Zheng, X. Fan, et al., GMAN: A Graph Multi-
                                                                 [18] A. Luo, B. Shangguan, et al., Spatial-Temporal Diffu-
     Attention Network for Traffic Prediction, Proceed-
                                                                      sion Convolutional Network: A Novel Framework for
     ings of the AAAI Conference on Artificial Intelligence
                                                                      Taxi Demand Forecasting, ISPRS Int. J. Geo Inf. 11
     34 (2020) 1234–1241. doi:10.1609/aaai.v34i01.
                                                                      (2022) 193. doi:10.3390/ijgi11030193.
     5477.
                                                                 [19] Y. Li, Y. Zheng, et al., Traffic prediction in a bike-
 [5] Y. Yang, M. Jin, et al., A Survey on Diffusion Models
                                                                      sharing system, in: Proceedings of the 23rd ACM
     for Time Series and Spatio-Temporal Data, ArXiv
                                                                      SIGSPATIAL International Conference on Advances in
     abs/2404.18886 (2024). doi:10.48550/arXiv.2404.
                                                                      Geographic Information Systems, Sigspatial ’15, 2015.
     18886.
                                                                      doi:10.1145/2820783.2820837.
 [6] Y. Li, R. Yu, et al., Diffusion Convolutional Recur-
                                                                 [20] Y. Jin, X. Ye, et al., Demand Forecasting of Online Car-
     rent Neural Network: Data-Driven Traffic Forecast-
                                                                      Hailing With Stacking Ensemble Learning Approach
     ing, arXiv: 1707.01926 (2017). doi:10.48550/arXiv.
                                                                      and Large-Scale Datasets, IEEE Access 8 (2020) 199513–
     1707.01926.
                                                                      199522. doi:10.1109/ACCESS.2020.3034355.
 [7] N. Pelekis, Y. Theodoridis, Mobility Data Management
                                                                 [21] M. Mokbel, M. Sakr, et al., Mobility Data Science:
     and Exploration, Springer, New York, NY, 2014. doi:10.
                                                                      Perspectives and Challenges, ACM Trans. Spatial Al-
     1007/978-1-4939-0392-4.
                                                                      gorithms Syst. 10 (2024). doi:10.1145/3652158.
 [8] W. Liao, B. Zeng, et al., Taxi demand forecasting based
     on the temporal multimodal information fusion graph
     neural network, Applied Intelligence 52 (2022) 12077–
     12090. doi:10.1007/s10489-021-03128-1.
 [9] X. Geng, Y. Li, et al., Spatiotemporal Multi-Graph Con-
     volution Network for Ride-Hailing Demand Forecast-
     ing, Proceedings of the AAAI Conference on Artificial
     Intelligence 33 (2019) 3656–3663. doi:10.1609/aaai.
     v33i01.33013656.

</pre>