<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Shared Micro-mobility Demand Forecasting using Gradient Boosting methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonios Tziorvas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George S. Theodoropoulos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Theodoridis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, University of Piraeus</institution>
          ,
          <addr-line>Piraeus</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Urban demand forecasting plays a critical role in optimizing routing, dispatching, and congestion management within Intelligent Transportation Systems. By leveraging data fusion and analytics techniques, trafic density estimation serves as a key intermediate measure for identifying and predicting emerging demand patterns. In this paper, we propose two gradient boosting model variations, one for classification and one for regression, both capable of generating demand forecasts at various temporal horizons, from 5 minutes up to one hour. Our approach efectively integrates spatial and temporal features, enabling accurate predictions that are essential for improving the eficiency of shared (micro-)mobility services. To evaluate the efectiveness of our approach, we utilize open shared mobility data derived from e-scooters and e-bikes networks in two Dutch metropolitan areas. These real-world datasets enable us to validate our approach and demonstrate its efectiveness in capturing the complexities of modern urban mobility. Ultimately, our methodology ofers novel insights on urban micro-mobility management, helping to tackle the challenges arising from rapid urbanization and thus, contributing to more sustainable, eficient, and liveable cities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Gradient Boosting</kwd>
        <kwd>Demand Forecasting</kwd>
        <kwd>Spatial/Temporal Features</kwd>
        <kwd>Open shared mobility data</kwd>
        <kwd>E-scooters/E-bikes</kwd>
        <kwd>Urban micro-mobility management</kwd>
        <kwd>Intelligent Transportation Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Urban shared mobility, also known as Mobility-as-a-Service
(MaaS) [1], integrates transportation modes, such as
public transit, micromobility services (e.g., bike- and
scootersharing), and commute-based models (e.g., carpooling). The
need for accurate mobility pattern forecasting is growing
rapidly as real-time data analytics in Intelligent
Transportation Systems (ITS) help alleviate congestion, reduce travel
times, and enhance road safety in increasingly complex
urban environments. In shared mobility services, including
ride-hailing and bike-sharing, predicting demand across
spatial and temporal dimensions is essential for eficient
resource allocation, reduced waiting times, and optimized fleet
deployment. This insight supports smart city initiatives by
informing human-centric urban infrastructure design and
facilitating sustainable development through data-driven
decisions in areas, such as energy consumption, public transit
planning, and emergency services.</p>
      <p>Several Machine Learning (ML) approaches have been
proposed for detecting and forecasting spatio-temporal
patterns in timeseries, including Long Short-Term Memory
(LSTM) networks [2], Graph Neural Networks (GNNs) [3, 4],
and Difusion-based Models [ 5, 6]. However, these models
often sufer from high computational complexity due to the
intricate nature of spatio-temporal data, which combines
spatial correlations with temporal dependencies [7].</p>
      <p>A common representation for such data is the
SpatioTemporal Graph (STG), where nodes denote spatial locations
and edges encode relationships over time. Capturing both
spatial and temporal dependencies requires advanced
architectures, such as Spatio-Temporal Graph Neural Networks
(STGNNs) [8] or Spatio-Temporal Graph Convolutional
Networks (STGCNs) [9]. However, these methods impose
significant computational demands, limiting their real-world
applicability. [10]. While optimization techniques such as
compression and pruning can reduce model size and
inference time, they often come at the cost of decreased predictive
accuracy.</p>
      <p>This trade-of between complexity, accuracy, and
eficiency underscores the need for alternative methodologies.
In this work, we propose a novel approach based on
gradientboosted trees, a class of models known for their superior
performance on structured tabular data [11]. Our method
efectively forecasts micro-mobility demand across spatial
and temporal dimensions while maintaining computational
eficiency, making it suitable for large-scale deployment.</p>
      <p>In our approach, we provide a robust feature extraction
pipeline that can capture both spatial and temporal
dependencies and integrate them into our model. We also present
a gradient boosting ML algorithm capable of predicting the
demand in a given area, either in the form of levels (e.g.,
’Low’, ’Medium’, ’High’) through a classifier, or in absolute
demand values through a regressor. The performance of
our model is evaluated using two real-world micro-mobility
datasets and the results turn out to be quite promising since
the our approach can efectively adapt to the uniqueness of
each area and adequately model its intricacies.</p>
      <p>The rest of this paper is structured as follows: Section 2
reviews related work, Section 3 introduces our forecasting
methodology, including the feature extraction process and
training configuration. Section 4 describes the experimental
setup and presents the results of both predictive models,
ofering detailed performance metrics. Finally, Section 5
concludes the paper with a summary of the key
contributions and directions for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>A wide variety of modeling approaches have been proposed
to tackle the challenges related to spatio-temporal
forecasting. In this section, we will identify and review previous
approaches in the context of shared mobility demand
forecasting by categorizing them in three groups: (traditional
or advanced) ML approaches, deep learning models, and
ensemble and hybrid methods, respectively.</p>
      <sec id="sec-2-1">
        <title>2.1. ML Approaches</title>
        <p>Traditional machine learning (ML) methods provide a
stable analytical foundation for structured predictive
problems. Feng et al. [12] developed a predictive model for
bike-sharing demand in Chicago using Poisson regression,
incorporating time, weather, population density, and
activity density as features. Additionally, a Random Forest model
was employed to enhance predictive accuracy through
ensemble learning, aiding in demand-supply optimization and
the identification of potential new station locations.
Similarly, Xiao et al. [13] applied clustering techniques to
forecast short-term car demand, utilizing an improved k-nearest
neighbor (kNN) model for online car-hailing services in
Hefei City, China. Their results highlight the efectiveness
of clustering methodologies in enhancing demand
prediction accuracy.</p>
        <p>A notable advancement in bike-share trafic prediction is
the two-step pattern recognition model proposed by Sohrabi
&amp; Ermagun [14]. This approach first generates trafic
proifles, where bike station trafic is represented as timeseries
profiles with -minute intervals and an overlap length .
The kNN algorithm is then applied to identify
spatiotemporal similarities, incorporating historical trafic data,
temporal factors (e.g., weather, weekdays), and spatial
characteristics (e.g., socio-demographics, land use,
infrastructure). By leveraging weighted Euclidean distance metrics,
this method enables accurate short- and long-term trafic
forecasting at both station and system-wide levels,
demonstrating the eficacy of integrating spatiotemporal features
with ML techniques.</p>
        <p>Smith et al. [15] employ ML to predict bicycle and
pedestrian trafic patterns across 20 U.S. metropolitan statistical
areas (MSAs). Building on previous eforts by Le et al. [ 16],
where stepwise linear regression models were developed
based on Census-derived, neighborhood-level covariates,
this study incorporates trafic counts aggregated from 4,145
locations alongside novel street-level data, including Google
Street View imagery and Point of Interest (PoI) data. The
authors evaluate several ML regressors, including linear,
ridge, lasso, bagging, gradient boosting, and random forest,
reporting significant improvements in predictive accuracy
over traditional linear regression, particularly when using
augmented datasets.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Deep Learning Models</title>
        <p>Ge et al. [17] introduced a Self-Attention ConvLSTM
(SAConvLSTM) model to enhance the accuracy of ConvLSTM
for online car-hailing demand forecasting. By converting
car-hailing trajectories into grid-based images, the model
incorporates a self-attention module to capture long-range
spatiotemporal dependencies, leveraging pairwise similarity
scores across input and memory positions.</p>
        <p>Luo et al. [18] proposed a Spatial-Temporal Difusion
Convolutional Network (ST-DCN) to address limitations
in modeling dynamic spatiotemporal dependencies for taxi
demand forecasting. Their approach integrates a two-phase
graph difusion convolutional network with an attention
mechanism to model spatial dependencies, while a temporal
convolution module captures long-term trends, including
recent, daily, and weekly patterns. The use of stacked
convolution layers further enhances the model’s ability to process
extended timeseries sequences.</p>
        <p>Li et al. [19] proposed a hierarchical framework for
proactive bike redistribution in bike-sharing systems. A
bipartite clustering algorithm groups stations, followed by
citywide rental prediction using a Gradient Boosting Regression
Tree. Rental proportions and inter-cluster transitions are
then estimated via a multi-similarity-based model to predict
station-level rentals and returns. Experiments on real-world
datasets from New York City and Washington D.C.
indicate significant performance gains over baseline models,
particularly during periods of atypical demand.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Ensemble and Hybrid Models</title>
        <p>The concept of ensemble learning, which involves
combining the predictions of multiple simple models to form a
new, more robust model, has been extensively explored in
the literature and has been shown to produce better results
compared to its individual constituent models. Building on
this idea, Yuming et al. [20] developed a stacking ensemble
learning framework that integrates the predictions of three
distinct base learners: Random Forest, LightGBM, and Long
Short-Term Memory (LSTM) networks. These base
learners were trained in parallel using a common set of input
features, including temporal, spatial, and weather-related
variables. The predictions generated by each base learner
were then aggregated by a Support Vector Regression (SVR)
meta-learner that produced the final demand forecast
values.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Positioning of Our Work with Respect to State-of-the-Art</title>
        <p>Our approach diverges from these approaches in multiple
ways. Firstly, our methodology employs a unified modeling
workflow that can tackle the problem in two alternative
ways, by either regression or classification, allowing us to
adjust to specific needs. Additionally, our feature
engineering pipeline integrates both spatial and temporal features
which are fed to a tuned gradient boosting model, enabling
us to better identify the intricacies of urban micro-mobility.
Lastly, by utilizing two real-world datasets for evaluation,
our method can be adequately validated under diferent
scenarios, ensuring it is efective and robust. Nevertheless, as
part of our future research, we plan to align related work
with our approach for a fair comparison under the same
settings over the same real datasets.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Our Approach for Eficient Shared</title>
    </sec>
    <sec id="sec-4">
      <title>Micro-Mobility Demand</title>
    </sec>
    <sec id="sec-5">
      <title>Forecasting</title>
      <p>In this section, we define the shared mobility demand
forecasting problem and present our proposed approach.</p>
      <sec id="sec-5-1">
        <title>3.1. Problem Definition</title>
        <p>Generally speaking, a spatio-temporal timeseries is a
sequence of data points, where each one is associated with
a time index and a spatial location. For the purposes of
our work, an area of interest (a city, a region, etc.) is
partitioned into a set of geographical districts, where each
district  is represented as a polygon. We denote  as
853
674
496
318
the percentile-encoded demand of micro-mobility vehicles
within district  at time . Given  past observations for
district , we construct its spatio-temporal timeseries as
 = {−  , −  +1, . . . , − 1, }. The problem
addressed in this paper is to predict the micro-mobility demand
value + for district ,  time steps ahead, given 
i.e., the current along with the past  observations.</p>
        <p>We will exemplify the above definition through a real
case extracted from our real-world dataset (dataset details
are presented in Section 4). As illustrated in Figure 1, the
problem at hand is to predict (as accurately as possible) the
anticipated demand values of each district’s demand
timeseries (in orange) given the current and some past demand
values (in blue), taking into account relevant (temporal,
spatial, context) features. To eficiently achieve this goal,
seasonal etc. patterns hidden in these timeseries should be
disclosed. For instance, looking at the timeseries of two
major districts in Rotterdam that are displayed in Figure 1,
we identify seasonal commuting patterns: the highest and
most spread peak occurs on Friday, due to a combination of
leisure and work time. This is made more evident in Figure
2, which showcases the combination of the hour-of-day and
day-of-week features in Rotterdam Centrum district.</p>
      </sec>
      <sec id="sec-5-2">
        <title>3.2. Methodology</title>
        <p>We propose the so-called Gradient Boosting Demand
Predictor, in two variations, diferentiated by their output type:
• A  -class classifier (per horizon per district)
• A regressor (per horizon per district)</p>
        <p>The purpose of the classifier is to classify the future
demand of a district into predefined levels (e.g., ’Low’,
’Medium’, ’High’), whereas the regressor aims to predict
the actual demand values for the respective horizon. In
other words, for an area of interest partitioned into 
districts with the goal of  individual prediction horizons, both
architectures require training  ×  models, where each
model predicts the output for a specific district-horizon pair.
The reasoning behind training individual models per district
is to allow each model to specialize in its respective spatial
region, focusing exclusively on its unique spatial and
temporal patterns. Nevertheless, in our future work we plan to
assess the eficiency of single global model for all districts
(where the district ID will be given as input to the model).</p>
        <p>As illustrated in Figure 3, the initial step of our
methodology is the partitioning of the area of interest into districts,
modeled as an adjacency graph that allows us to extract
the neighbors of each district. Two districts are defined as
neighbors if they share a common boundary1. The demands
of the target as well as of the neighboring districts are
represented by the respective spatiotemporal timeseries (step 2),
as defined in Section 3.1, which are fed into a feature
extraction pipeline (step 3), to be detailed in Section 3.3. Finally,
all these features are combined into a single, augmented
dataset (step 4) to be used for the purposes of training our
two models, the classifier and the regressor (step 5).</p>
      </sec>
      <sec id="sec-5-3">
        <title>3.3. Feature Extraction</title>
        <p>Given a timeseries  of a target district , feature
extraction is performed to enhance the spatio-temporal modeling
capabilities of the proposed method. The extracted
features are categorized into four primary groups: time-related,
lagged, rolling, and exponentially weighted features. These
features are derived not only from the target district but also
from its neighboring districts, incorporating both spatial
and temporal dependencies.</p>
        <p>
          Time-related capture temporal patterns influencing the
target variable. Key attributes include the hour of the day
(
          <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref2 ref20 ref21 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-23</xref>
          ) to identify daily patterns like peak periods, and the
day of the week (0-6, with 0 as Monday) to capture weekly
trends. Monthly variations are represented by the month
(0-11, with 0 as January), accounting for seasonal behaviors.
        </p>
        <p>
          Additionally, the minute component (
          <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref2 ref20 ref21 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-59</xref>
          ) highlights finer
time dependencies and intra-hour variations.
        </p>
        <p>The second category of features, denoted to as lagged
features, is designed to capture the temporal dependencies
between past observations of the timeseries and its current
state. For each district, a set of lagged features is generated
by systematically shifting the original timeseries
observations backward in time. Formally, given a timeseries 
representing the observed values of a target district A at
time , the lagged feature at a temporal ofset  is defined as
−  . A range of temporal ofsets is selected to ensure that
both short-term and long-term dependencies are captured.</p>
        <p>
          Specifically, the lagged features are computed at intervals
of 1, 5, 10, and 15 minutes, capturing short-term temporal
dependencies across multiple time horizons.
1The inclusion of neighboring districts is justified by the limited spatial
range and temporal duration of trips usually made by e-bikes and
escooters. Due to their physical constraints, micro-mobility vehicles are
unlikely to reach distant districts within short timeframes, making the
consideration of neighboring districts only being a suficient choice
for efective spatio-temporal modeling.
target Target variable for the current district at the present time step
neighbor_* Feature variable of each of the target’s neighbors in neighboring districts
hour-of-day Hour of the day of the timestamp (
          <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref2 ref20 ref21 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-23</xref>
          )
minute-of-hour Minute of the hour of the timestamp (
          <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref2 ref20 ref21 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-59</xref>
          )
day-of-week Day of the week of the timestamp (0=Monday, 6=Sunday)
month-of-year Month of the year of the timestamp (0=January, 11=December)
neighbor_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for neighboring districts
target_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for the current district
neighbor_rolling_mean_{5, 10, 15} Rolling mean of neighbors over the last {5, 10, 15} observations
target_rolling_mean_{5, 10, 15} Rolling mean of the target variable over the last {5, 10, 15} observations
neighbor_ewm_{5, 10, 15} Exponentially weighted moving average of neighbors with a span of {5, 10, 15}
target_ewm_{5, 10, 15} Exponentially weighted moving average of the target variable with a span of {5, 10, 15}
This shifting operation efectively creates additional
feature representations of the past states of the system,
allowing the model to learn historical patterns that influence
future predictions. By incorporating lagged features into
the predictive framework, the approach enhances its ability
to capture recurrent patterns, trends, and autocorrelations
inherent in spatio-temporal mobility data.
        </p>
        <p>The rolling features provide a smoothed representation
of the data over various intervals, capturing localized
fluctuations that are critical for accurate forecasting. These
features are computed using sliding windows, where
statistical measures are aggregated over a sequence of recent
time steps to encapsulate short-term trends and variability
in mobility patterns. For each district, a rolling window of
varying sizes—ranging from 5 to 15 minutes in 5-minute
increments—is applied to the timeseries data. Within each
window, the mean is computed to capture central tendency
and, over larger windows, the degree of short-term
variability.</p>
        <p>Finally, the objective of the exponentially weighted
features is to prioritize recent observations while retaining
information from historical data. This is achieved through
the application of exponentially weighted techniques, which
assign progressively smaller weights to older data points,
ensuring that the model places greater emphasis on more
recent trends. Among these methods, the Exponential
Weighted Moving Average (EWMA) is particularly
efective, as it assigns exponentially decreasing weights to past
observations, thereby capturing temporal dependencies in
a manner that balances adaptability and historical context.</p>
        <p>For a given timeseries , the EWMA at time  is computed
recursively as    =   + (1 −  )  − 1
where  represents the smoothing factor, typically defined
as  = 2+1 for a given window size  . This
formulation allows the predictive framework to swiftly adapt to
evolving patterns, including sudden variations induced by
external factors such as weather conditions, road incidents,
or public events, while maintaining a stable representation
of long-term trends.</p>
        <p>The features used in our methodology are summarized
in Table 1.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Experimental Study</title>
      <p>This section describes the data and preprocessing steps used
in our experimental study (Section 4.1) as well as our
findings (Section 4.2). The experiments were conducted on a
server with an AMD Epyc 64-Core CPU and an Nvidia A100
GPU with 40GB of memory. Our source code is publicly
available for reproducibility purposes.2</p>
      <sec id="sec-6-1">
        <title>4.1. Experimental Setup</title>
        <p>In our study, we utilize shared micro-mobility data provided
by Deelfiets Nederland, a popular micro-mobility service
provider in Netherlands, through an open API3. Data
consists of real-time GPS positions for vehicles in the
Netherlands, which we query every 60 seconds, creating a dataset
of suficient granularity for our use-case. The dataset
includes the latitude, longitude, vehicle type and company
name for each vehicle. For the purposes of this study, we
utilize mobility data from two major metropolitan areas,
Amsterdam and Rotterdam, focusing specifically on the ten
most densely populated districts in each city. These districts
exhibit diverse urban characteristics and distinct
spatiotemporal patterns, making them well-suited for evaluating
the model’s adaptability to localized mobility trends. The
2https://github.com/DataStories-UniPi/Shared-Mobility.git
3https://api.deelfietsdashboard.nl/dashboard-api/public/vehicles_in_
public_space
temporal coverage of our data is about 2 months (November
11, 2024 - January 15, 2025), and 5 months (August 06, 2024
- January 15, 2025), respectively. Since the original data
source is a stream instead of a static dataset, we manually
append the temporal indicator in each subsequent request.</p>
        <p>To derive demand estimates at the district level, a
spatial filtering and aggregation procedure is applied. First, a
point-polygon intersection is performed to associate each
mobility trace with its corresponding district boundaries,
discarding traces that fall outside the predefined study
areas. Next, the filtered traces are aggregated based on their
unique spatio-temporal identifiers (i.e., timestamp and
district ID), summing the occurrences within each region to
obtain demand estimates. This transformation preserves the
spatial and temporal integrity of the data while ensuring
compatibility with the forecasting models. In its final form,
the dataset consists of 89,022 observations across 97 districts
in Amsterdam and 218,325 observations across 22 districts
in Rotterdam.</p>
        <p>To validate our models, we use a 70-30 train-test temporal
split validation method. To ensure efective training of the
classifier, additional preprocessing steps were taken apart
from the feature extraction process discovered above. To
this extent, a quantile-based discretization function was
employed to transform continuous density values into 3
discrete demand levels (Low, Medium, High, as they will be
defined in the following paragraph). This transformation
enhances the classifier’s ability to model class distinctions
and mitigates the challenges associated with imbalanced
class distributions.</p>
        <p>We define ’Medium’ demand as the range where demand
levels are centered around 50% of the value of the day with
the highest demand, which serves as the baseline value.
Specifically, for a given threshold  ∈ (0, 50), the range
[50 − , 50 + ) is categorized as normal demand. Demand
levels below (above) this range are classified as ’Low’ (’High’,
respectively) demand. In our experiments, we set  = 17, in
order for the demand levels to correspond to three equally
sized ranges. This threshold provides a balanced distribution
of data across the defined demand categories, facilitating a
robust analysis of varying demand patterns.</p>
        <p>The optimization framework employed in this study
incorporates Bayesian Optimization due to its ability to
balance exploration and exploitation efectively. This approach
facilitates the exploration of the search space for better
solutions while simultaneously exploiting regions with high
potential, thereby reducing the number of iterations required
for optimization. Its suitability is particularly evident when
dealing with extensive experimental evaluations, such as
the computationally intensive training of ML models, as
it leverages prior knowledge to guide sampling and
optimize resources eficiently. Additionally, the probabilistic
framework of Bayesian Optimization enables the automatic
adjustment of the exploration-exploitation trade-of through
its acquisition functions.</p>
      </sec>
      <sec id="sec-6-2">
        <title>4.2. Experimental Results</title>
        <p>In this section, we assess the overall performance of our
models across multiple districts and timestamps. Table 2
summarizes the training and inference times of our
models across diferent prediction horizons in the two cities
of interest. Training time corresponds to the overall time
required to train the respective model, whereas inference
time corresponds to the average time taken to generate a
specific prediction for a given time horizon. It turns out that
training of our models is extremely fast since it takes 1-2
seconds or even less to be performed, while the inference
time is impressively in the order of 1 microsecond. This
analysis showcases the computational eficiency of the
models, providing insights into their scalability and suitability
for real-world spatio-temporal forecasting tasks.</p>
      </sec>
      <sec id="sec-6-3">
        <title>4.3. A Note on Feature Importance</title>
        <p>In order to assess the efect of the underlying features in the
quality of prediction of either the classifier or the regressor,
we present the average feature importance across all
prediction horizons for both models. The feature importance
values were computed using XGBoost’s internal feature
importance computation algorithm. Given the minimal
contributions of some features (importance values below 10− 4),
the y-axis of the corresponding plots has been log-scaled
to improve visualization. The results of this analysis are
grouped by city and illustrated in Figures 4 and 5.</p>
        <p>In both figures we can see that the average contribution
of the features related to the target is higher than those of
the neighbors, as expected. Overall, in decreasing order, the
most important features appear to be the target_ewm_*, the
target_rolling_mean_*, and the target_lag_*, denoting the
EWMA, the rolling mean, and the lagged value, respectively,</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusions</title>
      <p>This study investigated spatio-temporal forecasting
methods for density prediction in urban environments. By
integrating spatial dependencies with temporal trends, the
proposed methodology efectively captured localized patterns,
generating accurate forecasts across multiple horizons. Key
contributions include a flexible feature engineering pipeline
that incorporates both intra- and inter-district interactions,
alongside the application of eficient gradient boosting
architectures. This approach enhances predictive accuracy
while minimizing computational overhead, rendering it
suitable for large-scale forecasting tasks. Experimental results
demonstrated the eficacy of gradient-boosted density
predictors for both regression and classification, exhibiting
competitive performance across varying prediction horizons.
Although accuracy slightly declines for longer horizons, the
model remains robust, underscoring its adaptability and
practical applicability in real-world mobility forecasting.</p>
      <p>This research also unveils opportunities for future
exploration. Specifically, the integration of feature interactions
and their contributions to the overall model quality warrant
further investigation. Incorporating external factors, such
as weather conditions or public events, could potentially
enhance predictive performance. Additionally, applying
the proposed methodology to other geographical areas and
diverse mobility scenarios would help validate its
generalization capabilities and adaptability to varying contexts.
Finally, a comprehensive experimental comparison with
related work under specific settings is planned to facilitate
fair benchmarking.</p>
      <p>In conclusion, this study showcases the ability of
datadriven approaches to efectively tackle spatio-temporal
forecasting challenges. By leveraging the inherent spatial
segmentation of cities into districts, the methodology enables
the extraction of localized temporal patterns, facilitating
more informed decision-making and contributing to smarter,
more eficient urban planning from the perspective of
mobility data science [21].</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported in part by the Horizon
Framework Programme of the European Union under grant
agreement No. 101093051 (EMERALDS;
https://www.emeraldshorizon.eu/).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Large
Language Models in order to paraphrase and reword. After
using this tool/service, the authors reviewed and edited
the content as needed and take full responsibility for the
publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Mladenović</surname>
          </string-name>
          ,
          <article-title>Mobility as a Service</article-title>
          , in: R. Vickerman (Ed.),
          <source>International Encyclopedia of Transportation, Elsevier</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .1016/ B978-0
          <source>-08-102671-7</source>
          .
          <fpage>10607</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F. O</given-names>
            <surname>'Donncha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          , et al.,
          <article-title>A spatio-temporal LSTM model to forecast across multiple temporal and spatial scales</article-title>
          ,
          <source>Ecological Informatics</source>
          <volume>69</volume>
          (
          <year>2022</year>
          )
          <article-title>101687</article-title>
          . doi:
          <volume>10</volume>
          . 1016/j.ecoinf.
          <year>2022</year>
          .
          <volume>101687</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yin</surname>
          </string-name>
          , et al.,
          <article-title>Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Trafic Forecasting</article-title>
          ,
          <source>in: Proceedings of the 27th International Joint Conference on Artificial Intelligence</source>
          ,
          <year>2018</year>
          , p.
          <fpage>3634</fpage>
          -
          <lpage>3640</lpage>
          . doi:
          <volume>10</volume>
          .24963/ijcai.
          <year>2018</year>
          /505.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fan</surname>
          </string-name>
          , et al.,
          <article-title>GMAN: A Graph MultiAttention Network for Trafic Prediction</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>1234</fpage>
          -
          <lpage>1241</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai.v34i01.
          <fpage>5477</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jin</surname>
          </string-name>
          , et al.,
          <article-title>A Survey on Difusion Models for Time Series</article-title>
          and
          <string-name>
            <surname>Spatio-Temporal</surname>
            <given-names>Data</given-names>
          </string-name>
          ,
          <source>ArXiv abs/2404</source>
          .18886 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2404. 18886.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          , et al.,
          <source>Difusion Convolutional Recurrent Neural Network: Data-Driven Trafic Forecasting</source>
          , arXiv:
          <fpage>1707</fpage>
          .
          <year>01926</year>
          (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv. 1707.
          <year>01926</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pelekis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          ,
          <source>Mobility Data Management and Exploration</source>
          , Springer, New York, NY,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          . 1007/978-1-
          <fpage>4939</fpage>
          -0392-4.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zeng</surname>
          </string-name>
          , et al.,
          <article-title>Taxi demand forecasting based on the temporal multimodal information fusion graph neural network</article-title>
          ,
          <source>Applied Intelligence</source>
          <volume>52</volume>
          (
          <year>2022</year>
          )
          <fpage>12077</fpage>
          -
          <lpage>12090</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10489-021-03128-1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <article-title>Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>33</volume>
          (
          <year>2019</year>
          )
          <fpage>3656</fpage>
          -
          <lpage>3663</lpage>
          . doi:
          <volume>10</volume>
          .1609/aaai. v33i01.
          <fpage>33013656</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          , et al.,
          <source>A Comprehensive Survey on Graph Neural Networks, IEEE Transactions on Neural Networks and Learning Systems</source>
          <volume>32</volume>
          (
          <year>2019</year>
          )
          <fpage>4</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TNNLS.
          <year>2020</year>
          .
          <volume>2978386</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Grinsztajn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Oyallon</surname>
          </string-name>
          , et al.,
          <article-title>Why do tree-based models still outperform deep learning on typical tabular data?</article-title>
          ,
          <source>in: Proceedings of the 36th International Conference on Neural Information Processing Systems</source>
          ,
          <source>(NIPS '22)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>507</fpage>
          -
          <lpage>520</lpage>
          . doi:https: //doi.org/10.48550/arXiv.2207.08815.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Predicting the Dynamic Demand of BikeSharing System in Chicago with Divvy Operation Data: A Data-Driven approach for bike-sharing demand forecasting</article-title>
          ,
          <source>in: Proceedings of the 5th International Conference on E-Commerce</source>
          ,
          <article-title>E-Business and EGovernment</article-title>
          , ICEEG '
          <volume>21</volume>
          ,
          <year>2021</year>
          , p.
          <fpage>30</fpage>
          -
          <lpage>34</lpage>
          . doi:
          <volume>10</volume>
          .1145/ 3466029.3466035.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Kong</surname>
          </string-name>
          , et al.,
          <article-title>Short-Term Demand Forecasting of Urban Online Car-Hailing Based on the K-Nearest Neighbor Model</article-title>
          ,
          <source>Sensors</source>
          <volume>22</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          . 3390/s22239456.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sohrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ermagun</surname>
          </string-name>
          ,
          <article-title>Dynamic bike sharing trafifc prediction using spatiotemporal pattern detection</article-title>
          ,
          <source>Transportation Research Part D: Transport and Environment</source>
          <volume>90</volume>
          (
          <year>2021</year>
          )
          <article-title>102647</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.trd.
          <year>2020</year>
          .
          <volume>102647</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hankey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Predicting bicycling and walking trafic using street view imagery and destination data</article-title>
          ,
          <source>Transportation Research Part D-transport and Environment</source>
          <volume>90</volume>
          (
          <year>2021</year>
          )
          <article-title>102651</article-title>
          . doi:
          <volume>10</volume>
          .1016/j. trd.
          <year>2020</year>
          .
          <volume>102651</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>H. T. K. Le</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Buehler</surname>
          </string-name>
          , et al.,
          <article-title>Correlates of the Built Environment and Active Travel: Evidence from 20 US Metropolitan Areas</article-title>
          ,
          <source>Environmental Health Perspectives</source>
          <volume>126</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1289/EHP3389.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          , et al.,
          <article-title>Self-Attention ConvLSTM for Spatiotemporal Forecasting of Short-Term Online CarHailing Demand</article-title>
          ,
          <source>Sustainability</source>
          <volume>14</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          . 3390/su14127371.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shangguan</surname>
          </string-name>
          , et al.,
          <article-title>Spatial-Temporal Difusion Convolutional Network: A Novel Framework for Taxi Demand Forecasting</article-title>
          ,
          <source>ISPRS Int. J. Geo Inf</source>
          .
          <volume>11</volume>
          (
          <year>2022</year>
          )
          <article-title>193</article-title>
          . doi:
          <volume>10</volume>
          .3390/ijgi11030193.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , et al.,
          <article-title>Trafic prediction in a bikesharing system</article-title>
          ,
          <source>in: Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Sigspatial '15</source>
          ,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1145/2820783.2820837.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ye</surname>
          </string-name>
          , et al.,
          <article-title>Demand Forecasting of Online CarHailing With Stacking Ensemble Learning Approach</article-title>
          and
          <string-name>
            <surname>Large-Scale</surname>
            <given-names>Datasets</given-names>
          </string-name>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>199513</fpage>
          -
          <lpage>199522</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>3034355</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mokbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sakr</surname>
          </string-name>
          , et al.,
          <source>Mobility Data Science: Perspectives and Challenges</source>
          ,
          <source>ACM Trans. Spatial Algorithms Syst</source>
          .
          <volume>10</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1145/3652158.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>