Shared Micro-mobility Demand Forecasting using Gradient Boosting methods

Shared Micro-mobility Demand Forecasting using Gradient Boosting methods AntoniosTziorvas Department of Informatics University of Piraeus

Piraeus Greece

GeorgeSTheodoropoulos Department of Informatics University of Piraeus

Piraeus Greece

YannisTheodoridis Department of Informatics University of Piraeus

Piraeus Greece

Shared Micro-mobility Demand Forecasting using Gradient Boosting methods 1EB111BA902DD76AD5E0BC35267142E8 GROBID - A machine learning software for extracting information from scholarly documents Gradient Boosting Demand Forecasting Spatial/Temporal Features Open shared mobility data E-scooters/E-bikes Urban micro-mobility management Intelligent Transportation Systems

Urban demand forecasting plays a critical role in optimizing routing, dispatching, and congestion management within Intelligent Transportation Systems. By leveraging data fusion and analytics techniques, traffic density estimation serves as a key intermediate measure for identifying and predicting emerging demand patterns. In this paper, we propose two gradient boosting model variations, one for classification and one for regression, both capable of generating demand forecasts at various temporal horizons, from 5 minutes up to one hour. Our approach effectively integrates spatial and temporal features, enabling accurate predictions that are essential for improving the efficiency of shared (micro-)mobility services. To evaluate the effectiveness of our approach, we utilize open shared mobility data derived from e-scooters and e-bikes networks in two Dutch metropolitan areas. These real-world datasets enable us to validate our approach and demonstrate its effectiveness in capturing the complexities of modern urban mobility. Ultimately, our methodology offers novel insights on urban micro-mobility management, helping to tackle the challenges arising from rapid urbanization and thus, contributing to more sustainable, efficient, and liveable cities.

Introduction

Urban shared mobility, also known as Mobility-as-a-Service (MaaS) [1], integrates transportation modes, such as public transit, micromobility services (e.g., bike-and scootersharing), and commute-based models (e.g., carpooling). The need for accurate mobility pattern forecasting is growing rapidly as real-time data analytics in Intelligent Transportation Systems (ITS) help alleviate congestion, reduce travel times, and enhance road safety in increasingly complex urban environments. In shared mobility services, including ride-hailing and bike-sharing, predicting demand across spatial and temporal dimensions is essential for efficient resource allocation, reduced waiting times, and optimized fleet deployment. This insight supports smart city initiatives by informing human-centric urban infrastructure design and facilitating sustainable development through data-driven decisions in areas, such as energy consumption, public transit planning, and emergency services.

Several Machine Learning (ML) approaches have been proposed for detecting and forecasting spatio-temporal patterns in timeseries, including Long Short-Term Memory (LSTM) networks [2], Graph Neural Networks (GNNs) [3,4], and Diffusion-based Models [5,6]. However, these models often suffer from high computational complexity due to the intricate nature of spatio-temporal data, which combines spatial correlations with temporal dependencies [7].

A common representation for such data is the Spatio-Temporal Graph (STG), where nodes denote spatial locations and edges encode relationships over time. Capturing both spatial and temporal dependencies requires advanced architectures, such as Spatio-Temporal Graph Neural Networks (STGNNs) [8] or Spatio-Temporal Graph Convolutional Networks (STGCNs) [9]. However, these methods impose sig-nificant computational demands, limiting their real-world applicability. [10]. While optimization techniques such as compression and pruning can reduce model size and inference time, they often come at the cost of decreased predictive accuracy.

This trade-off between complexity, accuracy, and efficiency underscores the need for alternative methodologies. In this work, we propose a novel approach based on gradientboosted trees, a class of models known for their superior performance on structured tabular data [11]. Our method effectively forecasts micro-mobility demand across spatial and temporal dimensions while maintaining computational efficiency, making it suitable for large-scale deployment.

In our approach, we provide a robust feature extraction pipeline that can capture both spatial and temporal dependencies and integrate them into our model. We also present a gradient boosting ML algorithm capable of predicting the demand in a given area, either in the form of levels (e.g., 'Low', 'Medium', 'High') through a classifier, or in absolute demand values through a regressor. The performance of our model is evaluated using two real-world micro-mobility datasets and the results turn out to be quite promising since the our approach can effectively adapt to the uniqueness of each area and adequately model its intricacies.

The rest of this paper is structured as follows: Section 2 reviews related work, Section 3 introduces our forecasting methodology, including the feature extraction process and training configuration. Section 4 describes the experimental setup and presents the results of both predictive models, offering detailed performance metrics. Finally, Section 5 concludes the paper with a summary of the key contributions and directions for future research.

Related Work

A wide variety of modeling approaches have been proposed to tackle the challenges related to spatio-temporal forecasting. In this section, we will identify and review previous approaches in the context of shared mobility demand forecasting by categorizing them in three groups: (traditional or advanced) ML approaches, deep learning models, and ensemble and hybrid methods, respectively.

ML Approaches

Traditional machine learning (ML) methods provide a stable analytical foundation for structured predictive problems. Feng et al. [12] developed a predictive model for bike-sharing demand in Chicago using Poisson regression, incorporating time, weather, population density, and activity density as features. Additionally, a Random Forest model was employed to enhance predictive accuracy through ensemble learning, aiding in demand-supply optimization and the identification of potential new station locations. Similarly, Xiao et al. [13] applied clustering techniques to forecast short-term car demand, utilizing an improved k-nearest neighbor (kNN) model for online car-hailing services in Hefei City, China. Their results highlight the effectiveness of clustering methodologies in enhancing demand prediction accuracy.

A notable advancement in bike-share traffic prediction is the two-step pattern recognition model proposed by Sohrabi & Ermagun [14]. This approach first generates traffic profiles, where bike station traffic is represented as timeseries profiles with 𝑡-minute intervals and an overlap length 𝑂. The kNN algorithm is then applied to identify spatiotemporal similarities, incorporating historical traffic data, temporal factors (e.g., weather, weekdays), and spatial characteristics (e.g., socio-demographics, land use, infrastructure). By leveraging weighted Euclidean distance metrics, this method enables accurate short-and long-term traffic forecasting at both station and system-wide levels, demonstrating the efficacy of integrating spatiotemporal features with ML techniques.

Smith et al. [15] employ ML to predict bicycle and pedestrian traffic patterns across 20 U.S. metropolitan statistical areas (MSAs). Building on previous efforts by Le et al. [16], where stepwise linear regression models were developed based on Census-derived, neighborhood-level covariates, this study incorporates traffic counts aggregated from 4,145 locations alongside novel street-level data, including Google Street View imagery and Point of Interest (PoI) data. The authors evaluate several ML regressors, including linear, ridge, lasso, bagging, gradient boosting, and random forest, reporting significant improvements in predictive accuracy over traditional linear regression, particularly when using augmented datasets.

Deep Learning Models

Ge et al. [17] introduced a Self-Attention ConvLSTM (SA-ConvLSTM) model to enhance the accuracy of ConvLSTM for online car-hailing demand forecasting. By converting car-hailing trajectories into grid-based images, the model incorporates a self-attention module to capture long-range spatiotemporal dependencies, leveraging pairwise similarity scores across input and memory positions.

Luo et al. [18] proposed a Spatial-Temporal Diffusion Convolutional Network (ST-DCN) to address limitations in modeling dynamic spatiotemporal dependencies for taxi demand forecasting. Their approach integrates a two-phase graph diffusion convolutional network with an attention mechanism to model spatial dependencies, while a temporal convolution module captures long-term trends, including recent, daily, and weekly patterns. The use of stacked convolution layers further enhances the model's ability to process extended timeseries sequences.

Li et al. [19] proposed a hierarchical framework for proactive bike redistribution in bike-sharing systems. A bipartite clustering algorithm groups stations, followed by citywide rental prediction using a Gradient Boosting Regression Tree. Rental proportions and inter-cluster transitions are then estimated via a multi-similarity-based model to predict station-level rentals and returns. Experiments on real-world datasets from New York City and Washington D.C. indicate significant performance gains over baseline models, particularly during periods of atypical demand.

Ensemble and Hybrid Models

The concept of ensemble learning, which involves combining the predictions of multiple simple models to form a new, more robust model, has been extensively explored in the literature and has been shown to produce better results compared to its individual constituent models. Building on this idea, Yuming et al. [20] developed a stacking ensemble learning framework that integrates the predictions of three distinct base learners: Random Forest, LightGBM, and Long Short-Term Memory (LSTM) networks. These base learners were trained in parallel using a common set of input features, including temporal, spatial, and weather-related variables. The predictions generated by each base learner were then aggregated by a Support Vector Regression (SVR) meta-learner that produced the final demand forecast values.

Positioning of Our Work with Respect to State-of-the-Art

Our approach diverges from these approaches in multiple ways. Firstly, our methodology employs a unified modeling workflow that can tackle the problem in two alternative ways, by either regression or classification, allowing us to adjust to specific needs. Additionally, our feature engineering pipeline integrates both spatial and temporal features which are fed to a tuned gradient boosting model, enabling us to better identify the intricacies of urban micro-mobility. Lastly, by utilizing two real-world datasets for evaluation, our method can be adequately validated under different scenarios, ensuring it is effective and robust. Nevertheless, as part of our future research, we plan to align related work with our approach for a fair comparison under the same settings over the same real datasets.

Our Approach for Efficient Shared Micro-Mobility Demand Forecasting

In this section, we define the shared mobility demand forecasting problem and present our proposed approach.

Problem Definition

Generally speaking, a spatio-temporal timeseries is a sequence of data points, where each one is associated with a time index and a spatial location. For the purposes of our work, an area of interest (a city, a region, etc.) is partitioned into a set of geographical districts, where each district 𝐴 is represented as a polygon. We denote 𝐷 𝐴 𝑡 as the percentile-encoded demand of micro-mobility vehicles within district 𝐴 at time 𝑡. Given 𝜏 past observations for district 𝐴, we construct its spatio-temporal timeseries as

𝑋 𝐴 𝑡 = {𝐷 𝐴 𝑡−𝜏 , 𝐷 𝐴 𝑡−𝜏 +1 , . . . , 𝐷 𝐴 𝑡−1 , 𝐷 𝐴 𝑡 }.

The problem addressed in this paper is to predict the micro-mobility demand value 𝐷 𝐴 𝑡+𝑚 for district 𝐴, 𝑚 time steps ahead, given 𝑋 𝐴 𝑡 i.e., the current along with the past 𝜏 observations. We will exemplify the above definition through a real case extracted from our real-world dataset (dataset details are presented in Section 4). As illustrated in Figure 1, the problem at hand is to predict (as accurately as possible) the anticipated demand values of each district's demand timeseries (in orange) given the current and some past demand values (in blue), taking into account relevant (temporal, spatial, context) features. To efficiently achieve this goal, seasonal etc. patterns hidden in these timeseries should be disclosed. For instance, looking at the timeseries of two major districts in Rotterdam that are displayed in Figure 1, we identify seasonal commuting patterns: the highest and most spread peak occurs on Friday, due to a combination of leisure and work time. This is made more evident in Figure 2, which showcases the combination of the hour-of-day and day-of-week features in Rotterdam Centrum district.

Methodology

We propose the so-called Gradient Boosting Demand Predictor, in two variations, differentiated by their output type:

• A 𝑁 -class classifier (per horizon per district) • A regressor (per horizon per district)

The purpose of the classifier is to classify the future demand of a district into predefined levels (e.g., 'Low', 'Medium', 'High'), whereas the regressor aims to predict the actual demand values for the respective horizon. In other words, for an area of interest partitioned into 𝑀 districts with the goal of 𝑇 individual prediction horizons, both architectures require training 𝑇 × 𝑀 models, where each model predicts the output for a specific district-horizon pair. The reasoning behind training individual models per district is to allow each model to specialize in its respective spatial region, focusing exclusively on its unique spatial and temporal patterns. Nevertheless, in our future work we plan to assess the efficiency of single global model for all districts (where the district ID will be given as input to the model).

As illustrated in Figure 3, the initial step of our methodology is the partitioning of the area of interest into districts, modeled as an adjacency graph that allows us to extract the neighbors of each district. Two districts are defined as neighbors if they share a common boundary 1 . The demands of the target as well as of the neighboring districts are represented by the respective spatiotemporal timeseries (step 2), as defined in Section 3.1, which are fed into a feature extraction pipeline (step 3), to be detailed in Section 3.3. Finally, all these features are combined into a single, augmented dataset (step 4) to be used for the purposes of training our two models, the classifier and the regressor (step 5).

Feature Extraction

Given a timeseries 𝑋 𝐴 𝑡 of a target district 𝐴, feature extraction is performed to enhance the spatio-temporal modeling capabilities of the proposed method. The extracted features are categorized into four primary groups: time-related, lagged, rolling, and exponentially weighted features. These features are derived not only from the target district but also from its neighboring districts, incorporating both spatial and temporal dependencies.

Time-related capture temporal patterns influencing the target variable. Key attributes include the hour of the day (0-23) to identify daily patterns like peak periods, and the day of the week (0-6, with 0 as Monday) to capture weekly trends. Monthly variations are represented by the month (0-11, with 0 as January), accounting for seasonal behaviors. Additionally, the minute component (0-59) highlights finer time dependencies and intra-hour variations.

The second category of features, denoted to as lagged features, is designed to capture the temporal dependencies between past observations of the timeseries and its current state. For each district, a set of lagged features is generated by systematically shifting the original timeseries observations backward in time. Formally, given a timeseries 𝑋 𝐴 𝑡 representing the observed values of a target district A at time 𝑡, the lagged feature at a temporal offset 𝜏 is defined as 𝑋 𝐴 𝑡−𝜏 . A range of temporal offsets is selected to ensure that both short-term and long-term dependencies are captured. Specifically, the lagged features are computed at intervals of 1, 5, 10, and 15 minutes, capturing short-term temporal dependencies across multiple time horizons. This shifting operation effectively creates additional feature representations of the past states of the system, allowing the model to learn historical patterns that influence future predictions. By incorporating lagged features into the predictive framework, the approach enhances its ability to capture recurrent patterns, trends, and autocorrelations inherent in spatio-temporal mobility data.

The rolling features provide a smoothed representation of the data over various intervals, capturing localized fluctuations that are critical for accurate forecasting. These features are computed using sliding windows, where statistical measures are aggregated over a sequence of recent time steps to encapsulate short-term trends and variability in mobility patterns. For each district, a rolling window of varying sizes-ranging from 5 to 15 minutes in 5-minute increments-is applied to the timeseries data. Within each window, the mean is computed to capture central tendency and, over larger windows, the degree of short-term variability.

Finally, the objective of the exponentially weighted features is to prioritize recent observations while retaining information from historical data. This is achieved through the application of exponentially weighted techniques, which assign progressively smaller weights to older data points, ensuring that the model places greater emphasis on more recent trends. Among these methods, the Exponential Weighted Moving Average (EWMA) is particularly effective, as it assigns exponentially decreasing weights to past observations, thereby capturing temporal dependencies in a manner that balances adaptability and historical context. For a given timeseries 𝑋𝑡, the EWMA at time 𝑡 is computed recursively as 𝐸𝑊 𝑀 𝐴𝑡 = 𝛼𝑋𝑡 + (1 − 𝛼)𝐸𝑊 𝑀 𝐴𝑡−1 where 𝛼 represents the smoothing factor, typically defined as 𝛼 =2 𝑁 +1 for a given window size 𝑁 . This formulation allows the predictive framework to swiftly adapt to evolving patterns, including sudden variations induced by external factors such as weather conditions, road incidents, or public events, while maintaining a stable representation of long-term trends.

The features used in our methodology are summarized in Table 1.

Experimental Study

This section describes the data and preprocessing steps used in our experimental study (Section 4.1) as well as our findings (Section 4.2). The experiments were conducted on a server with an AMD Epyc 64-Core CPU and an Nvidia A100 GPU with 40GB of memory. Our source code is publicly available for reproducibility purposes.

Experimental Setup

In our study, we utilize shared micro-mobility data provided by Deelfiets Nederland, a popular micro-mobility service provider in Netherlands, through an open API 3 . Data consists of real-time GPS positions for vehicles in the Netherlands, which we query every 60 seconds, creating a dataset of sufficient granularity for our use-case. The dataset includes the latitude, longitude, vehicle type and company name for each vehicle. For the purposes of this study, we utilize mobility data from two major metropolitan areas, Amsterdam and Rotterdam, focusing specifically on the ten most densely populated districts in each city. These districts exhibit diverse urban characteristics and distinct spatiotemporal patterns, making them well-suited for evaluating the model's adaptability to localized mobility trends. The temporal coverage of our data is about 2 months (November 11, 2024 -January 15, 2025), and 5 months (August 06, 2024 -January 15, 2025), respectively. Since the original data source is a stream instead of a static dataset, we manually append the temporal indicator in each subsequent request.

To derive demand estimates at the district level, a spatial filtering and aggregation procedure is applied. First, a point-polygon intersection is performed to associate each mobility trace with its corresponding district boundaries, discarding traces that fall outside the predefined study areas. Next, the filtered traces are aggregated based on their unique spatio-temporal identifiers (i.e., timestamp and district ID), summing the occurrences within each region to obtain demand estimates. This transformation preserves the spatial and temporal integrity of the data while ensuring compatibility with the forecasting models. In its final form, the dataset consists of 89,022 observations across 97 districts in Amsterdam and 218,325 observations across 22 districts in Rotterdam.

To validate our models, we use a 70-30 train-test temporal split validation method. To ensure effective training of the classifier, additional preprocessing steps were taken apart from the feature extraction process discovered above. To this extent, a quantile-based discretization function was employed to transform continuous density values into 3 discrete demand levels (Low, Medium, High, as they will be defined in the following paragraph). This transformation enhances the classifier's ability to model class distinctions and mitigates the challenges associated with imbalanced class distributions.

We define 'Medium' demand as the range where demand levels are centered around 50% of the value of the day with the highest demand, which serves as the baseline value. Specifically, for a given threshold 𝑑 ∈ (0, 50), the range [50 − 𝑑, 50 + 𝑑) is categorized as normal demand. Demand levels below (above) this range are classified as 'Low' ('High', respectively) demand. In our experiments, we set 𝑑 = 17, in order for the demand levels to correspond to three equally sized ranges. This threshold provides a balanced distribution of data across the defined demand categories, facilitating a robust analysis of varying demand patterns.

The optimization framework employed in this study incorporates Bayesian Optimization due to its ability to balance exploration and exploitation effectively. This approach facilitates the exploration of the search space for better solutions while simultaneously exploiting regions with high potential, thereby reducing the number of iterations required for optimization. Its suitability is particularly evident when dealing with extensive experimental evaluations, such as the computationally intensive training of ML models, as it leverages prior knowledge to guide sampling and optimize resources efficiently. Additionally, the probabilistic framework of Bayesian Optimization enables the automatic adjustment of the exploration-exploitation trade-off through its acquisition functions.

Experimental Results

In this section, we assess the overall performance of our models across multiple districts and timestamps. Table 2 summarizes the training and inference times of our models across different prediction horizons in the two cities of interest. Training time corresponds to the overall time required to train the respective model, whereas inference time corresponds to the average time taken to generate a specific prediction for a given time horizon. It turns out that training of our models is extremely fast since it takes 1-2 seconds or even less to be performed, while the inference time is impressively in the order of 1 microsecond. This analysis showcases the computational efficiency of the models, providing insights into their scalability and suitability for real-world spatio-temporal forecasting tasks.

Table 2

Training and inference times of the two models across different horizons. Tables 3-6 present the predictive performance of the proposed spatio-temporal forecasting models evaluated at four prediction horizons: 5, 15, 30, and 60 minutes. The quality metrics we used include 𝑅 2 , RMSE, sMAPE and MASE for the regressor and F1-score, Accuracy, Recall and Precision for the classifier. These metrics collectively assess the model's predictive accuracy, robustness, and relative performance across districts with varying population sizes and temporal dynamics.

City

Aggregated metrics are added to summarize the overall performance across all districts, offering insights into the model's general effectiveness and consistency across varying prediction horizons. It is evident that both architectures perform really well at both short-term (5-15 min.) and longterm (30-60 min.) forecasting. From the summary statistics we can deduce that both models consistently perform well for the vast majority of the districts. Indicatively, the regressor's 𝑅 2 ranges on the average from 0.94 down to 0.84 as we move from lower to higher prediction horizons. As for the classifier, 𝐹 1-score ranges on the average from 0.89 to 0.80, respectively.

A Note on Feature Importance

In order to assess the effect of the underlying features in the quality of prediction of either the classifier or the regressor, we present the average feature importance across all prediction horizons for both models. The feature importance values were computed using XGBoost's internal feature importance computation algorithm. Given the minimal contributions of some features (importance values below 10 −4 ), the y-axis of the corresponding plots has been log-scaled to improve visualization. The results of this analysis are grouped by city and illustrated in Figures 4 and 5.

In both figures we can see that the average contribution of the features related to the target is higher than those of the neighbors, as expected. Overall, in decreasing order, the most important features appear to be the target_ewm_*, the target_rolling_mean_*, and the target_lag_*, denoting the EWMA, the rolling mean, and the lagged value, respectively,

Conclusions

This study investigated spatio-temporal forecasting methods for density prediction in urban environments. By integrating spatial dependencies with temporal trends, the pro- posed methodology effectively captured localized patterns, generating accurate forecasts across multiple horizons. Key contributions include a flexible feature engineering pipeline that incorporates both intra-and inter-district interactions, alongside the application of efficient gradient boosting architectures. This approach enhances predictive accuracy while minimizing computational overhead, rendering it suitable for large-scale forecasting tasks. Experimental results demonstrated the efficacy of gradient-boosted density predictors for both regression and classification, exhibiting competitive performance across varying prediction horizons. Although accuracy slightly declines for longer horizons, the model remains robust, underscoring its adaptability and practical applicability in real-world mobility forecasting.

This research also unveils opportunities for future exploration. Specifically, the integration of feature interactions and their contributions to the overall model quality warrant further investigation. Incorporating external factors, such as weather conditions or public events, could potentially enhance predictive performance. Additionally, applying the proposed methodology to other geographical areas and diverse mobility scenarios would help validate its generalization capabilities and adaptability to varying contexts. Finally, a comprehensive experimental comparison with related work under specific settings is planned to facilitate fair benchmarking.

In conclusion, this study showcases the ability of datadriven approaches to effectively tackle spatio-temporal fore-casting challenges. By leveraging the inherent spatial segmentation of cities into districts, the methodology enables the extraction of localized temporal patterns, facilitating more informed decision-making and contributing to smarter, more efficient urban planning from the perspective of mobility data science [21].

Figure 1 :1Figure 1: Two indicative timeseries for two regions of Rotterdam: Delfshaven (top) and Rotterdam Centrum (bottom). The x-axis denotes time, while the y-axis indicates the measured demand values.

Figure 2 :2Figure 2: Average hourly demand across the week in Rotterdam Centrum. The coloring indicates the relative demand intensity and, along with the peaks, aid in highlighting daily and weekly usage patterns.

Figure 3 :3Figure 3: Illustration of our Gradient Boosting Density Predictor proposed methodology

the current district at the present time step neighbor_* Feature variable of each of the target's neighbors in neighboring districts Time-Related hour-of-day Hour of the day of the timestamp (0-23) minute-of-hour Minute of the hour of the timestamp (0-59) day-of-week Day of the week of the timestamp (0=Monday, 6=Sunday) month-of-year Month of the year of the timestamp (0=January, 11=December) Lagged neighbor_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for neighboring districts target_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for the current district Rolling neighbor_rolling_mean_{5, 10, 15} Rolling mean of neighbors over the last {5, 10, 15} observations target_rolling_mean_{5, 10, 15} Rolling mean of the target variable over the last {5, 10, 15} observations EWMA neighbor_ewm_{5, 10, 15} Exponentially weighted moving average of neighbors with a span of {5, 10, 15} target_ewm_{5, 10, 15} Exponentially weighted moving average of the target variable with a span of {5, 10, 15}

Figure 4 :Figure 5 :45Figure 4: Average feature importance across all 4 prediction horizons of the regressor in Rotterdam (left) and Amsterdam (right)

Table 11Summary table of the extracted features used in our approach

Table 33Predictive performance of the regressor in Rotterdam across different prediction horizons.

DistrictAvg PopR 25 min RMSE MAE sMAPER 215 min RMSE MAE sMAPER 230 min RMSE MAE sMAPER 260 min RMSE MAE sMAPEKralingen-Crooswijk5070.979.706.940.010.9611.007.860.020.9513.179.810.020.9018.1113.450.03Rotterdam Centrum5020.9915.2511.180.020.9820.3314.770.020.9722.9517.370.030.9334.6626.620.04Hillegersberg-Schiebroek4760.8017.3612.670.020.7618.8214.060.030.7320.0315.200.030.6821.7916.780.03Delfshaven3470.986.434.700.010.978.035.750.020.9610.137.270.020.9213.789.970.03Feijenoord3470.976.314.610.010.9210.097.720.020.949.297.000.020.9210.317.630.02Noord3070.8815.6910.110.030.8517.4911.610.030.8119.7613.610.040.7323.9616.870.04Prins Alexander3030.996.044.370.010.995.914.260.010.986.764.920.010.987.995.870.02Charlois2420.975.243.840.020.975.934.420.020.966.895.200.020.947.845.920.03Ĳsselmonde1840.945.253.250.020.906.584.180.030.897.114.500.030.848.375.570.04Spaanse Polder360.902.181.360.030.693.882.180.050.872.531.760.040.544.753.430.07Average (𝜇 ± 𝜎)0.94 ± 0.068.94 ± 5.296.30 ± 3.770.02 ± 0.010.90 ± 0.1010.80 ± 5.987.68 ± 4.410.02 ± 0.010.90 ± 0.0811.86 ± 6.868.66 ± 5.170.03 ± 0.010.84 ± 0.1415.16 ± 9.4011.21 ± 7.220.03 ± 0.02

Table 44Predictive performance of the classifier in Rotterdam across different prediction horizons.DistrictAvg Pop5 min15 min30 min60 minF1Acc.Prec. Rec.F1Acc.Prec. Rec.F1Acc.Prec. Rec.F1Acc.Prec. Rec.Kralingen-Crooswijk5070.920.920.920.920.900.900.900.900.890.880.880.880.850.850.850.85Rotterdam Centrum5020.910.990.960.870.870.980.940.800.830.980.920.780.750.970.840.70Hillegersberg-Schiebroek4760.750.950.820.710.730.940.790.700.710.940.770.670.670.930.760.63Delfshaven3470.910.960.910.910.890.950.900.890.860.930.860.870.800.900.810.80Feijenoord3470.880.950.900.870.870.940.890.850.840.930.880.810.740.900.820.71Noord3070.820.990.850.800.710.980.740.690.550.910.550.630.590.970.620.58Prins Alexander3030.970.980.970.970.970.980.970.970.960.980.960.960.960.970.960.95Charlois2420.910.910.920.910.900.900.900.900.890.890.890.890.870.870.870.87IJsselmonde1840.920.970.920.920.910.970.910.910.900.970.900.910.890.970.890.90Spaanse Polder360.950.990.980.920.900.980.940.870.930.980.960.900.910.980.950.88Average (𝜇 ± 𝜎)0.89 ± 0.060.96 ± 0.030.91 ± 0.050.88 ± 0.070.87 ± 0.080.95 ± 0.030.89 ± 0.070.85 ± 0.090.84 ± 0.120.94 ± 0.040.86 ± 0.120.83 ± 0.110.80 ± 0.110.93 ± 0.050.84 ± 0.100.79 ± 0.12

of the target variable. Focusing on the time-related features only, hour-of-day and day-of-week show the relatively highest importance in both cases, underlying the presence of seasonal patterns.

Table 55Predictive performance of the regressor in Amsterdam across different prediction horizons.DistrictAvg. PopR 25 min RMSE MAE sMAPER 215 min RMSE MAE sMAPER 230 min RMSE MAE sMAPER 260 min RMSE MAE sMAPEOostelijk Havengebied1100.972.341.830.020.962.752.160.020.953.242.530.020.924.163.240.03Buitenveldert-West870.972.121.570.020.952.661.960.020.933.282.400.020.874.373.170.03Burgwallen-Nieuwe Zijde720.981.911.410.020.962.551.910.030.933.382.580.030.894.353.350.04Jordaan710.982.311.780.020.944.042.750.030.914.673.420.040.856.144.660.05Scheldebuurt710.981.561.190.010.962.061.570.020.942.662.020.020.893.502.640.03Middenmeer660.961.941.460.020.942.281.730.020.902.892.200.030.863.552.710.03IJburg West630.931.621.280.020.911.831.450.020.872.151.700.020.792.762.200.03Westelijk Havengebied630.883.721.600.030.854.051.750.030.834.281.870.040.784.742.200.04Landlust590.991.421.070.020.971.961.510.030.962.602.010.040.923.492.700.05Nieuwmarkt/Lastage580.922.821.940.020.883.342.370.030.764.713.350.040.705.293.880.05Average (𝜇 ± 𝜎)0.95 ± 0.042.18 ± 0.691.51 ± 0.280.02 ± 0.010.93 ± 0.042.75 ± 0.811.92 ± 0.410.02 ± 0.010.90 ± 0.063.38 ± 0.892.41 ± 0.590.03 ± 0.010.85 ± 0.074.24 ± 0.993.08 ± 0.770.04 ± 0.01

Table 66Predictive performance of the classifier in Amsterdam across different prediction horizons.DistrictAvg. Pop5 min15 min30 min60 minF1Acc.Prec. Rec.F1Acc.Prec. Rec.F1Acc.Prec. Rec.F1Acc.Prec. Rec.Oostelijk Havengebied1100.900.920.930.890.890.900.890.880.860.880.870.850.760.780.750.79Buitenveldert-West870.920.930.940.910.900.910.920.890.880.890.890.860.830.860.850.82Burgwallen-Nieuwe Zijde720.940.940.950.940.920.920.920.910.890.890.890.890.850.840.850.84Jordaan710.910.940.940.890.870.920.890.850.840.900.870.810.820.890.820.82Scheldebuurt710.910.930.930.900.880.900.900.870.850.880.880.830.790.830.830.77Middenmeer660.850.940.900.810.790.910.810.770.730.900.800.700.680.830.690.69IJburg West630.730.960.810.680.630.950.660.610.570.790.540.730.610.920.600.62Westelijk Havengebied630.910.890.940.900.890.870.930.880.900.890.920.900.870.850.890.86Landlust590.940.940.930.940.930.930.920.930.920.920.910.920.870.870.870.87Nieuwmarkt/Lastage580.790.940.880.730.690.920.780.650.620.910.650.600.550.880.570.53Average (𝜇 ± 𝜎)0.88 ± 0.070.93 ± 0.020.91 ± 0.040.86 ± 0.090.84 ± 0.100.91 ± 0.020.86 ± 0.090.83 ± 0.110.81 ± 0.120.88 ± 0.030.82 ± 0.130.81 ± 0.100.76 ± 0.110.86 ± 0.040.77 ± 0.110.76 ± 0.11

The inclusion of neighboring districts is justified by the limited spatial range and temporal duration of trips usually made by e-bikes and escooters. Due to their physical constraints, micro-mobility vehicles are unlikely to reach distant districts within short timeframes, making the consideration of neighboring districts only being a sufficient choice for effective spatio-temporal modeling. https://github.com/DataStories-UniPi/Shared-Mobility.git https://api.deelfietsdashboard.nl/dashboard-api/public/vehicles_in_ public_space

Acknowledgments

This work was supported in part by the Horizon Framework Programme of the European Union under grant agreement No. 101093051 (EMERALDS; https://www.emeraldshorizon.eu/).

Declaration on Generative AI

During the preparation of this work, the authors used Large Language Models in order to paraphrase and reword. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the publication's content.

Mobility as a Service MNMladenović 10.1016/B978-0-08-102671-7.10607-4 International Encyclopedia of Transportation RVickerman Elsevier 2021 A spatio-temporal LSTM model to forecast across multiple temporal and spatial scales FO'donncha YHu 10.1016/j.ecoinf.2022.101687 Ecological Informatics 69 101687 2022 Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting BYu HYin 10.24963/ijcai.2018/505 Proceedings of the 27th International Joint Conference on Artificial Intelligence the 27th International Joint Conference on Artificial Intelligence 2018 GMAN: A Graph Multi-Attention Network for Traffic Prediction CZheng XFan 10.1609/aaai.v34i01.5477 Proceedings of the AAAI Conference on Artificial Intelligence 34 2020 A Survey on Diffusion Models for Time Series and Spatio-Temporal Data YYang MJin 10.48550/arXiv.2404.18886 2024 YLi RYu 10.48550/arXiv.1707.01926 arXiv:1707.01926 Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting 2017 NPelekis YTheodoridis 10.1007/978-1-4939-0392-4 Mobility Data Management and Exploration

New York, NY

Springer 2014 Taxi demand forecasting based on the temporal multimodal information fusion graph neural network WLiao BZeng 10.1007/s10489-021-03128-1 Applied Intelligence 52 2022 Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting XGeng YLi 10.1609/aaai.v33i01.33013656 Proceedings of the AAAI Conference on Artificial Intelligence the AAAI Conference on Artificial Intelligence 2019 33 A Comprehensive Survey on Graph Neural Networks ZWu SPan 10.1109/TNNLS.2020.2978386 IEEE Transactions on Neural Networks and Learning Systems 32 2019 Why do tree-based models still outperform deep learning on typical tabular data? LGrinsztajn EOyallon 10.48550/arXiv.2207.08815 .2207.08815 Proceedings of the 36th International Conference on Neural Information Processing Systems the 36th International Conference on Neural Information Processing Systems 2022 NIPS '22) Predicting the Dynamic Demand of Bike-Sharing System in Chicago with Divvy Operation Data: A Data-Driven approach for bike-sharing demand forecasting HFeng 10.1145/3466029.3466035 Proceedings of the 5th International Conference on E-Commerce, E-Business and E-Government the 5th International Conference on E-Commerce, E-Business and E-Government 2021 21 Short-Term Demand Forecasting of Urban Online Car-Hailing Based on the K-Nearest Neighbor Model YXiao WKong 10.3390/s22239456 Sensors 22 2022 Dynamic bike sharing traffic prediction using spatiotemporal pattern detection SSohrabi AErmagun 10.1016/j.trd.2020.102647 Transportation Research Part D: Transport and Environment 90 102647 2021 Predicting bicycling and walking traffic using street view imagery and destination data SHankey WZhang 10.1016/j.trd.2020.102651 Transportation Research Part D-transport and Environment 90 102651 2021 Correlates of the Built Environment and Active Travel: Evidence from 20 US Metropolitan Areas HT KLe RBuehler 10.1289/EHP3389 Environmental Health Perspectives 126 2018 Self-Attention ConvLSTM for Spatiotemporal Forecasting of Short-Term Online Car-Hailing Demand HGe SLi 10.3390/su14127371 Sustainability 14 2022 Spatial-Temporal Diffusion Convolutional Network: A Novel Framework for Taxi Demand Forecasting ALuo BShangguan 10.3390/ijgi11030193 ISPRS Int. J. Geo Inf 11 193 2022 Traffic prediction in a bikesharing system YLi YZheng 10.1145/2820783.2820837 Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2015 15 Demand Forecasting of Online Car-Hailing With Stacking Ensemble Learning Approach and Large-Scale Datasets YJin XYe 10.1109/ACCESS.2020.3034355 IEEE Access 8 2020 Mobility Data Science: Perspectives and Challenges MMokbel MSakr 10.1145/3652158 ACM Trans. Spatial Algorithms Syst 10 2024