1. Introduction

Shared Micro-mobility Demand Forecasting using Gradient Boosting methods

Antonios Tziorvas

George S. Theodoropoulos

Yannis Theodoridis

0 0 Department of Informatics, University of Piraeus , Piraeus , Greece

Urban demand forecasting plays a critical role in optimizing routing, dispatching, and congestion management within Intelligent Transportation Systems. By leveraging data fusion and analytics techniques, trafic density estimation serves as a key intermediate measure for identifying and predicting emerging demand patterns. In this paper, we propose two gradient boosting model variations, one for classification and one for regression, both capable of generating demand forecasts at various temporal horizons, from 5 minutes up to one hour. Our approach efectively integrates spatial and temporal features, enabling accurate predictions that are essential for improving the eficiency of shared (micro-)mobility services. To evaluate the efectiveness of our approach, we utilize open shared mobility data derived from e-scooters and e-bikes networks in two Dutch metropolitan areas. These real-world datasets enable us to validate our approach and demonstrate its efectiveness in capturing the complexities of modern urban mobility. Ultimately, our methodology ofers novel insights on urban micro-mobility management, helping to tackle the challenges arising from rapid urbanization and thus, contributing to more sustainable, eficient, and liveable cities.

eol>Gradient Boosting Demand Forecasting Spatial/Temporal Features Open shared mobility data E-scooters/E-bikes Urban micro-mobility management Intelligent Transportation Systems

1. Introduction

Urban shared mobility, also known as Mobility-as-a-Service (MaaS) [1], integrates transportation modes, such as public transit, micromobility services (e.g., bike- and scootersharing), and commute-based models (e.g., carpooling). The need for accurate mobility pattern forecasting is growing rapidly as real-time data analytics in Intelligent Transportation Systems (ITS) help alleviate congestion, reduce travel times, and enhance road safety in increasingly complex urban environments. In shared mobility services, including ride-hailing and bike-sharing, predicting demand across spatial and temporal dimensions is essential for eficient resource allocation, reduced waiting times, and optimized fleet deployment. This insight supports smart city initiatives by informing human-centric urban infrastructure design and facilitating sustainable development through data-driven decisions in areas, such as energy consumption, public transit planning, and emergency services.

Several Machine Learning (ML) approaches have been proposed for detecting and forecasting spatio-temporal patterns in timeseries, including Long Short-Term Memory (LSTM) networks [2], Graph Neural Networks (GNNs) [3, 4], and Difusion-based Models [ 5, 6]. However, these models often sufer from high computational complexity due to the intricate nature of spatio-temporal data, which combines spatial correlations with temporal dependencies [7].

A common representation for such data is the SpatioTemporal Graph (STG), where nodes denote spatial locations and edges encode relationships over time. Capturing both spatial and temporal dependencies requires advanced architectures, such as Spatio-Temporal Graph Neural Networks (STGNNs) [8] or Spatio-Temporal Graph Convolutional Networks (STGCNs) [9]. However, these methods impose significant computational demands, limiting their real-world applicability. [10]. While optimization techniques such as compression and pruning can reduce model size and inference time, they often come at the cost of decreased predictive accuracy.

This trade-of between complexity, accuracy, and eficiency underscores the need for alternative methodologies. In this work, we propose a novel approach based on gradientboosted trees, a class of models known for their superior performance on structured tabular data [11]. Our method efectively forecasts micro-mobility demand across spatial and temporal dimensions while maintaining computational eficiency, making it suitable for large-scale deployment.

In our approach, we provide a robust feature extraction pipeline that can capture both spatial and temporal dependencies and integrate them into our model. We also present a gradient boosting ML algorithm capable of predicting the demand in a given area, either in the form of levels (e.g., ’Low’, ’Medium’, ’High’) through a classifier, or in absolute demand values through a regressor. The performance of our model is evaluated using two real-world micro-mobility datasets and the results turn out to be quite promising since the our approach can efectively adapt to the uniqueness of each area and adequately model its intricacies.

The rest of this paper is structured as follows: Section 2 reviews related work, Section 3 introduces our forecasting methodology, including the feature extraction process and training configuration. Section 4 describes the experimental setup and presents the results of both predictive models, ofering detailed performance metrics. Finally, Section 5 concludes the paper with a summary of the key contributions and directions for future research.

2. Related Work

A wide variety of modeling approaches have been proposed to tackle the challenges related to spatio-temporal forecasting. In this section, we will identify and review previous approaches in the context of shared mobility demand forecasting by categorizing them in three groups: (traditional or advanced) ML approaches, deep learning models, and ensemble and hybrid methods, respectively.

2.1. ML Approaches

Traditional machine learning (ML) methods provide a stable analytical foundation for structured predictive problems. Feng et al. [12] developed a predictive model for bike-sharing demand in Chicago using Poisson regression, incorporating time, weather, population density, and activity density as features. Additionally, a Random Forest model was employed to enhance predictive accuracy through ensemble learning, aiding in demand-supply optimization and the identification of potential new station locations. Similarly, Xiao et al. [13] applied clustering techniques to forecast short-term car demand, utilizing an improved k-nearest neighbor (kNN) model for online car-hailing services in Hefei City, China. Their results highlight the efectiveness of clustering methodologies in enhancing demand prediction accuracy.

A notable advancement in bike-share trafic prediction is the two-step pattern recognition model proposed by Sohrabi & Ermagun [14]. This approach first generates trafic proifles, where bike station trafic is represented as timeseries profiles with -minute intervals and an overlap length . The kNN algorithm is then applied to identify spatiotemporal similarities, incorporating historical trafic data, temporal factors (e.g., weather, weekdays), and spatial characteristics (e.g., socio-demographics, land use, infrastructure). By leveraging weighted Euclidean distance metrics, this method enables accurate short- and long-term trafic forecasting at both station and system-wide levels, demonstrating the eficacy of integrating spatiotemporal features with ML techniques.

Smith et al. [15] employ ML to predict bicycle and pedestrian trafic patterns across 20 U.S. metropolitan statistical areas (MSAs). Building on previous eforts by Le et al. [ 16], where stepwise linear regression models were developed based on Census-derived, neighborhood-level covariates, this study incorporates trafic counts aggregated from 4,145 locations alongside novel street-level data, including Google Street View imagery and Point of Interest (PoI) data. The authors evaluate several ML regressors, including linear, ridge, lasso, bagging, gradient boosting, and random forest, reporting significant improvements in predictive accuracy over traditional linear regression, particularly when using augmented datasets.

2.2. Deep Learning Models

Ge et al. [17] introduced a Self-Attention ConvLSTM (SAConvLSTM) model to enhance the accuracy of ConvLSTM for online car-hailing demand forecasting. By converting car-hailing trajectories into grid-based images, the model incorporates a self-attention module to capture long-range spatiotemporal dependencies, leveraging pairwise similarity scores across input and memory positions.

Luo et al. [18] proposed a Spatial-Temporal Difusion Convolutional Network (ST-DCN) to address limitations in modeling dynamic spatiotemporal dependencies for taxi demand forecasting. Their approach integrates a two-phase graph difusion convolutional network with an attention mechanism to model spatial dependencies, while a temporal convolution module captures long-term trends, including recent, daily, and weekly patterns. The use of stacked convolution layers further enhances the model’s ability to process extended timeseries sequences.

Li et al. [19] proposed a hierarchical framework for proactive bike redistribution in bike-sharing systems. A bipartite clustering algorithm groups stations, followed by citywide rental prediction using a Gradient Boosting Regression Tree. Rental proportions and inter-cluster transitions are then estimated via a multi-similarity-based model to predict station-level rentals and returns. Experiments on real-world datasets from New York City and Washington D.C. indicate significant performance gains over baseline models, particularly during periods of atypical demand.

2.3. Ensemble and Hybrid Models

The concept of ensemble learning, which involves combining the predictions of multiple simple models to form a new, more robust model, has been extensively explored in the literature and has been shown to produce better results compared to its individual constituent models. Building on this idea, Yuming et al. [20] developed a stacking ensemble learning framework that integrates the predictions of three distinct base learners: Random Forest, LightGBM, and Long Short-Term Memory (LSTM) networks. These base learners were trained in parallel using a common set of input features, including temporal, spatial, and weather-related variables. The predictions generated by each base learner were then aggregated by a Support Vector Regression (SVR) meta-learner that produced the final demand forecast values.

2.4. Positioning of Our Work with Respect to State-of-the-Art

Our approach diverges from these approaches in multiple ways. Firstly, our methodology employs a unified modeling workflow that can tackle the problem in two alternative ways, by either regression or classification, allowing us to adjust to specific needs. Additionally, our feature engineering pipeline integrates both spatial and temporal features which are fed to a tuned gradient boosting model, enabling us to better identify the intricacies of urban micro-mobility. Lastly, by utilizing two real-world datasets for evaluation, our method can be adequately validated under diferent scenarios, ensuring it is efective and robust. Nevertheless, as part of our future research, we plan to align related work with our approach for a fair comparison under the same settings over the same real datasets.

3. Our Approach for Eficient Shared Micro-Mobility Demand Forecasting

In this section, we define the shared mobility demand forecasting problem and present our proposed approach.

3.1. Problem Definition

Generally speaking, a spatio-temporal timeseries is a sequence of data points, where each one is associated with a time index and a spatial location. For the purposes of our work, an area of interest (a city, a region, etc.) is partitioned into a set of geographical districts, where each district is represented as a polygon. We denote as 853 674 496 318 the percentile-encoded demand of micro-mobility vehicles within district at time . Given past observations for district , we construct its spatio-temporal timeseries as = {− , − +1, . . . , − 1, }. The problem addressed in this paper is to predict the micro-mobility demand value + for district , time steps ahead, given i.e., the current along with the past observations.

We will exemplify the above definition through a real case extracted from our real-world dataset (dataset details are presented in Section 4). As illustrated in Figure 1, the problem at hand is to predict (as accurately as possible) the anticipated demand values of each district’s demand timeseries (in orange) given the current and some past demand values (in blue), taking into account relevant (temporal, spatial, context) features. To eficiently achieve this goal, seasonal etc. patterns hidden in these timeseries should be disclosed. For instance, looking at the timeseries of two major districts in Rotterdam that are displayed in Figure 1, we identify seasonal commuting patterns: the highest and most spread peak occurs on Friday, due to a combination of leisure and work time. This is made more evident in Figure 2, which showcases the combination of the hour-of-day and day-of-week features in Rotterdam Centrum district.

3.2. Methodology

We propose the so-called Gradient Boosting Demand Predictor, in two variations, diferentiated by their output type: • A -class classifier (per horizon per district) • A regressor (per horizon per district)

The purpose of the classifier is to classify the future demand of a district into predefined levels (e.g., ’Low’, ’Medium’, ’High’), whereas the regressor aims to predict the actual demand values for the respective horizon. In other words, for an area of interest partitioned into districts with the goal of individual prediction horizons, both architectures require training × models, where each model predicts the output for a specific district-horizon pair. The reasoning behind training individual models per district is to allow each model to specialize in its respective spatial region, focusing exclusively on its unique spatial and temporal patterns. Nevertheless, in our future work we plan to assess the eficiency of single global model for all districts (where the district ID will be given as input to the model).

As illustrated in Figure 3, the initial step of our methodology is the partitioning of the area of interest into districts, modeled as an adjacency graph that allows us to extract the neighbors of each district. Two districts are defined as neighbors if they share a common boundary1. The demands of the target as well as of the neighboring districts are represented by the respective spatiotemporal timeseries (step 2), as defined in Section 3.1, which are fed into a feature extraction pipeline (step 3), to be detailed in Section 3.3. Finally, all these features are combined into a single, augmented dataset (step 4) to be used for the purposes of training our two models, the classifier and the regressor (step 5).

3.3. Feature Extraction

Given a timeseries of a target district , feature extraction is performed to enhance the spatio-temporal modeling capabilities of the proposed method. The extracted features are categorized into four primary groups: time-related, lagged, rolling, and exponentially weighted features. These features are derived not only from the target district but also from its neighboring districts, incorporating both spatial and temporal dependencies.

Time-related capture temporal patterns influencing the target variable. Key attributes include the hour of the day ( 0-23 ) to identify daily patterns like peak periods, and the day of the week (0-6, with 0 as Monday) to capture weekly trends. Monthly variations are represented by the month (0-11, with 0 as January), accounting for seasonal behaviors.

Additionally, the minute component ( 0-59 ) highlights finer time dependencies and intra-hour variations.

The second category of features, denoted to as lagged features, is designed to capture the temporal dependencies between past observations of the timeseries and its current state. For each district, a set of lagged features is generated by systematically shifting the original timeseries observations backward in time. Formally, given a timeseries representing the observed values of a target district A at time , the lagged feature at a temporal ofset is defined as − . A range of temporal ofsets is selected to ensure that both short-term and long-term dependencies are captured.

Specifically, the lagged features are computed at intervals of 1, 5, 10, and 15 minutes, capturing short-term temporal dependencies across multiple time horizons. 1The inclusion of neighboring districts is justified by the limited spatial range and temporal duration of trips usually made by e-bikes and escooters. Due to their physical constraints, micro-mobility vehicles are unlikely to reach distant districts within short timeframes, making the consideration of neighboring districts only being a suficient choice for efective spatio-temporal modeling. target Target variable for the current district at the present time step neighbor_* Feature variable of each of the target’s neighbors in neighboring districts hour-of-day Hour of the day of the timestamp ( 0-23 ) minute-of-hour Minute of the hour of the timestamp ( 0-59 ) day-of-week Day of the week of the timestamp (0=Monday, 6=Sunday) month-of-year Month of the year of the timestamp (0=January, 11=December) neighbor_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for neighboring districts target_lag_{1, 5, 10, 15} Lagged value ({1, 5, 10, 15}-step(s) back) of the target variable for the current district neighbor_rolling_mean_{5, 10, 15} Rolling mean of neighbors over the last {5, 10, 15} observations target_rolling_mean_{5, 10, 15} Rolling mean of the target variable over the last {5, 10, 15} observations neighbor_ewm_{5, 10, 15} Exponentially weighted moving average of neighbors with a span of {5, 10, 15} target_ewm_{5, 10, 15} Exponentially weighted moving average of the target variable with a span of {5, 10, 15} This shifting operation efectively creates additional feature representations of the past states of the system, allowing the model to learn historical patterns that influence future predictions. By incorporating lagged features into the predictive framework, the approach enhances its ability to capture recurrent patterns, trends, and autocorrelations inherent in spatio-temporal mobility data.

The rolling features provide a smoothed representation of the data over various intervals, capturing localized fluctuations that are critical for accurate forecasting. These features are computed using sliding windows, where statistical measures are aggregated over a sequence of recent time steps to encapsulate short-term trends and variability in mobility patterns. For each district, a rolling window of varying sizes—ranging from 5 to 15 minutes in 5-minute increments—is applied to the timeseries data. Within each window, the mean is computed to capture central tendency and, over larger windows, the degree of short-term variability.

Finally, the objective of the exponentially weighted features is to prioritize recent observations while retaining information from historical data. This is achieved through the application of exponentially weighted techniques, which assign progressively smaller weights to older data points, ensuring that the model places greater emphasis on more recent trends. Among these methods, the Exponential Weighted Moving Average (EWMA) is particularly efective, as it assigns exponentially decreasing weights to past observations, thereby capturing temporal dependencies in a manner that balances adaptability and historical context.

For a given timeseries , the EWMA at time is computed recursively as = + (1 − ) − 1 where represents the smoothing factor, typically defined as = 2+1 for a given window size . This formulation allows the predictive framework to swiftly adapt to evolving patterns, including sudden variations induced by external factors such as weather conditions, road incidents, or public events, while maintaining a stable representation of long-term trends.

The features used in our methodology are summarized in Table 1.

4. Experimental Study

This section describes the data and preprocessing steps used in our experimental study (Section 4.1) as well as our findings (Section 4.2). The experiments were conducted on a server with an AMD Epyc 64-Core CPU and an Nvidia A100 GPU with 40GB of memory. Our source code is publicly available for reproducibility purposes.2

4.1. Experimental Setup

In our study, we utilize shared micro-mobility data provided by Deelfiets Nederland, a popular micro-mobility service provider in Netherlands, through an open API3. Data consists of real-time GPS positions for vehicles in the Netherlands, which we query every 60 seconds, creating a dataset of suficient granularity for our use-case. The dataset includes the latitude, longitude, vehicle type and company name for each vehicle. For the purposes of this study, we utilize mobility data from two major metropolitan areas, Amsterdam and Rotterdam, focusing specifically on the ten most densely populated districts in each city. These districts exhibit diverse urban characteristics and distinct spatiotemporal patterns, making them well-suited for evaluating the model’s adaptability to localized mobility trends. The 2https://github.com/DataStories-UniPi/Shared-Mobility.git 3https://api.deelfietsdashboard.nl/dashboard-api/public/vehicles_in_ public_space temporal coverage of our data is about 2 months (November 11, 2024 - January 15, 2025), and 5 months (August 06, 2024 - January 15, 2025), respectively. Since the original data source is a stream instead of a static dataset, we manually append the temporal indicator in each subsequent request.

To derive demand estimates at the district level, a spatial filtering and aggregation procedure is applied. First, a point-polygon intersection is performed to associate each mobility trace with its corresponding district boundaries, discarding traces that fall outside the predefined study areas. Next, the filtered traces are aggregated based on their unique spatio-temporal identifiers (i.e., timestamp and district ID), summing the occurrences within each region to obtain demand estimates. This transformation preserves the spatial and temporal integrity of the data while ensuring compatibility with the forecasting models. In its final form, the dataset consists of 89,022 observations across 97 districts in Amsterdam and 218,325 observations across 22 districts in Rotterdam.

To validate our models, we use a 70-30 train-test temporal split validation method. To ensure efective training of the classifier, additional preprocessing steps were taken apart from the feature extraction process discovered above. To this extent, a quantile-based discretization function was employed to transform continuous density values into 3 discrete demand levels (Low, Medium, High, as they will be defined in the following paragraph). This transformation enhances the classifier’s ability to model class distinctions and mitigates the challenges associated with imbalanced class distributions.

We define ’Medium’ demand as the range where demand levels are centered around 50% of the value of the day with the highest demand, which serves as the baseline value. Specifically, for a given threshold ∈ (0, 50), the range [50 − , 50 + ) is categorized as normal demand. Demand levels below (above) this range are classified as ’Low’ (’High’, respectively) demand. In our experiments, we set = 17, in order for the demand levels to correspond to three equally sized ranges. This threshold provides a balanced distribution of data across the defined demand categories, facilitating a robust analysis of varying demand patterns.

The optimization framework employed in this study incorporates Bayesian Optimization due to its ability to balance exploration and exploitation efectively. This approach facilitates the exploration of the search space for better solutions while simultaneously exploiting regions with high potential, thereby reducing the number of iterations required for optimization. Its suitability is particularly evident when dealing with extensive experimental evaluations, such as the computationally intensive training of ML models, as it leverages prior knowledge to guide sampling and optimize resources eficiently. Additionally, the probabilistic framework of Bayesian Optimization enables the automatic adjustment of the exploration-exploitation trade-of through its acquisition functions.

4.2. Experimental Results

In this section, we assess the overall performance of our models across multiple districts and timestamps. Table 2 summarizes the training and inference times of our models across diferent prediction horizons in the two cities of interest. Training time corresponds to the overall time required to train the respective model, whereas inference time corresponds to the average time taken to generate a specific prediction for a given time horizon. It turns out that training of our models is extremely fast since it takes 1-2 seconds or even less to be performed, while the inference time is impressively in the order of 1 microsecond. This analysis showcases the computational eficiency of the models, providing insights into their scalability and suitability for real-world spatio-temporal forecasting tasks.

4.3. A Note on Feature Importance

In order to assess the efect of the underlying features in the quality of prediction of either the classifier or the regressor, we present the average feature importance across all prediction horizons for both models. The feature importance values were computed using XGBoost’s internal feature importance computation algorithm. Given the minimal contributions of some features (importance values below 10− 4), the y-axis of the corresponding plots has been log-scaled to improve visualization. The results of this analysis are grouped by city and illustrated in Figures 4 and 5.

In both figures we can see that the average contribution of the features related to the target is higher than those of the neighbors, as expected. Overall, in decreasing order, the most important features appear to be the target_ewm_*, the target_rolling_mean_*, and the target_lag_*, denoting the EWMA, the rolling mean, and the lagged value, respectively,

5. Conclusions

This study investigated spatio-temporal forecasting methods for density prediction in urban environments. By integrating spatial dependencies with temporal trends, the proposed methodology efectively captured localized patterns, generating accurate forecasts across multiple horizons. Key contributions include a flexible feature engineering pipeline that incorporates both intra- and inter-district interactions, alongside the application of eficient gradient boosting architectures. This approach enhances predictive accuracy while minimizing computational overhead, rendering it suitable for large-scale forecasting tasks. Experimental results demonstrated the eficacy of gradient-boosted density predictors for both regression and classification, exhibiting competitive performance across varying prediction horizons. Although accuracy slightly declines for longer horizons, the model remains robust, underscoring its adaptability and practical applicability in real-world mobility forecasting.

This research also unveils opportunities for future exploration. Specifically, the integration of feature interactions and their contributions to the overall model quality warrant further investigation. Incorporating external factors, such as weather conditions or public events, could potentially enhance predictive performance. Additionally, applying the proposed methodology to other geographical areas and diverse mobility scenarios would help validate its generalization capabilities and adaptability to varying contexts. Finally, a comprehensive experimental comparison with related work under specific settings is planned to facilitate fair benchmarking.

In conclusion, this study showcases the ability of datadriven approaches to efectively tackle spatio-temporal forecasting challenges. By leveraging the inherent spatial segmentation of cities into districts, the methodology enables the extraction of localized temporal patterns, facilitating more informed decision-making and contributing to smarter, more eficient urban planning from the perspective of mobility data science [21].

Acknowledgments

This work was supported in part by the Horizon Framework Programme of the European Union under grant agreement No. 101093051 (EMERALDS; https://www.emeraldshorizon.eu/).

Declaration on Generative AI

During the preparation of this work, the authors used Large Language Models in order to paraphrase and reword. After using this tool/service, the authors reviewed and edited the content as needed and take full responsibility for the publication’s content.

[1]

M. N.

Mladenović , Mobility as a Service , in: R. Vickerman (Ed.), International Encyclopedia of Transportation, Elsevier , 2021 , pp. 12 - 18 . doi: 10 .1016/ B978-0 -08-102671-7 . 10607 - 4 .

[2]

F. O

'Donncha ,

Hu , et al., A spatio-temporal LSTM model to forecast across multiple temporal and spatial scales , Ecological Informatics 69 ( 2022 ) 101687 . doi: 10 . 1016/j.ecoinf. 2022 . 101687 .

[3]

Yu ,

Yin , et al., Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Trafic Forecasting , in: Proceedings of the 27th International Joint Conference on Artificial Intelligence , 2018 , p. 3634 - 3640 . doi: 10 .24963/ijcai. 2018 /505.

[4]

Zheng ,

Fan , et al., GMAN: A Graph MultiAttention Network for Trafic Prediction , Proceedings of the AAAI Conference on Artificial Intelligence 34 ( 2020 ) 1234 - 1241 . doi: 10 .1609/aaai.v34i01. 5477 .

[5]

Yang ,

Jin , et al., A Survey on Difusion Models for Time Series and Spatio-Temporal

Data

, ArXiv abs/2404 .18886 ( 2024 ). doi: 10 .48550/arXiv.2404. 18886.

[6]

Li ,

Yu , et al., Difusion Convolutional Recurrent Neural Network: Data-Driven Trafic Forecasting , arXiv: 1707 . 01926 ( 2017 ). doi: 10 .48550/arXiv. 1707. 01926 .

[7]

Pelekis ,

Theodoridis , Mobility Data Management and Exploration , Springer, New York, NY, 2014 . doi: 10 . 1007/978-1- 4939 -0392-4.

[8]

Liao ,

Zeng , et al., Taxi demand forecasting based on the temporal multimodal information fusion graph neural network , Applied Intelligence 52 ( 2022 ) 12077 - 12090 . doi: 10 .1007/s10489-021-03128-1.

[9]

Geng ,

Li , et al., Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting , Proceedings of the AAAI Conference on Artificial Intelligence 33 ( 2019 ) 3656 - 3663 . doi: 10 .1609/aaai. v33i01. 33013656 .

[10]

Wu ,

Pan , et al., A Comprehensive Survey on Graph Neural Networks, IEEE Transactions on Neural Networks and Learning Systems 32 ( 2019 ) 4 - 24 . doi: 10 . 1109/TNNLS. 2020 . 2978386 .

[11]

Grinsztajn ,

Oyallon , et al., Why do tree-based models still outperform deep learning on typical tabular data? , in: Proceedings of the 36th International Conference on Neural Information Processing Systems , (NIPS '22) , 2022 , pp. 507 - 520 . doi:https: //doi.org/10.48550/arXiv.2207.08815.

[12]

Feng , Predicting the Dynamic Demand of BikeSharing System in Chicago with Divvy Operation Data: A Data-Driven approach for bike-sharing demand forecasting , in: Proceedings of the 5th International Conference on E-Commerce , E-Business and EGovernment , ICEEG ' 21 , 2021 , p. 30 - 34 . doi: 10 .1145/ 3466029.3466035.

[13]

Xiao ,

Kong , et al., Short-Term Demand Forecasting of Urban Online Car-Hailing Based on the K-Nearest Neighbor Model , Sensors 22 ( 2022 ). doi: 10 . 3390/s22239456.

[14]

Sohrabi ,

Ermagun , Dynamic bike sharing trafifc prediction using spatiotemporal pattern detection , Transportation Research Part D: Transport and Environment 90 ( 2021 ) 102647 . doi: 10 .1016/j.trd. 2020 . 102647 .

[15]

Hankey ,

Zhang , et al., Predicting bicycling and walking trafic using street view imagery and destination data , Transportation Research Part D-transport and Environment 90 ( 2021 ) 102651 . doi: 10 .1016/j. trd. 2020 . 102651 .

[16] H. T. K. Le , R. Buehler , et al., Correlates of the Built Environment and Active Travel: Evidence from 20 US Metropolitan Areas , Environmental Health Perspectives 126 ( 2018 ). doi: 10 .1289/EHP3389.

[17]

Ge ,

Li , et al., Self-Attention ConvLSTM for Spatiotemporal Forecasting of Short-Term Online CarHailing Demand , Sustainability 14 ( 2022 ). doi: 10 . 3390/su14127371.

[18]

Luo ,

Shangguan , et al., Spatial-Temporal Difusion Convolutional Network: A Novel Framework for Taxi Demand Forecasting , ISPRS Int. J. Geo Inf . 11 ( 2022 ) 193 . doi: 10 .3390/ijgi11030193.

[19]

Li ,

Zheng , et al., Trafic prediction in a bikesharing system , in: Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Sigspatial '15 , 2015 . doi: 10 .1145/2820783.2820837.

[20]

Jin ,

Ye , et al., Demand Forecasting of Online CarHailing With Stacking Ensemble Learning Approach and Large-Scale

Datasets

, IEEE Access 8 ( 2020 ) 199513 - 199522 . doi: 10 .1109/ACCESS. 2020 . 3034355 .

[21]

Mokbel ,

Sakr , et al., Mobility Data Science: Perspectives and Challenges , ACM Trans. Spatial Algorithms Syst . 10 ( 2024 ). doi: 10 .1145/3652158.