<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3652158</article-id>
      <title-group>
        <article-title>Short-Term Trafic Congestion Prediction Using Machine Learning: XGBoost vs. Deep Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>George S. Theodoropoulos</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zoi Nikolarakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yannis Theodoridis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Muenster</institution>
          ,
          <addr-line>Leonardo-Campus 3, Muenster, 48149</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Piraeus</institution>
          ,
          <addr-line>Karaoli ke Dimitriou 80, Piraeus, 185 34</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <volume>8</volume>
      <fpage>24</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>Recurring large-scale trafic congestion is one of the most important challenges that modern urban cities face, necessitating accurate prediction models to enable timely interventions. While Deep Learning approaches such as Graph Convolutional Networks (GCNs) and Long Short-Term Memory networks (LSTMs) have been widely adopted for capturing spatio-temporal dependencies, they are often accompanied by high computational costs and lack interpretability. This paper presents a comprehensive comparison of these deep learning methods against a Gradient Boosting approach (XGBoost) enhanced by systematic feature engineering for short-term trafic congestion prediction, formulated as a binary classification problem. We evaluate the models on two distinct datasets, the public PEMS-BAY and the proprietary Dutch NDW, across prediction horizons ranging up to 1 hour (in 10-minute intervals). Our results demonstrate that the XGBoost-based model significantly outperforms GCN and LSTM in terms of prediction accuracy (1 score) while requiring substantially less computational resources, achieving up to 3 orders of magnitude savings in inference time. Additionally, we highlight the inherent explainability of the tree-based approach, which provides actionable insights into congestion propagation patterns, ofering a practical and transparent solution for real-world urban trafic management systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;trafic congestion prediction</kwd>
        <kwd>XGBoost</kwd>
        <kwd>graph convolutional networks</kwd>
        <kwd>LSTM</kwd>
        <kwd>urban trafic management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Urban trafic congestion has become an increasingly pressing issue in cities around the world, afecting
millions of commuters daily and costing economies billions of dollars each year [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As cities continue
to grow and more people move into urban areas, the problem of gridlock during peak hours has only
gotten worse, leading to longer commute times, higher fuel consumption, and increased air pollution.
For city planners and trafic engineers, finding ways to predict when and where congestion will occur
has become essential for developing efective trafic management strategies. Being able to anticipate
trafic jams before they happen allows for timely interventions such as adjusting trafic signal timings,
implementing dynamic lane management, or providing real-time navigation alternatives to drivers
through mobile applications. However, accurately forecasting congestion remains a challenging task
because trafic patterns are influenced by numerous factors including time of day, day of week, weather
conditions, special events, and the interconnected nature of road networks where a problem in one
location often quickly spreads to others. Figure 1 visually presents such congestion events in an
urban trafic network, where a sensor is experiencing congestion (i.e., the average speed is below the
congestion threshold that is represented by the red dotted line) while others are not.
      </p>
      <p>In recent years, researchers have explored various machine learning approaches to tackle this complex
prediction problem, with many studies focusing on deep learning techniques that can capture both
spatial relationships between road segments and temporal patterns in trafic flow [ 2]. Despite their
demonstrated potential, these methods often require substantial computational resources and can
be dificult to interpret, making it hard for transportation professionals to understand why certain
predictions are made.</p>
      <p>Our work adopts a diferent approach by employing an eficient machine-learning solution (XGBoost)
and comparing it with deep learning models, namely Graph Convolutional Networks (GCNs) and Long
Short-Term Memory networks (LSTMs), to determine which provides the most efective solution for
realworld trafic management applications. In this paper, we present a comprehensive evaluation of these
three approaches for short-term trafic congestion prediction, defined as forecasting whether specific
road segments will experience significant slowdowns within the next 10–60 minutes. We formalize
this as a binary classification problem where congestion is identified based on speed measurements
falling below a certain threshold relative to historical patterns at each location. Our experiments on
two diferent datasets show that systematic feature engineering leads to an XGBoost model that is
more accurate, eficient and interpretable, turning what would otherwise be a black-box system into a
valuable diagnostic tool for improving urban mobility.</p>
      <p>The remainder of this paper is structured as follows: Section 2 reviews related work on trafic
state prediction, contrasting traditional statistical methods with modern deep learning architectures.
Section 3 formally defines the short-term trafic congestion prediction problem as a binary classification
task. Section 4 describes the deep learning baselines used in this study, detailing the GCN and LSTM
approaches. Section 5 presents the proposed XGBoost-based methodology, emphasizing the systematic
ifve-phase feature engineering pipeline. Section 6 reports the experimental results on the PEMS-BAY
and NDW datasets, evaluating prediction accuracy and computational eficiency. Finally, Section 7
concludes the paper and discusses the implications for practical urban trafic management systems.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Early eforts to understand and mitigate the problems that urban trafic congestion introduces focused
on broad conceptual frameworks, as seen in Afrin &amp; Yodo’s [3] comprehensive survey. They emphasized
the need for holistic, sustainable solutions that integrate infrastructure, trafic management, and policy,
highlighting how data-driven approaches and smart mobility concepts are becoming essential for
building resilient transportation systems.</p>
      <p>Alghamdi et al. [4] demonstrated the utility of ARIMA models for short-term trafic forecasting,
showing they can efectively capture linear, stationary patterns in historical trafic data with reasonable
computational eficiency. However, they also noted a key limitation: these models often struggle
with the complex, non-linear dynamics inherent in real-world trafic flow. This observation spurred
significant interest in machine learning and deep learning techniques, which could better handle the
spatio-temporal complexity of urban networks.</p>
      <p>Early deep learning approaches focused on modeling temporal dependencies. Liu et al. [5] were
among the first to successfully integrate spatial awareness directly into temporal modeling with their
Conv-LSTM architecture. By embedding convolutional operations within LSTM units, their model
learned spatial correlations alongside temporal dynamics, significantly outperforming standard LSTMs
and traditional time-series methods for short-term trafic flow prediction. This established a crucial
principle: efective trafic prediction requires joint modeling of space and time.</p>
      <p>Subsequent research refined this spatio-temporal modeling. Zhao et al. [ 6] took a significant step
forward by introducing the Temporal Graph Convolutional Network (T-GCN), which explicitly leveraged
the underlying road network topology using GCNs combined with Gated Recurrent Units (GRUs).
This approach proved highly efective, outperforming both statistical models and earlier deep learning
baselines by more accurately representing how trafic states propagate through connected road segments.</p>
      <p>Further advancements focused on optimizing the representation of trafic data. Guo et al. [ 7]
developed an Optimized Graph Convolutional Recurrent Neural Network (OGCRNN), incorporating
an optimization strategy to improve feature propagation and reduce noise within the graph structure,
leading to even higher prediction accuracy. Ranjan et al. [8] combined CNN, LSTM, and Transpose
CNN layers to generate high-resolution congestion maps for city-wide networks, demonstrating strong
performance for large-scale real-time monitoring. Nagy &amp; Simon [9] made a particularly important
contribution by emphasizing the spatial propagation of congestion itself; they showed that explicitly
modeling how congestion spreads across the network, rather than just predicting isolated points,
significantly boosts forecasting accuracy. This highlighted the need for models to understand trafic as
a connected system where a jam in one area directly impacts neighboring segments.</p>
      <p>A critical challenge in applying these models to congestion prediction specifically—often framed
as a binary classification problem (congested vs. non-congested)—is the inherent class imbalance, as
congestion events are typically rare (e.g., less than 15–20% of observations). Chen et al. [10] directly
addressed this with their Periodic Convolutional Neural Network (PCNN). They explicitly modeled
trafic’s strong daily and weekly periodicity using a time-series folding technique to create periodic
input matrices, making the model robust to non-stationary patterns. Crucially, PCNN treated congestion
as a binary state defined by a speed threshold (e.g., below 70% of historical baselines), similar to many
operational definitions. By prioritizing recall for congested states through weighted loss functions and
processing multi-grained temporal windows, PCNN achieved strong 1-scores (87.3%) for short-term
(30-min) congestion detection on NYC data, outperforming LSTMs and GCNs.</p>
      <p>More recently, research has increasingly tackled the practical constraint of sparse sensor coverage, a
common reality where only a fraction of road segments have fixed sensors. Li et al. [ 11] proposed a
Multi-Task Graph Neural Network (MT-GNN) specifically designed for this scenario. Their key insight
was using multi-task learning: the model jointly predicted continuous speed (the auxiliary task) and
binary congestion states (the primary task). This approach enriched the features used for congestion
classification, helping overcome the data scarcity at any single point.</p>
      <p>Liu et al. [12] further advanced solutions for partial sensing with their Spatio-Temporal Partial
Sensing (STPS) framework. Recognizing that unsensed locations exhibit diferent trafic distributions
than observed ones, they introduced a learned spatial transfer matrix based on graph attention. This
matrix quantifies how congestion propagates from sensed to unsensed segments, efectively modeling
spatial dependencies even with incomplete data. To handle non-stationarity (like sudden incidents),
they used rank-based speed features relative to historical distributions (e.g., “slower than 95% of past
observations”) instead of raw speeds.</p>
      <p>The importance of carefully selecting evaluation metrics for these tasks cannot be overstated. Naidu,
Zuva, &amp; Sibanda [13] provided a vital systematic review, stressing that inappropriate metrics (e.g.,
relying solely on overall accuracy for imbalanced congestion data) can lead to misleading conclusions.
They underscored the need to align metrics like 1-score or precision-recall curves with the specific
problem objectives and data characteristics, which is essential for fairly comparing models like PCNN,
MT-GNN, and STPS.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Problem Definition</title>
      <p>This work addresses the critical challenge of short-term trafic congestion prediction in urban road
networks, specifically formulated as a spatio-temporal binary classification problem. Given an urban
network of fixed-point trafic sensors monitoring real-time speed measurements, we define congestion
at location  and time  as a significant deviation from typical trafic conditions, formalised through the
following condition:
(, ) =
{︃1 if (, ) &lt;  · quantile  ((, ·))</p>
      <p>0 otherwise
where (, ) denotes the trafic speed at sensor ,  is the congestion severity threshold,  represents the
high-speed quantile reference, and quantile ((, ·)) is the  -quantile of historical speed distribution
for sensor . Without loss of generality and according to domain experts’ feedback, we set  = 0.65
and  = 0.9 , which, in fact, considers acute congestion events as speed reductions exceeding 35% of the
typical (90th percentile) flow conditions at each location, as shown in the sample time series in Figure 1.</p>
      <p>Regarding the prediction horizon, the objective of this work is to forecast future congestion states
(,  + ℎ) for arbitrary sensor locations  and prediction horizons ℎ, where ℎ = 10, 20, . . . , 60
minutes. This requires modeling the complex spatio-temporal dynamics of trafic flow, where
congestion emerges through spatial propagation, temporal patterns, non-stationarity, and class imbalance
(congestion events typically constitute less than 15% of observations).</p>
      <p>The above formulation extends beyond conventional trafic state prediction (which typically estimates
continuous speed or flow values) by explicitly framing congestion as a binary classification problem
with an operational threshold that directly corresponds to trafic management interventions. This binary
formulation enables direct alignment with real-world decision-making: when congestion is predicted
with suficient confidence, trafic engineers can implement specific interventions such as dynamic signal
timing, ramp metering, or incident response deployment.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Deep Learning Models for Trafic Congestion Prediction</title>
      <p>The sections that follow describe two distinct deep learning modeling strategies that are used to forecast
congestion at individual sensor locations. The first strategy employs a GCN, which first aggregates
information from neighboring sensors based on their spatial proximity and then applies a temporal
processing step to capture the dynamics of trafic flow. The second strategy builds on this spatial
foundation by adding a recurrent LSTM architecture that further refines the representation of trafic
over successive time steps. Together, these approaches illustrate the efectiveness of deep learning
models in predicting congestion across diferent horizon values.</p>
      <sec id="sec-4-1">
        <title>4.1. The GCN Approach</title>
        <p>The GCN model is specifically engineered to leverage the spatial relationships between trafic sensors
while capturing temporal trafic patterns. The methodology exploits both dynamic trafic measurements
and static geographical information through a specialized graph convolution layer that incorporates
the precomputed weighted adjacency matrix. This design allows the model to identify how trafic
conditions propagate through the road network based on physical proximity between sensors.</p>
        <p>The network structure begins with the input layer that receives the spatiotemporal feature matrix,
followed by the graph convolution operation that applies spatial filtering across the sensor network. The
output from the graph convolution layer is then processed through feature aggregation mechanisms that
combine information across both spatial and temporal dimensions before passing to the final prediction
layers. This formulation conceptually follows the proposed GCN architecture of Yu et al. [14].
Spatial Dependency Modeling The spatial relationships between sensors are encoded in a weighted
adjacency matrix derived from geographical distances between sensor locations. This matrix is
transformed using a Gaussian Radial Basis Function (RBF) kernel to emphasize closer relationships while
diminishing the influence of distant sensors. The kernel function is defined as  = exp(− 2 / 2),
where  represents the geographical distance between sensors  and , and  is a scaling parameter.</p>
        <p>The adjacency matrix is row-normalized to create a weighted representation where the influence of
neighboring sensors is proportional to their spatial proximity. This normalized matrix is then used in
the graph convolution operation to ensure that each sensor’s representation is updated based on its
immediate neighbors with appropriate weighting, efectively modeling trafic flow propagation across
the network.</p>
        <p>Feature Processing The model incorporates various feature types processed through the graph
convolution layer. Dynamic features include current speed measurements for each sensor and
neighboraggregated speed values weighted by spatial distance. The temporal component is captured through
cyclical features (sin/cos transforms) for minute-of-day and day-of-week, which encode the periodic
nature of trafic patterns.</p>
        <p>Static geographical coordinates (latitude and longitude) are processed separately and then fused with
the dynamic features after graph convolution. This fusion occurs at a dedicated feature combination
layer, allowing the model to leverage both the spatial relationships captured through the graph structure
and the absolute geographical positioning of sensors.</p>
        <p>Model Training The GCN model is trained using binary cross-entropy loss, with early stopping
implemented to prevent overfitting. During training, the model processes fixed-length historical
windows of trafic data to predict congestion states at multiple future time horizons. The training
procedure maintains the chronological order of data, with the 80-10-10 split ensuring that no future
information leaks into the training process.</p>
        <p>The model is trained independently for each sensor location, allowing it to capture the unique trafic
patterns at each specific location. The spatial relationships encoded in the adjacency matrix enable the
model to leverage information from neighboring sensors while maintaining sensor-specific prediction
capabilities.</p>
        <p>Decision Threshold Optimisation Since the classifier outputs a vector of estimated congestion
probabilities, an appropriate decision threshold must be selected. This threshold is identified by
computing the 1 score on the test set for each candidate threshold and choosing the value that maximizes
the 1 metric. This process results in a model that exhibits greater robustness and generalization across
unseen data, while also striking a balance between precision and recall scores.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. The LSTM Approach</title>
        <p>The LSTM approach that draws inspiration from the work present in Zhao et al. [6] extends the spatial
modeling foundation established by the GCN framework with enhanced temporal processing capabilities.
While leveraging the same graph-based spatial dependency modeling as the GCN (detailed in Section 4.1),
this architecture introduces sequential temporal processing to capture the dynamic evolution of trafic
patterns. The hybrid design processes spatial relationships through graph convolution before feeding
spatially-enhanced representations into an LSTM layer, creating a unified spatiotemporal model.
Spatial-Temporal Integration The spatial dependency modeling follows the GCN implementation
described above, utilizing the same distance-based adjacency matrix transformed via Gaussian RBF
kernel and row-normalized for spatial weighting. However, the LSTM architecture introduces a critical
modification: the graph convolution layer processes input features at each timestamp independently,
generating spatially-aware representations that are then sequentially fed into the LSTM. This two-stage
processing (spatial → temporal) enables the model to first contextualize trafic conditions within the road
network before analyzing their temporal evolution, efectively capturing how congestion propagates
through both space and time.</p>
        <p>Sequence Processing The LSTM layer processes the sequence of spatially-enhanced representations
from historical time windows, maintaining a hidden state that evolves with each time step to capture
temporal dependencies. Unlike the GCN’s static spatial aggregation, the LSTM explicitly models the
progression of trafic states through its memory cells, making it particularly efective at capturing
congestion formation patterns, propagation speeds, and dissipation dynamics. The model processes
ifxed-length historical sequences to predict congestion states at multiple future horizons, with its
stateful architecture allowing it to retain relevant information from earlier time steps while discarding
transient fluctuations.</p>
        <p>Model Training Training follows the same fundamental protocol as the GCN approach (binary
cross-entropy loss, early stopping, and chronological 80-10-10 data split), but with sequence-specific
adaptations. The model processes sliding windows of historical trafic data where each window contains
multiple time steps of spatially-processed features. Speed features are normalized using the same
StandardScaler methodology described in Section 4.1, ensuring consistent scaling across sensors and
time periods. While maintaining per-sensor training to capture location-specific patterns, the LSTM
benefits from the spatial information propagated through the graph convolution layer, creating a
balanced approach that leverages both local sensor characteristics and network-wide trafic dynamics.
Decision Threshold Optimisation The optimal decision threshold is identified in the same way as
with GCN, ensuring robustness and comparability between the two methods.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. XGBoost for Trafic Congestion Prediction</title>
      <p>In this section, we present our XGBoost-based methodology for congestion prediction. Unlike the
GCN and LSTM approaches (detailed in Sections 4.1 and 4.2, respectively), which inherently capture
spatial dependencies through graph-based architectures and temporal dynamics via sequence modeling,
this tree-based approach relies on explicit feature engineering to encode spatiotemporal patterns. While
deep learning models automatically learn representations from raw sensor data, XGBoost requires careful
construction of lag features and interaction terms to model trafic propagation efects—highlighting
a fundamental methodological contrast between representation learning and feature engineering
paradigms.</p>
      <p>For the XGBoost model, we implement a systematic five-phase ablation methodology designed to
iteratively refine feature engineering and model optimization. This pipeline consists of a sequential
series of feature-engineering and hyper-parameter tuning steps that progressively narrow the predictor
from all sensors toward a minimal but high-performing configuration. Each stage builds on the results
of the previous one, resulting in a final model that provides the most accurate predictions. A visual
illustration of this workflow is presented in Figure 2.</p>
      <p>Phase 1: Baseline Model Construction The first phase establishes a reference model using all
available sensor locations as features, augmented with 5 basic temporal indicators, namely day, hour,
minute, weekend/rush hour status. Sample weights address class imbalance by inversely scaling with
class frequency. This baseline provides initial feature importance rankings and performance metrics
that guide subsequent ablation steps, while establishing a critical comparison point for evaluating the
value of iterative refinement.</p>
      <p>Phase 2: Primary Feature Selection Leveraging importance scores from Phase 1, this phase
identifies the most predictive features by training models with reduced sets (3, 5, or 10 features). The
optimal feature count is determined by maximizing 1 score, significantly reducing dimensionality
while preserving predictive power. This step isolates key sensor locations and temporal patterns critical
for congestion prediction at each specific site, contrasting with the GCN/LSTM’s continuous spatial
aggregation.</p>
      <p>Phase 3: Advanced Feature Engineering This phase introduces lagged features at multiple intervals
(5–720 minutes) to explicitly model temporal dependencies that deep learning architectures capture
implicitly. The expanded set includes:
• Short-term dynamics (5–60 minute lags)
• Mid-term patterns (720-minute lag)
• Daily/weekly periodicity
These engineered features compensate for XGBoost’s lack of inherent temporal modeling, directly
encoding how congestion evolves from free-flow to peak conditions—addressing a key limitation of
non-sequential tree-based models.</p>
      <p>Phase 4: Fine-Grained Feature Selection Systematic selection across 5–50 features identifies the
minimal optimal set by evaluating performance at each cardinality. The optimal count is determined
at the point of diminishing returns, where additional features no longer yield significant accuracy
improvements (represented in this work by the more robust 1 score metric that takes both precision
and recall into account). This phase ensures computational eficiency while eliminating redundant
signals, contrasting with GCN/LSTM’s fixed architectural constraints.</p>
      <p>Phase 5: Final Model and Threshold Optimization
hyperparameters across:</p>
      <sec id="sec-5-1">
        <title>A comprehensive grid search optimizes</title>
        <p>• Learning rate [0.01, 0.1]
• Tree depth [3, 5, 7, 9]
• Boosting rounds [100, 300, 500, 700]</p>
        <p>The configuration maximizing validation 1 score is selected, completing the ablation pipeline. Lastly,
the optimal decision threshold is established using the same procedure employed for the aforementioned
deep-learning models.</p>
        <p>Overall, this multiphase approach demonstrates that sequential refinement creates synergistic
performance gains that cannot be easily obtained through isolated optimizations.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experimental Study</title>
      <p>This section presents the comprehensive evaluation of the proposed trafic congestion prediction models.
All experiments were conducted on an Intel-Xeon based server equipped with an NVIDIA A-100 GPU
with 40GB of VRAM, with the three methods under consideration utilising the GPU for both training and
inference. Regarding data, the publicly available PEMS-BAY dataset and the proprietary Dutch National
Data Warehouse (NDW) dataset were used, each providing unique spatio-temporal characteristics and
allowing for comprehensive validation under diverse trafic conditions.</p>
      <p>Our evaluation protocol employs a static model approach. The model is trained once on the training
set (80% of chronologically ordered data), with the test set (remaining 20%) being used only for final
evaluation without any model updates. This protocol reflects a practical scenario where the model is
trained ofline once and deployed for inference. In a real-time deployment scenario, the model could be
periodically retrained (e.g., weekly) to incorporate new data, but this is beyond the scope of the current
evaluation.</p>
      <sec id="sec-6-1">
        <title>6.1. Datasets Used</title>
        <p>PEMS-BAY1 is a widely adopted open-access dataset sourced from the California Department of
Transportation’s Performance Measurement System (PeMS). It comprises 325 loop-detector sensors
deployed across the San Francisco Bay Area highway network, recording vehicular speed at 5-minute
intervals from January 2017 to June 2017 (6 months total). This yields 52,115 samples per sensor
(16,937,375 samples network-wide) with an average speed of 62.62 mph. Critically, the dataset provides
precise geocoordinates and sensor IDs, facilitating the construction of a directional adjacency matrix that
captures spatial dependencies between sensors. For our experiments, 18 target sensors were randomly
selected to ensure unbiased evaluation of the model’s generalization capability. As a benchmark
dataset in trafic forecasting literature, PEMS-BAY’s open availability enables reproducibility and direct
comparison with state-of-the-art methods.</p>
        <p>NDW2 is a proprietary dataset provided by the Dutch National Data Warehouse, containing
highresolution trafic measurements from 204 loop detectors along two major highways (A20 and A4) in
Rotterdam, the Netherlands. Collected at 1-minute intervals over 13 months (April 2022 – May 2023), it
delivers 544,320 samples per sensor (112,129,920 samples network-wide) with an average speed of 94.31
kph. The dataset uniquely captures complex trafic dynamics across critical infrastructure including
merging zones, bottlenecks, and free-flow segments, while encompassing seasonal variations,
weatherrelated disruptions (e.g., rain/snow events), and incident-induced anomalies. Unlike PEMS, NDW’s
target sensors were strategically defined by local trafic management experts as four Locations of Interest
(LOIs)—nodes identified as congestion initiators where early intervention can prevent network-wide
spillback. This expert-curated selection presents a more challenging prediction scenario but ofers
higher real-world utility for trafic management. Due to its proprietary nature, NDW access is restricted
to approved research collaborations with Dutch transportation authorities.</p>
        <p>Both datasets are visually presented in Figure 3. Their comparative characteristics are summarized in
Table 1.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Experimental Results</title>
        <p>This section presents an analysis of the comparative performance of the three proposed approaches for
short-term trafic congestion prediction. We evaluate each model across the two distinct datasets (NDW
and PEMS-BAY) at multiple prediction horizons (10–60 minutes), with performance measured using
precision, recall, and 1 score after decision threshold tuning. The results reveal significant diferences
in predictive capabilities, computational eficiency and robustness across horizons.</p>
        <sec id="sec-6-2-1">
          <title>1https://pems.dot.ca.gov 2https://english.ndw.nu</title>
          <p>Table 2 presents the complete experimental results, including the optimal decision thresholds,
performance metrics, and computational characteristics for all configurations. The most striking observation
is the consistent superiority of the XGBoost approach across all prediction horizons and both datasets.</p>
          <p>Specifically, the results show that the XGBoost approach achieves the highest 1 scores across all
prediction horizons on both datasets. On the NDW dataset, XGBoost attains 1 scores ranging from
0.70 (10-minute horizon) to 0.44 (60-minute horizon), significantly outperforming GCN (0.50 to 0.43) and
LSTM (0.51 to 0.43). Similarly, on the PEMS-BAY dataset, XGBoost maintains 1 scores from 0.86 down
to 0.74, while GCN and LSTM fluctuate in the 0.72–0.74 range. This consistent superiority validates the
efectiveness of the systematic feature engineering approach, particularly the incorporation of
multiscale temporal lag features that capture both short-term dynamics and periodic patterns. Additionally,
XGBoost appears to be more robust overall, with the standard deviation of its 1 scores being consistently
lower than those of its deep-learning competitors, especially for lower prediction horizons.</p>
          <p>Regarding optimal threshold identification, XGBoost consistently employs high decision thresholds
(0.76–0.86) across all horizons, indicating its preference for high-confidence predictions that maximize
precision (0.39–0.84). In contrast, GCN and LSTM use significantly lower thresholds (0.15–0.45),
prioritizing recall (0.55–0.88) at the expense of precision. This trade-of aligns with their architectural designs:
the deep learning models’ spatial propagation mechanisms inherently favor capturing congestion
spread (high recall), while XGBoost’s feature-based approach enables more discriminative decision
boundaries. Notably, the XGBoost models maintain recall above 0.55 even at the 60-minute horizon on
NDW, demonstrating its ability to balance both metrics efectively.</p>
          <p>The computational characteristics, however, reveal a big contrast between the approaches.
XGBoost demonstrates significantly improved eficiency in the inference phase, as it processes individual
predictions in 0.10–1.10  s/sample, while GCN and LSTM require 57.5–210.1  s/sample. This
threeorders-of-magnitude advantage in inference speed is critical for real-time trafic management systems
where prediction latency directly impacts intervention efectiveness.</p>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. A Note on Explainability</title>
        <p>Another key advantage of the XGBoost model over black-box deep learning models is its inherent
explainability. Through iterative feature ablation, it uses an identifiable and minimal sensor set to identify
how congestion propagates between specific road segments at precise time intervals. For example, it
can uncover patterns like morning rush-hour disturbances at arterial junctions causing downstream
bottlenecks in 15–20 minutes, or evening patterns that spread from commercial to residential zones. This
clarity lets trafic engineers directly identify critical relationships (e.g., how one interchange sequentially
afects three segments) and distinguish congestion catalysts from bufers. The model’s simplicity makes
these insights immediately actionable without data science expertise, enabling targeted interventions
like timed signal adjustments.</p>
        <p>In a more detailed example, sensors located at the A4 highway’s merging zone near the Benelux
tunnel and the A20 highway’s interchange with the A13 consistently displayed high feature importance
scores (ranked among the top 3–5 features) when predicting congestion at three of the four LOI target
sensors. These sensors were consistently selected as important predictors regardless of the prediction
horizon (10, 20, or 30 minutes), suggesting they represent genuine congestion initiation points rather
than transient correlations. Critically, these high-importance sensors correspond precisely to locations
that trafic engineering experts have identified as known bottlenecks, i.e., areas where trafic flow
naturally degrades due to lane reductions, merging maneuvers, and geometric design. The feature
importance analysis revealed that congestion at these upstream locations typically manifests as leading
indicators for downstream LOI congestion with a time lag of approximately 15–25 minutes, which
aligns closely with empirical observations of trafic wave propagation speeds on Dutch highways. This
ifnding validates that the XGBoost model has learned physically meaningful relationships that mirror
real-world trafic dynamics.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions &amp; Future Work</title>
      <p>This study shows that a well-engineered XGBoost model consistently outperforms deep learning
architectures like GCNs and LSTMs in short-term trafic congestion forecasting, across both public and
proprietary datasets. The model achieved higher 1 scores at all prediction horizons while requiring
far less computational resources, enabling deployment on standard hardware and frequent updates. For
transportation agencies, this means advanced predictive capabilities are now accessible without costly
GPU infrastructure. The model’s transparency also supports collaboration between data scientists and
trafic managers, and allows municipalities of all sizes to implement efective, timely interventions
against congestion.</p>
      <p>Future work will focus on validating generalizability across more large-scale urban datasets, exploring
incremental retraining to handle shifting trafic patterns, and investigating deployment on edge hardware
for decentralized, low-latency inference within trafic infrastructure.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was supported partially by the Horizon Europe R&amp;I programme EMERALDS under the GA
No. 101093051 and the University of Piraeus Research Center (UPRC). The authors also acknowledge
Mr. Erik-Sander Smits (ARANE, NL) for his expert feedback.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used LLMs in order to conduct grammar and spelling
checks. After using these tools, the authors reviewed and edited the content as needed and take full
responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mokbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sakr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Züfle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Aref</surname>
          </string-name>
          , G. Andrienko,
          <string-name>
            <given-names>N.</given-names>
            <surname>Andrienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chawla</surname>
          </string-name>
          , R. Cheng, P. Chrysanthis,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ghinita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Graser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunopulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-S.</given-names>
            <surname>Kim</surname>
          </string-name>
          , K.-S. Kim,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kröger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krumm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Magdy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nascimento</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ravada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Renz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sacharidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Salim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sarwat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shahabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Speckmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tanin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Torp</surname>
          </string-name>
          , G. Trajcevski, M. van
          <string-name>
            <surname>Kreveld</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wenk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Werner</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Youssef</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zeinalipour</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , E. Zimányi,
          <article-title>Mobility data science: Perspectives and</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>