=Paper=
{{Paper
|id=Vol-3827/paper6
|storemode=property
|title=SpoT-Mamba: Learning Long-Range Dependency on Spatio-Temporal Graphs with Selective State Spaces
|pdfUrl=https://ceur-ws.org/Vol-3827/paper6.pdf
|volume=Vol-3827
|authors=Jinhyeok Choi,Heehyeon Kim,Minhyeong An,Joyce Jiyoung Whang
|dblpUrl=https://dblp.org/rec/conf/strl/ChoiKAW24
}}
==SpoT-Mamba: Learning Long-Range Dependency on Spatio-Temporal Graphs with Selective State Spaces==
SpoT-Mamba: Learning Long-Range Dependency on
Spatio-Temporal Graphs with Selective State Spaces
Jinhyeok Choi1 , Heehyeon Kim1 , Minhyeong An1 and Joyce Jiyoung Whang1,*
1
School of Computing, KAIST, Daejeon, Republic of Korea
Abstract
Spatio-temporal graph (STG) forecasting is a critical task with extensive applications in the real world, including traffic and weather
forecasting. Although several recent methods have been proposed to model complex dynamics in STGs, addressing long-range spatio-
temporal dependencies remains a significant challenge, leading to limited performance gains. Inspired by a recently proposed state space
model named Mamba, which has shown remarkable capability of capturing long-range dependency, we propose a new STG forecasting
framework named SpoT-Mamba. SpoT-Mamba generates node embeddings by scanning various node-specific walk sequences. Based on
the node embeddings, it conducts temporal scans to capture long-range spatio-temporal dependencies. Experimental results on the
real-world traffic forecasting dataset demonstrate the effectiveness of SpoT-Mamba.
Keywords
Spatio-Temporal Graphs, Traffic Forecasting, Selective State Spaces, Random Walks
1. Introduction including language, audio, and genomics. In addition, there
have been several studies towards replacing the transformer
The predictive learning methods on time series play a cru- with Mamba in graph transformer frameworks [20, 21, 22].
cial role in diverse applications, such as traffic and weather In this paper, our focus lies on the predictive learning task
forecasting. The intricate relationships and dynamic na- on STGs, specifically STG forecasting. For STG forecasting,
ture of time series are often represented as graphs, specifi- it is vital to capture the evolving behavior of individual
cally spatio-temporal graphs (STGs), where node attributes nodes over time and how these changes propagate through-
evolve over time. Recently, spatio-temporal graph neu- out the entire graph. Furthermore, leveraging these dynam-
ral networks (STGNNs) have emerged as a powerful tool ics over long spatial and temporal ranges plays a crucial
for capturing both spatial and temporal dependencies in role in dealing with the intricate spatio-temporal correla-
STGs [1, 2, 3]. Many of those methods employ graph neu- tions in STGs [1, 3, 23]. Building upon these insights and
ral networks (GNNs) to exploit spatial dependencies inher- recent advances, we introduce SpoT-Mamba, a new Spatio-
ent in the graph structures, integrating them with recur- Temporal graph forecasting framework with a Mamba-
rent units or convolutions to capture temporal dependen- based sequence modeling architecture. With Mamba blocks,
cies [2, 4, 5, 6, 7, 8, 9]. These approaches have facilitated SpoT-Mamba extracts structural information of each node by
the capturing of spatio-temporal dependencies within STGs. scanning multi-way walk sequences and effectively captures
Despite their remarkable performance in predictive learn- long-range temporal dependencies with temporal scans. Ex-
ing tasks, they often face challenges in handling long-range periments on the real-world dataset demonstrate that SpoT-
temporal dependencies among different time steps [10, 11]. Mamba achieves promising performance in STG forecasting.
STGs often exhibit repetitive patterns over both short and The official implementations of SpoT-Mamba are available
long periods, which is critical for precise predictions [12, 13]. at https://github.com/bdi-lab/SpoT-Mamba.
Therefore, several methods have adopted self-attention
mechanisms of transformer layers [14] rather than recurrent
units to enhance their capability in exploiting global tempo- 2. Preliminaries
ral information [10, 11, 15]. However, the significant compu-
tational overhead and complexity of attention mechanisms State Space Model (SSM) The state space model (SSM)
are being highlighted as major concerns [10, 12, 16, 17]. assumes that dynamic systems can be represented by their
Meanwhile, structured state space sequence (S4) models states at time step π‘ [18]. SSM defines the evolution of
have emerged as a promising approach for sequence mod- a dynamic systemβs state with two equations: hβ² (π‘) =
eling with linear scaling in sequence length [18]. Those Ah(π‘) + Bπ₯(π‘) and π¦(π‘) = Ch(π‘) + Dπ₯(π‘), where
models take the advantages of recurrent neural networks h(π‘) β Rπ· denotes the latent state, π₯(π‘) β R represents
and convolutional neural networks, enabling them to han- the input signal, π¦(π‘) β R denotes the output signal, and
dle long-range dependencies without relying on attention. A β Rπ·Γπ· , B β Rπ·Γ1 , C β Rπ·Γπ· , and D β R are
However, due to their inability to select information depend- learnable parameters. SSM learns how to transform the in-
ing on input data, they have shown limited performance. put signal π₯(π‘) into the latent state h(π‘), which is used to
A recent study has introduced a new S4 model overcom- model the system dynamics and predict its output π¦(π‘).
ing the issue, named Mamba, which introduces a selection
mechanism to filter information in an input-dependent man- Discretized SSM To adapt SSM for discrete input se-
ner [19]. Mamba has demonstrated notable performance quences instead of continuous signals, discretization is ap-
over transformers across various types of sequence data, plied with a step size β. The discretized SSM is defined in a
recurrent form: hπ‘ = Ahπ‘β1 + Bπ₯π‘ and π¦π‘ = Chπ‘ , where
STRLβ24: Third International Workshop on Spatio-Temporal Reasoning A and B are approximated learnable parameters using a
and Learning, 5 August 2024, Jeju, South Korea bilinear method with a step size β [18]. The term D is
*
Corresponding author.
omitted from the equations as it can be considered as a skip
$ cjh0507@kaist.ac.kr (J. Choi); heehyeon@kaist.ac.kr (H. Kim);
amh0360@kaist.ac.kr (M. An); jjwhang@kaist.ac.kr (J. J. Whang) connection. This formulation allows for capturing temporal
Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribu-
tion 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
dependencies efficiently, resulting in similar computations 3. Spatio-Temporal Graph
to those in recurrent neural networks.
Meanwhile, due to its linear time-invariant (LTI) property,
Forecasting with SpoT-Mamba
SSM can be reformulated as discrete convolution: y = K*x We propose SpoT-Mamba (Figure 1), which captures spa-
πΏβ1
and K β RπΏ = (CB, CAB, . . . , CA B), where x β tial dependencies from node-specific walk sequences and
RπΏ denotes the input sequence, y β RπΏ denotes the output learns temporal dependencies across time steps leveraging
sequence, * indicates the convolution operation, and πΏ is Mamba-based sequence modeling. By utilizing the selection
the sequence length. This representation facilitates parallel mechanisms of Mamba blocks, SpoT-Mamba can selectively
training for SSM, thereby enhancing training efficiency. propagate or forget information in an input-dependent man-
The recurrent and convolutional representations of SSM ner on both temporal and spatial domain.
for sequence modeling enable parallel training and linear
scaling in sequence length. To further enhance the com- Multi-way Walk Sequence In STGs, the temporal se-
putational complexity of SSM, the structured state space quences for nodes are naturally defined by the time-series
sequence (S4) models have been proposed [18]. S4 models data. On the other hand, since the topological structure
address the fundamental bottleneck of SSM, which involves does not have a specific order, a tailored method is required
repeated matrix multiplications, by employing a low-rank to define the spatial sequences of nodes in graphs. Hence,
correction to stably diagonalize the transition matrix A. we employ three well-known walk algorithms: depth-first
search (DFS), breadth-first search (BFS), and random walks
Mamba S4 models have demonstrated remarkable per- (RW), to extract diverse local and global structural informa-
formance in handling long-range dependencies in contin- tion from each nodeβs neighborhood. The walk sequences
uous signal data, such as audio and time series. However, of length πΎ for node π£π using these walk algorithms are
S4 models struggle with effectively handling discrete and defined as π²π΅πΉ π (π), π²π·πΉ π (π), and π²π
π (π), respectively.
information-dense data such as text [19]. This limitation These node-specific walk sequences are extracted π times
arises from the LTI property inherent in the convolutional to exploit more comprehensive structural information.
form of SSMs. While the LTI property enables linear time se-
quence modeling for S4 models, it requires that the learnable Walk Sequence Embedding SpoT-Mamba generates em-
matrices A, B, and C, as well as the step size β, remain beddings for node-specific walk sequences by scanning each
unchanged across all time steps. Consequently, S4 models sequence. Here, SpoT-Mamba performs bi-directional scans
cannot selectively recall previous tokens or combine the through Mamba blocks, which makes the model robust to
current token, treating each token in the input sequence permutations and captures the long-range spatial depen-
uniformly. In contrast, Transformers dynamically adjust at- dency of the sequence more effectively [21]. Then, SpoT-
tention scores based on the input sequence, allowing them Mamba aggregates the representations of node-specific walk
to effectively focus on different parts of the sequence [14]. sequences with pointwise convolution. This allows for incor-
To address both the lack of selectivity in S4 models and porating representations of neighboring nodes to generate
the efficiency bottleneck in sequence modeling, a recent walk sequence embedding for each type of walk sequence.
study introduced a new S4 model called Mamba, which Subsequently, SpoT-Mamba integrates the walk sequence
removes the LTI constraints [19]. Mamba incorporates a embeddings into a node embedding wπ β Rπ· using Multi-
selection mechanism that allows its learnable parameters to Layer Perceptron (MLP), where π represents the node index
dynamically interact with the input sequence. This mech- and π· denotes the embedding dimension. Rather than sim-
anism is achieved by modifying the learnable parameters ply stacking GNN layers, we employ Mamba-based sequence
B and C, as well as the step size β, to functions of the modeling to generate node embeddings from diverse types
input sequence. Therefore, Mamba can selectively recall or of walk sequences. Therefore, our approach effectively cap-
ignore information in an input-dependent manner, while tures local and long-range dependencies within the graph
maintaining linear scalability in sequence length. by scanning the neighborhood structure of each node.
Inspired by the recent advancements in Mamba, we pro-
pose a Mamba-based sequence modeling architecture for
Temporal Scan with Mamba Blocks We adopt the learn-
predictive learning tasks on STGs. Our approach employs
able day-of-week and timestamps-of-day embeddings to
a selective mechanism to handle the dynamical changes
capture the repetitive patterns over both short and long
in STGs, capturing long-range spatio-temporal dependen-
periods in STGs [24]. SpoT-Mamba concatenates these em-
cies. In addition, this allows for addressing computational
beddings with the node embedding for each time step to
inefficiencies in transformer-based STGNNs [10, 11, 15].
effectively model temporal dynamics from historical obser-
vations. Then, it performs selective recurrent scans with
Spatio-Temporal Graph Forecasting A spatio-temporal Mamba blocks across the sequence of embeddings along
graph (STG) is defined as π’ = (π±, β°, π³ ), where π± is the time axis, observing changes in temporal dynamics over
a set of π nodes, β° β π± Γ π± is a set of edges, π³ = time. This process helps identify critical portions of the
[X1 , . . . , Xπ ] is a sequence of observed data for all nodes sequence for forecasting and captures periodic patterns in
at each historical time step, and π is a length of the se- both short- and long-term intervals. Consequently, the re-
quence. Here, Xπ‘ β Rπ Γπ·in denotes the observed data sults effectively encompass spatio-temporal dependencies,
at time step π‘, where π·in denotes the dimension of the in- thereby enriching the predictive capabilities.
put node attributes. STG forecasting aims to predict future
observations for π β² time steps, given historical observa-
STG forecasting of SpoT-Mamba Finally, SpoT-Mamba
tions for the previous π time steps. This is formulated as
π (Β·)
enhances the representations of nodes scanned along the
[Xπ‘βπ +1 , . . . , Xπ‘ ] βββ [Xπ‘+1 , . . . , Xπ‘+π β² ], where π (Β·) temporal axis by incorporating global information from the
represents the STG forecasting model. entire graph at each time step through transformer layers.
πΎ Γπ
π²π΅πΉπ π Pointwise
Conv
MLP π°π β βπ·
π²π·πΉπ π
π²π
π π β βπΎΓ3π· β β3π·
Multi-way Walk Bi-directional Scan
Graph Structure Walk Sequence Embedding
Sequence Generation with Mamba Blocks
π πβ²
Spatial Transformer
Regression Layer
π
SMTWTFS Embedding
Layer time ππ‘ β² π π‘ β²+π
time time
β²
β βπΓπΓπ·in β βπ ΓπΓπ·out
Ground-Truth Periodicity and Feature Temporal Scan Spatial Self-Attention Predicted
Time Series Embedding with Mamba Blocks and Regression Time Series
Figure 1: Overview of SpoT-Mamba. The first row represents the node-specific walk sequence embedding. The second row
represents the overall procedure of STG forecasting. W β Rπ Γπ· denotes the node embeddings for all nodes in the graph
and Zπ‘β² β Rπ Γ4π· denotes one of the outcomes from the temporal scan, corresponding to the time step π‘β² .
Then, MLP is applied to forecast the attributes of each node Table 1
for future time steps. To accurately predict the temporal Statistic of PEMS04 dataset.
trajectory while ensuring robustness to outliers that deviate |π±| |β°| #Time Steps Time Interval Time Range
significantly from the expected trajectory, we train SpoT-
307 338 16,992 5 min. 01/2018 - 02/2018
Mamba utilizing the Huber loss, which is less sensitive to
outliers while maintaining the smoothness of the squared
error for small errors [24].
differences between predictions and ground truth values.
RMSE is calculated as the square root of the average of the
4. Experiments squared differences. MAPE is similar to MAE but normalizes
each error by the corresponding ground truth value and
We compare the performance of SpoT-Mamba with state-of- expresses it as a percentage. For all three metrics, lower
the-art baselines. Additionally, we conduct ablation studies values indicate better performance. Additionally, we rank all
to further demonstrate the effectiveness of SpoT-Mamba. the methods used in our experiments across these evaluation
metrics and compute their average ranks.
4.1. Dataset and Experimental Setup
Implementation Details SpoT-Mamba is implemented
Dataset We evaluate SpoT-Mamba on PEMS04 [25], a real-
using the Deep Graph Library [34] and PyTorch [35]. For the
world traffic flow forecasting benchmark, following the ex-
transformer, we utilize the off-the-shelf transformer encoder
perimental setup in [24]. PEMS04 contains highway traffic
available in PyTorch, and for Mamba, we employ the official
flow data collected from the California Department of Trans-
implementation [19] and apply pre-normalization. We train
portationβs Performance Measurement System (PEMS). The
SpoT-Mamba for 300 epochs using the Adam optimizer [36],
nodes represent the sensors, and edges are created when
with early stopping if there is no improvement over 20
two sensors are on the same road. Traffic data in PEMS04 is
epochs. Additionally, we apply the learning rate decay,
collected every 5 minutes. We set the input and prediction
reducing the learning rate at the 20th, 40th, and 60th epochs.
intervals to 1 hour, corresponding to π = π β² = 12. The
To determine the optimal hyperparameters for SpoT-Mamba,
statistic of PEMS04 is shown in Table 1. In the experiments,
we conduct a grid search. The grid search covers π β
PEMS04 is divided into training, validation, and test sets in
{2, 4}, learning rates of {0.001, 0.0005}), weight decays of
a 6:2:2 ratio in temporal order.
{0.001, 0.0001}, and learning rate decay rates of {0.1, 0.5}.
We fix the feed-forward dimension to 256, π· = 32, πΎ =
Baselines We compare SpoT-Mamba with several base- 20, dropout probability to 0.1, batch size to 32, and the
lines using various methods, including GNNs and Trans- number of layers for Mamba and the transformer to 3. All
formers. For STGNNs, we consider GWNet [9], DCRNN [26], experiments are conducted using GeForce RTX 3090 24GB.
AGCRN [27], GTS [28], and MTGNN [29]. For attention-
based methods, we include STAEformer [24], GMAN [30],
and PDformer [31]. Additionally, other methods such as 4.2. Traffic Forecasting Performance
HI [16], STNorm [32], and STID [33] are also considered. Table 2 shows the traffic forecasting performance of the
baselines and SpoT-Mamba on PEMS04 in terms of MAE,
Evaluation Metrics We use three standard metrics for RMSE, and MAPE. Note that we report the baseline results
traffic flow prediction: Mean Absolute Error (MAE), Root from [24] since we strictly followed the experimental set-
Mean Squared Error (RMSE), and Mean Absolute Percentage tings described in [24]. For each evaluation metric, the best
Error (MAPE). MAE is the unweighted mean of the absolute results are boldfaced, and the second-best results are under-
250
Traffic Flow
Traffic Flow
400 200
200 150
100
0
0 100 200 300 400 500 0 100 200 300 400 500
300
Traffic Flow
Traffic Flow
200 400
100 200
0 00
0 100 200 300 400 500 100 200 300 400 500
Time Step Time Step
Figure 2: Traffic flow forecasting results for four randomly selected nodes in PEMS04. The blue line represents the ground
truth, and the orange line denotes the predictions made by SpoT-Mamba.
Table 2 Table 3
Traffic forecasting performance on PEMS04. βAvg.β denotes the Ablation studies of SpoT-Mamba on PEMS04, varying the types
average rank across the three evaluation metrics. of the walk scan and temporal scan modules.
MAE Rank RMSE Rank MAPE Rank Avg. Walk Scan Temporal Scan MAE RMSE MAPE
HI 42.35 13 61.66 13 29.92 13 13 Transformer Transformer 18.41 30.32 12.12
GWNet 18.53 5 29.92 1 12.89 6 4 Transformer Mamba 18.69 30.17 12.28
DCRNN 19.63 11 31.26 8 13.59 11 10 Mamba Transformer 18.29 30.06 11.93
AGCRN 19.38 9 31.25 7 13.40 9 8.33 Mamba Mamba 18.31 30.11 11.86
STGCN 19.57 10 31.38 9 13.44 10 9.67
GTS 20.96 12 32.95 12 14.66 12 12
MTGNN 19.17 8 31.70 11 13.37 8 9 module is replaced. Specifically, when the Walk Scan is con-
STNorm 18.96 6 30.98 6 12.69 5 5.67 ducted by a transformer encoder, the overall performance
GMAN 19.14 7 31.60 10 13.19 7 8 of SpoT-Mamba decreases (first and second rows). On the
PDformer 18.36 3 30.03 3 12.00 3 3 other hand, replacing only the Mamba blocks for the Tem-
STID 18.38 4 29.95 2 12.04 4 3.33 poral Scan with a transformer encoder shows negligible
STAEformer 18.22 1 30.18 5 11.98 2 2.67 performance differences (third row).
SpoT-Mamba 18.31 2 30.11 4 11.86 1 2.33 This disparity can be attributed to the inherent differ-
ences between Mamba and Transformer, along with the
application of learnable embeddings that impose biases on
lined. It is observed that SpoT-Mamba consistently achieves the sequence. Mamba blocks scan inputs recurrently, inher-
high rankings across all metrics: MAE, RMSE, and MAPE, ently considering the sequence order. In contrast, the trans-
recording the highest average rank among all methods. This former encoder does not recognize input sequence order
suggests the effectiveness of Mambaβs selective recurrent by itself. Furthermore, while SpoT-Mamba utilizes learn-
scan in modeling spatio-temporal dependency. Compared able embeddings for temporal sequences, i.e., day-of-week
to other metrics, SpoT-Mamba demonstrates its best per- and timestamps-of-day, it does not apply such embeddings
formance in MAPE, achieving the highest ranking. Among for walk sequences. As a result, the Transformer encoder
the baselines, STAEformer [24] shows the most comparable struggles to perceive the order in walk sequences, despite
performance to SpoT-Mamba. performing well with temporal sequences.
4.3. Qualitative Analysis & Ablation Studies
5. Conclusion and Future Work
In Figure 2, we visualize the predictions of SpoT-Mamba
and the ground-truth time series on PEMS04. For this visu- In this paper, we explore STG forecasting and introduce a
alization, predictions are made for four randomly selected new Mamba-based STG forecasting model, SpoT-Mamba.
nodes, starting from an arbitrary time step within a test SpoT-Mamba utilizes Mamba blocks to scan multi-way
split, covering 576 consecutive time steps (equivalent to two walk sequences and temporal sequences. This approach
days). Since we set π = π β² = 12, multiple predictions allows the model to effectively capture the long-range spatio-
are concatenated to represent the duration. When multiple temporal dependencies in STG, enhancing forecasting ac-
predictions exist at a single time step, we average them. It is curacy on complex graph structures. SpoT-Mamba shows
observed that the predicted time series closely aligns with promising results on the real-world traffic forecasting bench-
the ground-truth data. mark PEMS04. For future work, we will extend SpoT-Mamba
Additionally, we conduct the ablation studies of SpoT- to handle graphs with complex relations [37, 38] and evolv-
Mamba on PEMS04. We replace the Mamba blocks used ing graphs [39, 40].
for the walk sequence embedding (indicated as Walk Scan)
and for scanning along the time axis (indicated as Temporal
Scan) with transformer encoders. Note that when replacing
Acknowledgments
the Mamba blocks for the Temporal Scan with a transformer This research was supported by an NRF grant funded by
encoder, we reduce the batch size from 32 to 8 due to Out-of- MSIT 2022R1A2C4001594 (Extendable Graph Representa-
Memory issues. Results are shown in Table 3. We observed tion Learning) and an IITP grant funded by MSIT 2022-0-
differences in performance depending on which type of scan
00369 (Development of AI Technology to support Expert chine Learning, 2022, pp. 27268β27286. URL: https:
Decision-making that can Explain the Reasons/Grounds for //proceedings.mlr.press/v162/zhou22g.html.
Judgment Results based on Expert Knowledge). [12] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, S. Dust-
dar, Pyraformer: Low-complexity pyramidal atten-
tion for long-range time series modeling and fore-
References casting, in: Proceedings of the 10th International
Conference on Learning Representations, 2022. URL:
[1] Z. Diao, X. Wang, D. Zhang, Y. Liu, K. Xie, S. He, Dy-
https://openreview.net/forum?id=0EXmFzUn5I.
namic spatial-temporal graph convolutional neural
[13] G. Lai, W.-C. Chang, Y. Yang, H. Liu, Modeling
networks for traffic forecasting, in: Proceedings of the
long- and short-term temporal patterns with deep
33rd AAAI Conference on Artificial Intelligence, 2019,
neural networks, in: Proceedings of the 42th Inter-
pp. 890β897. doi:10.1609/aaai.v33i01.3301890.
national ACM SIGIR Conference on Research and De-
[2] D. Cao, Y. Wang, J. Duan, C. Zhang, X. Zhu, C. Huang,
velopment in Information Retrieval, 2018, p. 95β104.
Y. Tong, B. Xu, J. Bai, J. Tong, Q. Zhang, Spectral tempo-
doi:10.1145/3209978.3210006.
ral graph neural network for multivariate time-series
[14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
forecasting, in: Proceedings of the 34th International
L. Jones, A. N. Gomez, Ε. Kaiser, I. Polosukhin, At-
Conference on Neural Information Processing Sys-
tention is all you need, in: Proceedings of the 31st
tems, volume 33, 2020, pp. 17766β17778. URL: https:
International Conference on Neural Information Pro-
//proceedings.neurips.cc/paper_files/paper/2020/file/
cessing Systems, 2017, pp. 6000β6010.
cdf6581cb7aca4b7e19ef136c6e601a5-Paper.pdf.
[15] S. Guo, Y. Lin, N. Feng, C. Song, H. Wan, Attention
[3] M. Li, Z. Zhu, Spatial-temporal fusion graph neural
based spatial-temporal graph convolutional networks
networks for traffic flow forecasting, in: Proceed-
for traffic flow forecasting, in: Proceedings of the 33rd
ings of the 35th AAAI Conference on Artificial In-
AAAI Conference on Artificial Intelligence, 2019, pp.
telligence, 2021, pp. 4189β4196. doi:10.1609/aaai.
922β929. doi:10.1609/aaai.v33i01.3301922.
v35i5.16542.
[16] Y. Cui, J. Xie, K. Zheng, Historical inertia: A neglected
[4] Y. Chen, I. Segovia, Y. R. Gel, Z-gcnets: Time zigzags at
but powerful baseline for long sequence time-series
graph convolutional networks for time series forecast-
forecasting, in: Proceedings of the 30th ACM Interna-
ing, in: Proceedings of the 38th International Confer-
tional Conference on Information & Knowledge Man-
ence on Machine Learning, 2021, pp. 1684β1694. URL:
agement, 2021, p. 2965β2969. doi:10.1145/3459637.
https://proceedings.mlr.press/v139/chen21o.html.
3482120.
[5] K. Guo, Y. Hu, Y. Sun, S. Qian, J. Gao, B. Yin, Hier-
[17] R.-G. Cirstea, B. Yang, C. Guo, T. Kieu, S. Pan, To-
archical graph convolution network for traffic fore-
wards spatio- temporal aware traffic time series fore-
casting, in: Proceedings of the 35th AAAI Con-
casting, in: Proceedings of the IEEE 38th International
ference on Artificial Intelligence, 2021, pp. 151β159.
Conference on Data Engineering, 2022, pp. 2900β2913.
doi:10.1609/aaai.v35i1.16088.
doi:10.1109/ICDE53745.2022.00262.
[6] B. Lu, X. Gan, H. Jin, L. Fu, H. Zhang, Spatiotem-
[18] A. Gu, K. Goel, C. Re, Efficiently modeling long
poral adaptive gated graph convolution network for
sequences with structured state spaces, arXiv
urban traffic flow forecasting, in: Proceedings of
preprint arXiv:2111.00396 (2022). doi:10.48550/
the 29th ACM International Conference on Informa-
arXiv.2111.00396.
tion & Knowledge Management, 2020, pp. 1025β1034.
[19] A. Gu, T. Dao, Mamba: Linear-time sequence
doi:10.1145/3340531.3411894.
modeling with selective state spaces, arXiv
[7] J. Ye, L. Sun, B. Du, Y. Fu, H. Xiong, Coupled layer-
preprint arXiv:2312.00752 (2023). doi:10.48550/
wise graph convolution for transportation demand
arXiv.2312.00752.
prediction, 2021, pp. 4617β4625. doi:10.1609/aaai.
[20] C. Wang, O. Tsepa, J. Ma, B. Wang, Graph-mamba:
v35i5.16591.
Towards long-range graph sequence modeling with
[8] L. Zhao, Y. Song, C. Zhang, Y. Liu, P. Wang, T. Lin,
selective state spaces, arXiv preprint arXiv:2402.00789
M. Deng, H. Li, T-gcn: A temporal graph convolu-
(2024). doi:10.48550/arXiv.2402.00789.
tional network for traffic prediction, IEEE Transac-
[21] A. Behrouz, F. Hashemi, Graph mamba: To-
tions on Intelligent Transportation Systems 21 (2020)
wards learning on graphs with state space mod-
3848β3858. doi:10.1109/TITS.2019.2935152.
els, arXiv preprint arXiv:2402.08678 (2024). doi:10.
[9] Z. Wu, S. Pan, G. Long, J. Jiang, C. Zhang, Graph
48550/arXiv.2402.08678.
wavenet for deep spatial-temporal graph modeling,
[22] L. Li, H. Wang, W. Zhang, A. Coster, Stg-mamba:
in: Proceedings of the 28th International Joint Confer-
Spatial-temporal graph learning via selective state
ence on Artificial Intelligence, 2019, p. 1907β1913. URL:
space model, arXiv preprint arXiv:2403.12418 (2024).
https://dl.acm.org/doi/abs/10.5555/3367243.3367303.
doi:10.48550/arXiv.2403.12418.
[10] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong,
[23] Z. Fang, Q. Long, G. Song, K. Xie, Spatial-temporal
W. Zhang, Informer: Beyond efficient transformer for
graph ode networks for traffic flow forecasting, in:
long sequence time-series forecasting, in: Proceedings
Proceedings of the 27th ACM SIGKDD Conference
of the 35th AAAI Conference on Artificial Intelligence,
on Knowledge Discovery & Data Mining, 2021, pp.
2021, pp. 11106β11115. doi:10.1609/aaai.v35i12.
364β-373. doi:10.1145/3447548.3467430.
17325.
[24] H. Liu, Z. Dong, R. Jiang, J. Deng, J. Deng, Q. Chen,
[11] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, R. Jin,
X. Song, Spatio-temporal adaptive embedding makes
FEDformer: Frequency enhanced decomposed trans-
vanilla transformer sota for traffic forecasting, in: Pro-
former for long-term series forecasting, in: Pro-
ceedings of the 32nd ACM International Conference
ceedings of the 39th International Conference on Ma-
on Information and Knowledge Management, 2023, p.
4125β4129. doi:10.1145/3583780.3615160. Processing Systems, 2019, pp. 8024β8035. URL: https:
[25] C. Song, Y. Lin, S. Guo, H. Wan, Spatial-temporal //proceedings.neurips.cc/paper_files/paper/2019/file/
synchronous graph convolutional networks: A new bdbca288fee7f92f2bfa9f7012727740-Paper.pdf.
framework for spatial-temporal network data fore- [36] D. P. Kingma, J. Ba, Adam: A method for stochastic
casting, in: Proceedings of the 34th AAAI Con- optimization, in: Proceedings of the 3rd International
ference on Artificial Intelligence, 2020, pp. 914β921. Conference on Learning Representations, 2015. URL:
doi:10.1609/aaai.v34i01.5438. https://doi.org/10.48550/arXiv.1412.6980.
[26] Y. Li, R. Yu, C. Shahabi, Y. Liu, Diffusion convolu- [37] H. Kim, J. Choi, J. J. Whang, Dynamic relation-
tional recurrent neural network: Data-driven traffic attentive graph neural networks for fraud detection,
forecasting, in: Proceedings of the 6th International in: Proceedings of 2023 IEEE International Conference
Conference on Learning Representations, 2018. URL: on Data Mining Workshops (ICDMW), 2023, pp. 1092β
https://openreview.net/forum?id=SJiHXGWAZ. 1096. doi:10.1109/ICDMW60847.2023.00143.
[27] L. Bai, L. Yao, C. Li, X. Wang, C. Wang, Adaptive graph [38] C. Chung, J. Lee, J. J. Whang, Representation learning
convolutional recurrent network for traffic forecasting, on hyper-relational and numeric knowledge graphs
in: Proceedings of the 34th International Conference with transformers, in: Proceedings of the 29th ACM
on Neural Information Processing Systems, 2020, p. SIGKDD Conference on Knowledge Discovery and
17804β17815. URL: https://dl.acm.org/doi/abs/10.5555/ Data Mining, 2023, pp. 310β322.
3495724.3497218. [39] X. Huang, Y. Yang, Y. Wang, C. Wang, Z. Zhang,
[28] C. Shang, J. Chen, J. Bi, Discrete graph structure learn- J. Xu, L. Chen, M. Vazirgiannis, Dgraph: A
ing for forecasting multiple time series, in: Proceed- large-scale financial dataset for graph anomaly
ings of the 9th International Conference on Learning detection, in: Proceedings of the 36th International
Representations, 2021. URL: https://openreview.net/ Conference on Neural Information Processing
forum?id=WEHSlH5mOk. Systems, 2022, pp. 22765β22777. URL: https:
[29] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang, //proceedings.neurips.cc/paper_files/paper/2022/file/
Connecting the dots: Multivariate time series forecast- 8f1918f71972789db39ec0d85bb31110-Paper-Datasets_
ing with graph neural networks, in: Proceedings of and_Benchmarks.pdf.
the 26th ACM SIGKDD International Conference on [40] J. Lee, C. Chung, J. J. Whang, InGram: Inductive
Knowledge Discovery & Data Mining, 2020, p. 753β763. knowledge graph embedding via relation graphs, in:
doi:10.1145/3394486.3403118. Proceedings of the 40th International Conference on
[30] C. Zheng, X. Fan, C. Wang, J. Qi, Gman: A graph Machine Learning, 2023, pp. 18796β18809.
multi-attention network for traffic prediction, in: Pro-
ceedings of the 34th AAAI Conference on Artificial
Intelligence, 2020, pp. 1234β1241. doi:10.1609/aaai.
v34i01.5477.
[31] J. Jiang, C. Han, W. X. Zhao, J. Wang, Pdformer: Prop-
agation delay-aware dynamic long-range transformer
for traffic flow prediction, in: Proceedings of the 37th
AAAI Conference on Artificial Intelligence, 2023, pp.
4365β4373. doi:10.1609/aaai.v37i4.25556.
[32] J. Deng, X. Chen, R. Jiang, X. Song, I. W. Tsang, St-
norm: Spatial and temporal normalization for multi-
variate time series forecasting, in: Proceedings of
the 27th ACM SIGKDD Conference on Knowledge
Discovery & Data Mining, 2021, p. 269β278. doi:10.
1145/3447548.3467330.
[33] Z. Shao, Z. Zhang, F. Wang, W. Wei, Y. Xu, Spatial-
temporal identity: A simple yet effective baseline for
multivariate time series forecasting, in: Proceedings
of the 31st ACM International Conference on Informa-
tion & Knowledge Management, 2022, p. 4454β4458.
doi:10.1145/3511808.3557702.
[34] M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song,
J. Zhou, C. Ma, L. Yu, Y. Gai, T. Xiao, T. He, G. Karypis,
J. Li, Z. Zhang, Deep graph library: A graph-
centric, highly-performant package for graph neu-
ral networks, arXiv preprint arXiv:1909.01315 (2020).
doi:10.48550/arXiv.1909.01315.
[35] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Brad-
bury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,
L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De-
Vito, M. Raison, A. Tejani, S. Chilamkurthy,
B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch:
An imperative style, high-performance deep
learning library, in: Proceedings of the 33th
International Conference on Neural Information