=Paper= {{Paper |id=Vol-3651/DARLI-AP_paper4 |storemode=property |title=Deep Spatio-Temporal Encoding: Achieving Higher Accuracy by Aligning with External Real-World Data |pdfUrl=https://ceur-ws.org/Vol-3651/DARLI-AP-4.pdf |volume=Vol-3651 |authors=Chen Jiang,Wenlu Wang,Jingjing Li,Naiqing Pan,Wei-Shinn Ku |dblpUrl=https://dblp.org/rec/conf/edbt/JiangWLPK24 }} ==Deep Spatio-Temporal Encoding: Achieving Higher Accuracy by Aligning with External Real-World Data== https://ceur-ws.org/Vol-3651/DARLI-AP-4.pdf

Deep Spatio-Temporal Encoding: Achieving Higher Accuracy by
Aligning with External Real-World Data
Chen Jiang1,* , Wenlu Wang2 , Jingjing Li3 , Naiqing Pan1 and Wei-Shinn Ku1
1
Auburn University, Auburn, AL, USA
2
Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
3
Meta, Menlo Park, CA, USA

Abstract
Spatio-temporal deep learning has drawn a lot of attention since many downstream real-world applications can benefit from accurate
predictions. For example, accurate prediction of heavy rainfall events is essential for effective urban water usage, flooding warning, and
mitigation. In this paper, we propose a strategy to leverage spatially connected real-world features to enhance prediction accuracy.
Specifically, we leverage spatially connected real-world climate data to predict heavy rainfall risks in a broad range in our case study. We
experimentally ascertain that our Trans-Graph Convolutional Network (TGCN) accurately predicts heavy rainfall risks and real estate
trends, demonstrating the advantage of incorporating external spatially-connected real-world data to improve model performance, and
it shows that this proposed study has a significant potential to enhance spatio-temporal prediction accuracy, aiding in efficient urban
water usage, flooding risk warning, and fair housing in real estate.

Keywords
Spatial-temporal Analysis, Deep Learning, Transformer,

ing spatially-linked external real-world data along with a
TGCN to learn the spatio-temporal dependencies from the
combined data. As it has been proven that utilizing more
multi-source real-world data is more likely to lead to higher
accuracy [9], our study aims to introduce a fresh perspective
on integrating external real-world data into the proposed
framework. We use heavy rainfall prediction as a case study
for our proposed method, and overall we aim to provide
accurate spatio-temporal predictions by leveraging as much
information as possible, enabling better decision-making
for a broad range of spatio-temporal applications and at the
same time offering a novel angle and a comprehensive evalu-
ation to demonstrate the feasibility of integrating additional
Figure 1: An example of spatial and temporal features in the external real-world data without the necessity of customiz-
case study of precipitation prediction. ing transformer attention mechanisms. Our approach is
experimentally validated by predicting heavy rainfall events
1. Introduction and real estate hotspots.
The traditional method for predicting heavy rainfall in-
Spatio-temporal predictions have been extensively studied volves manually engineering features from weather data,
due to their impact on real-world applications [1, 2, 3, 4, 5]. including temperature, pressure, humidity, etc. Meteorolo-
For example, heavy rainfall events can cause significant gists rely on their expertise to interpret this data and fore-
damage to infrastructure and pose serious threats to human cast future weather patterns. This process entails observing
safety. Predicting these events with greater accuracy allows and analyzing atmospheric factors to predict weather pat-
better preparation and response [6], ultimately saving lives terns. However, this traditional approach is time-consuming,
and reducing the economic impact of such events. labor-intensive, and susceptible to human error, especially
Deep learning methods, such as deep spatio-temporal when dealing with large datasets. As data grows, it becomes
prediction models [7, 8], have improved the performance increasingly challenging to analyze large amounts of infor-
of rainfall forecasting over the years. However, the role of mation by hand.
external data in enhancing the prediction accuracy is still Previous research has investigated using deep learning
controversial. Some argue that external data can provide for precipitation prediction [10, 11] with promising results.
more useful information for the prediction model, while However, some limitations can be significantly improved to
others claim that external data can introduce more noise enhance deep model performance. One area with room for
and complexity to the learning process. In this study, we enhancement is leveraging spatial dependencies. To tackle
propose to improve spatio-temporal predictions by combin- this challenge, we propose a model that integrates both
Graph Convolution Networks (GCNs) and a Transformer.
Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 This model enables combining external spatially-linked data
Joint Conference (March 25-28, 2024), Paestum, Italy for spatio-temporal predictions.
*
Corresponding author. Specifically, we employ a GCN to analyze the adjacency
$ czj0042@auburn.edu (C. Jiang); wenlu.wang@tamucc.edu matrix on a grid level and generate correlations between
(W. Wang); jingjingli@meta.com (J. Li); nzp0030@auburn.edu
(N. Pan); weishinn@auburn.edu (W. Ku)
each grid element. The GCN captures the spatial relation-
0009-0000-6888-6643 (C. Jiang); 0000-0002-4829-1068 (W. Wang); ships and dependencies among neighboring grid points,
0000-0002-1465-7738 (N. Pan); 0000-0001-8636-4689 (W. Ku) allowing for a comprehensive understanding of the data’s
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).

CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
spatial dynamics. We then utilize a Transformer model to flow.
encode the temporal precipitation data and combine it with
the spatial correlations obtained from the GCNs. By com-
bining the GCNs and the Transformer within the proposed 3. Methodology
TGCN model, we create a framework that harnesses both
In this section, we detail our model architecture and the
the spatial and temporal dimensions of the data.
benefits of our design.

2. Related Work 3.1. Overview
2.1. Graph Neural Networks The architecture we propose, illustrated in Figure 2, incorpo-
rates a combination of techniques to enhance the prediction
Graph Convolutional Networks (GCNs) are a type of deep model. We begin by utilizing a transformer encoder to ef-
learning model designed to process data represented in a fectively encode the time series precipitation data, and then
graph structure, such as social or sensor networks [12]. integrate local climate features into the model, enabling
GCNs have demonstrated their effectiveness in various ap- a comprehensive understanding of the factors influencing
plications, including node classification, link prediction, and heavy rainfall.
recommendation systems [13, 14, 15, 16]. The concept of To address spatial dependencies and relationships among
Graph Neural Networks (GNNs) was initially introduced grid points, a GCN is introduced. This GCN learns the spa-
in [17] and further expanded upon in subsequent research tial dependencies within the dataset, considering the inter-
by [18]. GNNs, a type of recurrent neural network (RNN), connectedness of grids based on their spatial locations. By
iteratively propagate information from neighboring nodes leveraging the GCN, the model becomes capable of captur-
until reaching a stable fixed point. This iterative process has ing and integrating spatial information, thereby enhancing
traditionally been computationally expensive, but recent prediction accuracy.
studies, such as [19], have made significant improvements The latent code, which combines the encoded time series
in this area. Inspired by the success of Convolutional Neural precipitation data and the spatially connected local climate
Networks (CNNs) in computer vision, which extract high- features learned through the GCN, is fed into a multi-layer
level features from images using convolution and pooling perceptron (MLP) for prediction. This integrated architec-
layers, current models aim to adapt these layers to directly ture allows the MLP model to leverage the fused informa-
process graph inputs. GCNs can be categorized into two tion, including temporal precipitation data, other climate
types of graph convolution layers: spectral graph convolu- features, and spatial factors, to effectively learn and infer
tion and localized graph convolution, as discussed in [20]. future heavy rainfall areas.
Early research primarily focused on spectral graph convolu-
tions, pioneered by [21]. The current state-of-the-art model,
GCN, further simplified the graph convolution operation by
3.2. Model Architecture
employing a localized first-order approximation. However, 3.2.1. Preliminaries
spectral methods require operations on the entire graph
Laplacian during training, which can be computationally Our proposed TGCN model consists of Encoder, GCNs and
expensive. Several subsequent works, such as FastGCN [22] Multi-layer Perceptron (MLP) layers. The major compo-
have aimed to alleviate this issue. nent in the transformer is the Multi-head self-attention.
Recently, researchers have explored the application of GCNs 𝑄𝐾 𝑇
in time series prediction. For example, spatio-temporal GCN- 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄, 𝐾, 𝑉 ) = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥( √ )𝑉 (1)
𝑑𝑘
based approaches have been proposed for traffic flow pre-
diction [23], and the integration of time-aware topological Where the K and V are matrices that store the keys and
information into GCNs using the mathematical framework values. Q is the query that will map against a set of keys.
of zigzag persistence [24].
3.2.2. Transformer-based Encoder
2.2. Spatial Temporal Prediction We have developed a predictive model using the Trans-
In this section, we discuss various existing temporal and former architecture, tailored for heavy rainfall forecasts.
spatial-temporal forecasting methods. For example, Recur- Unlike traditional methods that only use past rainfall data,
rent Neural Networks (RNNs), especially long-short-term our model factors in numerous external variables to boost
memory (LSTM) [25], have gained popularity in time series accuracy. We examine local features, including geography,
forecasting [26]. Convolutional Neural Networks (CNN) atmospheric conditions (pressure, temperature, wind), hu-
and its variant Temporal Convolutional Neural Networks midity, and topography, all of which influence heavy rainfall
(TCN) are another option for sequence prediction [27], of- likelihood in a specific area. Therefore, we have developed a
fering parallel computations compared to RNNs [28]. In transformer-based prediction model [34] that incorporates
recent years, researchers have explored Transformers and GCNs to process the spatial features. By doing so, our model
its variants in time series forecasting, achieving state-of- can capture the spatial relationships among various features
the-art performance in tasks like energy consumption and in a graph structure, such as the dependencies between grid
stock market [29, 30, 31]. Designing a model capable of point locations and their corresponding climate data. The
comprehensively capturing both spatial and temporal pat- integration of the GCNs enhances our model’s ability to
terns represents another emerging trend in spatial-temporal capture both temporal and spatial information. Our model
prediction tasks [32, 33]. For example, [33] introduced a design starts with a transformer encoder capturing tempo-
spatial-temporal graph neural network for predicting traffic ral precipitation patterns, followed by embedding this data
and merging it with local climate data like moisture and
Figure 2: Design Flow of the Trans-Graph Convolutional Prediction Model: The Trans-Graph Convolutional Prediction Model
incorporates a transformer layer for time-series precipitation data, a GCN for local climate features and spatial relationships
among grid points, and a four-layer MLP model for the final prediction.

humidity. We enhance prediction accuracy with this added graph and enhance the overall representation of the input
context. data.
As illustrated in Figure 3, GCNs involve learning a linear
3.2.3. Graph Convolutional Networks transformation of the feature vectors of each node in a graph,
which is then used to update the node features by aggregat-
ing information from the node’s neighbors. Mathematically,
this can be expressed as:
⎛ ⎞
∑︁ 1
ℎ(𝑙+1)
𝑣𝑖 = 𝜎⎝ 𝑊 (𝑙+1) ℎ(𝑙+1)
𝑣𝑗
⎠ (2)
𝑐𝑖𝑗
𝑣𝑗 ∈𝒩 (𝑣𝑖 )

(𝑙+1)
In the equation, ℎ𝑣𝑖 represents the feature vector
of node 𝑣𝑖 at layer 𝑙 + 1, 𝑊 (𝑙+1) denotes the learnable
weight matrix for layer 𝑙 + 1, 𝒩 (𝑣𝑖 ) represents the set of
neighbors of node 𝑣𝑖 , and 𝑐𝑖𝑗 is a normalization constant
that ensures proper scaling of the aggregated information.
The function 𝜎 denotes a non-linear activation function,
which introduces non-linearity into the model. In our
specific case, we utilize the ReLU activation function. This
equation can be interpreted as calculating a weighted sum
of the feature vectors of the neighbors of node 𝑣𝑖 at layer
𝑙 + 1, where the weights are determined by the learned
weight matrix 𝑊 (𝑙+1) . Then, a non-linear activation
function is applied to obtain the updated feature vector
(𝑙+1)
ℎ𝑣𝑖 for node 𝑖 at layer 𝑙 + 1. This process is repeated
across multiple layers to learn expressive representations
of the graph data.

For the final prediction, we utilize a four-layer MLP model
that combines time series data with other features, effec-
tively leveraging both temporal and spatial information
Figure 3: Graph Convolutional Network Architecture: The input
captured by our model for more accurate predictions.
data consists of the spatial relation matrix and spatially connected
climate data. The nodes in the figure are for illustrative purposes.
By leveraging the transformer architecture, incorporating
GCNs, and utilizing a four-layer MLP model, our approach
enables the effective integration of temporal and spatial
GCNs have received considerable attention in recent information for improved prediction accuracy.
years and have shown impressive performance in various ap-
plications. In this study, we aim to improve the performance 3.2.4. Jointly Learning
of our model by integrating a GCN on top of a Transformer
encoder model. The GCN model is specifically designed to As illustrated in Figure 2, we propose to map temporal data
capture the spatial relationships between each node in the and non-temporal data into the same latent space and merge
the latent vectors for the subsequent prediction task.
To encode the local climate features and capture the spa- 4.1.1. Precipitation Dataset
tial dependencies among the grid points for data 𝑥𝑐 , we
Our precipitation dataset is sourced from the NOAA HRRR
employ a GCN to learn the relationships and dependencies
dataset2 , offering real-time climate data at a 3 km spatial
within the spatial domain. The output hidden features at
(𝐿) resolution and 1-hour temporal resolution. This dataset [35]
a specific layer 𝐿 can be denoted as ℎ𝑐 . Equation 2 is encompasses total precipitation, precipitation rate, and nine
applied in this context. Assuming we use 𝐿𝑐 layers in total, additional climate variables, including humidity (%), mois-
and we use the final layer to summarize climate information, ture availability (%), pressure (Pa), wind speed (m/s), and
which is defined as: total cloud cover (%). Simulated brightness temperature data
𝑐 is acquired from the GOES 11 satellite 3 . The precipitation
hc = ℎ(𝐿
𝑣
)
(3)
data consist of the following three types:
(0)
where ℎ𝑣 = 𝑥𝑐 • Temporal precipitation data, denoted as 𝑥𝑡 , as shown
In this equation, ℎ𝑐 represents the hidden features at layer in Table 1 and Figure 5. It captures the historical
𝐿, which are obtained by applying the ReLU activation func- patterns and fluctuations in precipitation over time.
(𝐿) (𝐿−1)
tion to the sum of the weighted input features 𝑊𝑐 ℎ𝑐 Specifically, we define the temporal precipitation
(𝐿)
and the bias term 𝑏𝑐 from Equation 2. rate and total accumulated precipitation over the
We encode temporal precipitation data using a trans- past 6 hours as 𝑥ℎ , which consists of 𝑁 timestamps:
former encoder [34],
𝑥𝑡 = {𝑥1𝑡 , 𝑥2𝑡 , ..., 𝑥𝑁
𝑡 }
ht = 𝑇 𝑟𝑎𝑛𝑠𝑓 𝑜𝑟𝑚𝑒𝑟𝐸𝑛𝑐𝑜𝑑𝑒𝑟(𝑥𝑡 ) (4) 𝑥𝑖𝑡 𝑖∈{1..𝑁 } represents the average price for the 𝑖-th
timestamp.
ht ∈ R𝑑𝑡 (5)
• Local climate data 𝑥𝑐 : The dataset comprises twelve
. Since 𝑥𝑡 and 𝑥𝑐 are encoded as ht and hc , we define the local climate variables, including temperature, hu-
merged hidden state as hm midity, wind speed, atmospheric pressure, and vari-
ous other meteorological factors.
hm = 𝐶𝑂𝑁 𝐶𝐴𝑇 (ht , hc ) (6) • Spatial location data 𝑥𝑠 : Each grid point in the
dataset represents a specific location within the
To further process the merged information, we use another
study area, such as a region or a cell. To represent the
multi-layer perceptron specifically trained for the predic-
relationships between these grid points, we used an
tion task. Similarly, we define the 𝑙-th layer network as
adjacency matrix. In the adjacency matrix, a value
(assuming 𝐿𝑛 layers in total)
of 0 indicates that two grid points are not neigh-
bors, while a value of 1 denotes their neighboring
ℎ(𝑙) (𝑙) (𝑙−1)
𝑛 = 𝑅𝑒𝐿𝑢(𝑊𝑛 ℎ𝑛 + 𝑏(𝑙)
𝑛 ) (7)
relationship.
(0) (𝑙−1)
where ℎ𝑛 = hm , and ℎ𝑛 is the input of the (l-1)-th
(𝑙) (𝑙) 4.1.2. Real-estate Dataset
layer in the i-th position. 𝑊𝑛 and 𝑏𝑛 are model parame-
ters. The real estate dataset captures the dynamics of the U.S. real
We use the output from the last layer for prediction estate market by collecting spatially correlated data from
𝑛
multiple sources. It consists of 7,436 neighborhoods, 567
𝑦¯ = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(ℎ(𝐿
𝑛
)
) (8) cities, 304 counties, 225 metros, and 50 states across the U.S.
The data are connected through spatial locations, forming a
Loss is measured with the Binary Cross-Entropy loss (BCE) multi-level spatial hierarchy. The dataset consists of three
main components: census data, pricing history, and school
𝑙𝑜𝑠𝑠 = 𝐵𝐶𝐸𝑙𝑜𝑠𝑠(𝑦¯, 𝑦) (9)
district information. Here are some statistics about the real
The binary cross entropy (BCE) loss can be formulated as estate dataset:
follows: • Spatial Hierarchy Levels: The dataset includes a
𝑁
multi-level spatial hierarchy, including information
1 ∑︁ at the state, metro, county, city, and neighborhood
𝐵𝐶𝐸𝑙𝑜𝑠𝑠 = − [𝑦𝑖 log(𝑝𝑖 ) + (1 − 𝑦𝑖 ) log(1 − 𝑝𝑖 )]
𝑁 𝑖=1 levels.
(10) • Census Data: The census data consists of 16 vari-
where: 𝑁 is the total number of samples, 𝑦𝑖 is the true label ables related to various aspects of housing prices,
for sample 𝑖, 𝑝𝑖 is the predicted probability 𝑖, log denotes personal income, demographics, and spatial infor-
the natural logarithm. mation.
• Pricing History: The dataset includes temporal hous-
ing price history for each neighborhood, spanning
4. Experimental Validation from 1996 to 2019.
• School District Information: The dataset incorpo-
4.1. Datasets rates school district information. It provides details
Our data and code are publicly available1 . In our dataset, on the number of school districts present in each
the train and test split ratio is 7:3. county within the studied area. Additionally, the
dataset includes information on the top school dis-
trict(s) within the region.
2
https://rapidrefresh.noaa.gov/hrrr/
1 3
https://github.com/jiang28/Deep-Spatio-Temporal-Encoding https://www.goes.noaa.gov/
GridID Longitude Latitude Grid Points Grid Spacing Vertical Level
1 122.71 21.13 1799 × 1059 3 km 50
Time Stamps 2022/09/23 00:00 2022/09/23 01:00 2022/09/23 02:00 ... 2022/10/02 00:00
Precipitation rate (mm/hour) 0.0 0.72 0.94 ... 0
Total Precipitation (mm) 0.01 1.88 4.3 ... 31.61

Table 1
Temporal data format. It has data on the grid id, longitude, latitude, grid points, grid spacing, vertical level, timestamps, total
precipitation, and precipitation rate.

To facilitate the task of predicting real estate hotspots, the 4.3. Heavy Rainfall Prediction
dataset is classified into two classes based on the house price
Study Area: Figure 4 presents the location of the study area
increase rate for each neighorhood: 1 for hotspots and 0
in this study. It consists of 10,000 grids across the state of
for non-hotspots. The detailed settings of the Real-estate
Florida in the U.S.
Dataset can be found in [36].

4.2. Evaluation Metrics
We evaluate the performance of a classification system us-
ing various metrics, including Accuracy, Recall, Precision,
F1-score, and ROC. These metrics are calculated based on
the number of true positives (𝑡𝑝 ), false positives (𝑓𝑝 ), false
negatives (𝑓𝑛 ), and true negatives (𝑡𝑛 ). Accuracy measures
the proportion of observations, both positive and negative,
that were correctly classified by the system, and can be
computed using the formula:
𝑡𝑝 + 𝑡𝑛
𝑎𝑐𝑐 =
𝑡𝑝 + 𝑓𝑝 + 𝑡𝑛 + 𝑓𝑛
Recall measures the proportion of true positives that were
correctly identified by the system, and can be computed Figure 4: The study area consists of 10,000 grids across South
using the formula: Florida in the United States. The figure shows the observed
precipitation values in each county within this area.
𝑡𝑝
𝑟𝑒𝑐𝑎𝑙𝑙 =
𝑡𝑝 + 𝑓𝑛
Precision measures the proportion of identified positives
that were actually true positives, and can be computed using
the formula:
𝑡𝑝
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑡𝑝 + 𝑓𝑝
F1-score is a weighted average of precision and recall,
and provides a single measure of the system’s accuracy on
the dataset, and can be computed using the formula:

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 * 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹1 = 2 *
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 Figure 5: Study Area Precipitation Rate Heatmap: 100x100 grid
ROC (Receiver Operating Characteristic) curve is a graph- region on September 28, 2022, at 13:00 (mm/s).
ical plot that illustrates the performance of a binary classifier
system. It is created by plotting the True Positive Rate (TPR)
against the False Positive Rate (FPR), which can be computed Our study identifies heavy rainfall risk areas based on
using the formulas: precipitation rate. Following the United States Geological
Survey (USGS) standard4 , we define the heavy rainfall risk
𝑡𝑝 as follows:
𝑇𝑃𝑅 =
𝑡𝑝 + 𝑓𝑛 {︃
0, if 𝑅 < 4 mm/hr
𝑓𝑝 Class =
𝐹𝑃𝑅 = 1, if 𝑅 ≥ 4 mm/hr
𝑓𝑝 + 𝑡𝑛
Recognizing the significance of precipitation rate as a
Overall, these metrics provide a comprehensive evalua-
critical factor, our objective is to pinpoint areas that are
tion of a classification system’s performance and can help
susceptible to encountering heavy rainfall within the next
identify areas for improvement.
hour. The classification into two classes simplifies the
problem and provides a clear distinction between areas with
different levels of heavy rainfall risk. Using a 4 mm/hour
4
https://www.usgs.gov/
Precision Recall F1-score
Model Accuracy ROC
0 1 0 1 0 1
RF 81% 0.79 0.84 0.79 0.81 0.87 0.81 0.813
SVM 77.2% 0.77 0.78 0.76 0.78 0.77 0.78 0.772
DT 76.5% 0.73 0.77 0.77 0.73 0.75 0.75 0.754
LR 90% 0.91 0.90 0.90 0.91 0.91 0.90 0.904
MLP 87.8% 0.82 0.94 0.93 0.84 0.87 0.89 0.879
LSTM 86.6% 0.79 0.96 0.96 0.79 0.86 0.87 0.874
Transformer 93.5% 0.88 0.98 0.98 0.90 0.93 0.94 0.941
TGCN (Ours) 95.6% 0.93 0.97 0.97 0.94 0.95 0.96 0.954
Table 2
When comparing model performance on the Real Estate dataset, the proposed model has achieved an accuracy of 95.6%.

Precision Recall F1-score
Model Accuracy ROC
0 1 0 1 0 1
RF 74.4% 0.65 0.78 0.56 0.84 0.60 0.81 0.701
SVM 54.1% 0.28 0.63 0.20 0.73 0.23 0.67 0.461
DT 80.5% 0.91 0.74 0.69 0.93 0.78 0.82 0.807
LR 78.8% 0.85 0.74 0.70 0.87 0.77 0.80 0.87
MLP 80.3% 0.83 0.78 0.77 0.84 0.80 0.81 0.804
LSTM 83.1% 0.87 0.80 0.79 0.88 0.83 0.84 0.832
Transformer 83.4% 0.85 0.82 0.82 0.85 0.83 0.83 0.835
TGCN (Ours) 86.6% 0.90 0.83 0.82 0.91 0.86 0.87 0.867
Table 3
When comparing model performance on the Precipitation dataset, the proposed model has achieved an accuracy of 86.6%.

threshold, we classify areas as either low-risk (labeled as 0) values for both classes compared to the baseline models. It
or high-risk (labeled as 1). For example, out of 10,000 grid achieves precision scores of 0.9 and 0.83 for classes 0 and
points in the study area, 4,798 have a potential for heavy 1, respectively, along with recall scores of 0.82 for class 0
rain risk, while 5,202 do not. This classification simplifies and 0.85 for class 1. The F1 scores also indicate the TGCN
decision-making and resource allocation. model’s overall better performance. The ROC score for the
TGCN model is 0.867.
These results demonstrate that the proposed TGCN model
4.4. Baselines consistently outperforms the other models on both datasets
in terms of accuracy, precision, recall, F1 score, and ROC
We use the following baseline methods: score. The TGCN model’s ability to capture temporal, non-
temporal, and spatial information through its integration
• Random Forest (RF) [37]
of the transformer layer and the graph convolutional net-
• Support Vector Machine (SVM) [38]
work contributes to its good performance in identifying and
• Decision Tree (DT) [39] predicting hotspots and heavy rainfall areas.
• Linear Regression (LR) [40]
• Multilayer Perceptron (MLP) [41]
• Long Short Term Memory (LSTM) [25] 6. Conclusion
• Transformer [34]
In conclusion, the accurate prediction of heavy rainfall
events is crucial for effective urban water usage, disaster
5. Performance Analysis response, and mitigation efforts. This paper proposed a pre-
diction model that leverages spatially connected features
Based on the results presented in Table 2 and Table 3, we and real-world climate data to predict heavy rainfall risks
can analyze the performance of different models on the Real across a broad range. Through extensive experimentation,
Estate dataset and the Precipitation dataset, respectively. it was observed that the TGCN model outperformed the
In Table 2, the proposed model outperforms all the base- other machine learning methods in forecasting both heavy
line models with an accuracy of 95.6%. The proposed model rainfall events and real estate trends.
also exhibits the highest precision for both classes (0 and
1), achieving 0.93 and 0.97, respectively. It demonstrates
high recall values for both classes as well. The F1 scores are 7. Future Work and Limitations
also higher for the proposed model compared to the base-
While this study successfully demonstrated the effectiveness
line models, indicating a better balance between precision
of the proposed TGCN model in predicting heavy rainfall
and recall. The TGCN model’s performance is further re-
risks, there are several avenues for future research and im-
flected in the ROC score of 0.954, which indicates its ability
provement.
to discriminate between the two classes effectively.
We plan to incorporate more diverse and comprehensive
Table 3 shows that the proposed model again achieves the
datasets, including additional meteorological and geograph-
highest accuracy of 86.6%. Similar to the Real Estate dataset,
ical features. This expansion has the potential to enhance
the TGCN model demonstrates superior precision and recall
the accuracy and generalizability of the TGCN model. Fur- [13] L. Cai, B. Yan, G. Mai, K. Janowicz, R. Zhu, Transgcn:
thermore, we are considering the integration of real-time Coupling transformation assumptions with graph con-
data streams and the utilization of advanced data fusion volutional networks for link prediction, in: Proceed-
techniques to further enhance the model’s forecasting capa- ings of the 10th international conference on knowl-
bilities. edge capture, 2019, pp. 131–138.
[14] L. Gan, X. Yang, N. Narisetty, F. Liang, Bayesian joint
estimation of multiple graphical models, Advances in
Acknowledgement Neural Information Processing Systems 32 (2019).
[15] H. Gao, Z. Wang, S. Ji, Large-scale learnable graph
This work was partially supported by the National Science
convolutional networks, in: Proceedings of the 24th
Foundation (NSF) under Grant No. 2318641. Any opinions,
ACM SIGKDD international conference on knowledge
findings, and conclusions or recommendations expressed in
discovery & data mining, 2018, pp. 1416–1424.
this material are those of the authors and do not reflect the
[16] H. Wang, M. Zhao, X. Xie, W. Li, M. Guo, Knowledge
views of the National Science Foundation.
graph convolutional networks for recommender sys-
tems, in: The world wide web conference, 2019, pp.
References 3307–3313.
[17] M. Gori, G. Monfardini, F. Scarselli, A new model
[1] A.-T. Kuo, H. Chen, W.-S. Ku, Bert-trip: Effective and for learning in graph domains, in: IEEE International
scalable trip representation using attentive contrast Joint Conference on Neural Networks, volume 2, IEEE,
learning, in: 2023 IEEE 39th International Conference 2005, pp. 729–734.
on Data Engineering (ICDE), IEEE Computer Society, [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner,
2023, pp. 612–623. G. Monfardini, The graph neural network model, IEEE
[2] P.-Y. Ting, T. Wada, Y.-L. Chiu, M.-T. Sun, K. Sakai, Trans. Neural Networks 20 (2009) 61–80.
W.-S. Ku, A. A.-K. Jeng, J.-S. Hwu, Freeway travel [19] Y. Li, D. Tarlow, M. Brockschmidt, R. S. Zemel, Gated
time prediction using deep hybrid model–taking sun graph sequence neural networks, in: ICLR, 2016.
yat-sen freeway as an example, IEEE Transactions on [20] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamil-
Vehicular Technology 69 (2020) 8257–8266. ton, J. Leskovec, Graph convolutional neural networks
[3] A. Datta, S. Banerjee, A. O. Finley, A. E. Gelfand, Hier- for web-scale recommender systems, in: SIGKDD,
archical nearest-neighbor gaussian process models for ACM, 2018, pp. 974–983.
large geostatistical datasets, Journal of the American [21] J. Bruna, W. Zaremba, A. Szlam, Y. LeCun, Spectral
Statistical Association 111 (2016) 800–812. networks and locally connected networks on graphs,
[4] B. Gräler, E. J. Pebesma, G. B. Heuvelink, Spatio- in: ICLR, 2014.
temporal interpolation using gstat., R J. 8 (2016) 204. [22] J. Chen, T. Ma, C. Xiao, Fastgcn: Fast learning with
[5] Z. Diao, X. Wang, D. Zhang, Y. Liu, K. Xie, S. He, Dy- graph convolutional networks via importance sam-
namic spatial-temporal graph convolutional neural pling, in: ICLR, OpenReview.net, 2018.
networks for traffic forecasting, in: Proceedings of the [23] B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolu-
AAAI conference on artificial intelligence, volume 33, tional networks: A deep learning framework for traffic
2019, pp. 890–897. forecasting, arXiv preprint arXiv:1709.04875 (2017).
[6] K. Kitchat, M.-H. Lin, H.-S. Chen, M.-T. Sun, K. Sakai, [24] S. Guo, Y. Lin, N. Feng, C. Song, H. Wan, Attention
W.-S. Ku, T. Surasak, A deep reinforcement learning based spatial-temporal graph convolutional networks
system for the allocation of epidemic prevention mate- for traffic flow forecasting, in: Proceedings of the
rials based on ddpg, Expert Systems with Applications AAAI conference on artificial intelligence, volume 33,
242 (2024) 122763. 2019, pp. 922–929.
[7] F. Amato, F. Guignard, S. Robert, M. Kanevski, A [25] S. Hochreiter, J. Schmidhuber, Long short-term mem-
novel framework for spatio-temporal prediction of ory, Neural computation 9 (1997) 1735–1780.
environmental data using deep learning, Scientific [26] S. McNally, J. Roche, S. Caton, Predicting the price of
reports 10 (2020) 22243. bitcoin using machine learning, in: 2018 26th Euromi-
[8] H. Liu, X. Mi, Y. Li, Smart deep learning based wind cro International Conference on Parallel, Distributed
speed prediction model using wavelet packet decom- and Network-based Processing (PDP), IEEE, 2018, pp.
position, convolutional neural network and convo- 339–343.
lutional long short term memory network, Energy [27] S. D. Yeddula, C. Jiang, B. Hui, W.-S. Ku, Traffic acci-
Conversion and Management 166 (2018) 120–131. dent hotspot prediction using temporal convolutional
[9] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, Q. Tian, Ac- networks: A spatio-temporal approach, in: Proceed-
curate medium-range global weather forecasting with ings of the 31st ACM International Conference on
3d neural networks, Nature (2023) 1–6. Advances in Geographic Information Systems, 2023,
[10] A. Moraux, S. Dewitte, B. Cornelis, A. Munteanu, A pp. 1–4.
deep learning multimodal method for precipitation [28] A. Borovykh, S. Bohte, C. W. Oosterlee, Conditional
estimation, Remote Sensing 13 (2021) 3278. time series forecasting with convolutional neural net-
[11] X. Shi, Z. Gao, L. Lausen, H. Wang, D.-Y. Yeung, W.- works, arXiv preprint arXiv:1703.04691 (2017).
k. Wong, W.-c. Woo, Deep learning for precipitation [29] S. Wu, X. Xiao, Q. Ding, P. Zhao, Y. Wei, J. Huang,
nowcasting: A benchmark and a new model, Advances Adversarial sparse transformer for time series fore-
in neural information processing systems 30 (2017). casting, Advances in neural information processing
[12] T. N. Kipf, M. Welling, Semi-supervised classification systems 33 (2020) 17105–17115.
with graph convolutional networks, arXiv preprint [30] J. Yoo, Y. Soun, Y.-c. Park, U. Kang, Accurate multi-
arXiv:1609.02907 (2016). variate stock movement prediction via data-axis trans-
former with multi-level contexts, in: Proceedings of
the 27th ACM SIGKDD Conference on Knowledge
Discovery & Data Mining, 2021, pp. 2037–2045.
[31] Y. Liu, S. Wang, J. Chen, B. Chen, X. Wang, D. Hao,
L. Sun, Rice yield prediction and model interpreta-
tion based on satellite and climatic indicators using a
transformer method, Remote Sensing 14 (2022) 5045.
[32] Z. Lin, M. Li, Z. Zheng, Y. Cheng, C. Yuan, Self-
attention convlstm for spatiotemporal prediction, in:
Proceedings of the AAAI conference on artificial in-
telligence, volume 34, 2020, pp. 11531–11538.
[33] X. Wang, Y. Ma, Y. Wang, W. Jin, X. Wang, J. Tang,
C. Jia, J. Yu, Traffic flow prediction via spatial temporal
graph neural network, in: Proceedings of the web
conference 2020, 2020, pp. 1082–1092.
[34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Atten-
tion is all you need, arXiv preprint arXiv:1706.03762
(2017).
[35] C. Jiang, W. Wang, N. Pan, W.-S. Ku, A multimodal
geo dataset for high-resolution precipitation forecast-
ing, in: Proceedings of the 31st ACM International
Conference on Advances in Geographic Information
Systems, 2023, pp. 1–4.
[36] C. Jiang, J. Li, W. Wang, W.-S. Ku, Modeling real estate
dynamics using temporal encoding, in: Proceedings
of the 29th International Conference on Advances in
Geographic Information Systems, 2021, pp. 516–525.
[37] T. K. Ho, Random decision forests, in: Proceedings
of 3rd international conference on document analysis
and recognition, volume 1, IEEE, 1995, pp. 278–282.
[38] B. E. Boser, I. M. Guyon, V. N. Vapnik, A training
algorithm for optimal margin classifiers, in: Proceed-
ings of the fifth annual workshop on Computational
learning theory, 1992, pp. 144–152.
[39] W.-Y. Loh, Classification and regression trees, Wiley
interdisciplinary reviews: data mining and knowledge
discovery 1 (2011) 14–23.
[40] J. A. Nelder, R. W. Wedderburn, Generalized linear
models, Journal of the Royal Statistical Society: Series
A (General) 135 (1972) 370–384.
[41] L. B. Almeida, C1. 2 multilayer perceptrons, Handbook
of Neural Computation C 1 (1997).