Deep Spatio-Temporal Encoding: Achieving Higher Accuracy by
                         Aligning with External Real-World Data
                         Chen Jiang1,* , Wenlu Wang2 , Jingjing Li3 , Naiqing Pan1 and Wei-Shinn Ku1
                         1
                           Auburn University, Auburn, AL, USA
                         2
                           Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
                         3
                           Meta, Menlo Park, CA, USA


                                          Abstract
                                          Spatio-temporal deep learning has drawn a lot of attention since many downstream real-world applications can benefit from accurate
                                          predictions. For example, accurate prediction of heavy rainfall events is essential for effective urban water usage, flooding warning, and
                                          mitigation. In this paper, we propose a strategy to leverage spatially connected real-world features to enhance prediction accuracy.
                                          Specifically, we leverage spatially connected real-world climate data to predict heavy rainfall risks in a broad range in our case study. We
                                          experimentally ascertain that our Trans-Graph Convolutional Network (TGCN) accurately predicts heavy rainfall risks and real estate
                                          trends, demonstrating the advantage of incorporating external spatially-connected real-world data to improve model performance, and
                                          it shows that this proposed study has a significant potential to enhance spatio-temporal prediction accuracy, aiding in efficient urban
                                          water usage, flooding risk warning, and fair housing in real estate.

                                          Keywords
                                          Spatial-temporal Analysis, Deep Learning, Transformer,


                                                                                                                                 ing spatially-linked external real-world data along with a
                                                                                                                                 TGCN to learn the spatio-temporal dependencies from the
                                                                                                                                 combined data. As it has been proven that utilizing more
                                                                                                                                 multi-source real-world data is more likely to lead to higher
                                                                                                                                 accuracy [9], our study aims to introduce a fresh perspective
                                                                                                                                 on integrating external real-world data into the proposed
                                                                                                                                 framework. We use heavy rainfall prediction as a case study
                                                                                                                                 for our proposed method, and overall we aim to provide
                                                                                                                                 accurate spatio-temporal predictions by leveraging as much
                                                                                                                                 information as possible, enabling better decision-making
                                                                                                                                 for a broad range of spatio-temporal applications and at the
                                                                                                                                 same time offering a novel angle and a comprehensive evalu-
                                                                                                                                 ation to demonstrate the feasibility of integrating additional
                         Figure 1: An example of spatial and temporal features in the                                            external real-world data without the necessity of customiz-
                         case study of precipitation prediction.                                                                 ing transformer attention mechanisms. Our approach is
                                                                                                                                 experimentally validated by predicting heavy rainfall events
                         1. Introduction                                                                                         and real estate hotspots.
                                                                                                                                    The traditional method for predicting heavy rainfall in-
                         Spatio-temporal predictions have been extensively studied                                               volves manually engineering features from weather data,
                         due to their impact on real-world applications [1, 2, 3, 4, 5].                                         including temperature, pressure, humidity, etc. Meteorolo-
                         For example, heavy rainfall events can cause significant                                                gists rely on their expertise to interpret this data and fore-
                         damage to infrastructure and pose serious threats to human                                              cast future weather patterns. This process entails observing
                         safety. Predicting these events with greater accuracy allows                                            and analyzing atmospheric factors to predict weather pat-
                         better preparation and response [6], ultimately saving lives                                            terns. However, this traditional approach is time-consuming,
                         and reducing the economic impact of such events.                                                        labor-intensive, and susceptible to human error, especially
                            Deep learning methods, such as deep spatio-temporal                                                  when dealing with large datasets. As data grows, it becomes
                         prediction models [7, 8], have improved the performance                                                 increasingly challenging to analyze large amounts of infor-
                         of rainfall forecasting over the years. However, the role of                                            mation by hand.
                         external data in enhancing the prediction accuracy is still                                                Previous research has investigated using deep learning
                         controversial. Some argue that external data can provide                                                for precipitation prediction [10, 11] with promising results.
                         more useful information for the prediction model, while                                                 However, some limitations can be significantly improved to
                         others claim that external data can introduce more noise                                                enhance deep model performance. One area with room for
                         and complexity to the learning process. In this study, we                                               enhancement is leveraging spatial dependencies. To tackle
                         propose to improve spatio-temporal predictions by combin-                                               this challenge, we propose a model that integrates both
                                                                                                                                 Graph Convolution Networks (GCNs) and a Transformer.
                          Published in the Proceedings of the Workshops of the EDBT/ICDT 2024                                    This model enables combining external spatially-linked data
                          Joint Conference (March 25-28, 2024), Paestum, Italy                                                   for spatio-temporal predictions.
                         *
                          Corresponding author.                                                                                     Specifically, we employ a GCN to analyze the adjacency
                         $ czj0042@auburn.edu (C. Jiang); wenlu.wang@tamucc.edu                                                  matrix on a grid level and generate correlations between
                          (W. Wang); jingjingli@meta.com (J. Li); nzp0030@auburn.edu
                          (N. Pan); weishinn@auburn.edu (W. Ku)
                                                                                                                                 each grid element. The GCN captures the spatial relation-
                          0009-0000-6888-6643 (C. Jiang); 0000-0002-4829-1068 (W. Wang);                                        ships and dependencies among neighboring grid points,
                          0000-0002-1465-7738 (N. Pan); 0000-0001-8636-4689 (W. Ku)                                              allowing for a comprehensive understanding of the data’s
                                  © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                  Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
spatial dynamics. We then utilize a Transformer model to          flow.
encode the temporal precipitation data and combine it with
the spatial correlations obtained from the GCNs. By com-
bining the GCNs and the Transformer within the proposed           3. Methodology
TGCN model, we create a framework that harnesses both
                                                                  In this section, we detail our model architecture and the
the spatial and temporal dimensions of the data.
                                                                  benefits of our design.

2. Related Work                                                   3.1. Overview
2.1. Graph Neural Networks                                        The architecture we propose, illustrated in Figure 2, incorpo-
                                                                  rates a combination of techniques to enhance the prediction
Graph Convolutional Networks (GCNs) are a type of deep            model. We begin by utilizing a transformer encoder to ef-
learning model designed to process data represented in a          fectively encode the time series precipitation data, and then
graph structure, such as social or sensor networks [12].          integrate local climate features into the model, enabling
GCNs have demonstrated their effectiveness in various ap-         a comprehensive understanding of the factors influencing
plications, including node classification, link prediction, and   heavy rainfall.
recommendation systems [13, 14, 15, 16]. The concept of              To address spatial dependencies and relationships among
Graph Neural Networks (GNNs) was initially introduced             grid points, a GCN is introduced. This GCN learns the spa-
in [17] and further expanded upon in subsequent research          tial dependencies within the dataset, considering the inter-
by [18]. GNNs, a type of recurrent neural network (RNN),          connectedness of grids based on their spatial locations. By
iteratively propagate information from neighboring nodes          leveraging the GCN, the model becomes capable of captur-
until reaching a stable fixed point. This iterative process has   ing and integrating spatial information, thereby enhancing
traditionally been computationally expensive, but recent          prediction accuracy.
studies, such as [19], have made significant improvements            The latent code, which combines the encoded time series
in this area. Inspired by the success of Convolutional Neural     precipitation data and the spatially connected local climate
Networks (CNNs) in computer vision, which extract high-           features learned through the GCN, is fed into a multi-layer
level features from images using convolution and pooling          perceptron (MLP) for prediction. This integrated architec-
layers, current models aim to adapt these layers to directly      ture allows the MLP model to leverage the fused informa-
process graph inputs. GCNs can be categorized into two            tion, including temporal precipitation data, other climate
types of graph convolution layers: spectral graph convolu-        features, and spatial factors, to effectively learn and infer
tion and localized graph convolution, as discussed in [20].       future heavy rainfall areas.
Early research primarily focused on spectral graph convolu-
tions, pioneered by [21]. The current state-of-the-art model,
GCN, further simplified the graph convolution operation by
                                                                  3.2. Model Architecture
employing a localized first-order approximation. However,         3.2.1. Preliminaries
spectral methods require operations on the entire graph
Laplacian during training, which can be computationally           Our proposed TGCN model consists of Encoder, GCNs and
expensive. Several subsequent works, such as FastGCN [22]         Multi-layer Perceptron (MLP) layers. The major compo-
have aimed to alleviate this issue.                               nent in the transformer is the Multi-head self-attention.
Recently, researchers have explored the application of GCNs                                              𝑄𝐾 𝑇
in time series prediction. For example, spatio-temporal GCN-              𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄, 𝐾, 𝑉 ) = 𝑠𝑜𝑓 𝑡𝑚𝑎𝑥( √ )𝑉              (1)
                                                                                                           𝑑𝑘
based approaches have been proposed for traffic flow pre-
diction [23], and the integration of time-aware topological         Where the K and V are matrices that store the keys and
information into GCNs using the mathematical framework            values. Q is the query that will map against a set of keys.
of zigzag persistence [24].
                                                                  3.2.2. Transformer-based Encoder
2.2. Spatial Temporal Prediction                                  We have developed a predictive model using the Trans-
In this section, we discuss various existing temporal and         former architecture, tailored for heavy rainfall forecasts.
spatial-temporal forecasting methods. For example, Recur-         Unlike traditional methods that only use past rainfall data,
rent Neural Networks (RNNs), especially long-short-term           our model factors in numerous external variables to boost
memory (LSTM) [25], have gained popularity in time series         accuracy. We examine local features, including geography,
forecasting [26]. Convolutional Neural Networks (CNN)             atmospheric conditions (pressure, temperature, wind), hu-
and its variant Temporal Convolutional Neural Networks            midity, and topography, all of which influence heavy rainfall
(TCN) are another option for sequence prediction [27], of-        likelihood in a specific area. Therefore, we have developed a
fering parallel computations compared to RNNs [28]. In            transformer-based prediction model [34] that incorporates
recent years, researchers have explored Transformers and          GCNs to process the spatial features. By doing so, our model
its variants in time series forecasting, achieving state-of-      can capture the spatial relationships among various features
the-art performance in tasks like energy consumption and          in a graph structure, such as the dependencies between grid
stock market [29, 30, 31]. Designing a model capable of           point locations and their corresponding climate data. The
comprehensively capturing both spatial and temporal pat-          integration of the GCNs enhances our model’s ability to
terns represents another emerging trend in spatial-temporal       capture both temporal and spatial information. Our model
prediction tasks [32, 33]. For example, [33] introduced a         design starts with a transformer encoder capturing tempo-
spatial-temporal graph neural network for predicting traffic      ral precipitation patterns, followed by embedding this data
                                                                  and merging it with local climate data like moisture and
          Figure 2: Design Flow of the Trans-Graph Convolutional Prediction Model: The Trans-Graph Convolutional Prediction Model
          incorporates a transformer layer for time-series precipitation data, a GCN for local climate features and spatial relationships
          among grid points, and a four-layer MLP model for the final prediction.


humidity. We enhance prediction accuracy with this added               graph and enhance the overall representation of the input
context.                                                               data.
                                                                          As illustrated in Figure 3, GCNs involve learning a linear
3.2.3. Graph Convolutional Networks                                    transformation of the feature vectors of each node in a graph,
                                                                       which is then used to update the node features by aggregat-
                                                                       ing information from the node’s neighbors. Mathematically,
                                                                       this can be expressed as:
                                                                                             ⎛                             ⎞
                                                                                                 ∑︁      1
                                                                                ℎ(𝑙+1)
                                                                                 𝑣𝑖    = 𝜎⎝                 𝑊 (𝑙+1) ℎ(𝑙+1)
                                                                                                                     𝑣𝑗
                                                                                                                           ⎠      (2)
                                                                                                        𝑐𝑖𝑗
                                                                                                𝑣𝑗 ∈𝒩 (𝑣𝑖 )

                                                                                                (𝑙+1)
                                                                          In the equation, ℎ𝑣𝑖       represents the feature vector
                                                                       of node 𝑣𝑖 at layer 𝑙 + 1, 𝑊 (𝑙+1) denotes the learnable
                                                                       weight matrix for layer 𝑙 + 1, 𝒩 (𝑣𝑖 ) represents the set of
                                                                       neighbors of node 𝑣𝑖 , and 𝑐𝑖𝑗 is a normalization constant
                                                                       that ensures proper scaling of the aggregated information.
                                                                       The function 𝜎 denotes a non-linear activation function,
                                                                       which introduces non-linearity into the model. In our
                                                                       specific case, we utilize the ReLU activation function. This
                                                                       equation can be interpreted as calculating a weighted sum
                                                                       of the feature vectors of the neighbors of node 𝑣𝑖 at layer
                                                                       𝑙 + 1, where the weights are determined by the learned
                                                                       weight matrix 𝑊 (𝑙+1) . Then, a non-linear activation
                                                                       function is applied to obtain the updated feature vector
                                                                         (𝑙+1)
                                                                       ℎ𝑣𝑖     for node 𝑖 at layer 𝑙 + 1. This process is repeated
                                                                       across multiple layers to learn expressive representations
                                                                       of the graph data.

                                                                          For the final prediction, we utilize a four-layer MLP model
                                                                       that combines time series data with other features, effec-
                                                                       tively leveraging both temporal and spatial information
Figure 3: Graph Convolutional Network Architecture: The input
                                                                       captured by our model for more accurate predictions.
data consists of the spatial relation matrix and spatially connected
climate data. The nodes in the figure are for illustrative purposes.
                                                                          By leveraging the transformer architecture, incorporating
                                                                       GCNs, and utilizing a four-layer MLP model, our approach
                                                                       enables the effective integration of temporal and spatial
   GCNs have received considerable attention in recent                 information for improved prediction accuracy.
years and have shown impressive performance in various ap-
plications. In this study, we aim to improve the performance           3.2.4. Jointly Learning
of our model by integrating a GCN on top of a Transformer
encoder model. The GCN model is specifically designed to               As illustrated in Figure 2, we propose to map temporal data
capture the spatial relationships between each node in the             and non-temporal data into the same latent space and merge
                                                                       the latent vectors for the subsequent prediction task.
   To encode the local climate features and capture the spa-         4.1.1. Precipitation Dataset
tial dependencies among the grid points for data 𝑥𝑐 , we
                                                                     Our precipitation dataset is sourced from the NOAA HRRR
employ a GCN to learn the relationships and dependencies
                                                                     dataset2 , offering real-time climate data at a 3 km spatial
within the spatial domain. The output hidden features at
                                         (𝐿)                         resolution and 1-hour temporal resolution. This dataset [35]
a specific layer 𝐿 can be denoted as ℎ𝑐 . Equation 2 is              encompasses total precipitation, precipitation rate, and nine
applied in this context. Assuming we use 𝐿𝑐 layers in total,         additional climate variables, including humidity (%), mois-
and we use the final layer to summarize climate information,         ture availability (%), pressure (Pa), wind speed (m/s), and
which is defined as:                                                 total cloud cover (%). Simulated brightness temperature data
                                         𝑐                           is acquired from the GOES 11 satellite 3 . The precipitation
                              hc = ℎ(𝐿
                                    𝑣
                                       )
                                                               (3)
                                                                     data consist of the following three types:
           (0)
where ℎ𝑣 = 𝑥𝑐                                                                • Temporal precipitation data, denoted as 𝑥𝑡 , as shown
   In this equation, ℎ𝑐 represents the hidden features at layer                in Table 1 and Figure 5. It captures the historical
𝐿, which are obtained by applying the ReLU activation func-                    patterns and fluctuations in precipitation over time.
                                                   (𝐿) (𝐿−1)
tion to the sum of the weighted input features 𝑊𝑐 ℎ𝑐                           Specifically, we define the temporal precipitation
                     (𝐿)
and the bias term 𝑏𝑐 from Equation 2.                                          rate and total accumulated precipitation over the
   We encode temporal precipitation data using a trans-                        past 6 hours as 𝑥ℎ , which consists of 𝑁 timestamps:
former encoder [34],
                                                                                                 𝑥𝑡 = {𝑥1𝑡 , 𝑥2𝑡 , ..., 𝑥𝑁
                                                                                                                         𝑡 }
                 ht = 𝑇 𝑟𝑎𝑛𝑠𝑓 𝑜𝑟𝑚𝑒𝑟𝐸𝑛𝑐𝑜𝑑𝑒𝑟(𝑥𝑡 )                (4)             𝑥𝑖𝑡 𝑖∈{1..𝑁 } represents the average price for the 𝑖-th
                                                                               timestamp.
                              ht ∈ R𝑑𝑡                         (5)
                                                                             • Local climate data 𝑥𝑐 : The dataset comprises twelve
. Since 𝑥𝑡 and 𝑥𝑐 are encoded as ht and hc , we define the                     local climate variables, including temperature, hu-
merged hidden state as hm                                                      midity, wind speed, atmospheric pressure, and vari-
                                                                               ous other meteorological factors.
                       hm = 𝐶𝑂𝑁 𝐶𝐴𝑇 (ht , hc )                 (6)           • Spatial location data 𝑥𝑠 : Each grid point in the
                                                                               dataset represents a specific location within the
To further process the merged information, we use another
                                                                               study area, such as a region or a cell. To represent the
multi-layer perceptron specifically trained for the predic-
                                                                               relationships between these grid points, we used an
tion task. Similarly, we define the 𝑙-th layer network as
                                                                               adjacency matrix. In the adjacency matrix, a value
(assuming 𝐿𝑛 layers in total)
                                                                               of 0 indicates that two grid points are not neigh-
                                                                               bors, while a value of 1 denotes their neighboring
                   ℎ(𝑙)       (𝑙) (𝑙−1)
                    𝑛 = 𝑅𝑒𝐿𝑢(𝑊𝑛 ℎ𝑛      + 𝑏(𝑙)
                                           𝑛 )                 (7)
                                                                               relationship.
                 (0)             (𝑙−1)
   where ℎ𝑛 = hm , and ℎ𝑛        is the input of the (l-1)-th
                             (𝑙)      (𝑙)                            4.1.2. Real-estate Dataset
layer in the i-th position. 𝑊𝑛 and 𝑏𝑛 are model parame-
ters.                                                                The real estate dataset captures the dynamics of the U.S. real
We use the output from the last layer for prediction                 estate market by collecting spatially correlated data from
                                             𝑛
                                                                     multiple sources. It consists of 7,436 neighborhoods, 567
                         𝑦¯ = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(ℎ(𝐿
                                       𝑛
                                          )
                                            )                  (8)   cities, 304 counties, 225 metros, and 50 states across the U.S.
                                                                     The data are connected through spatial locations, forming a
Loss is measured with the Binary Cross-Entropy loss (BCE)            multi-level spatial hierarchy. The dataset consists of three
                                                                     main components: census data, pricing history, and school
                        𝑙𝑜𝑠𝑠 = 𝐵𝐶𝐸𝑙𝑜𝑠𝑠(𝑦¯, 𝑦)                  (9)
                                                                     district information. Here are some statistics about the real
The binary cross entropy (BCE) loss can be formulated as             estate dataset:
follows:                                                                     • Spatial Hierarchy Levels: The dataset includes a
                          𝑁
                                                                               multi-level spatial hierarchy, including information
                  1 ∑︁                                                         at the state, metro, county, city, and neighborhood
𝐵𝐶𝐸𝑙𝑜𝑠𝑠 = −              [𝑦𝑖 log(𝑝𝑖 ) + (1 − 𝑦𝑖 ) log(1 − 𝑝𝑖 )]
                 𝑁 𝑖=1                                                         levels.
                                                          (10)               • Census Data: The census data consists of 16 vari-
where: 𝑁 is the total number of samples, 𝑦𝑖 is the true label                  ables related to various aspects of housing prices,
for sample 𝑖, 𝑝𝑖 is the predicted probability 𝑖, log denotes                   personal income, demographics, and spatial infor-
the natural logarithm.                                                         mation.
                                                                             • Pricing History: The dataset includes temporal hous-
                                                                               ing price history for each neighborhood, spanning
4. Experimental Validation                                                     from 1996 to 2019.
                                                                             • School District Information: The dataset incorpo-
4.1. Datasets                                                                  rates school district information. It provides details
Our data and code are publicly available1 . In our dataset,                    on the number of school districts present in each
the train and test split ratio is 7:3.                                         county within the studied area. Additionally, the
                                                                               dataset includes information on the top school dis-
                                                                               trict(s) within the region.
                                                                     2
                                                                         https://rapidrefresh.noaa.gov/hrrr/
1                                                                    3
    https://github.com/jiang28/Deep-Spatio-Temporal-Encoding             https://www.goes.noaa.gov/
                   GridID                   Longitude           Latitude            Grid Points        Grid Spacing    Vertical Level
                     1                        122.71             21.13              1799 × 1059            3 km              50
                Time Stamps             2022/09/23 00:00    2022/09/23 01:00      2022/09/23 02:00          ...       2022/10/02 00:00
        Precipitation rate (mm/hour)          0.0                 0.72                  0.94                ...              0
          Total Precipitation (mm)            0.01                1.88                  4.3                 ...            31.61

     Table 1
     Temporal data format. It has data on the grid id, longitude, latitude, grid points, grid spacing, vertical level, timestamps, total
     precipitation, and precipitation rate.


To facilitate the task of predicting real estate hotspots, the             4.3. Heavy Rainfall Prediction
dataset is classified into two classes based on the house price
                                                                           Study Area: Figure 4 presents the location of the study area
increase rate for each neighorhood: 1 for hotspots and 0
                                                                           in this study. It consists of 10,000 grids across the state of
for non-hotspots. The detailed settings of the Real-estate
                                                                           Florida in the U.S.
Dataset can be found in [36].

4.2. Evaluation Metrics
We evaluate the performance of a classification system us-
ing various metrics, including Accuracy, Recall, Precision,
F1-score, and ROC. These metrics are calculated based on
the number of true positives (𝑡𝑝 ), false positives (𝑓𝑝 ), false
negatives (𝑓𝑛 ), and true negatives (𝑡𝑛 ). Accuracy measures
the proportion of observations, both positive and negative,
that were correctly classified by the system, and can be
computed using the formula:
                                 𝑡𝑝 + 𝑡𝑛
                  𝑎𝑐𝑐 =
                            𝑡𝑝 + 𝑓𝑝 + 𝑡𝑛 + 𝑓𝑛
  Recall measures the proportion of true positives that were
correctly identified by the system, and can be computed                    Figure 4: The study area consists of 10,000 grids across South
using the formula:                                                         Florida in the United States. The figure shows the observed
                                                                           precipitation values in each county within this area.
                                     𝑡𝑝
                      𝑟𝑒𝑐𝑎𝑙𝑙 =
                                  𝑡𝑝 + 𝑓𝑛
  Precision measures the proportion of identified positives
that were actually true positives, and can be computed using
the formula:
                                       𝑡𝑝
                    𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
                                    𝑡𝑝 + 𝑓𝑝
  F1-score is a weighted average of precision and recall,
and provides a single measure of the system’s accuracy on
the dataset, and can be computed using the formula:

                            𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 * 𝑟𝑒𝑐𝑎𝑙𝑙
               𝐹1 = 2 *
                            𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙                             Figure 5: Study Area Precipitation Rate Heatmap: 100x100 grid
   ROC (Receiver Operating Characteristic) curve is a graph-               region on September 28, 2022, at 13:00 (mm/s).
ical plot that illustrates the performance of a binary classifier
system. It is created by plotting the True Positive Rate (TPR)
against the False Positive Rate (FPR), which can be computed                  Our study identifies heavy rainfall risk areas based on
using the formulas:                                                        precipitation rate. Following the United States Geological
                                                                           Survey (USGS) standard4 , we define the heavy rainfall risk
                                    𝑡𝑝                                     as follows:
                       𝑇𝑃𝑅 =
                                 𝑡𝑝 + 𝑓𝑛                                                          {︃
                                                                                                     0, if 𝑅 < 4 mm/hr
                                     𝑓𝑝                                                  Class =
                       𝐹𝑃𝑅 =                                                                         1, if 𝑅 ≥ 4 mm/hr
                                  𝑓𝑝 + 𝑡𝑛
                                                                           Recognizing the significance of precipitation rate as a
   Overall, these metrics provide a comprehensive evalua-
                                                                           critical factor, our objective is to pinpoint areas that are
tion of a classification system’s performance and can help
                                                                           susceptible to encountering heavy rainfall within the next
identify areas for improvement.
                                                                           hour. The classification into two classes simplifies the
                                                                           problem and provides a clear distinction between areas with
                                                                           different levels of heavy rainfall risk. Using a 4 mm/hour
                                                                           4
                                                                               https://www.usgs.gov/
                                                      Precision            Recall        F1-score
                         Model         Accuracy                                                       ROC
                                                      0       1        0            1    0      1
                          RF              81%        0.79    0.84    0.79      0.81     0.87   0.81   0.813
                         SVM             77.2%       0.77    0.78    0.76      0.78     0.77   0.78   0.772
                          DT             76.5%       0.73    0.77    0.77      0.73     0.75   0.75   0.754
                          LR              90%        0.91    0.90    0.90      0.91     0.91   0.90   0.904
                         MLP             87.8%       0.82    0.94    0.93      0.84     0.87   0.89   0.879
                         LSTM            86.6%       0.79    0.96    0.96      0.79     0.86   0.87   0.874
                      Transformer        93.5%       0.88    0.98    0.98      0.90     0.93   0.94   0.941
                     TGCN (Ours)         95.6%       0.93    0.97    0.97      0.94     0.95   0.96   0.954
    Table 2
    When comparing model performance on the Real Estate dataset, the proposed model has achieved an accuracy of 95.6%.

                                                      Precision            Recall        F1-score
                         Model         Accuracy                                                       ROC
                                                      0       1        0            1    0      1
                          RF             74.4%       0.65    0.78    0.56      0.84     0.60   0.81   0.701
                         SVM             54.1%       0.28    0.63    0.20      0.73     0.23   0.67   0.461
                          DT             80.5%       0.91    0.74    0.69      0.93     0.78   0.82   0.807
                          LR             78.8%       0.85    0.74    0.70      0.87     0.77   0.80    0.87
                         MLP             80.3%       0.83    0.78    0.77      0.84     0.80   0.81   0.804
                         LSTM            83.1%       0.87    0.80    0.79      0.88     0.83   0.84   0.832
                      Transformer        83.4%       0.85    0.82    0.82      0.85     0.83   0.83   0.835
                     TGCN (Ours)         86.6%       0.90    0.83    0.82      0.91     0.86   0.87   0.867
    Table 3
    When comparing model performance on the Precipitation dataset, the proposed model has achieved an accuracy of 86.6%.


threshold, we classify areas as either low-risk (labeled as 0)      values for both classes compared to the baseline models. It
or high-risk (labeled as 1). For example, out of 10,000 grid        achieves precision scores of 0.9 and 0.83 for classes 0 and
points in the study area, 4,798 have a potential for heavy          1, respectively, along with recall scores of 0.82 for class 0
rain risk, while 5,202 do not. This classification simplifies       and 0.85 for class 1. The F1 scores also indicate the TGCN
decision-making and resource allocation.                            model’s overall better performance. The ROC score for the
                                                                    TGCN model is 0.867.
                                                                       These results demonstrate that the proposed TGCN model
4.4. Baselines                                                      consistently outperforms the other models on both datasets
                                                                    in terms of accuracy, precision, recall, F1 score, and ROC
We use the following baseline methods:                              score. The TGCN model’s ability to capture temporal, non-
                                                                    temporal, and spatial information through its integration
     • Random Forest (RF) [37]
                                                                    of the transformer layer and the graph convolutional net-
     • Support Vector Machine (SVM) [38]
                                                                    work contributes to its good performance in identifying and
     • Decision Tree (DT) [39]                                      predicting hotspots and heavy rainfall areas.
     • Linear Regression (LR) [40]
     • Multilayer Perceptron (MLP) [41]
     • Long Short Term Memory (LSTM) [25]                           6. Conclusion
     • Transformer [34]
                                                                    In conclusion, the accurate prediction of heavy rainfall
                                                                    events is crucial for effective urban water usage, disaster
5. Performance Analysis                                             response, and mitigation efforts. This paper proposed a pre-
                                                                    diction model that leverages spatially connected features
 Based on the results presented in Table 2 and Table 3, we          and real-world climate data to predict heavy rainfall risks
can analyze the performance of different models on the Real         across a broad range. Through extensive experimentation,
Estate dataset and the Precipitation dataset, respectively.         it was observed that the TGCN model outperformed the
   In Table 2, the proposed model outperforms all the base-         other machine learning methods in forecasting both heavy
line models with an accuracy of 95.6%. The proposed model           rainfall events and real estate trends.
also exhibits the highest precision for both classes (0 and
1), achieving 0.93 and 0.97, respectively. It demonstrates
high recall values for both classes as well. The F1 scores are      7. Future Work and Limitations
also higher for the proposed model compared to the base-
                                                                    While this study successfully demonstrated the effectiveness
line models, indicating a better balance between precision
                                                                    of the proposed TGCN model in predicting heavy rainfall
and recall. The TGCN model’s performance is further re-
                                                                    risks, there are several avenues for future research and im-
flected in the ROC score of 0.954, which indicates its ability
                                                                    provement.
to discriminate between the two classes effectively.
                                                                       We plan to incorporate more diverse and comprehensive
   Table 3 shows that the proposed model again achieves the
                                                                    datasets, including additional meteorological and geograph-
highest accuracy of 86.6%. Similar to the Real Estate dataset,
                                                                    ical features. This expansion has the potential to enhance
the TGCN model demonstrates superior precision and recall
the accuracy and generalizability of the TGCN model. Fur-        [13] L. Cai, B. Yan, G. Mai, K. Janowicz, R. Zhu, Transgcn:
thermore, we are considering the integration of real-time             Coupling transformation assumptions with graph con-
data streams and the utilization of advanced data fusion              volutional networks for link prediction, in: Proceed-
techniques to further enhance the model’s forecasting capa-           ings of the 10th international conference on knowl-
bilities.                                                             edge capture, 2019, pp. 131–138.
                                                                 [14] L. Gan, X. Yang, N. Narisetty, F. Liang, Bayesian joint
                                                                      estimation of multiple graphical models, Advances in
Acknowledgement                                                       Neural Information Processing Systems 32 (2019).
                                                                 [15] H. Gao, Z. Wang, S. Ji, Large-scale learnable graph
This work was partially supported by the National Science
                                                                      convolutional networks, in: Proceedings of the 24th
Foundation (NSF) under Grant No. 2318641. Any opinions,
                                                                      ACM SIGKDD international conference on knowledge
findings, and conclusions or recommendations expressed in
                                                                      discovery & data mining, 2018, pp. 1416–1424.
this material are those of the authors and do not reflect the
                                                                 [16] H. Wang, M. Zhao, X. Xie, W. Li, M. Guo, Knowledge
views of the National Science Foundation.
                                                                      graph convolutional networks for recommender sys-
                                                                      tems, in: The world wide web conference, 2019, pp.
References                                                            3307–3313.
                                                                 [17] M. Gori, G. Monfardini, F. Scarselli, A new model
 [1] A.-T. Kuo, H. Chen, W.-S. Ku, Bert-trip: Effective and           for learning in graph domains, in: IEEE International
     scalable trip representation using attentive contrast            Joint Conference on Neural Networks, volume 2, IEEE,
     learning, in: 2023 IEEE 39th International Conference            2005, pp. 729–734.
     on Data Engineering (ICDE), IEEE Computer Society,          [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner,
     2023, pp. 612–623.                                               G. Monfardini, The graph neural network model, IEEE
 [2] P.-Y. Ting, T. Wada, Y.-L. Chiu, M.-T. Sun, K. Sakai,            Trans. Neural Networks 20 (2009) 61–80.
     W.-S. Ku, A. A.-K. Jeng, J.-S. Hwu, Freeway travel          [19] Y. Li, D. Tarlow, M. Brockschmidt, R. S. Zemel, Gated
     time prediction using deep hybrid model–taking sun               graph sequence neural networks, in: ICLR, 2016.
     yat-sen freeway as an example, IEEE Transactions on         [20] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamil-
     Vehicular Technology 69 (2020) 8257–8266.                        ton, J. Leskovec, Graph convolutional neural networks
 [3] A. Datta, S. Banerjee, A. O. Finley, A. E. Gelfand, Hier-        for web-scale recommender systems, in: SIGKDD,
     archical nearest-neighbor gaussian process models for            ACM, 2018, pp. 974–983.
     large geostatistical datasets, Journal of the American      [21] J. Bruna, W. Zaremba, A. Szlam, Y. LeCun, Spectral
     Statistical Association 111 (2016) 800–812.                      networks and locally connected networks on graphs,
 [4] B. Gräler, E. J. Pebesma, G. B. Heuvelink, Spatio-               in: ICLR, 2014.
     temporal interpolation using gstat., R J. 8 (2016) 204.     [22] J. Chen, T. Ma, C. Xiao, Fastgcn: Fast learning with
 [5] Z. Diao, X. Wang, D. Zhang, Y. Liu, K. Xie, S. He, Dy-           graph convolutional networks via importance sam-
     namic spatial-temporal graph convolutional neural                pling, in: ICLR, OpenReview.net, 2018.
     networks for traffic forecasting, in: Proceedings of the    [23] B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolu-
     AAAI conference on artificial intelligence, volume 33,           tional networks: A deep learning framework for traffic
     2019, pp. 890–897.                                               forecasting, arXiv preprint arXiv:1709.04875 (2017).
 [6] K. Kitchat, M.-H. Lin, H.-S. Chen, M.-T. Sun, K. Sakai,     [24] S. Guo, Y. Lin, N. Feng, C. Song, H. Wan, Attention
     W.-S. Ku, T. Surasak, A deep reinforcement learning              based spatial-temporal graph convolutional networks
     system for the allocation of epidemic prevention mate-           for traffic flow forecasting, in: Proceedings of the
     rials based on ddpg, Expert Systems with Applications            AAAI conference on artificial intelligence, volume 33,
     242 (2024) 122763.                                               2019, pp. 922–929.
 [7] F. Amato, F. Guignard, S. Robert, M. Kanevski, A            [25] S. Hochreiter, J. Schmidhuber, Long short-term mem-
     novel framework for spatio-temporal prediction of                ory, Neural computation 9 (1997) 1735–1780.
     environmental data using deep learning, Scientific          [26] S. McNally, J. Roche, S. Caton, Predicting the price of
     reports 10 (2020) 22243.                                         bitcoin using machine learning, in: 2018 26th Euromi-
 [8] H. Liu, X. Mi, Y. Li, Smart deep learning based wind             cro International Conference on Parallel, Distributed
     speed prediction model using wavelet packet decom-               and Network-based Processing (PDP), IEEE, 2018, pp.
     position, convolutional neural network and convo-                339–343.
     lutional long short term memory network, Energy             [27] S. D. Yeddula, C. Jiang, B. Hui, W.-S. Ku, Traffic acci-
     Conversion and Management 166 (2018) 120–131.                    dent hotspot prediction using temporal convolutional
 [9] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, Q. Tian, Ac-            networks: A spatio-temporal approach, in: Proceed-
     curate medium-range global weather forecasting with              ings of the 31st ACM International Conference on
     3d neural networks, Nature (2023) 1–6.                           Advances in Geographic Information Systems, 2023,
[10] A. Moraux, S. Dewitte, B. Cornelis, A. Munteanu, A               pp. 1–4.
     deep learning multimodal method for precipitation           [28] A. Borovykh, S. Bohte, C. W. Oosterlee, Conditional
     estimation, Remote Sensing 13 (2021) 3278.                       time series forecasting with convolutional neural net-
[11] X. Shi, Z. Gao, L. Lausen, H. Wang, D.-Y. Yeung, W.-             works, arXiv preprint arXiv:1703.04691 (2017).
     k. Wong, W.-c. Woo, Deep learning for precipitation         [29] S. Wu, X. Xiao, Q. Ding, P. Zhao, Y. Wei, J. Huang,
     nowcasting: A benchmark and a new model, Advances                Adversarial sparse transformer for time series fore-
     in neural information processing systems 30 (2017).              casting, Advances in neural information processing
[12] T. N. Kipf, M. Welling, Semi-supervised classification           systems 33 (2020) 17105–17115.
     with graph convolutional networks, arXiv preprint           [30] J. Yoo, Y. Soun, Y.-c. Park, U. Kang, Accurate multi-
     arXiv:1609.02907 (2016).                                         variate stock movement prediction via data-axis trans-
     former with multi-level contexts, in: Proceedings of
     the 27th ACM SIGKDD Conference on Knowledge
     Discovery & Data Mining, 2021, pp. 2037–2045.
[31] Y. Liu, S. Wang, J. Chen, B. Chen, X. Wang, D. Hao,
     L. Sun, Rice yield prediction and model interpreta-
     tion based on satellite and climatic indicators using a
     transformer method, Remote Sensing 14 (2022) 5045.
[32] Z. Lin, M. Li, Z. Zheng, Y. Cheng, C. Yuan, Self-
     attention convlstm for spatiotemporal prediction, in:
     Proceedings of the AAAI conference on artificial in-
     telligence, volume 34, 2020, pp. 11531–11538.
[33] X. Wang, Y. Ma, Y. Wang, W. Jin, X. Wang, J. Tang,
     C. Jia, J. Yu, Traffic flow prediction via spatial temporal
     graph neural network, in: Proceedings of the web
     conference 2020, 2020, pp. 1082–1092.
[34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit,
     L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Atten-
     tion is all you need, arXiv preprint arXiv:1706.03762
     (2017).
[35] C. Jiang, W. Wang, N. Pan, W.-S. Ku, A multimodal
     geo dataset for high-resolution precipitation forecast-
     ing, in: Proceedings of the 31st ACM International
     Conference on Advances in Geographic Information
     Systems, 2023, pp. 1–4.
[36] C. Jiang, J. Li, W. Wang, W.-S. Ku, Modeling real estate
     dynamics using temporal encoding, in: Proceedings
     of the 29th International Conference on Advances in
     Geographic Information Systems, 2021, pp. 516–525.
[37] T. K. Ho, Random decision forests, in: Proceedings
     of 3rd international conference on document analysis
     and recognition, volume 1, IEEE, 1995, pp. 278–282.
[38] B. E. Boser, I. M. Guyon, V. N. Vapnik, A training
     algorithm for optimal margin classifiers, in: Proceed-
     ings of the fifth annual workshop on Computational
     learning theory, 1992, pp. 144–152.
[39] W.-Y. Loh, Classification and regression trees, Wiley
     interdisciplinary reviews: data mining and knowledge
     discovery 1 (2011) 14–23.
[40] J. A. Nelder, R. W. Wedderburn, Generalized linear
     models, Journal of the Royal Statistical Society: Series
     A (General) 135 (1972) 370–384.
[41] L. B. Almeida, C1. 2 multilayer perceptrons, Handbook
     of Neural Computation C 1 (1997).