Deep Spatio-Temporal Encoding: Achieving Higher Accuracy by Aligning with External Real-World Data Chen Jiang1,* , Wenlu Wang2 , Jingjing Li3 , Naiqing Pan1 and Wei-Shinn Ku1 1 Auburn University, Auburn, AL, USA 2 Texas A&M University-Corpus Christi, Corpus Christi, TX, USA 3 Meta, Menlo Park, CA, USA Abstract Spatio-temporal deep learning has drawn a lot of attention since many downstream real-world applications can benefit from accurate predictions. For example, accurate prediction of heavy rainfall events is essential for effective urban water usage, flooding warning, and mitigation. In this paper, we propose a strategy to leverage spatially connected real-world features to enhance prediction accuracy. Specifically, we leverage spatially connected real-world climate data to predict heavy rainfall risks in a broad range in our case study. We experimentally ascertain that our Trans-Graph Convolutional Network (TGCN) accurately predicts heavy rainfall risks and real estate trends, demonstrating the advantage of incorporating external spatially-connected real-world data to improve model performance, and it shows that this proposed study has a significant potential to enhance spatio-temporal prediction accuracy, aiding in efficient urban water usage, flooding risk warning, and fair housing in real estate. Keywords Spatial-temporal Analysis, Deep Learning, Transformer, ing spatially-linked external real-world data along with a TGCN to learn the spatio-temporal dependencies from the combined data. As it has been proven that utilizing more multi-source real-world data is more likely to lead to higher accuracy [9], our study aims to introduce a fresh perspective on integrating external real-world data into the proposed framework. We use heavy rainfall prediction as a case study for our proposed method, and overall we aim to provide accurate spatio-temporal predictions by leveraging as much information as possible, enabling better decision-making for a broad range of spatio-temporal applications and at the same time offering a novel angle and a comprehensive evalu- ation to demonstrate the feasibility of integrating additional Figure 1: An example of spatial and temporal features in the external real-world data without the necessity of customiz- case study of precipitation prediction. ing transformer attention mechanisms. Our approach is experimentally validated by predicting heavy rainfall events 1. Introduction and real estate hotspots. The traditional method for predicting heavy rainfall in- Spatio-temporal predictions have been extensively studied volves manually engineering features from weather data, due to their impact on real-world applications [1, 2, 3, 4, 5]. including temperature, pressure, humidity, etc. Meteorolo- For example, heavy rainfall events can cause significant gists rely on their expertise to interpret this data and fore- damage to infrastructure and pose serious threats to human cast future weather patterns. This process entails observing safety. Predicting these events with greater accuracy allows and analyzing atmospheric factors to predict weather pat- better preparation and response [6], ultimately saving lives terns. However, this traditional approach is time-consuming, and reducing the economic impact of such events. labor-intensive, and susceptible to human error, especially Deep learning methods, such as deep spatio-temporal when dealing with large datasets. As data grows, it becomes prediction models [7, 8], have improved the performance increasingly challenging to analyze large amounts of infor- of rainfall forecasting over the years. However, the role of mation by hand. external data in enhancing the prediction accuracy is still Previous research has investigated using deep learning controversial. Some argue that external data can provide for precipitation prediction [10, 11] with promising results. more useful information for the prediction model, while However, some limitations can be significantly improved to others claim that external data can introduce more noise enhance deep model performance. One area with room for and complexity to the learning process. In this study, we enhancement is leveraging spatial dependencies. To tackle propose to improve spatio-temporal predictions by combin- this challenge, we propose a model that integrates both Graph Convolution Networks (GCNs) and a Transformer. Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 This model enables combining external spatially-linked data Joint Conference (March 25-28, 2024), Paestum, Italy for spatio-temporal predictions. * Corresponding author. Specifically, we employ a GCN to analyze the adjacency $ czj0042@auburn.edu (C. Jiang); wenlu.wang@tamucc.edu matrix on a grid level and generate correlations between (W. Wang); jingjingli@meta.com (J. Li); nzp0030@auburn.edu (N. Pan); weishinn@auburn.edu (W. Ku) each grid element. The GCN captures the spatial relation-  0009-0000-6888-6643 (C. Jiang); 0000-0002-4829-1068 (W. Wang); ships and dependencies among neighboring grid points, 0000-0002-1465-7738 (N. Pan); 0000-0001-8636-4689 (W. Ku) allowing for a comprehensive understanding of the data’s Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings spatial dynamics. We then utilize a Transformer model to flow. encode the temporal precipitation data and combine it with the spatial correlations obtained from the GCNs. By com- bining the GCNs and the Transformer within the proposed 3. Methodology TGCN model, we create a framework that harnesses both In this section, we detail our model architecture and the the spatial and temporal dimensions of the data. benefits of our design. 2. Related Work 3.1. Overview 2.1. Graph Neural Networks The architecture we propose, illustrated in Figure 2, incorpo- rates a combination of techniques to enhance the prediction Graph Convolutional Networks (GCNs) are a type of deep model. We begin by utilizing a transformer encoder to ef- learning model designed to process data represented in a fectively encode the time series precipitation data, and then graph structure, such as social or sensor networks [12]. integrate local climate features into the model, enabling GCNs have demonstrated their effectiveness in various ap- a comprehensive understanding of the factors influencing plications, including node classification, link prediction, and heavy rainfall. recommendation systems [13, 14, 15, 16]. The concept of To address spatial dependencies and relationships among Graph Neural Networks (GNNs) was initially introduced grid points, a GCN is introduced. This GCN learns the spa- in [17] and further expanded upon in subsequent research tial dependencies within the dataset, considering the inter- by [18]. GNNs, a type of recurrent neural network (RNN), connectedness of grids based on their spatial locations. By iteratively propagate information from neighboring nodes leveraging the GCN, the model becomes capable of captur- until reaching a stable fixed point. This iterative process has ing and integrating spatial information, thereby enhancing traditionally been computationally expensive, but recent prediction accuracy. studies, such as [19], have made significant improvements The latent code, which combines the encoded time series in this area. Inspired by the success of Convolutional Neural precipitation data and the spatially connected local climate Networks (CNNs) in computer vision, which extract high- features learned through the GCN, is fed into a multi-layer level features from images using convolution and pooling perceptron (MLP) for prediction. This integrated architec- layers, current models aim to adapt these layers to directly ture allows the MLP model to leverage the fused informa- process graph inputs. GCNs can be categorized into two tion, including temporal precipitation data, other climate types of graph convolution layers: spectral graph convolu- features, and spatial factors, to effectively learn and infer tion and localized graph convolution, as discussed in [20]. future heavy rainfall areas. Early research primarily focused on spectral graph convolu- tions, pioneered by [21]. The current state-of-the-art model, GCN, further simplified the graph convolution operation by 3.2. Model Architecture employing a localized first-order approximation. However, 3.2.1. Preliminaries spectral methods require operations on the entire graph Laplacian during training, which can be computationally Our proposed TGCN model consists of Encoder, GCNs and expensive. Several subsequent works, such as FastGCN [22] Multi-layer Perceptron (MLP) layers. The major compo- have aimed to alleviate this issue. nent in the transformer is the Multi-head self-attention. Recently, researchers have explored the application of GCNs 𝑄𝐾 𝑇 in time series prediction. For example, spatio-temporal GCN- π΄π‘‘π‘‘π‘’π‘›π‘‘π‘–π‘œπ‘›(𝑄, 𝐾, 𝑉 ) = π‘ π‘œπ‘“ π‘‘π‘šπ‘Žπ‘₯( √ )𝑉 (1) π‘‘π‘˜ based approaches have been proposed for traffic flow pre- diction [23], and the integration of time-aware topological Where the K and V are matrices that store the keys and information into GCNs using the mathematical framework values. Q is the query that will map against a set of keys. of zigzag persistence [24]. 3.2.2. Transformer-based Encoder 2.2. Spatial Temporal Prediction We have developed a predictive model using the Trans- In this section, we discuss various existing temporal and former architecture, tailored for heavy rainfall forecasts. spatial-temporal forecasting methods. For example, Recur- Unlike traditional methods that only use past rainfall data, rent Neural Networks (RNNs), especially long-short-term our model factors in numerous external variables to boost memory (LSTM) [25], have gained popularity in time series accuracy. We examine local features, including geography, forecasting [26]. Convolutional Neural Networks (CNN) atmospheric conditions (pressure, temperature, wind), hu- and its variant Temporal Convolutional Neural Networks midity, and topography, all of which influence heavy rainfall (TCN) are another option for sequence prediction [27], of- likelihood in a specific area. Therefore, we have developed a fering parallel computations compared to RNNs [28]. In transformer-based prediction model [34] that incorporates recent years, researchers have explored Transformers and GCNs to process the spatial features. By doing so, our model its variants in time series forecasting, achieving state-of- can capture the spatial relationships among various features the-art performance in tasks like energy consumption and in a graph structure, such as the dependencies between grid stock market [29, 30, 31]. Designing a model capable of point locations and their corresponding climate data. The comprehensively capturing both spatial and temporal pat- integration of the GCNs enhances our model’s ability to terns represents another emerging trend in spatial-temporal capture both temporal and spatial information. Our model prediction tasks [32, 33]. For example, [33] introduced a design starts with a transformer encoder capturing tempo- spatial-temporal graph neural network for predicting traffic ral precipitation patterns, followed by embedding this data and merging it with local climate data like moisture and Figure 2: Design Flow of the Trans-Graph Convolutional Prediction Model: The Trans-Graph Convolutional Prediction Model incorporates a transformer layer for time-series precipitation data, a GCN for local climate features and spatial relationships among grid points, and a four-layer MLP model for the final prediction. humidity. We enhance prediction accuracy with this added graph and enhance the overall representation of the input context. data. As illustrated in Figure 3, GCNs involve learning a linear 3.2.3. Graph Convolutional Networks transformation of the feature vectors of each node in a graph, which is then used to update the node features by aggregat- ing information from the node’s neighbors. Mathematically, this can be expressed as: βŽ› ⎞ βˆ‘οΈ 1 β„Ž(𝑙+1) 𝑣𝑖 = 𝜎⎝ π‘Š (𝑙+1) β„Ž(𝑙+1) 𝑣𝑗 ⎠ (2) 𝑐𝑖𝑗 𝑣𝑗 βˆˆπ’© (𝑣𝑖 ) (𝑙+1) In the equation, β„Žπ‘£π‘– represents the feature vector of node 𝑣𝑖 at layer 𝑙 + 1, π‘Š (𝑙+1) denotes the learnable weight matrix for layer 𝑙 + 1, 𝒩 (𝑣𝑖 ) represents the set of neighbors of node 𝑣𝑖 , and 𝑐𝑖𝑗 is a normalization constant that ensures proper scaling of the aggregated information. The function 𝜎 denotes a non-linear activation function, which introduces non-linearity into the model. In our specific case, we utilize the ReLU activation function. This equation can be interpreted as calculating a weighted sum of the feature vectors of the neighbors of node 𝑣𝑖 at layer 𝑙 + 1, where the weights are determined by the learned weight matrix π‘Š (𝑙+1) . Then, a non-linear activation function is applied to obtain the updated feature vector (𝑙+1) β„Žπ‘£π‘– for node 𝑖 at layer 𝑙 + 1. This process is repeated across multiple layers to learn expressive representations of the graph data. For the final prediction, we utilize a four-layer MLP model that combines time series data with other features, effec- tively leveraging both temporal and spatial information Figure 3: Graph Convolutional Network Architecture: The input captured by our model for more accurate predictions. data consists of the spatial relation matrix and spatially connected climate data. The nodes in the figure are for illustrative purposes. By leveraging the transformer architecture, incorporating GCNs, and utilizing a four-layer MLP model, our approach enables the effective integration of temporal and spatial GCNs have received considerable attention in recent information for improved prediction accuracy. years and have shown impressive performance in various ap- plications. In this study, we aim to improve the performance 3.2.4. Jointly Learning of our model by integrating a GCN on top of a Transformer encoder model. The GCN model is specifically designed to As illustrated in Figure 2, we propose to map temporal data capture the spatial relationships between each node in the and non-temporal data into the same latent space and merge the latent vectors for the subsequent prediction task. To encode the local climate features and capture the spa- 4.1.1. Precipitation Dataset tial dependencies among the grid points for data π‘₯𝑐 , we Our precipitation dataset is sourced from the NOAA HRRR employ a GCN to learn the relationships and dependencies dataset2 , offering real-time climate data at a 3 km spatial within the spatial domain. The output hidden features at (𝐿) resolution and 1-hour temporal resolution. This dataset [35] a specific layer 𝐿 can be denoted as β„Žπ‘ . Equation 2 is encompasses total precipitation, precipitation rate, and nine applied in this context. Assuming we use 𝐿𝑐 layers in total, additional climate variables, including humidity (%), mois- and we use the final layer to summarize climate information, ture availability (%), pressure (Pa), wind speed (m/s), and which is defined as: total cloud cover (%). Simulated brightness temperature data 𝑐 is acquired from the GOES 11 satellite 3 . The precipitation hc = β„Ž(𝐿 𝑣 ) (3) data consist of the following three types: (0) where β„Žπ‘£ = π‘₯𝑐 β€’ Temporal precipitation data, denoted as π‘₯𝑑 , as shown In this equation, β„Žπ‘ represents the hidden features at layer in Table 1 and Figure 5. It captures the historical 𝐿, which are obtained by applying the ReLU activation func- patterns and fluctuations in precipitation over time. (𝐿) (πΏβˆ’1) tion to the sum of the weighted input features π‘Šπ‘ β„Žπ‘ Specifically, we define the temporal precipitation (𝐿) and the bias term 𝑏𝑐 from Equation 2. rate and total accumulated precipitation over the We encode temporal precipitation data using a trans- past 6 hours as π‘₯β„Ž , which consists of 𝑁 timestamps: former encoder [34], π‘₯𝑑 = {π‘₯1𝑑 , π‘₯2𝑑 , ..., π‘₯𝑁 𝑑 } ht = 𝑇 π‘Ÿπ‘Žπ‘›π‘ π‘“ π‘œπ‘Ÿπ‘šπ‘’π‘ŸπΈπ‘›π‘π‘œπ‘‘π‘’π‘Ÿ(π‘₯𝑑 ) (4) π‘₯𝑖𝑑 π‘–βˆˆ{1..𝑁 } represents the average price for the 𝑖-th timestamp. ht ∈ R𝑑𝑑 (5) β€’ Local climate data π‘₯𝑐 : The dataset comprises twelve . Since π‘₯𝑑 and π‘₯𝑐 are encoded as ht and hc , we define the local climate variables, including temperature, hu- merged hidden state as hm midity, wind speed, atmospheric pressure, and vari- ous other meteorological factors. hm = 𝐢𝑂𝑁 𝐢𝐴𝑇 (ht , hc ) (6) β€’ Spatial location data π‘₯𝑠 : Each grid point in the dataset represents a specific location within the To further process the merged information, we use another study area, such as a region or a cell. To represent the multi-layer perceptron specifically trained for the predic- relationships between these grid points, we used an tion task. Similarly, we define the 𝑙-th layer network as adjacency matrix. In the adjacency matrix, a value (assuming 𝐿𝑛 layers in total) of 0 indicates that two grid points are not neigh- bors, while a value of 1 denotes their neighboring β„Ž(𝑙) (𝑙) (π‘™βˆ’1) 𝑛 = 𝑅𝑒𝐿𝑒(π‘Šπ‘› β„Žπ‘› + 𝑏(𝑙) 𝑛 ) (7) relationship. (0) (π‘™βˆ’1) where β„Žπ‘› = hm , and β„Žπ‘› is the input of the (l-1)-th (𝑙) (𝑙) 4.1.2. Real-estate Dataset layer in the i-th position. π‘Šπ‘› and 𝑏𝑛 are model parame- ters. The real estate dataset captures the dynamics of the U.S. real We use the output from the last layer for prediction estate market by collecting spatially correlated data from 𝑛 multiple sources. It consists of 7,436 neighborhoods, 567 𝑦¯ = π‘ π‘–π‘”π‘šπ‘œπ‘–π‘‘(β„Ž(𝐿 𝑛 ) ) (8) cities, 304 counties, 225 metros, and 50 states across the U.S. The data are connected through spatial locations, forming a Loss is measured with the Binary Cross-Entropy loss (BCE) multi-level spatial hierarchy. The dataset consists of three main components: census data, pricing history, and school π‘™π‘œπ‘ π‘  = π΅πΆπΈπ‘™π‘œπ‘ π‘ (𝑦¯, 𝑦) (9) district information. Here are some statistics about the real The binary cross entropy (BCE) loss can be formulated as estate dataset: follows: β€’ Spatial Hierarchy Levels: The dataset includes a 𝑁 multi-level spatial hierarchy, including information 1 βˆ‘οΈ at the state, metro, county, city, and neighborhood π΅πΆπΈπ‘™π‘œπ‘ π‘  = βˆ’ [𝑦𝑖 log(𝑝𝑖 ) + (1 βˆ’ 𝑦𝑖 ) log(1 βˆ’ 𝑝𝑖 )] 𝑁 𝑖=1 levels. (10) β€’ Census Data: The census data consists of 16 vari- where: 𝑁 is the total number of samples, 𝑦𝑖 is the true label ables related to various aspects of housing prices, for sample 𝑖, 𝑝𝑖 is the predicted probability 𝑖, log denotes personal income, demographics, and spatial infor- the natural logarithm. mation. β€’ Pricing History: The dataset includes temporal hous- ing price history for each neighborhood, spanning 4. Experimental Validation from 1996 to 2019. β€’ School District Information: The dataset incorpo- 4.1. Datasets rates school district information. It provides details Our data and code are publicly available1 . In our dataset, on the number of school districts present in each the train and test split ratio is 7:3. county within the studied area. Additionally, the dataset includes information on the top school dis- trict(s) within the region. 2 https://rapidrefresh.noaa.gov/hrrr/ 1 3 https://github.com/jiang28/Deep-Spatio-Temporal-Encoding https://www.goes.noaa.gov/ GridID Longitude Latitude Grid Points Grid Spacing Vertical Level 1 122.71 21.13 1799 Γ— 1059 3 km 50 Time Stamps 2022/09/23 00:00 2022/09/23 01:00 2022/09/23 02:00 ... 2022/10/02 00:00 Precipitation rate (mm/hour) 0.0 0.72 0.94 ... 0 Total Precipitation (mm) 0.01 1.88 4.3 ... 31.61 Table 1 Temporal data format. It has data on the grid id, longitude, latitude, grid points, grid spacing, vertical level, timestamps, total precipitation, and precipitation rate. To facilitate the task of predicting real estate hotspots, the 4.3. Heavy Rainfall Prediction dataset is classified into two classes based on the house price Study Area: Figure 4 presents the location of the study area increase rate for each neighorhood: 1 for hotspots and 0 in this study. It consists of 10,000 grids across the state of for non-hotspots. The detailed settings of the Real-estate Florida in the U.S. Dataset can be found in [36]. 4.2. Evaluation Metrics We evaluate the performance of a classification system us- ing various metrics, including Accuracy, Recall, Precision, F1-score, and ROC. These metrics are calculated based on the number of true positives (𝑑𝑝 ), false positives (𝑓𝑝 ), false negatives (𝑓𝑛 ), and true negatives (𝑑𝑛 ). Accuracy measures the proportion of observations, both positive and negative, that were correctly classified by the system, and can be computed using the formula: 𝑑𝑝 + 𝑑𝑛 π‘Žπ‘π‘ = 𝑑𝑝 + 𝑓𝑝 + 𝑑𝑛 + 𝑓𝑛 Recall measures the proportion of true positives that were correctly identified by the system, and can be computed Figure 4: The study area consists of 10,000 grids across South using the formula: Florida in the United States. The figure shows the observed precipitation values in each county within this area. 𝑑𝑝 π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ = 𝑑𝑝 + 𝑓𝑛 Precision measures the proportion of identified positives that were actually true positives, and can be computed using the formula: 𝑑𝑝 π‘π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› = 𝑑𝑝 + 𝑓𝑝 F1-score is a weighted average of precision and recall, and provides a single measure of the system’s accuracy on the dataset, and can be computed using the formula: π‘π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› * π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ 𝐹1 = 2 * π‘π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘› + π‘Ÿπ‘’π‘π‘Žπ‘™π‘™ Figure 5: Study Area Precipitation Rate Heatmap: 100x100 grid ROC (Receiver Operating Characteristic) curve is a graph- region on September 28, 2022, at 13:00 (mm/s). ical plot that illustrates the performance of a binary classifier system. It is created by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR), which can be computed Our study identifies heavy rainfall risk areas based on using the formulas: precipitation rate. Following the United States Geological Survey (USGS) standard4 , we define the heavy rainfall risk 𝑑𝑝 as follows: 𝑇𝑃𝑅 = 𝑑𝑝 + 𝑓𝑛 {οΈƒ 0, if 𝑅 < 4 mm/hr 𝑓𝑝 Class = 𝐹𝑃𝑅 = 1, if 𝑅 β‰₯ 4 mm/hr 𝑓𝑝 + 𝑑𝑛 Recognizing the significance of precipitation rate as a Overall, these metrics provide a comprehensive evalua- critical factor, our objective is to pinpoint areas that are tion of a classification system’s performance and can help susceptible to encountering heavy rainfall within the next identify areas for improvement. hour. The classification into two classes simplifies the problem and provides a clear distinction between areas with different levels of heavy rainfall risk. Using a 4 mm/hour 4 https://www.usgs.gov/ Precision Recall F1-score Model Accuracy ROC 0 1 0 1 0 1 RF 81% 0.79 0.84 0.79 0.81 0.87 0.81 0.813 SVM 77.2% 0.77 0.78 0.76 0.78 0.77 0.78 0.772 DT 76.5% 0.73 0.77 0.77 0.73 0.75 0.75 0.754 LR 90% 0.91 0.90 0.90 0.91 0.91 0.90 0.904 MLP 87.8% 0.82 0.94 0.93 0.84 0.87 0.89 0.879 LSTM 86.6% 0.79 0.96 0.96 0.79 0.86 0.87 0.874 Transformer 93.5% 0.88 0.98 0.98 0.90 0.93 0.94 0.941 TGCN (Ours) 95.6% 0.93 0.97 0.97 0.94 0.95 0.96 0.954 Table 2 When comparing model performance on the Real Estate dataset, the proposed model has achieved an accuracy of 95.6%. Precision Recall F1-score Model Accuracy ROC 0 1 0 1 0 1 RF 74.4% 0.65 0.78 0.56 0.84 0.60 0.81 0.701 SVM 54.1% 0.28 0.63 0.20 0.73 0.23 0.67 0.461 DT 80.5% 0.91 0.74 0.69 0.93 0.78 0.82 0.807 LR 78.8% 0.85 0.74 0.70 0.87 0.77 0.80 0.87 MLP 80.3% 0.83 0.78 0.77 0.84 0.80 0.81 0.804 LSTM 83.1% 0.87 0.80 0.79 0.88 0.83 0.84 0.832 Transformer 83.4% 0.85 0.82 0.82 0.85 0.83 0.83 0.835 TGCN (Ours) 86.6% 0.90 0.83 0.82 0.91 0.86 0.87 0.867 Table 3 When comparing model performance on the Precipitation dataset, the proposed model has achieved an accuracy of 86.6%. threshold, we classify areas as either low-risk (labeled as 0) values for both classes compared to the baseline models. It or high-risk (labeled as 1). For example, out of 10,000 grid achieves precision scores of 0.9 and 0.83 for classes 0 and points in the study area, 4,798 have a potential for heavy 1, respectively, along with recall scores of 0.82 for class 0 rain risk, while 5,202 do not. This classification simplifies and 0.85 for class 1. The F1 scores also indicate the TGCN decision-making and resource allocation. model’s overall better performance. The ROC score for the TGCN model is 0.867. These results demonstrate that the proposed TGCN model 4.4. Baselines consistently outperforms the other models on both datasets in terms of accuracy, precision, recall, F1 score, and ROC We use the following baseline methods: score. The TGCN model’s ability to capture temporal, non- temporal, and spatial information through its integration β€’ Random Forest (RF) [37] of the transformer layer and the graph convolutional net- β€’ Support Vector Machine (SVM) [38] work contributes to its good performance in identifying and β€’ Decision Tree (DT) [39] predicting hotspots and heavy rainfall areas. β€’ Linear Regression (LR) [40] β€’ Multilayer Perceptron (MLP) [41] β€’ Long Short Term Memory (LSTM) [25] 6. Conclusion β€’ Transformer [34] In conclusion, the accurate prediction of heavy rainfall events is crucial for effective urban water usage, disaster 5. Performance Analysis response, and mitigation efforts. This paper proposed a pre- diction model that leverages spatially connected features Based on the results presented in Table 2 and Table 3, we and real-world climate data to predict heavy rainfall risks can analyze the performance of different models on the Real across a broad range. Through extensive experimentation, Estate dataset and the Precipitation dataset, respectively. it was observed that the TGCN model outperformed the In Table 2, the proposed model outperforms all the base- other machine learning methods in forecasting both heavy line models with an accuracy of 95.6%. The proposed model rainfall events and real estate trends. also exhibits the highest precision for both classes (0 and 1), achieving 0.93 and 0.97, respectively. It demonstrates high recall values for both classes as well. The F1 scores are 7. Future Work and Limitations also higher for the proposed model compared to the base- While this study successfully demonstrated the effectiveness line models, indicating a better balance between precision of the proposed TGCN model in predicting heavy rainfall and recall. The TGCN model’s performance is further re- risks, there are several avenues for future research and im- flected in the ROC score of 0.954, which indicates its ability provement. to discriminate between the two classes effectively. We plan to incorporate more diverse and comprehensive Table 3 shows that the proposed model again achieves the datasets, including additional meteorological and geograph- highest accuracy of 86.6%. Similar to the Real Estate dataset, ical features. This expansion has the potential to enhance the TGCN model demonstrates superior precision and recall the accuracy and generalizability of the TGCN model. Fur- [13] L. Cai, B. Yan, G. Mai, K. Janowicz, R. Zhu, Transgcn: thermore, we are considering the integration of real-time Coupling transformation assumptions with graph con- data streams and the utilization of advanced data fusion volutional networks for link prediction, in: Proceed- techniques to further enhance the model’s forecasting capa- ings of the 10th international conference on knowl- bilities. edge capture, 2019, pp. 131–138. [14] L. Gan, X. Yang, N. Narisetty, F. Liang, Bayesian joint estimation of multiple graphical models, Advances in Acknowledgement Neural Information Processing Systems 32 (2019). [15] H. Gao, Z. Wang, S. Ji, Large-scale learnable graph This work was partially supported by the National Science convolutional networks, in: Proceedings of the 24th Foundation (NSF) under Grant No. 2318641. Any opinions, ACM SIGKDD international conference on knowledge findings, and conclusions or recommendations expressed in discovery & data mining, 2018, pp. 1416–1424. this material are those of the authors and do not reflect the [16] H. Wang, M. Zhao, X. Xie, W. Li, M. Guo, Knowledge views of the National Science Foundation. graph convolutional networks for recommender sys- tems, in: The world wide web conference, 2019, pp. References 3307–3313. [17] M. Gori, G. Monfardini, F. Scarselli, A new model [1] A.-T. Kuo, H. Chen, W.-S. Ku, Bert-trip: Effective and for learning in graph domains, in: IEEE International scalable trip representation using attentive contrast Joint Conference on Neural Networks, volume 2, IEEE, learning, in: 2023 IEEE 39th International Conference 2005, pp. 729–734. on Data Engineering (ICDE), IEEE Computer Society, [18] F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, 2023, pp. 612–623. G. Monfardini, The graph neural network model, IEEE [2] P.-Y. Ting, T. Wada, Y.-L. Chiu, M.-T. Sun, K. Sakai, Trans. Neural Networks 20 (2009) 61–80. W.-S. Ku, A. A.-K. Jeng, J.-S. Hwu, Freeway travel [19] Y. Li, D. Tarlow, M. Brockschmidt, R. S. Zemel, Gated time prediction using deep hybrid model–taking sun graph sequence neural networks, in: ICLR, 2016. yat-sen freeway as an example, IEEE Transactions on [20] R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamil- Vehicular Technology 69 (2020) 8257–8266. ton, J. Leskovec, Graph convolutional neural networks [3] A. Datta, S. Banerjee, A. O. Finley, A. E. Gelfand, Hier- for web-scale recommender systems, in: SIGKDD, archical nearest-neighbor gaussian process models for ACM, 2018, pp. 974–983. large geostatistical datasets, Journal of the American [21] J. Bruna, W. Zaremba, A. Szlam, Y. LeCun, Spectral Statistical Association 111 (2016) 800–812. networks and locally connected networks on graphs, [4] B. GrΓ€ler, E. J. Pebesma, G. B. Heuvelink, Spatio- in: ICLR, 2014. temporal interpolation using gstat., R J. 8 (2016) 204. [22] J. Chen, T. Ma, C. Xiao, Fastgcn: Fast learning with [5] Z. Diao, X. Wang, D. Zhang, Y. Liu, K. Xie, S. He, Dy- graph convolutional networks via importance sam- namic spatial-temporal graph convolutional neural pling, in: ICLR, OpenReview.net, 2018. networks for traffic forecasting, in: Proceedings of the [23] B. Yu, H. Yin, Z. Zhu, Spatio-temporal graph convolu- AAAI conference on artificial intelligence, volume 33, tional networks: A deep learning framework for traffic 2019, pp. 890–897. forecasting, arXiv preprint arXiv:1709.04875 (2017). [6] K. Kitchat, M.-H. Lin, H.-S. Chen, M.-T. Sun, K. Sakai, [24] S. Guo, Y. Lin, N. Feng, C. Song, H. Wan, Attention W.-S. Ku, T. Surasak, A deep reinforcement learning based spatial-temporal graph convolutional networks system for the allocation of epidemic prevention mate- for traffic flow forecasting, in: Proceedings of the rials based on ddpg, Expert Systems with Applications AAAI conference on artificial intelligence, volume 33, 242 (2024) 122763. 2019, pp. 922–929. [7] F. Amato, F. Guignard, S. Robert, M. Kanevski, A [25] S. Hochreiter, J. Schmidhuber, Long short-term mem- novel framework for spatio-temporal prediction of ory, Neural computation 9 (1997) 1735–1780. environmental data using deep learning, Scientific [26] S. McNally, J. Roche, S. Caton, Predicting the price of reports 10 (2020) 22243. bitcoin using machine learning, in: 2018 26th Euromi- [8] H. Liu, X. Mi, Y. Li, Smart deep learning based wind cro International Conference on Parallel, Distributed speed prediction model using wavelet packet decom- and Network-based Processing (PDP), IEEE, 2018, pp. position, convolutional neural network and convo- 339–343. lutional long short term memory network, Energy [27] S. D. Yeddula, C. Jiang, B. Hui, W.-S. Ku, Traffic acci- Conversion and Management 166 (2018) 120–131. dent hotspot prediction using temporal convolutional [9] K. Bi, L. Xie, H. Zhang, X. Chen, X. Gu, Q. Tian, Ac- networks: A spatio-temporal approach, in: Proceed- curate medium-range global weather forecasting with ings of the 31st ACM International Conference on 3d neural networks, Nature (2023) 1–6. Advances in Geographic Information Systems, 2023, [10] A. Moraux, S. Dewitte, B. Cornelis, A. Munteanu, A pp. 1–4. deep learning multimodal method for precipitation [28] A. Borovykh, S. Bohte, C. W. Oosterlee, Conditional estimation, Remote Sensing 13 (2021) 3278. time series forecasting with convolutional neural net- [11] X. Shi, Z. Gao, L. Lausen, H. Wang, D.-Y. Yeung, W.- works, arXiv preprint arXiv:1703.04691 (2017). k. Wong, W.-c. Woo, Deep learning for precipitation [29] S. Wu, X. Xiao, Q. Ding, P. Zhao, Y. Wei, J. Huang, nowcasting: A benchmark and a new model, Advances Adversarial sparse transformer for time series fore- in neural information processing systems 30 (2017). casting, Advances in neural information processing [12] T. N. Kipf, M. Welling, Semi-supervised classification systems 33 (2020) 17105–17115. with graph convolutional networks, arXiv preprint [30] J. Yoo, Y. Soun, Y.-c. Park, U. Kang, Accurate multi- arXiv:1609.02907 (2016). variate stock movement prediction via data-axis trans- former with multi-level contexts, in: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 2037–2045. [31] Y. Liu, S. Wang, J. Chen, B. Chen, X. Wang, D. Hao, L. Sun, Rice yield prediction and model interpreta- tion based on satellite and climatic indicators using a transformer method, Remote Sensing 14 (2022) 5045. [32] Z. Lin, M. Li, Z. Zheng, Y. Cheng, C. Yuan, Self- attention convlstm for spatiotemporal prediction, in: Proceedings of the AAAI conference on artificial in- telligence, volume 34, 2020, pp. 11531–11538. [33] X. Wang, Y. Ma, Y. Wang, W. Jin, X. Wang, J. Tang, C. Jia, J. Yu, Traffic flow prediction via spatial temporal graph neural network, in: Proceedings of the web conference 2020, 2020, pp. 1082–1092. [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Atten- tion is all you need, arXiv preprint arXiv:1706.03762 (2017). [35] C. Jiang, W. Wang, N. Pan, W.-S. Ku, A multimodal geo dataset for high-resolution precipitation forecast- ing, in: Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, 2023, pp. 1–4. [36] C. Jiang, J. Li, W. Wang, W.-S. Ku, Modeling real estate dynamics using temporal encoding, in: Proceedings of the 29th International Conference on Advances in Geographic Information Systems, 2021, pp. 516–525. [37] T. K. Ho, Random decision forests, in: Proceedings of 3rd international conference on document analysis and recognition, volume 1, IEEE, 1995, pp. 278–282. [38] B. E. Boser, I. M. Guyon, V. N. Vapnik, A training algorithm for optimal margin classifiers, in: Proceed- ings of the fifth annual workshop on Computational learning theory, 1992, pp. 144–152. [39] W.-Y. Loh, Classification and regression trees, Wiley interdisciplinary reviews: data mining and knowledge discovery 1 (2011) 14–23. [40] J. A. Nelder, R. W. Wedderburn, Generalized linear models, Journal of the Royal Statistical Society: Series A (General) 135 (1972) 370–384. [41] L. B. Almeida, C1. 2 multilayer perceptrons, Handbook of Neural Computation C 1 (1997).