=Paper=
{{Paper
|id=Vol-2844/ainst7
|storemode=property
|title=Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data
|pdfUrl=https://ceur-ws.org/Vol-2844/ainst7.pdf
|volume=Vol-2844
|authors=Athanasios Davvetas,Iraklis A. Klampanos
|dblpUrl=https://dblp.org/rec/conf/setn/DavvetasK20
}}
==Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data==
Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data Athanasios Davvetas Iraklis A. Klampanos National Centre for Scientific Research “Demokritos” National Centre for Scientific Research “Demokritos” Institute of Informatics and Telecommunications Institute of Informatics and Telecommunications Athens, Greece Athens, Greece tdavvetas@iit.demokritos.gr iaklampanos@iit.demokritos.gr ABSTRACT When observing a phenomenon, severe cases or anomalies are of- ten characterised by deviation from the expected data distribution. However, non-deviating data samples may also implicitly lead to se- vere outcomes. In the case of unsupervised severe weather detection, these data samples can lead to mispredictions, since the predictors of severe weather are often not directly observed as features. We posit that incorporating external or auxiliary information, such as the outcome of an external task or an observation, can improve the decision boundaries of an unsupervised detection algorithm. In this paper, we increase the effectiveness of a clustering method to detect cases of severe weather by learning augmented and linearly separable latent representations. We evaluate our solution against three individual cases of severe weather, namely windstorms, floods and tornado outbreaks. CCS CONCEPTS Figure 1: Data sample of GHT variable at 700 hPa pressure • Computing methodologies → Artificial intelligence; Machine level learning; • Applied computing → Physical sciences and en- gineering. These otherwise normal circumstances may lead to natural disas- KEYWORDS ters, even in costly damages or fatalities. However, they can not Severe weather detection, representation learning, deep learning always be predicted by observing a physical quantity. To predict these types of occurrences, we need to incorporate external or auxiliary information that can effectively augment the 1 INTRODUCTION observable features. In this paper, we investigate the effects of in- Anomalies occur in the majority of datasets. They are fairly rare corporating external information in the form of an auxiliary task and are often challenging to detect in an unsupervised setting. Due outcome. We achieve this by utilising a deep learning method called to their lower frequency, the majority of normal samples introduces “Evidence Transfer” that incrementally manipulates the latent rep- implicit bias that results in biased predictions. From an unsuper- resentations of an autoencoder according to external categorical vised perspective, one can assume that these rare occurrences can evidence [3]. Evidence transfer allows for joint representation learn- be observed in the outliers of the data distribution. Yet, depending ing, based on external categorical evidence retrieved from textual on the application, searching for samples that deviate from the sources and weather re-analysis data. Evidence transfer success- expected data distribution may not improve the detection of an fully manipulates the initial learned representations, resulting in unsupervised method. increased effectiveness during individual severe weather detection. In some applications, the occurrence of anomalies might be ex- pected or it may not be trivial to detect deviation from the observed 2 DATA AND METHODS data distribution. An example of such application is detecting cases 2.1 Weather Re-analysis Data of severe weather. A heavy rain or windstorm may be considered ERA-Interim [4] re-analysis data are produced with a sequential as normal, depending on the geographic region or the season, etc. data assimilation scheme during which prior information from a forecast model is combined with the available observations in order to estimate the state of the global atmosphere, allowing for AINST2020, September 02–04, 2020, Athens, Greece a better description of past atmospheric conditions. Weather re- Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). analysis data are gridded data (as shown in Figure 1) depicting atmospheric variables in various timestamps and pressure levels (gravity-adjusted height), leading to 4D variables. They cover a time period of up to 40 years, with less than 1° spatial fineness and 6-hour temporal resolution for global region. In our experiments, we used ERA-interim data covering the time period from January 1 1979 to May 31 2018 with 6-hour temporal resolution (retrieved from the Research Data Archive of National Center for Atmospheric Research in Boulder, Colorado1 ). The spa- tial resolution is ≈ 0.7° × 0.7°, containing atmospheric variables across 37 vertical pressure levels ranging from 1ℎ𝑃𝑎 to 1000ℎ𝑃𝑎. We reduce the region of gridded data from global region to a Carte- sian domain that covers Europe. In order to reduce the domain of our data we used the pre-processor of Weather Research and Forecast (WRF) Model [8], named WPS. The new spatial resolution of our data is of 64 × 64 cells of 75𝑘𝑚 × 75𝑘𝑚 in the west-east and south-north axes. In our study, the atmospheric variable of interest is the geopoten- tial height (GHT) which can be seen as a gravity-adjusted height. GHT is often used for its predictive properties [6, 7, 9], as well as, to extract weather patterns for other downstream tasks [5]. Severe weather can be predicted via sequences of patterns in the geopo- tential height (e.g. a cyclone can be observed as a circular pattern). Figure 2: Overview of the use of Evidence Transfer for joint To highlight useful high-level features, such as circular shapes and representation learning over weather and textual evidence edges, we extract embeddings through a pre-trained VGG-16 net- to improve the detection of severe weather events. work on ImageNet. We feed the VGG-16 network with 3 different levels of GHT = 500, 700 and 900 ℎ𝑃𝑎 in similar fashion to using the RGB channels of an image. Therefore, a single data sample of the GeoNames 2 API. For the majority of extracted events, country shape 3 × 64 × 64 is transformed into an embedding of 64 × 64, names are used to reference the spatial extension of an event which resulting in a total of 4096 features. is stored in the “Affected Countries” field. More detailed spatial information such as city names or state names are stored in the 2.2 Textual Evidence “Location” field when they are available 3 . We augment the weather-based embeddings by making use of tex- 2.3 Evidence Transfer tual evidence for historic severe weather events, found in Wikipedia. For example, to find severe heavy rain occurrences we search Evidence transfer [3] is a deep learning method that incrementally for recorded floods. We extract categorical evidence from textual manipulates the latent representations of an autoencoder accord- sources of Wikipedia pages which associate a date to a severe ing to external categorical evidence. In the context of evidence weather event. transfer, any categorical variable can be utilised as evidence. The For our experiments we extract the following cases of extreme most straight forward case of evidence is using the outcome of events in Europe: (1) costly or deadly hailstorms, (2) floods, (3) an auxiliary task. Evidence transfer has been developed with the tornadoes and tornado outbreaks and (4) severe windstorms. notion that in practice the availability of external data is either not Each of these event types is treated as a binary classification task guaranteed, or we may observe the outcome of external processes for predicting a specific severe weather case. The occurrence date without having explicit access to the corresponding dataset. It is is used to both reference the weather re-analysis data, as well as a generic method for combining external evidence in the process the individual tasks. Since the events listed in Wikipedia do not of representation learning. It makes no assumptions regarding the typically supply exact times, we label the whole day of reference nature or source of external evidence. It is effective when intro- as severe, therefore, the minimum span of an event is one day or duced with meaningful evidence, robust against non-corresponding four 6-hour samples (we remind that the weather re-analysis data evidence and modular due to its transfer learning nature. are provided in 6-hour increments). Evidence transfer is a two step method. During the initialisation For each of the aforementioned lists we extract the following step, an autoencoder is trained to reconstruct the input data of the fields (for simplicity purposes, “Event” is used to represent each in- primary task. To ensure robustness, an intermediate step is required. dividual case of severe weather): Event Name, Event Type, Affected During the intermediate step a small biased evidence autoencoder Countries, Location, Country Coordinates (Latitude), Country Co- is trained to reconstruct each categorical evidence source. Evidence ordinates (Longitude), Event Description. The fields regarding the autoencoder is called “biased”, due to introduced limitation in the event (name, type, location, description) are extracted from the amount of iterations. Meaningful evidence is able to converge for Wikipedia pages, while the coordinates are retrieved from querying small amount of iterations, leading to a latent projection of the 2 https://www.geonames.org 1 http://rda.ucar.edu/datasets/ds627.0/ 3 Dataset available at: https://github.com/davidath/severe-weather-dataset evidence, however non-corresponding evidence is not able to gen- effectively manipulate the initial representations. It additionally eralise and therefore produce a uniform-like distribution. During allows the linear separation into two classes, while during over- severe weather case detection, we avoid this step, since we know sampling and combination, the implicit bias overcomes the latent that textual evidence is retrieved from meaningful sources. space by resulting in a single inseparable cluster. During the transfer step, the initial latent representation are manipulated according to external evidence through the joint opti- 2.5 Method overview misation of reconstructing the input, as well as, reducing the cross For all of our experiments, we follow the training procedure of entropy between an extended softmax layer of the latent space evidence transfer. First, we train a denoising stacked autoencoder and the external evidence. The loss function of the initialisation to reconstruct the primary task dataset, i.e. the weather re-analysis step is shown in Equation 1. In Equation 2 we show the evidence data. The initialisation step is completely unsupervised, no labels transfer step loss, where 𝑉 is the set of categorical evidence sources are used during this step. We consider an initial solution to our and 𝑄 are the extended softmax layers. Structural Similarity Index primary task, the “baseline” solution, during which we perform (SSIM) is used as reconstruction loss function in order to retain the an unsupervised detection method on the initially retrieved la- structural information of the data. tent representations. We perform the same unsupervised detection 1 ∑︁ 𝑁 method on incrementally manipulated latent representations from ℓ𝐴𝐸 = L (𝑋, 𝑋 ′ ) = 𝑆𝑆𝐼𝑀 (𝑥 (𝑖) , 𝑥 ′(𝑖) ) (1) evidence transfer in order to compare its effectiveness. We supply 𝑁 𝑖=1 the additional evidence sources based on the textual severe weather 𝐾 dataset. ℓ𝐸𝑣𝑖𝑇 𝑟𝑎𝑛𝑠 𝑓 = ℓ𝐴𝐸 + 𝜆 ∗ 1 ∑︁ 𝐻 (𝑉 𝑗 , 𝑄 𝑗 ) (2) During experimental investigation of the best sampling strategy, 𝐾 𝑗=1 one class SVM was used as an unsupervised detection method. For the cases of detecting individual severe weather cases we use 𝑘-means clustering with 𝑘=2 (prediction of severe or non-severe 2.4 Class Balancing weather) as an unsupervised detection method, except a single case In our experiments, the original data consist of 57584 weather re- where agglomerative clustering was used instead (ground-truth: analysis samples in 6-hour increments, while the total amount windstorm, evidence: tornado). of severe weather samples without duplicate dates are only 3136 (less than 6% of the samples). To deal with imbalanced learning, 3 EXPERIMENTAL EVALUATION we experiment with three different sampling strategies: (1) over- We experiment with individually detecting windstorms, floods and sampling the minority class, (2) under-sampling the majority class, tornado outbreaks 4 . We avoid using the hail events due to lim- (3) combination of over-sampling and under-sampling. ited amount of samples. We rotate between the different severe To over-sample the minority class, we use SMOTE [2]. SMOTE cases by selecting one case as the ground truth and alternate be- generates minority class samples by joining the line segments of tween using the rest as external evidence. For example, we select k-nearest neighbors. To under-sample the majority class, we per- windstorm weather samples and a portion of non-severe samples form random under-sample, although more sophisticated under- as our primary task – ground truth, while another case, e.g flood, sampling methods such as the ENN [10] (removes data samples that is selected as the auxiliary task – external evidence. We further deviate from the majority of k-nearest neighbors) can also be used. under-sample the remaining non-severe weather cases in order to A combination of both strategies can be achieved by combining match the number of severe weather samples. the over-sampling with under-sampling, such as the SMOTEENN In Table 1, we report experimental results in terms of preci- method [1]. sion, recall and F1-score for the anomalous class. Introducing ex- In order to test the effectiveness of each sampling strategy we ternal evidence leads to linearly separable representations that experiment with using the primary task of learning representations increases the effectiveness of clustering, and therefore detecting to detect severe weather samples by combining all severe cases into the severe weather samples. Even though evidence transfer is a a single class. We manipulate the initial learned space by incorporat- scalable method that can use multiple sources of evidence, in this ing the ground-truth labels (i.e the binary task labels of predicting case, it is not as effective, due to ground truth and external evidence severe from non-severe weather samples). Incorporating evidence contradicting each other for some portion of the data samples. that exactly replicates the outcome of the primary task is not real- In our experiments, the final dataset consists of non-severe sam- istic, however we use this scenario in order to investigate the best ples (≈500 after under-sampling to balance the individual severe choice of sampling strategy without introducing implicit uncer- class), one severe class as the primary task or ground truth and tainty from the choice of external categorical evidence. However, one as the external evidence. As an example, consider the task of to test its generalisation, we split the ground-truth into train and predicting windstorm samples as ground truth and the task of pre- test and only use the evidence labels during training with evidence dicting flood samples as external evidence. For the task of predicting transfer. windstorms, non-severe samples and flood samples are labelled as Quantitative evaluation with the micro average of precision, “normal”. However, for the task of predicting floods, non-severe recall and F1 score metrics for the full dataset (train and test) are samples and windstorm samples are labelled as “normal”. Therefore, presented in Table 2. Our experiments indicate that under-sampling external evidence contradicts the ground truth during non-severe the majority class is the most fitting for our case. By reducing the redundancy in the majority class, evidence transfer can more 4 Code available at: https://github.com/davidath/severe-weather-detect Table 1: Experimental evaluation of evidence transfer for individual severe weather case detection. Windstorm (Baseline) Flood (Baseline) Tornado (Baseline) Metric Flood Tornado Metric Windstorm Tornado Metric Windstorm Flood Precision 0.61 0.66 Precision 0.49 0.61 Precision 0.26 0.24 Recall 0.71 0.87 Recall 0.50 0.57 Recall 0.62 1.00 F1-Score 0.66 0.75 F1-Score 0.49 0.59 F1-Score 0.36 0.38 Windstorm (Evidence Transfer) Flood (Evidence Transfer) Tornado (Evidence Transfer) Metric Flood Tornado Metric Windstorm Tornado Metric Windstorm Flood Precision 0.84 (+0.23) 0.79 (+0.13) Precision 0.68 (+0.19) 0.72 (+0.11) Precision 0.32 (+0.06) 0.28 (+0.04) Recall 0.74 (+0.03) 1.00 (+0.13) Recall 0.92 (+0.42) 0.69 (+0.12) Recall 0.98 (+0.36) 0.69 (-0.31) F1-Score 0.79 (+0.13) 0.88 (+0.13) F1-Score 0.78 (+0.29) 0.71 (+0.12) F1-Score 0.49 (+0.13) 0.40 (+0.02) Table 2: Experimental evaluation of evidence transfer for se- vere weather case detection with three sampling strategies. Baseline Metric Oversample Undersample Combine Precision 0.51 0.53 0.51 Recall 0.51 0.53 0.51 F1-Score 0.51 0.53 0.51 Evidence Transfer Metric Oversample Undersample Combine Precision 0.59 (+0.08) 0.82 (+0.29) 0.55 (+0.04) Recall 0.59 (+0.08) 0.82 (+0.29) 0.55 (+0.04) F1-Score 0.59 (+0.08) 0.82 (+0.29) 0.55 (+0.04) (a) Baseline of “Windstorm - Flood” Combination samples. Introducing more sources of external evidence increases the contradiction for non-severe samples, leading to increased un- certainty during clustering. However, both quantitatively, as shown in Table 1, as well as qualitatively (ground-truth: windstorm, evidence: flood, depicted in Figure 3) introducing a single source of evidence improves the outcome of clustering method by pushing the latent representa- tions to become linearly separable and therefore improving the effectiveness for both 𝑘-means and agglomerative clustering. 4 FUTURE WORK AND CONCLUSIONS In this paper, we investigated using evidence transfer to improve a primary task of detecting individual cases of severe weather. By incorporating auxiliary tasks extracted from textual sources, we (b) Evidence transfer combination of “Windstorm - Flood” effectively manipulated the latent space of an autoencoder using evidence transfer, in order to increase the effectiveness of severe weather detection. Making latent representations incrementally lin- Figure 3: t-SNE 2d projections of the initial and Evidence early separable resulted in improving the effectiveness of 𝑘-means Transfer representations of originally 10 features. The ini- and agglomerative clustering. Additionally, we investigated the best tial latent space consists of a "mixed" cluster that can be seen sampling method for our imbalanced class of detecting severe cases as a single class in an unsupervised setting. However, after with non-observable predictors, by evaluating the effectiveness of evidence transfer, the latent representations are linearly sep- evidence transfer in one class SVM (with linear kernel) prediction. arable allowing for improved decision boundaries. Future work is directed towards utilising the temporal aspect of weather re-analysis data. For our experiments, we mostly focused on using embeddings extracted from an image recognition task. [4] D. P. Dee, S. M. Uppala, A. J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U. However, retrieving temporally-aware embeddings from raw data, Andrae, M. A. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A. C. M. Beljaars, L. van de Berg, J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A. J. Geer, e.g. via a recurrent autoencoder, could improve the individual de- L. Haimberger, S. B. Healy, H. Hersbach, E. V. Hólm, L. Isaksen, P. Kållberg, M. tection of severe weather cases by exploiting the temporal aspect of Köhler, M. Matricardi, A. P. McNally, B. M. Monge-Sanz, J.-J. Morcrette, B.-K. Park, C. Peubey, P. de Rosnay, C. Tavolato, J.-N. Thépaut, and F. Vitart. 2011. The the data. Additionally, since the under-sampling strategy appears ERA-Interim reanalysis: configuration and performance of the data assimilation to perform better for this problem, it would be beneficial to increase system. Quarterly Journal of the Royal Meteorological Society 137, 656 (2011), the total amount of severe weather samples from additional sources. 553–597. [5] Iraklis A. Klampanos, Athanasios Davvetas, Spyros Andronopoulos, Charalambos Pappas, Andreas Ikonomopoulos, and Vangelis Karkaletsis. 2018. Autoencoder- ACKNOWLEDGMENTS Driven Weather Clustering for Source Estimation during Nuclear Events. Envi- ronmental Modelling & Software 102 (April 2018), 84–93. This work has been supported by the Industrial Scholarships pro- [6] Mikhail A. Krinitskiy, Yulia A. Zyulyaeva, and Sergey K. Gulev. 2019. (9 2019). gram of Stavros Niarchos Foundation. https://doi.org/10.6084/m9.figshare.9851099.v1 [7] T. N. Krishnamurti, K. Rajendran, T. S. V. Vijaya Kumar, Stephen Lord, Zoltan Toth, Xiaolei Zou, Steven Cocke, Jon E. Ahlquist, and I. Michael Navon. 2003. REFERENCES Improved Skill for the Anomaly Correlation of Geopotential Heights at 500 hPa. [1] Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004. Monthly Weather Review 131, 6 (2003), 1082–1102. A Study of the Behavior of Several Methods for Balancing Machine Learning [8] John Michalakes, Jimy Dudhia, D. Gill, Tom Henderson, J. Klemp, W. Skamarock, Training Data. SIGKDD Explor. Newsl. 6, 1 (June 2004), 20–29. and Wei Wang. 2004. The Weather Reseach and Forecast Model: Software Archi- [2] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. tecture and Performance. 11th ECMWF Workshop on the Use of High Performance 2002. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Int. Res. 16, Computing In Meteorology. 1 (June 2002). [9] Murat Türkeş, UM Sümer, and G Kiliç. 2002. Persistence and periodicity in the [3] Athanasios Davvetas, Iraklis A. Klampanos, and Vangelis Karkaletsis. 2019. Evi- precipitation series of Turkey and associations with 500 hPa geopotential heights. dence Transfer for Improving Clustering Tasks Using External Categorical Evi- Climate Research - CLIMATE RES 21 (05 2002), 59–81. dence. In The International Joint Conference on Neural Networks (IJCNN). IEEE, [10] D. L. Wilson. 1972. Asymptotic Properties of Nearest Neighbor Rules Using 1–8. Edited Data. IEEE Transactions on Systems, Man, and Cybernetics SMC-2, 3 (1972), 408–421.