=Paper=
{{Paper
|id=Vol-2844/ainst7
|storemode=property
|title=Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data
|pdfUrl=https://ceur-ws.org/Vol-2844/ainst7.pdf
|volume=Vol-2844
|authors=Athanasios Davvetas,Iraklis A. Klampanos
|dblpUrl=https://dblp.org/rec/conf/setn/DavvetasK20
}}
==Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data==
<pdf width="1500px">https://ceur-ws.org/Vol-2844/ainst7.pdf</pdf>
<pre>
             Unsupervised Severe Weather Detection Via Joint
          Representation Learning Over Textual and Weather Data
                           Athanasios Davvetas                                                          Iraklis A. Klampanos
       National Centre for Scientific Research “Demokritos”                             National Centre for Scientific Research “Demokritos”
        Institute of Informatics and Telecommunications                                  Institute of Informatics and Telecommunications
                          Athens, Greece                                                                   Athens, Greece
                   tdavvetas@iit.demokritos.gr                                                    iaklampanos@iit.demokritos.gr

ABSTRACT
When observing a phenomenon, severe cases or anomalies are of-
ten characterised by deviation from the expected data distribution.
However, non-deviating data samples may also implicitly lead to se-
vere outcomes. In the case of unsupervised severe weather detection,
these data samples can lead to mispredictions, since the predictors
of severe weather are often not directly observed as features. We
posit that incorporating external or auxiliary information, such as
the outcome of an external task or an observation, can improve
the decision boundaries of an unsupervised detection algorithm. In
this paper, we increase the effectiveness of a clustering method to
detect cases of severe weather by learning augmented and linearly
separable latent representations. We evaluate our solution against
three individual cases of severe weather, namely windstorms, floods
and tornado outbreaks.

CCS CONCEPTS                                                                           Figure 1: Data sample of GHT variable at 700 hPa pressure
• Computing methodologies → Artificial intelligence; Machine                           level
learning; • Applied computing → Physical sciences and en-
gineering.
                                                                                       These otherwise normal circumstances may lead to natural disas-
KEYWORDS                                                                               ters, even in costly damages or fatalities. However, they can not
Severe weather detection, representation learning, deep learning                       always be predicted by observing a physical quantity.
                                                                                          To predict these types of occurrences, we need to incorporate
                                                                                       external or auxiliary information that can effectively augment the
1    INTRODUCTION                                                                      observable features. In this paper, we investigate the effects of in-
Anomalies occur in the majority of datasets. They are fairly rare                      corporating external information in the form of an auxiliary task
and are often challenging to detect in an unsupervised setting. Due                    outcome. We achieve this by utilising a deep learning method called
to their lower frequency, the majority of normal samples introduces                    “Evidence Transfer” that incrementally manipulates the latent rep-
implicit bias that results in biased predictions. From an unsuper-                     resentations of an autoencoder according to external categorical
vised perspective, one can assume that these rare occurrences can                      evidence [3]. Evidence transfer allows for joint representation learn-
be observed in the outliers of the data distribution. Yet, depending                   ing, based on external categorical evidence retrieved from textual
on the application, searching for samples that deviate from the                        sources and weather re-analysis data. Evidence transfer success-
expected data distribution may not improve the detection of an                         fully manipulates the initial learned representations, resulting in
unsupervised method.                                                                   increased effectiveness during individual severe weather detection.
   In some applications, the occurrence of anomalies might be ex-
pected or it may not be trivial to detect deviation from the observed                  2 DATA AND METHODS
data distribution. An example of such application is detecting cases                   2.1 Weather Re-analysis Data
of severe weather. A heavy rain or windstorm may be considered
                                                                                       ERA-Interim [4] re-analysis data are produced with a sequential
as normal, depending on the geographic region or the season, etc.
                                                                                       data assimilation scheme during which prior information from
                                                                                       a forecast model is combined with the available observations in
                                                                                       order to estimate the state of the global atmosphere, allowing for
AINST2020, September 02–04, 2020, Athens, Greece                                       a better description of past atmospheric conditions. Weather re-
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                     analysis data are gridded data (as shown in Figure 1) depicting
                                                                                       atmospheric variables in various timestamps and pressure levels
(gravity-adjusted height), leading to 4D variables. They cover a
time period of up to 40 years, with less than 1° spatial fineness and
6-hour temporal resolution for global region.
   In our experiments, we used ERA-interim data covering the time
period from January 1 1979 to May 31 2018 with 6-hour temporal
resolution (retrieved from the Research Data Archive of National
Center for Atmospheric Research in Boulder, Colorado1 ). The spa-
tial resolution is ≈ 0.7° × 0.7°, containing atmospheric variables
across 37 vertical pressure levels ranging from 1ℎ𝑃𝑎 to 1000ℎ𝑃𝑎.
We reduce the region of gridded data from global region to a Carte-
sian domain that covers Europe. In order to reduce the domain
of our data we used the pre-processor of Weather Research and
Forecast (WRF) Model [8], named WPS. The new spatial resolution
of our data is of 64 × 64 cells of 75𝑘𝑚 × 75𝑘𝑚 in the west-east and
south-north axes.
   In our study, the atmospheric variable of interest is the geopoten-
tial height (GHT) which can be seen as a gravity-adjusted height.
GHT is often used for its predictive properties [6, 7, 9], as well as,
to extract weather patterns for other downstream tasks [5]. Severe
weather can be predicted via sequences of patterns in the geopo-
tential height (e.g. a cyclone can be observed as a circular pattern).    Figure 2: Overview of the use of Evidence Transfer for joint
To highlight useful high-level features, such as circular shapes and      representation learning over weather and textual evidence
edges, we extract embeddings through a pre-trained VGG-16 net-            to improve the detection of severe weather events.
work on ImageNet. We feed the VGG-16 network with 3 different
levels of GHT = 500, 700 and 900 ℎ𝑃𝑎 in similar fashion to using
the RGB channels of an image. Therefore, a single data sample of          the GeoNames 2 API. For the majority of extracted events, country
shape 3 × 64 × 64 is transformed into an embedding of 64 × 64,            names are used to reference the spatial extension of an event which
resulting in a total of 4096 features.                                    is stored in the “Affected Countries” field. More detailed spatial
                                                                          information such as city names or state names are stored in the
2.2      Textual Evidence                                                 “Location” field when they are available 3 .
We augment the weather-based embeddings by making use of tex-
                                                                          2.3     Evidence Transfer
tual evidence for historic severe weather events, found in Wikipedia.
For example, to find severe heavy rain occurrences we search              Evidence transfer [3] is a deep learning method that incrementally
for recorded floods. We extract categorical evidence from textual         manipulates the latent representations of an autoencoder accord-
sources of Wikipedia pages which associate a date to a severe             ing to external categorical evidence. In the context of evidence
weather event.                                                            transfer, any categorical variable can be utilised as evidence. The
   For our experiments we extract the following cases of extreme          most straight forward case of evidence is using the outcome of
events in Europe: (1) costly or deadly hailstorms, (2) floods, (3)        an auxiliary task. Evidence transfer has been developed with the
tornadoes and tornado outbreaks and (4) severe windstorms.                notion that in practice the availability of external data is either not
   Each of these event types is treated as a binary classification task   guaranteed, or we may observe the outcome of external processes
for predicting a specific severe weather case. The occurrence date        without having explicit access to the corresponding dataset. It is
is used to both reference the weather re-analysis data, as well as        a generic method for combining external evidence in the process
the individual tasks. Since the events listed in Wikipedia do not         of representation learning. It makes no assumptions regarding the
typically supply exact times, we label the whole day of reference         nature or source of external evidence. It is effective when intro-
as severe, therefore, the minimum span of an event is one day or          duced with meaningful evidence, robust against non-corresponding
four 6-hour samples (we remind that the weather re-analysis data          evidence and modular due to its transfer learning nature.
are provided in 6-hour increments).                                           Evidence transfer is a two step method. During the initialisation
   For each of the aforementioned lists we extract the following          step, an autoencoder is trained to reconstruct the input data of the
fields (for simplicity purposes, “Event” is used to represent each in-    primary task. To ensure robustness, an intermediate step is required.
dividual case of severe weather): Event Name, Event Type, Affected        During the intermediate step a small biased evidence autoencoder
Countries, Location, Country Coordinates (Latitude), Country Co-          is trained to reconstruct each categorical evidence source. Evidence
ordinates (Longitude), Event Description. The fields regarding the        autoencoder is called “biased”, due to introduced limitation in the
event (name, type, location, description) are extracted from the          amount of iterations. Meaningful evidence is able to converge for
Wikipedia pages, while the coordinates are retrieved from querying        small amount of iterations, leading to a latent projection of the

                                                                          2 https://www.geonames.org
1 http://rda.ucar.edu/datasets/ds627.0/                                   3 Dataset available at: https://github.com/davidath/severe-weather-dataset
evidence, however non-corresponding evidence is not able to gen-         effectively manipulate the initial representations. It additionally
eralise and therefore produce a uniform-like distribution. During        allows the linear separation into two classes, while during over-
severe weather case detection, we avoid this step, since we know         sampling and combination, the implicit bias overcomes the latent
that textual evidence is retrieved from meaningful sources.              space by resulting in a single inseparable cluster.
   During the transfer step, the initial latent representation are
manipulated according to external evidence through the joint opti-       2.5     Method overview
misation of reconstructing the input, as well as, reducing the cross     For all of our experiments, we follow the training procedure of
entropy between an extended softmax layer of the latent space            evidence transfer. First, we train a denoising stacked autoencoder
and the external evidence. The loss function of the initialisation       to reconstruct the primary task dataset, i.e. the weather re-analysis
step is shown in Equation 1. In Equation 2 we show the evidence          data. The initialisation step is completely unsupervised, no labels
transfer step loss, where 𝑉 is the set of categorical evidence sources   are used during this step. We consider an initial solution to our
and 𝑄 are the extended softmax layers. Structural Similarity Index       primary task, the “baseline” solution, during which we perform
(SSIM) is used as reconstruction loss function in order to retain the    an unsupervised detection method on the initially retrieved la-
structural information of the data.                                      tent representations. We perform the same unsupervised detection
                                  1 ∑︁
                                      𝑁                                  method on incrementally manipulated latent representations from
            ℓ𝐴𝐸 = L (𝑋, 𝑋 ′ ) =         𝑆𝑆𝐼𝑀 (𝑥 (𝑖) , 𝑥 ′(𝑖) )    (1)    evidence transfer in order to compare its effectiveness. We supply
                                  𝑁 𝑖=1
                                                                         the additional evidence sources based on the textual severe weather
                                              𝐾
                                                                         dataset.
              ℓ𝐸𝑣𝑖𝑇 𝑟𝑎𝑛𝑠 𝑓 = ℓ𝐴𝐸 + 𝜆 ∗
                                          1 ∑︁
                                                𝐻 (𝑉 𝑗 , 𝑄 𝑗 )    (2)       During experimental investigation of the best sampling strategy,
                                          𝐾 𝑗=1                          one class SVM was used as an unsupervised detection method.
                                                                         For the cases of detecting individual severe weather cases we use
                                                                         𝑘-means clustering with 𝑘=2 (prediction of severe or non-severe
2.4    Class Balancing                                                   weather) as an unsupervised detection method, except a single case
In our experiments, the original data consist of 57584 weather re-       where agglomerative clustering was used instead (ground-truth:
analysis samples in 6-hour increments, while the total amount            windstorm, evidence: tornado).
of severe weather samples without duplicate dates are only 3136
(less than 6% of the samples). To deal with imbalanced learning,         3     EXPERIMENTAL EVALUATION
we experiment with three different sampling strategies: (1) over-
                                                                         We experiment with individually detecting windstorms, floods and
sampling the minority class, (2) under-sampling the majority class,
                                                                         tornado outbreaks 4 . We avoid using the hail events due to lim-
(3) combination of over-sampling and under-sampling.
                                                                         ited amount of samples. We rotate between the different severe
    To over-sample the minority class, we use SMOTE [2]. SMOTE
                                                                         cases by selecting one case as the ground truth and alternate be-
generates minority class samples by joining the line segments of
                                                                         tween using the rest as external evidence. For example, we select
k-nearest neighbors. To under-sample the majority class, we per-
                                                                         windstorm weather samples and a portion of non-severe samples
form random under-sample, although more sophisticated under-
                                                                         as our primary task – ground truth, while another case, e.g flood,
sampling methods such as the ENN [10] (removes data samples that
                                                                         is selected as the auxiliary task – external evidence. We further
deviate from the majority of k-nearest neighbors) can also be used.
                                                                         under-sample the remaining non-severe weather cases in order to
A combination of both strategies can be achieved by combining
                                                                         match the number of severe weather samples.
the over-sampling with under-sampling, such as the SMOTEENN
                                                                            In Table 1, we report experimental results in terms of preci-
method [1].
                                                                         sion, recall and F1-score for the anomalous class. Introducing ex-
    In order to test the effectiveness of each sampling strategy we
                                                                         ternal evidence leads to linearly separable representations that
experiment with using the primary task of learning representations
                                                                         increases the effectiveness of clustering, and therefore detecting
to detect severe weather samples by combining all severe cases into
                                                                         the severe weather samples. Even though evidence transfer is a
a single class. We manipulate the initial learned space by incorporat-
                                                                         scalable method that can use multiple sources of evidence, in this
ing the ground-truth labels (i.e the binary task labels of predicting
                                                                         case, it is not as effective, due to ground truth and external evidence
severe from non-severe weather samples). Incorporating evidence
                                                                         contradicting each other for some portion of the data samples.
that exactly replicates the outcome of the primary task is not real-
                                                                            In our experiments, the final dataset consists of non-severe sam-
istic, however we use this scenario in order to investigate the best
                                                                         ples (≈500 after under-sampling to balance the individual severe
choice of sampling strategy without introducing implicit uncer-
                                                                         class), one severe class as the primary task or ground truth and
tainty from the choice of external categorical evidence. However,
                                                                         one as the external evidence. As an example, consider the task of
to test its generalisation, we split the ground-truth into train and
                                                                         predicting windstorm samples as ground truth and the task of pre-
test and only use the evidence labels during training with evidence
                                                                         dicting flood samples as external evidence. For the task of predicting
transfer.
                                                                         windstorms, non-severe samples and flood samples are labelled as
    Quantitative evaluation with the micro average of precision,
                                                                         “normal”. However, for the task of predicting floods, non-severe
recall and F1 score metrics for the full dataset (train and test) are
                                                                         samples and windstorm samples are labelled as “normal”. Therefore,
presented in Table 2. Our experiments indicate that under-sampling
                                                                         external evidence contradicts the ground truth during non-severe
the majority class is the most fitting for our case. By reducing
the redundancy in the majority class, evidence transfer can more         4 Code available at: https://github.com/davidath/severe-weather-detect
                Table 1: Experimental evaluation of evidence transfer for individual severe weather case detection.

                   Windstorm (Baseline)                               Flood (Baseline)                            Tornado (Baseline)
            Metric           Flood         Tornado       Metric        Windstorm         Tornado        Metric       Windstorm           Flood
          Precision          0.61            0.66      Precision           0.49           0.61        Precision          0.26             0.24
           Recall            0.71            0.87       Recall             0.50           0.57         Recall            0.62             1.00
          F1-Score           0.66            0.75      F1-Score            0.49           0.59        F1-Score           0.36             0.38
               Windstorm (Evidence Transfer)                  Flood (Evidence Transfer)                    Tornado (Evidence Transfer)
            Metric           Flood         Tornado       Metric        Windstorm         Tornado        Metric       Windstorm           Flood
          Precision    0.84 (+0.23)     0.79 (+0.13)   Precision       0.68 (+0.19)   0.72 (+0.11)    Precision      0.32 (+0.06)     0.28 (+0.04)
           Recall      0.74 (+0.03)     1.00 (+0.13)    Recall         0.92 (+0.42)   0.69 (+0.12)     Recall        0.98 (+0.36)     0.69 (-0.31)
          F1-Score     0.79 (+0.13)     0.88 (+0.13)   F1-Score        0.78 (+0.29)   0.71 (+0.12)    F1-Score       0.49 (+0.13)     0.40 (+0.02)


Table 2: Experimental evaluation of evidence transfer for se-
vere weather case detection with three sampling strategies.

                                Baseline
      Metric      Oversample         Undersample        Combine
     Precision        0.51                 0.53           0.51
      Recall          0.51                 0.53           0.51
     F1-Score         0.51                 0.53           0.51
                        Evidence Transfer
      Metric      Oversample         Undersample        Combine
     Precision    0.59 (+0.08)        0.82 (+0.29)     0.55 (+0.04)
      Recall      0.59 (+0.08)        0.82 (+0.29)     0.55 (+0.04)
     F1-Score     0.59 (+0.08)        0.82 (+0.29)     0.55 (+0.04)                              (a) Baseline of “Windstorm - Flood” Combination


samples. Introducing more sources of external evidence increases
the contradiction for non-severe samples, leading to increased un-
certainty during clustering.
   However, both quantitatively, as shown in Table 1, as well as
qualitatively (ground-truth: windstorm, evidence: flood, depicted
in Figure 3) introducing a single source of evidence improves the
outcome of clustering method by pushing the latent representa-
tions to become linearly separable and therefore improving the
effectiveness for both 𝑘-means and agglomerative clustering.

4   FUTURE WORK AND CONCLUSIONS
In this paper, we investigated using evidence transfer to improve
a primary task of detecting individual cases of severe weather. By
incorporating auxiliary tasks extracted from textual sources, we                            (b) Evidence transfer combination of “Windstorm - Flood”
effectively manipulated the latent space of an autoencoder using
evidence transfer, in order to increase the effectiveness of severe
weather detection. Making latent representations incrementally lin-               Figure 3: t-SNE 2d projections of the initial and Evidence
early separable resulted in improving the effectiveness of 𝑘-means                Transfer representations of originally 10 features. The ini-
and agglomerative clustering. Additionally, we investigated the best              tial latent space consists of a "mixed" cluster that can be seen
sampling method for our imbalanced class of detecting severe cases                as a single class in an unsupervised setting. However, after
with non-observable predictors, by evaluating the effectiveness of                evidence transfer, the latent representations are linearly sep-
evidence transfer in one class SVM (with linear kernel) prediction.               arable allowing for improved decision boundaries.
   Future work is directed towards utilising the temporal aspect of
weather re-analysis data. For our experiments, we mostly focused
on using embeddings extracted from an image recognition task.                           [4] D. P. Dee, S. M. Uppala, A. J. Simmons, P. Berrisford, P. Poli, S. Kobayashi, U.
However, retrieving temporally-aware embeddings from raw data,                              Andrae, M. A. Balmaseda, G. Balsamo, P. Bauer, P. Bechtold, A. C. M. Beljaars, L.
                                                                                            van de Berg, J. Bidlot, N. Bormann, C. Delsol, R. Dragani, M. Fuentes, A. J. Geer,
e.g. via a recurrent autoencoder, could improve the individual de-                          L. Haimberger, S. B. Healy, H. Hersbach, E. V. Hólm, L. Isaksen, P. Kållberg, M.
tection of severe weather cases by exploiting the temporal aspect of                        Köhler, M. Matricardi, A. P. McNally, B. M. Monge-Sanz, J.-J. Morcrette, B.-K.
                                                                                            Park, C. Peubey, P. de Rosnay, C. Tavolato, J.-N. Thépaut, and F. Vitart. 2011. The
the data. Additionally, since the under-sampling strategy appears                           ERA-Interim reanalysis: configuration and performance of the data assimilation
to perform better for this problem, it would be beneficial to increase                      system. Quarterly Journal of the Royal Meteorological Society 137, 656 (2011),
the total amount of severe weather samples from additional sources.                         553–597.
                                                                                        [5] Iraklis A. Klampanos, Athanasios Davvetas, Spyros Andronopoulos, Charalambos
                                                                                            Pappas, Andreas Ikonomopoulos, and Vangelis Karkaletsis. 2018. Autoencoder-
ACKNOWLEDGMENTS                                                                             Driven Weather Clustering for Source Estimation during Nuclear Events. Envi-
                                                                                            ronmental Modelling & Software 102 (April 2018), 84–93.
This work has been supported by the Industrial Scholarships pro-                        [6] Mikhail A. Krinitskiy, Yulia A. Zyulyaeva, and Sergey K. Gulev. 2019. (9 2019).
gram of Stavros Niarchos Foundation.                                                        https://doi.org/10.6084/m9.figshare.9851099.v1
                                                                                        [7] T. N. Krishnamurti, K. Rajendran, T. S. V. Vijaya Kumar, Stephen Lord, Zoltan
                                                                                            Toth, Xiaolei Zou, Steven Cocke, Jon E. Ahlquist, and I. Michael Navon. 2003.
REFERENCES                                                                                  Improved Skill for the Anomaly Correlation of Geopotential Heights at 500 hPa.
[1] Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. 2004.         Monthly Weather Review 131, 6 (2003), 1082–1102.
    A Study of the Behavior of Several Methods for Balancing Machine Learning           [8] John Michalakes, Jimy Dudhia, D. Gill, Tom Henderson, J. Klemp, W. Skamarock,
    Training Data. SIGKDD Explor. Newsl. 6, 1 (June 2004), 20–29.                           and Wei Wang. 2004. The Weather Reseach and Forecast Model: Software Archi-
[2] Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer.          tecture and Performance. 11th ECMWF Workshop on the Use of High Performance
    2002. SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Int. Res. 16,        Computing In Meteorology.
    1 (June 2002).                                                                      [9] Murat Türkeş, UM Sümer, and G Kiliç. 2002. Persistence and periodicity in the
[3] Athanasios Davvetas, Iraklis A. Klampanos, and Vangelis Karkaletsis. 2019. Evi-         precipitation series of Turkey and associations with 500 hPa geopotential heights.
    dence Transfer for Improving Clustering Tasks Using External Categorical Evi-           Climate Research - CLIMATE RES 21 (05 2002), 59–81.
    dence. In The International Joint Conference on Neural Networks (IJCNN). IEEE,     [10] D. L. Wilson. 1972. Asymptotic Properties of Nearest Neighbor Rules Using
    1–8.                                                                                    Edited Data. IEEE Transactions on Systems, Man, and Cybernetics SMC-2, 3 (1972),
                                                                                            408–421.

</pre>