Multivariate Time Series-based Solar Flare Prediction by
Functional Network Embedding and Sequence Modeling
Shah Muhammad Hamdi1,* , Abu Fuad Ahmad2 and Soukaina Filali Boubrahimi1
1
    Utah State University, Logan, UT, 84322, USA
2
    New Mexico State University, Las Cruces, NM, 88003, USA


                                          Abstract
                                          Major flaring events on the Sun can have hazardous impacts on both space and ground-based infrastructure. An effective
                                          approach of predicting that a solar active region (AR) is likely to flare after a period of time is to leverage multivariate time
                                          series (MVTS) of the AR magnetic field parameters. Existing MVTS-based flare prediction models are based on training
                                          traditional classifiers with preset statistical features of univariate time series instances, or training deep sequence models
                                          based on Recurrent Neural Network (RNN) or Long Short Term Memory (LSTM) Network. While the earlier approach is
                                          affected by hand-engineered features, the latter approach uses only the temporal dimension of the MVTS instances. The
                                          variables of MVTS do not depend only on their historical values but also on other variables. In this work, we used the dynamic
                                          functional network representation of the MVTS instances to leverage higher-order relationships of the variables through
                                          Graph Convolution Network (GCN) embedding. In addition to finding spatial (inter-variable) patterns through functional
                                          network embedding, our model uses local and global temporal patterns through LSTM networks. Our experiments on a
                                          real-life solar flare dataset exhibit better prediction performance than other baseline methods.

                                          Keywords
                                          Solar flare prediction, Multivariate time series, GCN, LSTM


1. Introduction                                                                                        rely on data science-based approaches for predicting so-
                                                                                                       lar flares. The data is collected by the Helioseismic Mag-
Solar flares are characterized by sudden bursts of mag- netic Imager (HMI) housed in the Solar Dynamics Ob-
netic flux in the solar corona and heliosphere. Extreme servatory. Near-continuous-time images captured by the
Ultra-Violet (EUV), X-ray, and gamma-ray emissions instruments of HMI contain spatiotemporal magnetic
caused by major flaring events can have disastrous ef- field data of the active regions. The prediction of solar
fects on our technology-dependent society. The risks of flares, which will identify active regions that will poten-
life and infrastructure in both space and ground include tially flare after a period of time, requires time series
radiation exposure-based health risks of the astronauts, modeling of the magnetic field data. For that, spatiotem-
disruption in GPS and radio communication, and dam- poral magnetic field data of active regions are mapped
ages in electronic devices. The economic damage of such into multiple MVTS instances [3]. The variables of the
extreme solar events can rise up to trillions of dollars [1]. MVTS instances represent solar magnetic field parame-
In 2015, the White House released the National Space ters (e.g., flux, current, helicity, Lorentz force). The time
Weather Strategy and Space Weather Action Plan [2] as a series corresponding to the magnetic field parameters
roadmap for research aimed at predicting and mitigating are extracted based on two time windows: observation
the effects of solar eruptive activities.                                                              window (the time window of data collection), and predic-
    In recent years, multiple research efforts of the helio- tion window (the time window after the data collection
physics community aim to predict solar flares from the and before the flare occurrence). Each MVTS instance
current and historic magnetic field states of the solar is labeled as one of six classes - Q, A, B, C, M, and X,
active regions. Due to the absence of direct theoretical where Q represents flare quiet active regions, and other
relationship between magnetic field influx and flare oc- labels represent flaring events with increasing intensity.
currence in active regions (AR), solar physics researchers Among these classes, X and M-class flares are considered
AMLTS’22: Workshop on Applied Machine Learning Methods for Time as most intense flaring events.
Series Forecasting, co-located with the 31st ACM International Con-                                       In comparison to the earlier single timestamp-based
ference on Information and Knowledge Management (CIKM), October magnetic field vector classification models, recent MVTS-
17-21, 2022, Atlanta, USA                                                                              based models are more effective for predicting flaring
*
  Corresponding author.
                                                                                                       activities [3]. MVTS classification models targeting flare
$ s.hamdi@usu.edu (S. M. Hamdi); fuad@nmsu.edu (A. F. Ahmad);
soukaina.boubrahimi@usu.edu (S. F. Boubrahimi)                                                         prediction are divided in two categories: (1) statisti-
 0000-0002-9303-7835 (S. M. Hamdi); 0000-0001-5693-6383                                               cal feature-based method [4], and (2) end-to-end deep
(S. F. Boubrahimi)                                                                                     learning-based method [5]. The models of the first cate-
          © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
          Attribution 4.0 International (CC BY 4.0).                                                   gory work in two steps. Firstly, low-dimensional repre-
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
sentations of MVTS instances are calculated from con-         2. Related Work
catenation/aggregation of summarization statistics (e.g.,
mean, standard deviation, skewness, kurtosis, etc) of the     While the current approaches of flare prediction are
univariate time series components. Lastly, traditional        mostly based on data science, the earliest flare prediction
classifiers (e.g., kNN, SVM, etc) are trained with labeled    system was an expert system named THEO that required
MVTS representations. The two-step process of MVTS            human inputs [7]. The Space Environment Center (SEC)
classification relies heavily on hand-engineered statis-      of the National Oceanic and Atmospheric Administration
tical features and the choice of downstream classifiers,      (NOAA) adopted the system THEO in 1987. To distin-
which eventually complicates the application of these         guish flare classes, THEO was provided input data of
models in datasets with varying properties. In the sec-       sunspots and magnetic field properties.
ond category, RNN/LSTM-based deep sequence models                Due to the abundance of magnetic field data collected
are trained by sequentially feeding vectors representing      by NASA’s recent missions, research efforts of flare pre-
magnetic field parameters into sequence model cells, and      diction of the last two decades are based on data science
optimizing the cell weights through gradient descent-         rather than on purely theoretical modeling. Data science-
based backpropagation. While the deep learning models         based approaches stemmed from both linear and nonlin-
ensure end-to-end learning bypassing the dependency           ear statistics. Based on the type of dataset used, these
on hand-engineered features, they can utilize only the        approaches are subdivided into two classes: line-of-sight
time dimension of the MVTS instances, and this limited        magnetogram-based models and vector magnetogram-
usage of underlying patterns results in poor classification   based models. Solar active regions are represented by the
performance.                                                  parameters of either photospheric magnetic field data
   In this work, we propose a deep learning-based MVTS        that contain only the line-of-sight component of the
classification approach for solar flare prediction lever-     magnetic field or by the full-disk photospheric vector
aging the the fact that MVTS data is rich not only in         magnetic field. Followed by NASA’s launch of SDO in
temporal dimension, but also in spatial dimension which       2010, the HMI instrument has been mapping the full-
encodes inter-variable relationships [6]. For learning        disk vector magnetic field every 12 minutes [8]. Most
higher-order relationships of the MVTS variables, we          of the recent models use the near-continuous stream of
used functional networks, where nodes represent vari-         vector magnetogram data found from SDO, while the ear-
ables, and edges represent positive correlation of the time   lier models (dated before 2010) mostly used line-of-sight
series of corresponding variables. The MVTS instance          magnetic data.
is divided into equal-length temporal windows, and an            The objective of the linear statistical models was to find
edge-weighted functional network is constructed for each      the active region magnetic field features that are highly
window. We trained Graph Convolution Network (GCN)            correlated with the flare occurrences. Cui et al. [9] and
to learn representation of each functional network. In        Jing et al. [10] used line-of-sight magnetogram data to
addition, we used two LSTM networks for learning rep-         find correlation-based statistical relationships between
resentations based on temporal dimension within and           magnetic field parameters and flare occurrences. Even
between the windows. Our model significantly outper-          before the launch of SDO, Leka and Barnes [11] collected
forms existing MVTS-based flare prediction models on          and curated vector magnetogram data from Mees Solar
a dataset containing MVTS instances of solar events of        Observatory on the summit of Mount Haleakala, and
different flare classes.                                      used linear discriminant analysis (LDA) for classifying
   The contributions made by this paper are listed below.     flaring events.
                                                                 Nonlinear statistical models are mostly machine learn-
    1. Leveraging higher-order inter-variable relation-       ing classifiers based on tree induction, kernel method,
       ships of the MVTS instances by GCN-based dy-           neural network, and so on. On the line-of-sight
       namic functional network embedding.                    magnetogram-based active region datasets, Song et al.
    2. Utilizing local and global patterns of the temporal    [12] used logistic regression, Yu et al. [13] used C4.5
       dimension of the MVTS instances through LSTM-          decision tree, Ahmed et al. [14] used the fully connected
       based within-window and between-window se-             neural network, and Al-Ghraibah et al. [15] used rele-
       quence learning.                                       vance vector machine as classification models. Bobra et al.
    3. Experimentally demonstrating the better perfor-        [16] used Support Vector Machine (SVM) on SDO-based
       mance of our model in comparison with the state-       vector magnetogram data for classifying flaring and non-
       of-the-art baselines on a benchmark solar flare        flaring active regions. Nishizuka et al. [17] used both
       prediction dataset.                                    line-of-sight and vector magnetograms and compared
                                                              the performance of three classifiers - kNN, SVM, and Ex-
                                                              tremely Randomized Tree (ERT). Other examples of solar
                                                              flare prediction on non-sequential data include various
                                                                                                          MVTS instance 𝑆 (𝑖) ∈ R𝑇 ×𝑁 is a collection of univariate
                          Multivariate Time Series (MVTS) instance                Label: 'X'

                            P1
                                                                                                          time series of 𝑁 magnetic field parameters, where each
       field parameters
       Solar magnetic       P2                                                                            time series contains periodic observation values of the
                           ...                ...                                                         corresponding parameter for an observation period 𝑇 .
                                                                                                          We denote the vector of 𝑡-th timestamp as 𝑥<𝑡> ∈ R𝑁 ,
                           PN
                                                                           Flare occurrence               and the time series represented by 𝑘-th parameter as
                                                                                                          𝑃𝑘 ∈ R𝑇 . After the observation period 𝑇 and prediction
                                                                                                          period ∆, the event is labeled by the active region state
                                             Observation             Prediction                Time (t)
                                             window (T)              window (Δ)


Figure 1: Multivariate time series instance with predefined
                                                                                                          (flare quiet or different flare classes). The active region
observation and prediction window, and corresponding flare                                                state of a particular timestamp is found from the NOAA
class label [5]                                                                                           records of flaring events. Fig. 1 shows the MVTS-based
                                                                                                          data model of a solar event. Each MVTS instance is di-
                                                                                                          vided into 𝜂 equal-length windows such that 𝑇 = 𝜂𝜏 ,
                                                                                                          where 𝜏 denotes window length. The sub-MVTS is de-
applications of convolutional neural network (ConvNet)                                                    noted by 𝑠 ∈ R𝜏 ×𝑁 , and 𝑠 is a subsequence of 𝑆.
on SDO AIA/HMI images [18, 19, 20, 21].
   Angryk et al. [3] introduced temporal window-
                                                                                                          3.1.2. Node-attributed functional network
based flare prediction, which extends the earlier sin-
gle timestamp-based models. The authors published an                                                      Functional network is a undirected and edge-weighted
MVTS-based active region dataset, where each MVTS                                                         graph, and defined as 𝐺 = (𝑉, 𝐸, 𝑊, 𝑋), where the
instance records magnetic field data for a preset observa-                                                set of nodes 𝑉 = {𝑃1 , ..., 𝑃𝑁 } denotes magnetic field
tion time and uniform sampling rate, and is labeled by                                                    parameters, 𝑊 : 𝐸 −→ R is a function of mapping edges
flare classes that occurred after a given prediction time.                                                to their weights, and node attribute matrix 𝑋 ∈ R𝑁 ×𝜏
Among the MVTS classification approaches, Hamdi et                                                        contains the time series of each node in the sub-MVTS,
al. [4] used statistical summarization of component uni-                                                  i.e., 𝑋 = 𝑠𝑇 . The functional network is defined on the
variate time series for training kNN classifier, Ma et. al.                                               sub-MVTS, and the weight 𝑤𝑖𝑗 of edge 𝑒𝑖𝑗 (between node
[22] applied MVTS decision trees that approached the                                                      pair 𝑃𝑖 and 𝑃𝑗 ) represents the statistical similarity of 𝜏 -
problem using clustering as a preprocessing step, and                                                     length time series of 𝑃𝑖 and 𝑃𝑗 . Each functional network
Muzaheed et. al. [5] used LSTM-based deep sequence                                                        derived from a MVTS dataset has the same node set 𝑉 .
modeling for end-to-end flare classification that auto-
mated feature learning process avoiding hand-engineered                                                   3.1.3. Graph Convolution
statistical features.
   Unlike previous models based on traditional ML and                                                     For learning the representations of node-attributed func-
deep sequence learning, in this work, we present a model                                                  tional networks, we use Graph Convolution Network
that leverages temporal as well as spatial relationships                                                  (GCN). GCN is a widely used graph neural network [23]
of the MVTS instances. Our model learns MVTS repre-                                                       that learns node representations from a graph through
sentations through an end-to-end fashion, and utilizes                                                    layer-wise neighborhood aggregation. Graph convolu-
higher-order inter-variable relationships and local and                                                   tion of layer 𝑙 aggregates the representations of 𝑙-hop
global temporal changes.                                                                                  neighbors. GCN updates representation of node 𝑣 in a
                                                                                                          graph 𝐺 = (𝑉, 𝐸, 𝑊, 𝑋) by following equations.

3. MVTS representation learning
                                                                                                             ℎ[0]
                                                                                                              𝑣 = 𝑥𝑣                                              (1)
   by functional network and                                                                                                 ⎛                                   ⎞
   sequence embedding                                                                                                                  ∑︁ 𝑤𝑢𝑣 ℎ[𝑙]
                                                                                                                                               𝑢
                                                                                                           ℎ[𝑙+1]
                                                                                                            𝑣     = 𝑅𝑒𝐿𝑈 ⎝𝑊𝑔[𝑙]                    + 𝐵𝑔[𝑙] ℎ[𝑙]
                                                                                                                                                            𝑣
                                                                                                                                                                ⎠,
                                                                                                                                          |𝑁 (𝑣)|
                                                                                                                                     𝑢∈𝑁 (𝑣)
3.1. Notations and Preliminaries
                                                                                                               ∀𝑙 ∈ {0, 1, ..., 𝐿 − 1}                             (2)
3.1.1. MVTS and Sub-MVTS
                                                                                                               𝑧𝑣 = ℎ[𝐿]
                                                                                                                     𝑣                                             (3)
Each solar active region resulting in different flare classes                                                       1 ∑︁
(or staying as a flare quiet region) after a given prediction                                                 𝑧𝐺 =          𝑧𝑣                                     (4)
                                                                                                                   |𝑉 | 𝑣∈𝑉
window represents a solar event. The solar event 𝑖 is
represented by a MVTS instance 𝑆 (𝑖) , and associated by
                                                                                                             Here, 𝐿 is the number of GCN layers, 𝑥𝑣 ∈ R𝜏 is the
a class label 𝑦 (𝑖) . The class label 𝑦 (𝑖) represents the flare                                                              [𝑙]
quiet state, or flare classes of different intensities. The                                               vector of node 𝑣, ℎ𝑣 ∈ R𝑑𝑔 is the representation of node
                                                                                                                          [𝑙]
                                                                                                          𝑣 in layer 𝑙, 𝑊𝑔 ∈ R𝑑𝑔 ×𝑑𝑔 is the weight matrix of layer
                                                     Edge-weighted network structure and node attribute matrix

                                                                                                               F
                                                                              B        C                    E
                                                                                                            D
                                            Window-based           A                                                            GCN                  zG                 cf<0>      hf<0>
                                            functional                                              D       C
                                            network                                                         B
                                            construction                      F        E
                                                                                                                                                             zw<1>
                                                                                                            A
                    Parameters                                                                                                                   Concat
                                                                                                                                                                            LSTMf
                                                                                            cs<0>         hs<0>
          A     B     C     D     E     F                              Sub-MVTS
                                                                                                                                         hs|𝜏|
                                                               𝜏                                                    ...
                                                                                                     LSTMs                    LSTMs                  zs             cf<1>          hf<1>


                                                                                       ...
Time                                            s1


                                                s2                                                                                                          zw<2>
                                                                        ...                              ...                                       ...                                      Softmax
                                                                                                                                                                            LSTMf

                                                s3                                                             F
                                                                                  B        C                   E
                                                                                                               D                                                    cf<2>          hf<2>       z
                                                                   A                                                                                zG
              The MVTS instance is divided                                                           D                          GCN
                                                                                                               C
              into three windows (η = 3) each
              with 𝜏-length                                                       F        E                   B
                                                                                                               A                                            zw<3>
                                                                                                                                                                                       zf
                                                                                                                                              Concat                     LSTMf               Linear
                                                                                       cs<0>            hs<0>
                                                                   Sub-MVTS

                                                                                                                   ...                hs|𝜏|
                                                           𝜏                                        LSTMs                    LSTMs                  zs
                                                                                      ...


Figure 2: GCN-based node-attributed functional network embedding and LSTM-based local and global sequence embedding.
For showing the functional network construction process, parameter set {𝑃1 , 𝑃2 , .., 𝑃𝑁 } of the MVTS instance has been
shown as {𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹 }.


    [𝑙]
𝑙, 𝐵𝑔 ∈ R𝑑𝑔 is the bias vector of layer 𝑙, 𝑁 (𝑣) is the set
of neighbor nodes of node 𝑣, 𝑤𝑢𝑣 is the weight associated
in the edge between node 𝑣 and its neighbor 𝑢, 𝑧𝑣 is the                                                           ˜𝑐<𝑡> = 𝑡𝑎𝑛ℎ(𝑊𝑐 [ℎ<𝑡−1> , 𝑥<𝑡> ] + 𝑏𝑐 )                                          (5)
final representation of node 𝑣 after 𝐿 iterations of neigh-                                                              Γ𝑢 = 𝜎(𝑊𝑢 [ℎ<𝑡−1> , 𝑥<𝑡> ] + 𝑏𝑢 )                                          (6)
borhood aggregation, and 𝑧𝐺 is graph representation
found by averaging the node representations.                                                                             Γ𝑓 = 𝜎(𝑊𝑓 [ℎ<𝑡−1> , 𝑥<𝑡> ] + 𝑏𝑓 )                                          (7)
                                                                                                                         Γ𝑜 = 𝜎(𝑊𝑜 [ℎ             <𝑡−1>
                                                                                                                                                             ,𝑥     <𝑡>
                                                                                                                                                                                ] + 𝑏𝑜 )            (8)
3.1.4. Sequence embedding through LSTM                                                                             𝑐   <𝑡>
                                                                                                                             = Γ𝑢 ⊙ ˜𝑐           <𝑡>
                                                                                                                                                          + Γ𝑓 ⊙ 𝑐          <𝑡−1>
                                                                                                                                                                                                    (9)
Long-short term memory (LSTM) networks [24] are                                                                    ℎ   <𝑡>
                                                                                                                             = Γ𝑜 ⊙ 𝑡𝑎𝑛ℎ(𝑐                 <𝑡>
                                                                                                                                                                    )                              (10)
frequently used for sequence representation learn-
ing which facilitates various tasks such as sequence                                               We denote the number of dimensions of the cell state
classification, sequence-to-sequence translation, and                                           representation 𝑐<𝑡> and hidden state representation
so on. We use LSTM networks for learning low-                                                   ℎ<𝑡> of the LSTM cell as 𝑑𝑠 . The concatenation of hid-
dimensional representations of MVTS instances. The                                              den state of previous timestamp and the input of current
MVTS (and sub-MVTS) instances are sequences of 𝑁 -                                              timestamp is [ℎ<𝑡−1> , 𝑥<𝑡> ] ∈ R𝑑𝑠 +𝑁 . The candidate
dimensional timestamp vectors. The timestamp vec-                                               cell state representation is ˜𝑐<𝑡> ∈ R𝑑𝑠 . The weight
tor 𝑥<𝑡> ∈ R𝑁 represents the magnetic filed state of                                            matrices are 𝑊𝑐 , 𝑊𝑢 , 𝑊𝑓 , 𝑊𝑜 ∈ R𝑑𝑠 ×(𝑑𝑠 +𝑁 ) , and bias
the active region (𝑁 parameter values) in the times-                                            terms are 𝑏𝑐 , 𝑏𝑢 , 𝑏𝑓 , 𝑏𝑜 ∈ R. The subscripts 𝑢, 𝑓 , and 𝑜
tamp 𝑡. We denote the inputs to the LSTM cells                                                  represents the activations of update gate, forget gate, and
as [𝑥<1> , 𝑥<2> , 𝑥<3> , ..., 𝑥<𝛾> ], cell state represen-                                      output gate respectively, while ⊙ refers to elementwise
tations as [𝑐<0> , 𝑐<1> , 𝑐<2> , ..., 𝑐<𝛾−1> ], and hidden                                      multiplication, and 𝜎 represents sigmoid activation. Fi-
state representations as [ℎ<0> , ℎ<1> , ℎ<2> , ..., ℎ<𝛾> ],                                     nally, we consider ℎ<𝛾> as the final representation of
where 𝛾 is the last timestamp of the sequence. After ran-                                       the input MVTS.
domly initializing 𝑐<0> and ℎ<0> , we update the cell
state and hidden state of the timestamp 𝑡 by following
LSTM equations [24].
3.2. Data Preprocessing                                        3.3. MVTS representation learning
3.2.1. Node-level normalization                                 In Fig. 2, we show the components of MVTS representa-
                                                                tion learning. Firstly, the window embedding learns the
Since the magnetic field parameter values are recorded in
                                                                local spatiotemporal changes of the sub-MVTS instances
different scales, we perform z-score normalization. Sup-
                                                                through the models denoted as 𝐺𝐶𝑁 and 𝐿𝑆𝑇 𝑀𝑠 , and
pose that 𝑀 number of MVTS instances each with 𝑁
                                                                finally, the whole MVTS embedding learns global tempo-
parameters and 𝑇 time points are represented by a third-
                                                                ral changes of the local (window) representations through
order tensor 𝒳 ∈ R   𝑀 ×𝑁 ×𝑇
                                , where three modes repre-
                                                                the model denoted as 𝐿𝑆𝑇 𝑀𝑓 .
sent events, parameters/nodes, and timestamps. For the
better performance of the GCN-based graph embedding,
we perform node-level z-normalization as a preprocessing 3.3.1. Window embedding
step in the following three steps.                              Our model learns the representation of the window
                                                                𝑠 (sub-MVTS) of the MVTS instance 𝑆 through GCN-
     1. We perform mode-2 matricization, i.e., reshaping
                                                                based node-attributed functional network embedding
        the tensor so that mode-2 (parameter/node) fibers
                                                                and LSTM-based local sequence modeling.
        become the columns of the matrix. The matrix
        is denoted by 𝑋(2) ∈ R       𝑀 𝑇 ×𝑁
                                              . The columns are       • GCN-based functional network embedding:
        denoted by 𝑃1 , 𝑃2 , . . . , 𝑃𝑁 .                                We input the node-attributed functional network
     2. For each column 𝑃𝑗 , we perform z-normalization                  𝐺(𝑉, 𝐸, 𝑊, 𝑋) to a two-layer GCN. The initial
        as follows.                                                      node attributes are set as 𝑋 = 𝑠𝑇 (Eq. 1). In
                                     (𝑗)
                                                                         the first layer, each node is embedded into a 𝑑′𝑔 -
                         (𝑗)      𝑥𝑘 − 𝜇(𝑗)                              dimensional space through 1-hop neighborhood
                        𝑥𝑘 =
                                        𝜎 (𝑗)                            aggregation, and after the second layer, each node
                 (𝑗)                                                     is embedded into a 𝑑𝑔 -dimensional space through
        Here, 𝑥𝑘 is the 𝑘-th value of the column 𝑃𝑗 ,                    2-hop neighborhood aggregation (Eq. 2,3). Fi-
        where 1 ≤ 𝑘 ≤ 𝑀 𝑇 , 𝜇(𝑗) is the mean of the                      nally, the whole graph representation 𝑧𝐺 ∈ R𝑑𝑔
        column 𝑃𝑗 , and 𝜎 is the standard deviation of
                           (𝑗)
                                                                         is computed through mean pooling (Eq. 4).
        the column 𝑃𝑗 .                                               •  LSTM-based sub-MVTS embedding: The sub-
     3. We reshape the matrix 𝑋(2) ∈ R𝑀 𝑇 ×𝑁 back to                     MVTS 𝑠 = [𝑥<1> , ..., 𝑥<𝜏 > ], where 𝑥<𝑡> ∈
        third-order tensor, 𝒳 ∈ R𝑀 ×𝑁 ×𝑇 .                               R𝑁 , is sequentially input to the 𝐿𝑆𝑇 𝑀𝑠 (Eq. 5-
                                                                         10), and we extract the last hidden representation
3.2.2. Functional network construction                                   𝑧𝑠 = ℎ<𝜏𝑠
                                                                                     >
                                                                                       , where 𝑧𝑠 ∈ R𝑑𝑠 .
We calculate the Pearson correlation matrix 𝐶 ∈ R𝑁 ×𝑁          For the window embedding, we concatenate 𝑧𝐺 ∈ R𝑑𝑔
for the sub-MVTS 𝑠 ∈ R𝜏 ×𝑁 . In the correlation ma-            and 𝑧𝑠 ∈ R𝑑𝑠 . Therefore, the window representation is
trix, 𝐶𝑖𝑗 represents the Pearson correlation coefficient       𝑧𝑤 ∈ R𝑑𝑔 +𝑑𝑠 .
(in the range of [-1, 1]) between 𝜏 -length time series
𝑃𝑖 and 𝑃𝑗 . The symmetric matrix 𝐶 can be considered           3.3.2. Whole MVTS embedding
as an adjacency matrix of a graph of 𝑁 nodes. We ap-
ply a sparsity threshold of 0 so that only edges with          After each of 𝜂 windows is represented as (𝑑𝑔 +
positive weight (node pairs with positive correlation)         𝑑𝑠 )-dimensional vector, we feed the sequential data
are considered for functional network construction. We
                                                                 <1>
                                                               [𝑧𝑤            <𝜂>
                                                                      , ..., 𝑧𝑤   ] into 𝐿𝑆𝑇 𝑀𝑓 for global temporal
denote the sparse correlation matrix as the adjacency          change modeling. Note that 𝐿𝑆𝑇 𝑀𝑓 and 𝐿𝑆𝑇 𝑀𝑠 have
matrix 𝐴 ∈ R𝑁 ×𝑁 . Although the functional network             different learnable parameter sets (e.g., 𝑊𝑢𝑠 , 𝑊𝑢𝑓 , etc),
defined over a sub-MVTS encodes inter-variable inter-          although in this work the number of dimensions (𝑑𝑠 ) in
actions within a small temporal window, the adjacency          the cell state and hidden state are kept the same. We
matrix is not enough for the completeness of data, since       extract the final hidden state representation 𝑧𝑓 = ℎ<𝜂>
                                                                                                                    𝑓    ,
negative correlation coefficients are discarded. To avoid      where 𝑧𝑓 ∈ R𝑑𝑠 . We input 𝑧𝑓 into a linear (fully con-
the data missing, in addition to the adjacency matrix          nected) layer. In this layer, the parameters are 𝑊𝐹 ∈
(graph structure), we extract the node attribute matrix        R𝑛𝑐 ×𝑑𝑠 , and 𝑏𝐹 ∈ R, where 𝑛𝑐 is the number of classes.
𝑋 = 𝑠𝑇 . In 𝑋 ∈ R𝑁 ×𝜏 , each row represents node at-           After this layer, we have a 𝑛𝑐 -dimensional representation
tributes in the form of 𝜏 -length time series (normalized      of the whole MVTS instance of event 𝑖.
in the previous step).
                                                                              𝑧 (𝑖) = 𝑅𝑒𝐿𝑈 (𝑊𝐹 𝑧𝑓 + 𝑏𝐹 )               (11)
   Finally, we input 𝑧 (𝑖) ∈ R𝑛𝑐 into a softmax layer,        benchmark dataset. We used PyTorch 1.10.0 with CUDA
whose number of units is equal to the number of classes.      11.1 for implementing our GCN-LSTM-based MVTS clas-
The softmax layer gives us the normalized class probabil-     sifier. The source code of our model and the experimental
ities, and we finally get 𝑦ˆ(𝑖) ∈ R𝑛𝑐 .                       dataset are available at our GitHub repository. 1
                                     (𝑖)
                                𝑒𝑧                            4.1. Dataset
                   𝑦ˆ(𝑖) = ∑︀              (𝑖)
                                                       (12)
                                𝑛𝑐   𝑧𝑗
                                𝑗=1 𝑒                       As the benchmark dataset of our experiments, we used
  The predicted labels of training MVTS instances are       the solar flare prediction dataset published by Angryk et.
matched against true labels, and the Adam optimizer         al. [3]. Each MVTS instance in the dataset is made up
[25] updates the weight and bias parameter values of the    of 25 time series of active region magnetic field param-
𝐺𝐶𝑁 , 𝐿𝑆𝑇 𝑀𝑠 , 𝐿𝑆𝑇 𝑀𝑓 and the fully connected layer         eters (for the full list of parameters, see [16]). The time
through backpropagation algorithm. Algorithm 1 shows        series instances are recorded at 12 minutes intervals for
the training procedure of the proposed GCN-LSTM-based       a total duration of 12 hours (60 time steps). The MVTS
MVTS representation learning.                               instances are labeled according to the flaring event that
                                                            occurred after 12 hours. Therefore, the dataset has the
Algorithm 1 Training of GCN-LSTM-based MVTS rep- number of the observation points 𝑇 = 60, and the num-
resentation learning                                        ber of dimensions in timestamp vectors 𝑁 = 25, while
Input: Training set 𝒟 consisted of functional network       the prediction window is ∆ = 12 hours. Our experi-
adjacency matrices 𝑋𝑎𝑑𝑗 ∈ R      𝑛𝑡𝑟𝑎𝑖𝑛 ×𝜂×𝑁 ×𝑁
                                                   and node mental dataset consists of 1,540 MVTS instances evenly
attribute matrices 𝑋𝑛𝑎𝑡 ∈ R     𝑛𝑡𝑟𝑎𝑖𝑛 ×𝜂×𝑁 ×𝜏
                                                , one-hot   distributed across four classes (X, M, BC, and Q), where
training labels 𝑦𝑡𝑟𝑎𝑖𝑛 ∈ R𝑛𝑡𝑟𝑎𝑖𝑛 ×𝑛𝑐 , number of epochs BC represents events from both B and C classes (less in-
𝑛𝑒𝑝𝑜𝑐ℎ𝑠 , learning rate 𝛼, and weight decay factor of the tense flares). We split the dataset into train and test using
Adam optimizer 𝜆.                                           the stratified holdout method (two-thirds for training and
Output: Learned parameters of 𝐺𝐶𝑁 , 𝐿𝑆𝑇 𝑀𝑠 , and            one-third for the test).
𝐿𝑆𝑇 𝑀𝑓 .
  1: Randomly initialize parameter set 𝒲, which             4.2. Baseline methods
     contains 𝐺𝐶𝑁 , 𝐿𝑆𝑇 𝑀𝑠 , and 𝐿𝑆𝑇 𝑀𝑓 parameters
                                                            We evaluated our GCN-LSTM-based MVTS classification
  2: for number of training epochs 𝑛𝑒𝑝𝑜𝑐ℎ𝑠 do
                                                            model with six other baselines.
  3:     for MVTS instance 𝑖 = 1, 2, ..., 𝑛𝑡𝑟𝑎𝑖𝑛 do
  4:         Window matrix, 𝑍𝑤 = [0]𝜂×(𝑑𝑔 +𝑑𝑠 )                   • Flattened vector method (FLT): This is a naive
  5:         for window 𝑗 = 1, 2, ..., 𝜂 do                          method, where each 60 × 25 MVTS instance is
  6:               𝐴 ← 𝑋𝑎𝑑𝑗 [𝑖, 𝑗, :, :]                             flattened into a 1, 500-dimensional vector.
  7:               𝑋 ← 𝑋𝑛𝑎𝑡 [𝑖, 𝑗, :, :]                          • Vector of last timestamp (LTV): This method
  8:               𝑧𝐺 ← 𝐺𝐶𝑁 (𝐴, 𝑋) //Eq. 1-4 (𝐿 = 2)                 was introduced by Bobra et al [16], where vec-
  9:               𝑧𝑠 ← 𝐿𝑆𝑇 𝑀𝑠 (𝑋 𝑇 ) //Eq. 5-10                     tor magnetogram data (feature space of all mag-
 10:               𝑍𝑤 [𝑗, :] ← 𝐶𝑜𝑛𝑐𝑎𝑡(𝑧𝐺 , 𝑧𝑠 )                      netic field parameters) were used for classifica-
 11:         end for                                                 tion. Since the last timestamp of the MVTS is tem-
 12:         𝑧𝑓 ← 𝐿𝑆𝑇 𝑀𝑓 (𝑍𝑤 ) //Eq. 5-10                            porally nearest to the flaring event, we sampled
 13:         𝑧𝑓 ← 𝐿𝑖𝑛𝑒𝑎𝑟(𝑧𝑓 ) //Eq. 11                               the vector of the last timestamp (25-dimensional)
 14:         𝑧 ← 𝑆𝑜𝑓 𝑡𝑚𝑎𝑥(𝑧𝑓 ) //Eq. 12
               (𝑖)
                                                                     to train the classifier.
 15:         //negative log likelihood loss calculation
                                         (𝑖)                      •  Time series summarization-based MVTS rep-
 16:         ℒ ← 𝑁 𝐿𝐿𝐿𝑜𝑠𝑠(𝑧 (𝑖) , 𝑦𝑡𝑟𝑎𝑖𝑛 )                           resentation (TS-SUM): This method, proposed
 17:         Update 𝒲 minimizing ℒ by Adam(𝛼, 𝜆)                     by Hamdi et al [4] summarizes each individual
 18:     end for                                                     time series of length 𝑇 by eight statistical fea-
 19: end for
                                                                     tures: mean, standard deviation, skewness, and
 20: return 𝒲
                                                                     kurtosis of the original time series, and the first-
                                                                        order derivative of the time series. As a result, we
                                                                        get an 8 × 25-dimensional vector space, which is
                                                                        used for training the downstream classifier.
4. Experiments
                                                                      • Long-short term memory (LSTM): This LSTM-
In this section, we demonstrate our experimental find-                  based approach was proposed by Muzaheed et.
ings. We compared the performance of our model with
six other MVTS-based flare prediction baselines on a          1
                                                                  https://github.com/FuadAhmad/GCN-LSTM
Table 1
Multiclass classification performance of the proposed method with the baselines
 Measures               FLT               LTV             TS-SUM              RNN             LSTM             ROCKET          GCN-LSTM
 Accuracy          0.259 ± 0.012      0.323 ± 0.02     0.609 ± 0.091     0.427 ± 0.025     0.628 ± 0.03     0.742 ± 0.021    0.817 ± 0.014
 Precision (X)     0.232 ± 0.024     0.342 ± 0.041     0.712 ± 0.054     0.534 ± 0.031    0.757 ± 0.028      0.92 ± 0.034    0.932 ± 0.022
 Recall (X)        0.264 ± 0.053     0.392 ± 0.043     0.772 ± 0.024     0.631 ± 0.028    0.947 ± 0.023     0.981 ± 0.016     0.99 ± 0.023
 F1 (X)            0.244 ± 0.032      0.362 ± 0.04     0.741 ± 0.034     0.582 ± 0.019    0.841 ± 0.014     0.952 ± 0.028    0.961 ± 0.013
 Precision (M)     0.254 ± 0.012     0.324 ± 0.033     0.522 ± 0.031     0.411 ± 0.014    0.594 ± 0.018     0.661 ± 0.042    0.803 ± 0.054
 Recall (M)         0.26 ± 0.023     0.331 ± 0.061     0.552 ± 0.022      0.402 ± 0.03    0.544 ± 0.014     0.704 ± 0.038    0.824 ± 0.063
 F1 (M)            0.257 ± 0.026     0.327 ± 0.042     0.537 ± 0.023     0.406 ± 0.029     0.568 ± 0.02     0.687 ± 0.028    0.811 ± 0.033
 Precision (BC)    0.232 ± 0.044     0.263 ± 0.024     0.453 ± 0.033     0.282 ± 0.031    0.495 ± 0.013      0.58 ± 0.026     0.682 ± 0.03
 Recall (BC)       0.241 ± 0.053      0.212 ± 0.02     0.472 ± 0.014     0.261 ± 0.021    0.409 ± 0.023     0.573 ± 0.052     0.664 ± 0.05
 F1 (BC)           0.236 ± 0.041     0.234 ± 0.024     0.462 ± 0.041     0.271 ± 0.031    0.448 ± 0.031     0.577 ± 0.031    0.673 ± 0.032
 Precision (Q)     0.324 ± 0.034     0.343 ± 0.044     0.583 ± 0.045     0.483 ± 0.024    0.603 ± 0.024      0.81 ± 0.046    0.831 ± 0.018
 Recall (Q)        0.251 ± 0.042     0.362 ± 0.071     0.663 ± 0.034     0.413 ± 0.042    0.683 ± 0.023     0.724 ± 0.034    0.772 ± 0.021
 F1 (Q)            0.282 ± 0.014     0.352 ± 0.013     0.62 ± 0.043      0.445 ± 0.032     0.64 ± 0.024     0.771 ± 0.036    0.798 ± 0.017


       al. [5]. Each MVTS instance was considered as a        and one-third for the test). In the experiments of the
       𝑇 -length sequence of 𝑥<𝑡> ∈ R𝑁 timestamp vec-         proposed GCN-LSTM model, we have following hyper-
       tors. After sequentially feeding the LSTM model        parameters: # windows, 𝜂 : 4, window length, 𝜏 : 15,
       with each timestamp vector, the last hidden repre-     # hidden dimensions 𝑑′𝑔 in first GCN layer: 64, # node
       sentation was considered as the MVTS represen-         embedding dimensions 𝑑𝑔 in second GCN layer: 4, # di-
       tation. Following the same experimental setting,       mensions in cell state and hidden state representations
       we use the number of both cell state and hidden        𝑑𝑠 of both 𝐿𝑆𝑇 𝑀𝑠 and 𝐿𝑆𝑇 𝑀𝑓 : 128, # training epochs:
       state dimensions as 128, the number of training        100, Adam learning rate 𝛼: 10−4 , and weight decay (reg-
       epochs as 500, and the learning rate in stochastic     ularization factor) 𝜆: 10−3 .
       gradient descent as 0.01.
     • Recurrent Neural Network (RNN): As the fifth           4.3. Multiclass classification performance
       baseline, we replace LSTM cells of the model of
       [5] with standard RNN cells. Similar to the ex-        In Table 1, we show the classification performances of
       perimental setting of [5], we use the number of        our GCN-LSTM-based MVTS classifier along with that of
       RNN hidden dimensions as 128, the number of            the baseline methods. For a comprehensive classification
       training epochs as 1,000, and the learning rate in     report, we show accuracy along with precision, recall,
       stochastic gradient descent as 0.01.                   and F1 of each class. We performed five experiments
     • Random Convolutional Kernel Transform                  with different train/test sets sampled by stratified hold-
       (ROCKET): We use ROCKET [26] as the sixth              out (two-thirds for training and one-third for the test) and
       baseline for MVTS-based solar event classifica-        reported the mean and standard deviation of the experi-
       tion. ROCKET was shown as the best performing          ments. From the results, it is visible that the GCN-LSTM-
       algorithm in the MVTS classification benchmark-        based MVTS classification model outperforms all other
       ing study by Ruiz et al [27], which included 26        baselines in all the performance measures. In overall
       MVTS datasets of the UEA archive [28]. ROCKET          evaluation, ROCKET achieves second-bast performance,
       uses a large number of random convolution ker-         while the LSTM model becomes third. GCN-LSTM model
       nels in conjunction with a linear classifier (ridge    achieves around 20% more accuracy in comparison with
       regression or logistic regression), where each ker-    the LSTM model, which proves the importance of learn-
       nel is applied to each univariate time series in-      ing MVTS representations in both spatial and temporal
       stance. Similar to the experimental setting of         domains rather than learning only from the temporal
       [27], we used the number of kernels in ROCKET          domain. Among shallow ML models, TS-SUM performs
       as 10,000.                                             better than FLT and LTV models. In general, the high
                                                              performances of TS-SUM, RNN, LSTM, ROCKET, and
   The first three baselines are embedding followed by        GCN-LSTM prove the importance of time series repre-
classification methods. After performing the embedding        sentations of solar events.
of MVTS instances using those methods, we use logistic
regression classifier with L2 regularization. In all the
experiments, we split the dataset into train and test using
the stratified holdout method (two-thirds for training
                                                                                                                                        FLT        LTV        TS-SUM        RNN      LSTM         ROCKET       GCN-LSTM
                     FLT          LTV         TS-SUM   RNN         LSTM             ROCKET    GCN-LSTM
                                                                                                                                1.000
                    90


                                                                                                             Mean performance
                                                                                                                                0.750
                    70
  Accuracy (%)


                                                                                                                                0.500
                    50


                                                                                                                                0.250
                    30                                                                                                                  Accuracy         Precision   Recall (XM)   F1 (XM)   Precision     Recall (QBC)   F1 (QBC)
                                                                                                                                                           (XM)                               (QBC)


                                                                                                                                                                             Performance metric

                    10
                            0.1         0.2      0.3   0.4        0.5         0.6       0.7   0.8    0.9
                                                                                                           Figure 4: Binary classification performance of all baselines
                                                             Train set size


(a) Multiclass classification accuracy with increasing training data
                                                                                                           increase of training set size, we observe more consistent
                     FLT          LTV         TS-SUM   RNN         LSTM             ROCKET    GCN-LSTM     increasing patterns in deep learning and kernel-based
                   100                                                                                     methods, e.g., GCN-LSTM, ROCKET, LSTM, and RNN. It
                                                                                                           proves that with sufficiently large datasets, deep learn-
                    75                                                                                     ing models can outperform the traditional classifiers or
                                                                                                           embedding methods in a larger margin. The time series
  X class F1 (%)


                    50
                                                                                                           summarization-based method TS-SUM shows promising
                                                                                                           performance throughout the experiments, but the gener-
                    25
                                                                                                           alization capability of this model can be limited in a more
                                                                                                           complex dataset due to its less flexible learning methodol-
                                                                                                           ogy consisting of hand-engineered features. Compared to
                                                                                                           the deep learning-based and time series-based methods,
                     0
                            0.1         0.2      0.3   0.4        0.5         0.6       0.7   0.8    0.9


                                                             Train set size
                                                                                                           the LTV and FLT models perform poorly, which proves
                                                                                                           the importance of time series in avoiding underfitting.
                         (b) F1 of X class with increasing training set size

Figure 3: Multiclass classification with varying train set size                                            4.5. Binary classification performance
                                                                                                           In addition to classifying the solar active regions in dif-
                                                                                                           ferent flare classes, a major use case in data-driven flare
4.4. Classification varying train set size                                                                 prediction is the binary classification, i.e., distinguish-
                                                                                                           ing major flaring events from minor flaring events or
To verify the adaptability of our model with bigger train-
                                                                                                           flare quiet events. In this experiment, we considered X
ing datasets, we experimented by varying the training
                                                                                                           and M class MVTS instances as flaring events, while we
set size. We varied the training set size from 10% to 90%
                                                                                                           considered all other instances (Q and BC) as non-flaring
of the dataset size, while testing the models with the
                                                                                                           events. In Fig. 4, we show the mean binary classification
rest of the instances (Fig. 3). We performed stratified
                                                                                                           performances of all models over five different train/test
train/test sampling with a given training set size, and
                                                                                                           samples in terms of accuracy, precision, recall, and F1
evaluated the classification performance of the classifiers
                                                                                                           of flaring and non-flaring classes. It is clearly visible
five times with five distinct samples of training and test
                                                                                                           that the GCN-LSTM model outperforms all other base-
sets. In Fig. 3a and 3b, we plotted the mean accuracy
                                                                                                           lines. We reported the performances of the two best-
values and mean F1 (X class) values found in all runs of
                                                                                                           performing models in numbers along with their bars. In
different train/test samples with different training data
                                                                                                           all performance metrics, GCN-LSTM achieves an aver-
sizes. GCN-LSTM consistently outperforms other base-
                                                                                                           age of ∼ 8% better performance than the second-best
lines in all settings of training set sizes. ROCKET is the
                                                                                                           performing ROCKET algorithm. In general, we observe
second-best performing classifier in this experiment, and
                                                                                                           the similar performance of the models as that of multi-
especially in F1 measure ROCKET exhibits similar ro-
                                                                                                           class classification. Although one deep learning model,
bust performance to GCN-LSTM. With only 10% training
                                                                                                           i.e., the RNN-based model performed poorer than the
data, GCN-LSTM achieved 70% classification accuracy,
                                                                                                           TS-SUM method, the RNN-based model is an end-to-end
while the third-best performing LSTM model achieve that
                                                                                                           classification model, which might outperform TS-SUM
level of high performance by using 90% training data. Al-
                                                                                                           with more training data, more complex model, and more
though all models gain more accuracy with a gradual
                                                                                                           efficient hyperparameter tuning.
                                                                 sequence embedding. In contrary to other MVTS classi-
                                                                 fication models applied for flare prediction, our model
                                                                 utilizes spatial and temporal features of the MVTS in-
                    40                                           stances, and does not depend on predefined statistical
                                                                 features. Our experiments on a real-life solar flare pre-
                                                                 diction dataset demonstrate the superior performance of
                    20
                                                                 our model in performing multiclass and binary MVTS
t-SNE dimension 2


                                                                 classification.
                     0                                              In the future, we look forward to designing more effi-
                                                                 cient models by techniques such as (1) learning attention
                    20                                           coefficients in spatial and temporal feature spaces, (2) cus-
                                                                 tomizing transformer models for MVTS representations,
                         Class                                   and (3) analyzing the effects of univariate sequence em-
                    40       X                                   bedding towards MVTS representation learning. We will
                             M
                             BC                                  also apply our models in other MVTS-based solar event
                    60       Q                                   datasets (e.g., solar energetic particles) [30], and MVTS
                            40    20     0       20    40   60   datasets generated from other sources such as functional
                                   t-SNE dimension 1             MRI (fMRI)-based time series of brain regions [31].

Figure 5: t-SNE embedding of GCN-LSTM generated repre-
sentations of all MVTS instances in the dataset                  6. Acknowledgments
                                                                 This project has been supported in part by funding from
                                                                 CISE and GEO directorates under NSF awards #2153379
4.6. Embedding performance                                       and #2204363.
Visualization of high-dimensional data in 2D/3D space is
a well-known method of demonstrating the effectiveness
of learned representations. To investigate the quality
                                                                 References
of learned MVTS representations, we provide a visual-             [1] J. Eastwood, E. Biffis, M. Hapgood, L. Green, M. Bisi,
ization of t-SNE [29] transformed MVTS representations                R. Bentley, R. Wicks, L.-A. McKinnell, M. Gibbs,
extracted by the final layer of the GCN-LSTM model. Sim-              C. Burnett, The economic impact of space weather:
ilar to section 4.3, the stratified holdout strategy is taken         Where do we stand?, Risk Analysis 37 (2017) 206–
to pre-train the model, and all instances are projected to            218.
t-SNE-reduced 2D space (Fig. 5). The 2D projection ex-            [2] N. Science, T. Council, National space weather
hibits discernible clustering of the MVTS instances. Some             action plan, https://obamawhitehouse.archives.
meaningful insights are observed by the t-SNE scatter                 gov/sites/default/files/microsites/ostp/final_
plot such as (1) patterns of four classes are easily recog-           nationalspaceweatheractionplan_20151028.pdf,
nizable, (2) flare-quiet events (Q) and minor flaring events          2015. [Accessed: 10-Feb-2022].
(B and C) are comparatively similar, (3) X and M class            [3] R. A. Angryk, P. C. Martens, B. Aydin, D. Kempton,
flares exhibit significant dissimilarity from other classes,          S. S. Mahajan, S. Basodi, A. Ahmadzadeh, X. Cai,
(4) some flare-quiet events are similar to the minor flaring          S. F. Boubrahimi, S. M. Hamdi, et al., Multivariate
events, (5) few minor flares show similar characteristics             time series dataset for space weather data analytics,
to M-class flares, and (6) the characteristics of the X-class         Scientific data 7 (2020) 1–13.
flares are exclusive, and other class instances do not show       [4] S. M. Hamdi, D. Kempton, R. Ma, S. F. Boubrahimi,
any similarity with X-class instances.                                R. A. Angryk, A time series classification-based
                                                                      approach for solar flare prediction, in: 2017 IEEE
5. Conclusion                                                         Intl. Conf. on Big Data (Big Data), IEEE, 2017, pp.
                                                                      2543–2551.
In this work, we presented an end-to-end deep learning-           [5] A. A. M. Muzaheed, S. M. Hamdi, S. F. Boubrahimi,
based flare prediction model from multivariate time se-               Sequence model-based end-to-end solar flare classi-
ries (MVTS) represented datasets that leverages inter-                fication from multivariate time series data, in: 20th
variable relationships by graph convolutional network-                IEEE Intl. Conf. on Machine Learning and Applica-
based functional network embedding, and local and                     tions, ICMLA 2021, Pasadena, CA, USA, December
global temporal change modeling through LSTM-based                    13-16, 2021, IEEE, 2021, pp. 435–440.
 [6] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang,        network, The Astrophysical Journal 891 (2020) 10.
     Connecting the dots: Multivariate time series fore-     [20] E. Park, Y.-J. Moon, S. Shin, K. Yi, D. Lim, H. Lee,
     casting with graph neural networks, in: KDD ’20:             G. Shin, Application of the deep convolutional neu-
     The 26th ACM SIGKDD Conf. on Knowledge Dis-                  ral network to the forecast of solar flare occurrence
     covery and Data Mining, Virtual Event, CA, USA,              using full-disk solar magnetograms, The Astro-
     August 23-27, 2020, ACM, 2020, pp. 753–763.                  physical Journal 869 (2018) 91.
 [7] P. S. McIntosh, The classification of sunspot groups,   [21] N. Nishizuka, Y. Kubo, K. Sugiura, M. Den, M. Ishii,
     Solar Physics 125 (1990) 251–267.                            Operational solar flare prediction model using deep
 [8] J. P. Mason, J. Hoeksema, Testing automated solar            flare net, Earth, Planets and Space 73 (2021) 1–12.
     flare forecasting with 13 years of michelson doppler    [22] R. Ma, S. F. Boubrahimi, S. M. Hamdi, R. A. Angryk,
     imager magnetograms, The Astrophysical Journal               Solar flare prediction using multivariate time series
     723 (2010) 634.                                              decision trees, in: 2017 IEEE Intl. Conf. on Big Data,
 [9] Y. Cui, R. Li, L. Zhang, Y. He, H. Wang, Correlation         BigData 2017, Boston, MA, USA, December 11-14,
     between solar flare productivity and photospheric            2017, IEEE Computer Society, 2017, pp. 2569–2578.
     magnetic field properties, Solar Physics 237 (2006)     [23] T. N. Kipf, M. Welling, Semi-supervised classifica-
     45–59.                                                       tion with graph convolutional networks, in: 5th
[10] J. Jing, H. Song, V. Abramenko, C. Tan, H. Wang,             Intl. Conf. on Learning Representations, ICLR 2017,
     The statistical relationship between the photo-              Toulon, France, April 24-26, 2017, Conference Track
     spheric magnetic parameters and the flare produc-            Proceedings, OpenReview.net, 2017.
     tivity of active regions, The Astrophysical Journal     [24] S. Hochreiter, J. Schmidhuber, Long short-term
     644 (2006) 1273.                                             memory, Neural Comput. 9 (1997) 1735–1780.
[11] K. Leka, G. Barnes, Photospheric magnetic field         [25] D. P. Kingma, J. Ba, Adam: A method for stochastic
     properties of flaring versus flare-quiet active re-          optimization, in: 3rd Intl. Conf. on Learning Rep-
     gions. ii. discriminant analysis, The Astrophysical          resentations, ICLR 2015, San Diego, CA, USA, May
     Journal 595 (2003) 1296.                                     7-9, 2015, Conf. Track Proc., 2015.
[12] H. Song, C. Tan, J. Jing, H. Wang, V. Yurchyshyn,       [26] A. Dempster, F. Petitjean, G. I. Webb, ROCKET:
     V. Abramenko, Statistical assessment of photo-               exceptionally fast and accurate time series classifi-
     spheric magnetic features in imminent solar flare            cation using random convolutional kernels, Data
     predictions, Solar Physics 254 (2009) 101–125.               Min. Knowl. Discov. 34 (2020) 1454–1495.
[13] D. Yu, X. Huang, H. Wang, Y. Cui, Short-term so-        [27] A. P. Ruiz, M. Flynn, J. Large, M. Middlehurst,
     lar flare prediction using a sequential supervised           A. Bagnall, The great multivariate time series clas-
     learning method, Solar Physics 255 (2009) 91–105.            sification bake off: a review and experimental eval-
[14] O. W. Ahmed, R. Qahwaji, T. Colak, P. A. Higgins,            uation of recent algorithmic advances, Data Mining
     P. T. Gallagher, D. S. Bloomfield, Solar flare pre-          and Knowledge Discovery 35 (2021) 401–449.
     diction using advanced feature extraction, machine      [28] A. J. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large,
     learning, and feature selection, Solar Physics (2013)        A. Bostrom, P. Southam, E. J. Keogh, The UEA
     1–19.                                                        multivariate time series classification archive, 2018,
[15] A. Al-Ghraibah, L. Boucheron, R. McAteer, An                 CoRR abs/1811.00075 (2018). URL: http://arxiv.org/
     automated classification approach to ranking pho-            abs/1811.00075. arXiv:1811.00075.
     tospheric proxies of magnetic energy build-up, As-      [29] L. Van der Maaten, G. Hinton, Visualizing data
     tronomy & Astrophysics 579 (2015) A64.                       using t-sne., Journal of machine learning research
[16] M. G. Bobra, S. Couvidat, Solar flare prediction             9 (2008).
     using SDO/HMI vector magnetic field data with a         [30] S. F. Boubrahimi, S. M. Hamdi, R. Ma, R. A. Angryk,
     machine-learning algorithm, The Astrophysical                On the mining of the minimal set of time series
     Journal 798 (2015) 135.                                      data shapelets, in: IEEE Intl. Conf. on Big Data,
[17] N. Nishizuka, K. Sugiura, Y. Kubo, M. Den, S. Watari,        Big Data 2020, Atlanta, GA, USA, December 10-13,
     M. Ishii,      Solar flare prediction model with             2020, IEEE, 2020, pp. 493–502.
     three machine-learning algorithms using ultravio-       [31] S. M. Hamdi, B. Aydin, S. F. Boubrahimi, R. A. An-
     let brightening and vector magnetograms, Astro-              gryk, L. C. Krishnamurthy, R. D. Morris, Biomarker
     physical Journal 835 (2017) 156.                             detection from fmri-based complete functional con-
[18] Y. Zheng, X. Li, X. Wang, Solar flare prediction with        nectivity networks, in: IEEE Intl. Conf. on Artifi-
     the hybrid deep convolutional neural network, The            cial Intelligence and Knowledge Engineering, AIKE
     Astrophysical Journal 885 (2019) 73.                         2018, Laguna Hills, CA, USA, September 26-28, 2018,
[19] X. Li, Y. Zheng, X. Wang, L. Wang, Predicting                IEEE, 2018, pp. 17–24.
     solar flares using a novel deep convolutional neural