<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multivariate Time Series-based Solar Flare Prediction by Functional Network Embedding and Sequence Modeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shah Muhammad Hamdi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abu Fuad Ahmad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Soukaina Filali Boubrahimi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>New Mexico State University</institution>
          ,
          <addr-line>Las Cruces, NM, 88003</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Utah State University</institution>
          ,
          <addr-line>Logan, UT, 84322</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Major flaring events on the Sun can have hazardous impacts on both space and ground-based infrastructure. An efective approach of predicting that a solar active region (AR) is likely to flare after a period of time is to leverage multivariate time series (MVTS) of the AR magnetic field parameters. Existing MVTS-based flare prediction models are based on training traditional classifiers with preset statistical features of univariate time series instances, or training deep sequence models based on Recurrent Neural Network (RNN) or Long Short Term Memory (LSTM) Network. While the earlier approach is afected by hand-engineered features, the latter approach uses only the temporal dimension of the MVTS instances. The variables of MVTS do not depend only on their historical values but also on other variables. In this work, we used the dynamic functional network representation of the MVTS instances to leverage higher-order relationships of the variables through Graph Convolution Network (GCN) embedding. In addition to finding spatial (inter-variable) patterns through functional network embedding, our model uses local and global temporal patterns through LSTM networks. Our experiments on a real-life solar flare dataset exhibit better prediction performance than other baseline methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Solar flare prediction</kwd>
        <kwd>Multivariate time series</kwd>
        <kwd>GCN</kwd>
        <kwd>LSTM</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>rely on data science-based approaches for predicting
solar flares. The data is collected by the Helioseismic
MagSolar flares are characterized by sudden bursts of mag- netic Imager (HMI) housed in the Solar Dynamics
Obnetic flux in the solar corona and heliosphere. Extreme servatory. Near-continuous-time images captured by the
Ultra-Violet (EUV), X-ray, and gamma-ray emissions instruments of HMI contain spatiotemporal magnetic
caused by major flaring events can have disastrous ef- ifeld data of the active regions. The prediction of solar
fects on our technology-dependent society. The risks of flares, which will identify active regions that will
potenlife and infrastructure in both space and ground include tially flare after a period of time, requires time series
radiation exposure-based health risks of the astronauts, modeling of the magnetic field data. For that,
spatiotemdisruption in GPS and radio communication, and dam- poral magnetic field data of active regions are mapped
ages in electronic devices. The economic damage of such into multiple MVTS instances [3]. The variables of the
extreme solar events can rise up to trillions of dollars [1]. MVTS instances represent solar magnetic field
parameIn 2015, the White House released the National Space ters (e.g., flux, current, helicity, Lorentz force). The time
Weather Strategy and Space Weather Action Plan [2] as a series corresponding to the magnetic field parameters
roadmap for research aimed at predicting and mitigating are extracted based on two time windows: observation
the efects of solar eruptive activities. window (the time window of data collection), and
predic</p>
      <p>In recent years, multiple research eforts of the helio- tion window (the time window after the data collection
physics community aim to predict solar flares from the and before the flare occurrence). Each MVTS instance
current and historic magnetic field states of the solar is labeled as one of six classes - Q, A, B, C, M, and X,
active regions. Due to the absence of direct theoretical where Q represents flare quiet active regions, and other
relationship between magnetic field influx and flare oc- labels represent flaring events with increasing intensity.
currence in active regions (AR), solar physics researchers Among these classes, X and M-class flares are considered
AMLTS’22: Workshop on Applied Machine Learning Methods for Time as most intense flaring events.</p>
      <p>
        Series Forecasting, co-located with the 31st ACM International Con- In comparison to the earlier single timestamp-based
ference on Information and Knowledge Management (CIKM), October magnetic field vector classification models, recent
MVTS17-21, 2022, Atlanta, USA based models are more efective for predicting flaring
*$Cos.rhreasmpdoin@duinsgu.aeuduth(oSr..M. Hamdi); fuad@nmsu.edu (A. F. Ahmad); activities [3]. MVTS classification models targeting flare
soukaina.boubrahimi@usu.edu (S. F. Boubrahimi) prediction are divided in two categories: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
statisti0000-0002-9303-7835 (S. M. Hamdi); 0000-0001-5693-6383 cal feature-based method [4], and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) end-to-end deep
(S. F. Boubrahimi) learning-based method [5]. The models of the first
cate© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License gory work in two steps. Firstly, low-dimensional
repreCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>sentations of MVTS instances are calculated from
concatenation/aggregation of summarization statistics (e.g.,
mean, standard deviation, skewness, kurtosis, etc) of the While the current approaches of flare prediction are
univariate time series components. Lastly, traditional mostly based on data science, the earliest flare prediction
classifiers (e.g., kNN, SVM, etc) are trained with labeled system was an expert system named THEO that required
MVTS representations. The two-step process of MVTS human inputs [7]. The Space Environment Center (SEC)
classification relies heavily on hand-engineered statis- of the National Oceanic and Atmospheric Administration
tical features and the choice of downstream classifiers, (NOAA) adopted the system THEO in 1987. To
distinwhich eventually complicates the application of these guish flare classes, THEO was provided input data of
models in datasets with varying properties. In the sec- sunspots and magnetic field properties.
ond category, RNN/LSTM-based deep sequence models Due to the abundance of magnetic field data collected
are trained by sequentially feeding vectors representing by NASA’s recent missions, research eforts of flare
premagnetic field parameters into sequence model cells, and diction of the last two decades are based on data science
optimizing the cell weights through gradient descent- rather than on purely theoretical modeling. Data
sciencebased backpropagation. While the deep learning models based approaches stemmed from both linear and
nonlinensure end-to-end learning bypassing the dependency ear statistics. Based on the type of dataset used, these
on hand-engineered features, they can utilize only the approaches are subdivided into two classes: line-of-sight
time dimension of the MVTS instances, and this limited magnetogram-based models and vector
magnetogramusage of underlying patterns results in poor classification based models. Solar active regions are represented by the
performance. parameters of either photospheric magnetic field data</p>
      <p>In this work, we propose a deep learning-based MVTS that contain only the line-of-sight component of the
classification approach for solar flare prediction lever- magnetic field or by the full-disk photospheric vector
aging the the fact that MVTS data is rich not only in magnetic field. Followed by NASA’s launch of SDO in
temporal dimension, but also in spatial dimension which 2010, the HMI instrument has been mapping the
fullencodes inter-variable relationships [6]. For learning disk vector magnetic field every 12 minutes [ 8]. Most
higher-order relationships of the MVTS variables, we of the recent models use the near-continuous stream of
used functional networks, where nodes represent vari- vector magnetogram data found from SDO, while the
earables, and edges represent positive correlation of the time lier models (dated before 2010) mostly used line-of-sight
series of corresponding variables. The MVTS instance magnetic data.
is divided into equal-length temporal windows, and an The objective of the linear statistical models was to find
edge-weighted functional network is constructed for each the active region magnetic field features that are highly
window. We trained Graph Convolution Network (GCN) correlated with the flare occurrences. Cui et al. [ 9] and
to learn representation of each functional network. In Jing et al. [10] used line-of-sight magnetogram data to
addition, we used two LSTM networks for learning rep- find correlation-based statistical relationships between
resentations based on temporal dimension within and magnetic field parameters and flare occurrences. Even
between the windows. Our model significantly outper- before the launch of SDO, Leka and Barnes [11] collected
forms existing MVTS-based flare prediction models on and curated vector magnetogram data from Mees Solar
a dataset containing MVTS instances of solar events of Observatory on the summit of Mount Haleakala, and
diferent flare classes. used linear discriminant analysis (LDA) for classifying</p>
      <sec id="sec-2-1">
        <title>The contributions made by this paper are listed below. flaring events.</title>
        <p>Nonlinear statistical models are mostly machine
learn1. Leveraging higher-order inter-variable relation- ing classifiers based on tree induction, kernel method,
ships of the MVTS instances by GCN-based dy- neural network, and so on. On the line-of-sight
namic functional network embedding. magnetogram-based active region datasets, Song et al.
2. Utilizing local and global patterns of the temporal [12] used logistic regression, Yu et al. [13] used C4.5
dimension of the MVTS instances through LSTM- decision tree, Ahmed et al. [14] used the fully connected
based within-window and between-window se- neural network, and Al-Ghraibah et al. [15] used
relequence learning. vance vector machine as classification models. Bobra et al.
3. Experimentally demonstrating the better perfor- [16] used Support Vector Machine (SVM) on SDO-based
mance of our model in comparison with the state- vector magnetogram data for classifying flaring and
nonof-the-art baselines on a benchmark solar flare lfaring active regions. Nishizuka et al. [ 17] used both
prediction dataset. line-of-sight and vector magnetograms and compared
the performance of three classifiers - kNN, SVM, and
Extremely Randomized Tree (ERT). Other examples of solar
lfare prediction on non-sequential data include various
Multivariate Time Series (MVTS) instance</p>
        <p>P1
itrcenagm trrseeapam .P..</p>
        <p>2
loaS lifed</p>
        <p>PN
...</p>
        <p>Flare occurrence
Observation
window (T)</p>
        <p>Prediction
window (Δ)</p>
        <p>Time (t)</p>
        <sec id="sec-2-1-1">
          <title>MVTS instance () ∈ R ×  is a collection of univariate</title>
          <p>time series of  magnetic field parameters, where each
time series contains periodic observation values of the
corresponding parameter for an observation period  .</p>
          <p>We denote the vector of -th timestamp as &lt;&gt; ∈ R ,
and the time series represented by -th parameter as
 ∈ R . After the observation period  and prediction
period ∆ , the event is labeled by the active region state
(flare quiet or diferent flare classes). The active region
state of a particular timestamp is found from the NOAA
records of flaring events. Fig. 1 shows the MVTS-based
data model of a solar event. Each MVTS instance is
divided into  equal-length windows such that  =  ,
where  denotes window length. The sub-MVTS is
denoted by  ∈ R ×  , and  is a subsequence of .
applications of convolutional neural network (ConvNet)
on SDO AIA/HMI images [18, 19, 20, 21].</p>
          <p>Angryk et al. [3] introduced temporal
windowbased flare prediction, which extends the earlier sin- 3.1.2. Node-attributed functional network
gle timestamp-based models. The authors published an Functional network is a undirected and edge-weighted
MVTS-based active region dataset, where each MVTS graph, and defined as  = (, , , ), where the
instance records magnetic field data for a preset observa- set of nodes  = {1, ...,  } denotes magnetic field
tion time and uniform sampling rate, and is labeled by parameters,  :  →− R is a function of mapping edges
lfare classes that occurred after a given prediction time. to their weights, and node attribute matrix  ∈ R× 
Among the MVTS classification approaches, Hamdi et contains the time series of each node in the sub-MVTS,
al. [4] used statistical summarization of component uni- i.e.,  =  . The functional network is defined on the
variate time series for training kNN classifier, Ma et. al. sub-MVTS, and the weight  of edge  (between node
[22] applied MVTS decision trees that approached the pair  and  ) represents the statistical similarity of 
problem using clustering as a preprocessing step, and length time series of  and  . Each functional network
Muzaheed et. al. [5] used LSTM-based deep sequence derived from a MVTS dataset has the same node set  .
modeling for end-to-end flare classification that
automated feature learning process avoiding hand-engineered 3.1.3. Graph Convolution
statistical features.</p>
          <p>Unlike previous models based on traditional ML and
deep sequence learning, in this work, we present a model
that leverages temporal as well as spatial relationships
of the MVTS instances. Our model learns MVTS
representations through an end-to-end fashion, and utilizes
higher-order inter-variable relationships and local and
global temporal changes.</p>
          <p>For learning the representations of node-attributed
functional networks, we use Graph Convolution Network
(GCN). GCN is a widely used graph neural network [23]
that learns node representations from a graph through
layer-wise neighborhood aggregation. Graph
convolution of layer  aggregates the representations of -hop
neighbors. GCN updates representation of node  in a
graph  = (, , , ) by following equations.
3. MVTS representation learning
by functional network and
sequence embedding
3.1. Notations and Preliminaries
3.1.1. MVTS and Sub-MVTS
Each solar active region resulting in diferent flare classes
(or staying as a flare quiet region) after a given prediction
window represents a solar event. The solar event  is
represented by a MVTS instance (), and associated by
a class label (). The class label () represents the flare
quiet state, or flare classes of diferent intensities. The
ℎ[0] =</p>
          <p>⎛
ℎ[+1] =  ⎝[] ∑︁
∈() | ()|</p>
          <p>[]
ℎ + []ℎ[]⎠ ,</p>
          <p>⎞
∀ ∈ {0, 1, ...,  − 1}</p>
          <p>[]
 = ℎ
 =
1</p>
          <p>
            ∑︁ 
| | ∈
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            )
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Here,  is the number of GCN layers,  ∈ R is the</title>
          <p>vector of node , ℎ[] ∈ R is the representation of node</p>
          <p>[]
 in layer ,  ∈ R×  is the weight matrix of layer</p>
          <p>Edge-weighted network structure and node attribute matrix
Window-based
functional
network
construction</p>
          <p>A
s
1
s
2
s
3</p>
          <p>B
F
B</p>
          <p>F
A</p>
          <p>...</p>
          <p>Sub-MVTS</p>
          <p>C
E
cs&lt;0&gt;</p>
          <p>D</p>
          <p>F
E
D
C
B
A
hs&lt;0&gt;
...</p>
          <p>F
E
D
B</p>
          <p>A</p>
          <p>D C
C
E
cs&lt;0&gt;</p>
          <p>hs&lt;0&gt;
... LSTMs
A</p>
          <p>B</p>
          <p>C</p>
          <p>D</p>
          <p>E</p>
          <p>F</p>
          <p>Sub-MVTS</p>
          <p>...
... LSTMs</p>
          <p>LSTMs
The MVTS instance is divided
into three windows (η = 3) each
with  -length
GCN
z</p>
          <p>G
hs| |</p>
          <p>Concat
zs
...
z
G
zs</p>
          <p>GCN
...</p>
          <p>LSTMs
hs| |
c &lt;0&gt;
f
h &lt;0&gt;
f</p>
          <p>LSTMf
c &lt;1&gt;
f
h &lt;1&gt;
f
zw&lt;1&gt;
zw&lt;2&gt;
zw&lt;3&gt;
c &lt;2&gt;
f</p>
          <p>LSTMf</p>
          <p>Softmax
h &lt;2&gt;
f
z
f</p>
          <p>z</p>
          <p>Linear
Concat</p>
          <p>LSTMf
shown as {, , , , ,  }.
For showing the functional network construction process, parameter set {1, 2, ..,  } of the MVTS instance has been</p>
          <p>
            ∈ R is the bias vector of layer ,  () is the set
in the edge between node  and its neighbor ,  is the
ifnal representation of node  after  iterations of
neighborhood aggregation, and  is graph representation
found by averaging the node representations.
3.1.4. Sequence embedding through LSTM
Long-short term memory (LSTM) networks [24] are
frequently used for sequence representation
learning which facilitates various tasks such as sequence
classification, sequence-to-sequence translation, and
˜&lt;&gt; = ℎ([ℎ&lt;− 1&gt;, &lt;&gt;] + )
Γ  =  ([ℎ&lt;− 1&gt;, &lt;&gt;] + )
Γ  =  ( [ℎ&lt;− 1&gt;, &lt;&gt;] +  )
Γ  =  ([ℎ&lt;− 1&gt;, &lt;&gt;] + )
&lt;&gt; = Γ  ⊙ ˜&lt;&gt; + Γ  ⊙ &lt;− 1&gt;
ℎ&lt;&gt; = Γ  ⊙ ℎ(&lt;&gt;)
(
            <xref ref-type="bibr" rid="ref5">5</xref>
            )
(6)
(7)
(8)
(9)
(10)
          </p>
          <p>We denote the number of dimensions of the cell state
representation &lt;&gt; and hidden state representation
so on.</p>
          <p>We use LSTM</p>
          <p>networks for learning low- ℎ&lt;&gt; of the LSTM cell as . The concatenation of
hiddimensional representations of MVTS instances. The
den state of previous timestamp and the input of current
MVTS (and sub-MVTS) instances are sequences of  - timestamp is [ℎ&lt;− 1&gt;, &lt;&gt;] ∈ R+ . The candidate
dimensional timestamp vectors. The timestamp
vector &lt;&gt;
∈ R represents the magnetic filed state of
cell state representation is ˜&lt;&gt;
matrices are , ,  ,  ∈ R× (+), and bias
∈</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>R . The weight</title>
        <p>the active region ( parameter values) in the times- terms are , ,  ,  ∈ R. The subscripts ,  , and 
tamp .</p>
        <p>We denote the inputs to the LSTM cells
as [&lt;1&gt;, &lt;2&gt;, &lt;3&gt;, ..., &lt;&gt; ], cell state
representations as [&lt;0&gt;, &lt;1&gt;, &lt;2&gt;, ..., &lt; − 1&gt;], and hidden
state representations as [ℎ&lt;0&gt;, ℎ&lt;1&gt;, ℎ&lt;2&gt;, ..., ℎ&lt;&gt; ],
represents the activations of update gate, forget gate, and
output gate respectively, while ⊙
multiplication, and  represents sigmoid activation.
Finally, we consider ℎ&lt;&gt;
as the final representation of
refers to elementwise
domly initializing &lt;0&gt; and ℎ&lt;0&gt;, we update the cell
state and hidden state of the timestamp  by following
LSTM equations [24].
where  is the last timestamp of the sequence. After ran- the input MVTS.
3.2. Data Preprocessing
3.2.1. Node-level normalization
Since the magnetic field parameter values are recorded in
diferent scales, we perform z-score normalization.
Suppose that  number of MVTS instances each with 
parameters and  time points are represented by a
thirdorder tensor  ∈ R× ×  , where three modes
represent events, parameters/nodes, and timestamps. For the
better performance of the GCN-based graph embedding,
we perform node-level z-normalization as a preprocessing
step in the following three steps.</p>
        <p>
          1. We perform mode-2 matricization, i.e., reshaping
the tensor so that mode-2 (parameter/node) fibers
become the columns of the matrix. The matrix
is denoted by (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) ∈ R ×  . The columns are
denoted by 1, 2, . . . ,  .
2. For each column  , we perform z-normalization
as follows.
        </p>
        <p>() =
() −  ()</p>
        <p>
          ()
() is the -th value of the column  ,
Here, 
where 1 ≤  ≤   ,  () is the mean of the
column  , and  () is the standard deviation of
the column  .
3. We reshape the matrix (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) ∈ R ×  back to
        </p>
        <p>third-order tensor,  ∈ R× ×  .
3.2.2. Functional network construction
3.3. MVTS representation learning
In Fig. 2, we show the components of MVTS
representation learning. Firstly, the window embedding learns the
local spatiotemporal changes of the sub-MVTS instances
through the models denoted as  and  , and
ifnally, the whole MVTS embedding learns global
temporal changes of the local (window) representations through
the model denoted as   .
3.3.1. Window embedding
Our model learns the representation of the window
 (sub-MVTS) of the MVTS instance  through
GCNbased node-attributed functional network embedding
and LSTM-based local sequence modeling.</p>
        <p>• GCN-based functional network embedding:</p>
        <p>We input the node-attributed functional network
(, , , ) to a two-layer GCN. The initial
node attributes are set as  =  (Eq. 1). In
the first layer, each node is embedded into a
′dimensional space through 1-hop neighborhood
aggregation, and after the second layer, each node
is embedded into a -dimensional space through
2-hop neighborhood aggregation (Eq. 2,3).
Finally, the whole graph representation  ∈ R
is computed through mean pooling (Eq. 4).
• LSTM-based sub-MVTS embedding: The
sub</p>
        <p>MVTS  = [&lt;1&gt;, ..., &lt;&gt; ], where &lt;&gt; ∈
R , is sequentially input to the   (Eq.
510), and we extract the last hidden representation
 = ℎ&lt;&gt; , where  ∈ R .</p>
        <p>We calculate the Pearson correlation matrix  ∈ R×  For the window embedding, we concatenate  ∈ R
for the sub-MVTS  ∈ R ×  . In the correlation ma- and  ∈ R . Therefore, the window representation is
trix,  represents the Pearson correlation coeficient  ∈ R+ .
(in the range of [-1, 1]) between  -length time series
 and  . The symmetric matrix  can be considered 3.3.2. Whole MVTS embedding
as an adjacency matrix of a graph of  nodes. We
apply a sparsity threshold of 0 so that only edges with After each of  windows is represented as ( +
positive weight (node pairs with positive correlation) )-dimensional vector, we feed the sequential data
are considered for functional network construction. We [&lt; 1&gt;, ..., &lt; &gt; ] into   for global temporal
denote the sparse correlation matrix as the adjacency change modeling. Note that   and   have
matrix  ∈ R×  . Although the functional network diferent learnable parameter sets (e.g.,  ,  , etc),
defined over a sub-MVTS encodes inter-variable inter- although in this work the number of dimensions () in
actions within a small temporal window, the adjacency the cell state and hidden state are kept the same. We
matrix is not enough for the completeness of data, since extract the final hidden state representation  = ℎ&lt;&gt; ,
negative correlation coeficients are discarded. To avoid where  ∈ R . We input  into a linear (fully
conthe data missing, in addition to the adjacency matrix nected) layer. In this layer, the parameters are  ∈
(graph structure), we extract the node attribute matrix R×  , and  ∈ R, where  is the number of classes.
 =  . In  ∈ R×  , each row represents node at- After this layer, we have a -dimensional representation
tributes in the form of  -length time series (normalized of the whole MVTS instance of event .
in the previous step).
() =  (  +  )
(11)
ˆ() =
()
=1 ()
∑︀
(12)</p>
        <p>Finally, we input () ∈ R into a softmax layer, benchmark dataset. We used PyTorch 1.10.0 with CUDA
whose number of units is equal to the number of classes. 11.1 for implementing our GCN-LSTM-based MVTS
clasThe softmax layer gives us the normalized class probabil- sifier. The source code of our model and the experimental
ities, and we finally get ˆ() ∈ R . dataset are available at our GitHub repository. 1</p>
        <p>As the benchmark dataset of our experiments, we used</p>
        <p>The predicted labels of training MVTS instances are the solar flare prediction dataset published by Angryk et.
matched against true labels, and the Adam optimizer al. [3]. Each MVTS instance in the dataset is made up
[25] updates the weight and bias parameter values of the of 25 time series of active region magnetic field
param ,  ,   and the fully connected layer eters (for the full list of parameters, see [16]). The time
through backpropagation algorithm. Algorithm 1 shows series instances are recorded at 12 minutes intervals for
the training procedure of the proposed GCN-LSTM-based a total duration of 12 hours (60 time steps). The MVTS
MVTS representation learning. instances are labeled according to the flaring event that
occurred after 12 hours. Therefore, the dataset has the
Algorithm 1 Training of GCN-LSTM-based MVTS rep- number of the observation points  = 60, and the
numresentation learning ber of dimensions in timestamp vectors  = 25, while
Input: Training set  consisted of functional network the prediction window is ∆ = 12 hours. Our
experiadjacency matrices  ∈ R×  × ×  and node mental dataset consists of 1,540 MVTS instances evenly
attribute matrices  ∈ R×  × ×  , one-hot distributed across four classes (X, M, BC, and Q), where
training labels  ∈ R×  , number of epochs BC represents events from both B and C classes (less
inℎ, learning rate  , and weight decay factor of the tense flares). We split the dataset into train and test using
Adam optimizer  . the stratified holdout method (two-thirds for training and
Output: Learned parameters of  ,  , and one-third for the test).
  .</p>
        <p>1: Randomly initialize parameter set , which 4.2. Baseline methods</p>
        <p>contains  ,  , and   parameters We evaluated our GCN-LSTM-based MVTS classification
2: for number of training epochs ℎ do model with six other baselines.
3: for MVTS instance  = 1, 2, ...,  do
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:</p>
        <p>Window matrix,  = [0] × (+)
for window  = 1, 2, ...,  do
 ←  [, , :, :]
 ← [, , :, :]
 ←  (, ) //Eq. 1-4 ( = 2)
 ←  ( ) //Eq. 5-10
[, :] ← (, )
end for
 ←   () //Eq. 5-10
 ← ( ) //Eq. 11
() ←  ( ) //Eq. 12
//negative log likelihood loss calculation
16: ℒ ←  ((), ())
17: Update  minimizing ℒ by Adam(,  )
18: end for
19: end for
20: return 
• Flattened vector method (FLT): This is a naive
method, where each 60 × 25 MVTS instance is
lfattened into a 1, 500-dimensional vector.
• Vector of last timestamp (LTV): This method
was introduced by Bobra et al [16], where
vector magnetogram data (feature space of all
magnetic field parameters) were used for
classification. Since the last timestamp of the MVTS is
temporally nearest to the flaring event, we sampled
the vector of the last timestamp (25-dimensional)
to train the classifier.
• Time series summarization-based MVTS
representation (TS-SUM): This method, proposed
by Hamdi et al [4] summarizes each individual
time series of length  by eight statistical
features: mean, standard deviation, skewness, and
kurtosis of the original time series, and the
firstorder derivative of the time series. As a result, we
get an 8 × 25-dimensional vector space, which is
used for training the downstream classifier.
• Long-short term memory (LSTM): This
LSTMbased approach was proposed by Muzaheed et.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <p>In this section, we demonstrate our experimental
findings. We compared the performance of our model with
six other MVTS-based flare prediction baselines on a
al. [5]. Each MVTS instance was considered as a and one-third for the test). In the experiments of the
 -length sequence of &lt;&gt; ∈ R timestamp vec- proposed GCN-LSTM model, we have following
hypertors. After sequentially feeding the LSTM model parameters: # windows,  : 4, window length,  : 15,
with each timestamp vector, the last hidden repre- # hidden dimensions ′ in first GCN layer: 64, # node
sentation was considered as the MVTS represen- embedding dimensions  in second GCN layer: 4, #
ditation. Following the same experimental setting, mensions in cell state and hidden state representations
we use the number of both cell state and hidden  of both   and   : 128, # training epochs:
state dimensions as 128, the number of training 100, Adam learning rate  : 10− 4, and weight decay
(regepochs as 500, and the learning rate in stochastic ularization factor)  : 10− 3.</p>
      <p>gradient descent as 0.01.
• Recurrent Neural Network (RNN): As the fifth 4.3. Multiclass classification performance
baseline, we replace LSTM cells of the model of
[5] with standard RNN cells. Similar to the ex- In Table 1, we show the classification performances of
perimental setting of [5], we use the number of our GCN-LSTM-based MVTS classifier along with that of
RNN hidden dimensions as 128, the number of the baseline methods. For a comprehensive classification
training epochs as 1,000, and the learning rate in report, we show accuracy along with precision, recall,
stochastic gradient descent as 0.01. and F1 of each class. We performed five experiments
• Random Convolutional Kernel Transform with diferent train/test sets sampled by stratified
hold(ROCKET): We use ROCKET [26] as the sixth out (two-thirds for training and one-third for the test) and
baseline for MVTS-based solar event classifica- reported the mean and standard deviation of the
experition. ROCKET was shown as the best performing ments. From the results, it is visible that the
GCN-LSTMalgorithm in the MVTS classification benchmark- based MVTS classification model outperforms all other
ing study by Ruiz et al [27], which included 26 baselines in all the performance measures. In overall
MVTS datasets of the UEA archive [28]. ROCKET evaluation, ROCKET achieves second-bast performance,
uses a large number of random convolution ker- while the LSTM model becomes third. GCN-LSTM model
nels in conjunction with a linear classifier (ridge achieves around 20% more accuracy in comparison with
regression or logistic regression), where each ker- the LSTM model, which proves the importance of
learnnel is applied to each univariate time series in- ing MVTS representations in both spatial and temporal
stance. Similar to the experimental setting of domains rather than learning only from the temporal
[27], we used the number of kernels in ROCKET domain. Among shallow ML models, TS-SUM performs
as 10,000. better than FLT and LTV models. In general, the high
performances of TS-SUM, RNN, LSTM, ROCKET, and
GCN-LSTM prove the importance of time series
representations of solar events.</p>
      <p>The first three baselines are embedding followed by
classification methods. After performing the embedding
of MVTS instances using those methods, we use logistic
regression classifier with L2 regularization. In all the
experiments, we split the dataset into train and test using
the stratified holdout method (two-thirds for training
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8</p>
      <p>0.9</p>
      <p>Train set size
(a) Multiclass classification accuracy with increasing training data</p>
      <p>LTV</p>
      <p>TS-SUM</p>
      <p>RNN</p>
      <p>LSTM</p>
      <p>ROCKET</p>
      <p>GCN-LSTM</p>
      <p>FLT
100
75
)
%
(1F 50
s
s
a
l
c
X 25
0</p>
      <p>RNN</p>
      <p>LSTM</p>
      <p>FLT LTV TS-SUM RNN</p>
      <p>LSTM ROCKET
1.000
increase of training set size, we observe more consistent
increasing patterns in deep learning and kernel-based
methods, e.g., GCN-LSTM, ROCKET, LSTM, and RNN. It
proves that with suficiently large datasets, deep
learning models can outperform the traditional classifiers or
embedding methods in a larger margin. The time series
summarization-based method TS-SUM shows promising
performance throughout the experiments, but the
generalization capability of this model can be limited in a more
complex dataset due to its less flexible learning
methodology consisting of hand-engineered features. Compared to
the deep learning-based and time series-based methods,
the LTV and FLT models perform poorly, which proves
the importance of time series in avoiding underfitting.
4.5. Binary classification performance
In addition to classifying the solar active regions in
different flare classes, a major use case in data-driven flare
4.4. Classification varying train set size prediction is the binary classification, i.e.,
distinguishTo verify the adaptability of our model with bigger train- ing major flaring events from minor flaring events or
ing datasets, we experimented by varying the training lfare quiet events. In this experiment, we considered X
set size. We varied the training set size from 10% to 90% and M class MVTS instances as flaring events, while we
of the dataset size, while testing the models with the considered all other instances (Q and BC) as non-flaring
rest of the instances (Fig. 3). We performed stratified events. In Fig. 4, we show the mean binary classification
train/test sampling with a given training set size, and performances of all models over five diferent train/test
evaluated the classification performance of the classifiers samples in terms of accuracy, precision, recall, and F1
ifve times with five distinct samples of training and test of flaring and non-flaring classes. It is clearly visible
sets. In Fig. 3a and 3b, we plotted the mean accuracy that the GCN-LSTM model outperforms all other
basevalues and mean F1 (X class) values found in all runs of lines. We reported the performances of the two
bestdiferent train/test samples with diferent training data performing models in numbers along with their bars. In
sizes. GCN-LSTM consistently outperforms other base- all performance metrics, GCN-LSTM achieves an
averlines in all settings of training set sizes. ROCKET is the age of ∼ 8% better performance than the second-best
second-best performing classifier in this experiment, and performing ROCKET algorithm. In general, we observe
especially in F1 measure ROCKET exhibits similar ro- the similar performance of the models as that of
multibust performance to GCN-LSTM. With only 10% training class classification. Although one deep learning model,
data, GCN-LSTM achieved 70% classification accuracy, i.e., the RNN-based model performed poorer than the
while the third-best performing LSTM model achieve that TS-SUM method, the RNN-based model is an end-to-end
level of high performance by using 90% training data. Al- classification model, which might outperform TS-SUM
though all models gain more accuracy with a gradual with more training data, more complex model, and more
eficient hyperparameter tuning.
2
n
o
i
sen 0
m
i
d
ESN 20
t
40
60</p>
      <p>Class</p>
      <p>X
M
BC
Q
40
20 0 20
t-SNE dimension 1
40
60
sequence embedding. In contrary to other MVTS
classiifcation models applied for flare prediction, our model
utilizes spatial and temporal features of the MVTS
instances, and does not depend on predefined statistical
features. Our experiments on a real-life solar flare
prediction dataset demonstrate the superior performance of
our model in performing multiclass and binary MVTS
classification.</p>
      <p>
        In the future, we look forward to designing more
eficient models by techniques such as (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) learning attention
coeficients in spatial and temporal feature spaces, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
customizing transformer models for MVTS representations,
and (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) analyzing the efects of univariate sequence
embedding towards MVTS representation learning. We will
also apply our models in other MVTS-based solar event
datasets (e.g., solar energetic particles) [30], and MVTS
datasets generated from other sources such as functional
MRI (fMRI)-based time series of brain regions [31].
      </p>
    </sec>
    <sec id="sec-4">
      <title>6. Acknowledgments</title>
      <p>
        4.6. Embedding performance
Visualization of high-dimensional data in 2D/3D space is
a well-known method of demonstrating the efectiveness
of learned representations. To investigate the quality
of learned MVTS representations, we provide a
visualization of t-SNE [29] transformed MVTS representations
extracted by the final layer of the GCN-LSTM model.
Similar to section 4.3, the stratified holdout strategy is taken
to pre-train the model, and all instances are projected to
t-SNE-reduced 2D space (Fig. 5). The 2D projection
exhibits discernible clustering of the MVTS instances. Some
meaningful insights are observed by the t-SNE scatter
plot such as (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) patterns of four classes are easily
recognizable, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) flare-quiet events (Q) and minor flaring events
(B and C) are comparatively similar, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) X and M class
lfares exhibit significant dissimilarity from other classes,
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) some flare-quiet events are similar to the minor flaring
events, (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) few minor flares show similar characteristics
to M-class flares, and (6) the characteristics of the X-class
lfares are exclusive, and other class instances do not show
any similarity with X-class instances.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work, we presented an end-to-end deep
learningbased flare prediction model from multivariate time
series (MVTS) represented datasets that leverages
intervariable relationships by graph convolutional
networkbased functional network embedding, and local and
global temporal change modeling through LSTM-based
This project has been supported in part by funding from
CISE and GEO directorates under NSF awards #2153379
and #2204363.
[6] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang, network, The Astrophysical Journal 891 (2020) 10.</p>
      <p>Connecting the dots: Multivariate time series fore- [20] E. Park, Y.-J. Moon, S. Shin, K. Yi, D. Lim, H. Lee,
casting with graph neural networks, in: KDD ’20: G. Shin, Application of the deep convolutional
neuThe 26th ACM SIGKDD Conf. on Knowledge Dis- ral network to the forecast of solar flare occurrence
covery and Data Mining, Virtual Event, CA, USA, using full-disk solar magnetograms, The
AstroAugust 23-27, 2020, ACM, 2020, pp. 753–763. physical Journal 869 (2018) 91.
[7] P. S. McIntosh, The classification of sunspot groups, [21] N. Nishizuka, Y. Kubo, K. Sugiura, M. Den, M. Ishii,</p>
      <p>Solar Physics 125 (1990) 251–267. Operational solar flare prediction model using deep
[8] J. P. Mason, J. Hoeksema, Testing automated solar lfare net, Earth, Planets and Space 73 (2021) 1–12.
lfare forecasting with 13 years of michelson doppler [22] R. Ma, S. F. Boubrahimi, S. M. Hamdi, R. A. Angryk,
imager magnetograms, The Astrophysical Journal Solar flare prediction using multivariate time series
723 (2010) 634. decision trees, in: 2017 IEEE Intl. Conf. on Big Data,
[9] Y. Cui, R. Li, L. Zhang, Y. He, H. Wang, Correlation BigData 2017, Boston, MA, USA, December 11-14,
between solar flare productivity and photospheric 2017, IEEE Computer Society, 2017, pp. 2569–2578.
magnetic field properties, Solar Physics 237 (2006) [23] T. N. Kipf, M. Welling, Semi-supervised
classifica45–59. tion with graph convolutional networks, in: 5th
[10] J. Jing, H. Song, V. Abramenko, C. Tan, H. Wang, Intl. Conf. on Learning Representations, ICLR 2017,
The statistical relationship between the photo- Toulon, France, April 24-26, 2017, Conference Track
spheric magnetic parameters and the flare produc- Proceedings, OpenReview.net, 2017.
tivity of active regions, The Astrophysical Journal [24] S. Hochreiter, J. Schmidhuber, Long short-term
644 (2006) 1273. memory, Neural Comput. 9 (1997) 1735–1780.
[11] K. Leka, G. Barnes, Photospheric magnetic field [25] D. P. Kingma, J. Ba, Adam: A method for stochastic
properties of flaring versus flare-quiet active re- optimization, in: 3rd Intl. Conf. on Learning
Repgions. ii. discriminant analysis, The Astrophysical resentations, ICLR 2015, San Diego, CA, USA, May
Journal 595 (2003) 1296. 7-9, 2015, Conf. Track Proc., 2015.
[12] H. Song, C. Tan, J. Jing, H. Wang, V. Yurchyshyn, [26] A. Dempster, F. Petitjean, G. I. Webb, ROCKET:
V. Abramenko, Statistical assessment of photo- exceptionally fast and accurate time series
classifispheric magnetic features in imminent solar flare cation using random convolutional kernels, Data
predictions, Solar Physics 254 (2009) 101–125. Min. Knowl. Discov. 34 (2020) 1454–1495.
[13] D. Yu, X. Huang, H. Wang, Y. Cui, Short-term so- [27] A. P. Ruiz, M. Flynn, J. Large, M. Middlehurst,
lar flare prediction using a sequential supervised A. Bagnall, The great multivariate time series
claslearning method, Solar Physics 255 (2009) 91–105. sification bake of: a review and experimental
eval[14] O. W. Ahmed, R. Qahwaji, T. Colak, P. A. Higgins, uation of recent algorithmic advances, Data Mining
P. T. Gallagher, D. S. Bloomfield, Solar flare pre- and Knowledge Discovery 35 (2021) 401–449.
diction using advanced feature extraction, machine [28] A. J. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large,
learning, and feature selection, Solar Physics (2013) A. Bostrom, P. Southam, E. J. Keogh, The UEA
1–19. multivariate time series classification archive, 2018,
[15] A. Al-Ghraibah, L. Boucheron, R. McAteer, An CoRR abs/1811.00075 (2018). URL: http://arxiv.org/
automated classification approach to ranking pho- abs/1811.00075. arXiv:1811.00075.
tospheric proxies of magnetic energy build-up, As- [29] L. Van der Maaten, G. Hinton, Visualizing data
tronomy &amp; Astrophysics 579 (2015) A64. using t-sne., Journal of machine learning research
[16] M. G. Bobra, S. Couvidat, Solar flare prediction 9 (2008).</p>
      <p>using SDO/HMI vector magnetic field data with a [30] S. F. Boubrahimi, S. M. Hamdi, R. Ma, R. A. Angryk,
machine-learning algorithm, The Astrophysical On the mining of the minimal set of time series
Journal 798 (2015) 135. data shapelets, in: IEEE Intl. Conf. on Big Data,
[17] N. Nishizuka, K. Sugiura, Y. Kubo, M. Den, S. Watari, Big Data 2020, Atlanta, GA, USA, December 10-13,
M. Ishii, Solar flare prediction model with 2020, IEEE, 2020, pp. 493–502.
three machine-learning algorithms using ultravio- [31] S. M. Hamdi, B. Aydin, S. F. Boubrahimi, R. A.
Anlet brightening and vector magnetograms, Astro- gryk, L. C. Krishnamurthy, R. D. Morris, Biomarker
physical Journal 835 (2017) 156. detection from fmri-based complete functional
con[18] Y. Zheng, X. Li, X. Wang, Solar flare prediction with nectivity networks, in: IEEE Intl. Conf. on
Artifithe hybrid deep convolutional neural network, The cial Intelligence and Knowledge Engineering, AIKE
Astrophysical Journal 885 (2019) 73. 2018, Laguna Hills, CA, USA, September 26-28, 2018,
[19] X. Li, Y. Zheng, X. Wang, L. Wang, Predicting IEEE, 2018, pp. 17–24.</p>
      <p>solar flares using a novel deep convolutional neural</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Eastwood</surname>
          </string-name>
          , E. Bifis,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hapgood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bentley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wicks</surname>
          </string-name>
          , L.-
          <string-name>
            <surname>A. McKinnell</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gibbs</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Burnett</surname>
          </string-name>
          ,
          <article-title>The economic impact of space weather: Where do we stand?</article-title>
          ,
          <source>Risk Analysis</source>
          <volume>37</volume>
          (
          <year>2017</year>
          )
          <fpage>206</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Science</surname>
          </string-name>
          , T. Council,
          <article-title>National space weather action plan</article-title>
          , https://obamawhitehouse.archives. gov/sites/default/files/microsites/ostp/final_ nationalspaceweatheractionplan_20151028.pdf ,
          <year>2015</year>
          . [Accessed:
          <fpage>10</fpage>
          -Feb-2022].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Angryk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Martens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Aydin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kempton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Mahajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Basodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmadzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Boubrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          , et al.,
          <article-title>Multivariate time series dataset for space weather data analytics</article-title>
          ,
          <source>Scientific data 7</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kempton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Boubrahimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Angryk</surname>
          </string-name>
          ,
          <article-title>A time series classification-based approach for solar flare prediction</article-title>
          ,
          <source>in: 2017 IEEE Intl. Conf. on Big Data (Big Data)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>2543</fpage>
          -
          <lpage>2551</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. A. M.</given-names>
            <surname>Muzaheed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. F.</given-names>
            <surname>Boubrahimi</surname>
          </string-name>
          ,
          <article-title>Sequence model-based end-to-end solar flare classiifcation from multivariate time series data</article-title>
          ,
          <source>in: 20th IEEE Intl. Conf. on Machine Learning and Applications, ICMLA</source>
          <year>2021</year>
          , Pasadena, CA, USA, December
          <volume>13</volume>
          -
          <issue>16</issue>
          ,
          <year>2021</year>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>435</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>