1. Introduction

Multivariate Time Series-based Solar Flare Prediction by Functional Network Embedding and Sequence Modeling

Shah Muhammad Hamdi

Abu Fuad Ahmad

Soukaina Filali Boubrahimi

1 0 New Mexico State University , Las Cruces, NM, 88003 , USA 1 Utah State University , Logan, UT, 84322 , USA

Major flaring events on the Sun can have hazardous impacts on both space and ground-based infrastructure. An efective approach of predicting that a solar active region (AR) is likely to flare after a period of time is to leverage multivariate time series (MVTS) of the AR magnetic field parameters. Existing MVTS-based flare prediction models are based on training traditional classifiers with preset statistical features of univariate time series instances, or training deep sequence models based on Recurrent Neural Network (RNN) or Long Short Term Memory (LSTM) Network. While the earlier approach is afected by hand-engineered features, the latter approach uses only the temporal dimension of the MVTS instances. The variables of MVTS do not depend only on their historical values but also on other variables. In this work, we used the dynamic functional network representation of the MVTS instances to leverage higher-order relationships of the variables through Graph Convolution Network (GCN) embedding. In addition to finding spatial (inter-variable) patterns through functional network embedding, our model uses local and global temporal patterns through LSTM networks. Our experiments on a real-life solar flare dataset exhibit better prediction performance than other baseline methods.

eol>Solar flare prediction Multivariate time series GCN LSTM

1. Introduction

rely on data science-based approaches for predicting solar flares. The data is collected by the Helioseismic MagSolar flares are characterized by sudden bursts of mag- netic Imager (HMI) housed in the Solar Dynamics Obnetic flux in the solar corona and heliosphere. Extreme servatory. Near-continuous-time images captured by the Ultra-Violet (EUV), X-ray, and gamma-ray emissions instruments of HMI contain spatiotemporal magnetic caused by major flaring events can have disastrous ef- ifeld data of the active regions. The prediction of solar fects on our technology-dependent society. The risks of flares, which will identify active regions that will potenlife and infrastructure in both space and ground include tially flare after a period of time, requires time series radiation exposure-based health risks of the astronauts, modeling of the magnetic field data. For that, spatiotemdisruption in GPS and radio communication, and dam- poral magnetic field data of active regions are mapped ages in electronic devices. The economic damage of such into multiple MVTS instances [3]. The variables of the extreme solar events can rise up to trillions of dollars [1]. MVTS instances represent solar magnetic field parameIn 2015, the White House released the National Space ters (e.g., flux, current, helicity, Lorentz force). The time Weather Strategy and Space Weather Action Plan [2] as a series corresponding to the magnetic field parameters roadmap for research aimed at predicting and mitigating are extracted based on two time windows: observation the efects of solar eruptive activities. window (the time window of data collection), and predic

In recent years, multiple research eforts of the helio- tion window (the time window after the data collection physics community aim to predict solar flares from the and before the flare occurrence). Each MVTS instance current and historic magnetic field states of the solar is labeled as one of six classes - Q, A, B, C, M, and X, active regions. Due to the absence of direct theoretical where Q represents flare quiet active regions, and other relationship between magnetic field influx and flare oc- labels represent flaring events with increasing intensity. currence in active regions (AR), solar physics researchers Among these classes, X and M-class flares are considered AMLTS’22: Workshop on Applied Machine Learning Methods for Time as most intense flaring events.

Series Forecasting, co-located with the 31st ACM International Con- In comparison to the earlier single timestamp-based ference on Information and Knowledge Management (CIKM), October magnetic field vector classification models, recent MVTS17-21, 2022, Atlanta, USA based models are more efective for predicting flaring *$Cos.rhreasmpdoin@duinsgu.aeuduth(oSr..M. Hamdi); fuad@nmsu.edu (A. F. Ahmad); activities [3]. MVTS classification models targeting flare soukaina.boubrahimi@usu.edu (S. F. Boubrahimi) prediction are divided in two categories: ( 1 ) statisti0000-0002-9303-7835 (S. M. Hamdi); 0000-0001-5693-6383 cal feature-based method [4], and ( 2 ) end-to-end deep (S. F. Boubrahimi) learning-based method [5]. The models of the first cate© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License gory work in two steps. Firstly, low-dimensional repreCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org)

2. Related Work

sentations of MVTS instances are calculated from concatenation/aggregation of summarization statistics (e.g., mean, standard deviation, skewness, kurtosis, etc) of the While the current approaches of flare prediction are univariate time series components. Lastly, traditional mostly based on data science, the earliest flare prediction classifiers (e.g., kNN, SVM, etc) are trained with labeled system was an expert system named THEO that required MVTS representations. The two-step process of MVTS human inputs [7]. The Space Environment Center (SEC) classification relies heavily on hand-engineered statis- of the National Oceanic and Atmospheric Administration tical features and the choice of downstream classifiers, (NOAA) adopted the system THEO in 1987. To distinwhich eventually complicates the application of these guish flare classes, THEO was provided input data of models in datasets with varying properties. In the sec- sunspots and magnetic field properties. ond category, RNN/LSTM-based deep sequence models Due to the abundance of magnetic field data collected are trained by sequentially feeding vectors representing by NASA’s recent missions, research eforts of flare premagnetic field parameters into sequence model cells, and diction of the last two decades are based on data science optimizing the cell weights through gradient descent- rather than on purely theoretical modeling. Data sciencebased backpropagation. While the deep learning models based approaches stemmed from both linear and nonlinensure end-to-end learning bypassing the dependency ear statistics. Based on the type of dataset used, these on hand-engineered features, they can utilize only the approaches are subdivided into two classes: line-of-sight time dimension of the MVTS instances, and this limited magnetogram-based models and vector magnetogramusage of underlying patterns results in poor classification based models. Solar active regions are represented by the performance. parameters of either photospheric magnetic field data

In this work, we propose a deep learning-based MVTS that contain only the line-of-sight component of the classification approach for solar flare prediction lever- magnetic field or by the full-disk photospheric vector aging the the fact that MVTS data is rich not only in magnetic field. Followed by NASA’s launch of SDO in temporal dimension, but also in spatial dimension which 2010, the HMI instrument has been mapping the fullencodes inter-variable relationships [6]. For learning disk vector magnetic field every 12 minutes [ 8]. Most higher-order relationships of the MVTS variables, we of the recent models use the near-continuous stream of used functional networks, where nodes represent vari- vector magnetogram data found from SDO, while the earables, and edges represent positive correlation of the time lier models (dated before 2010) mostly used line-of-sight series of corresponding variables. The MVTS instance magnetic data. is divided into equal-length temporal windows, and an The objective of the linear statistical models was to find edge-weighted functional network is constructed for each the active region magnetic field features that are highly window. We trained Graph Convolution Network (GCN) correlated with the flare occurrences. Cui et al. [ 9] and to learn representation of each functional network. In Jing et al. [10] used line-of-sight magnetogram data to addition, we used two LSTM networks for learning rep- find correlation-based statistical relationships between resentations based on temporal dimension within and magnetic field parameters and flare occurrences. Even between the windows. Our model significantly outper- before the launch of SDO, Leka and Barnes [11] collected forms existing MVTS-based flare prediction models on and curated vector magnetogram data from Mees Solar a dataset containing MVTS instances of solar events of Observatory on the summit of Mount Haleakala, and diferent flare classes. used linear discriminant analysis (LDA) for classifying

The contributions made by this paper are listed below. flaring events.

Nonlinear statistical models are mostly machine learn1. Leveraging higher-order inter-variable relation- ing classifiers based on tree induction, kernel method, ships of the MVTS instances by GCN-based dy- neural network, and so on. On the line-of-sight namic functional network embedding. magnetogram-based active region datasets, Song et al. 2. Utilizing local and global patterns of the temporal [12] used logistic regression, Yu et al. [13] used C4.5 dimension of the MVTS instances through LSTM- decision tree, Ahmed et al. [14] used the fully connected based within-window and between-window se- neural network, and Al-Ghraibah et al. [15] used relequence learning. vance vector machine as classification models. Bobra et al. 3. Experimentally demonstrating the better perfor- [16] used Support Vector Machine (SVM) on SDO-based mance of our model in comparison with the state- vector magnetogram data for classifying flaring and nonof-the-art baselines on a benchmark solar flare lfaring active regions. Nishizuka et al. [ 17] used both prediction dataset. line-of-sight and vector magnetograms and compared the performance of three classifiers - kNN, SVM, and Extremely Randomized Tree (ERT). Other examples of solar lfare prediction on non-sequential data include various Multivariate Time Series (MVTS) instance

P1 itrcenagm trrseeapam .P..

2 loaS lifed

PN ...

Flare occurrence Observation window (T)

Prediction window (Δ)

Time (t)

MVTS instance () ∈ R × is a collection of univariate

time series of magnetic field parameters, where each time series contains periodic observation values of the corresponding parameter for an observation period .

We denote the vector of -th timestamp as <> ∈ R , and the time series represented by -th parameter as ∈ R . After the observation period and prediction period ∆ , the event is labeled by the active region state (flare quiet or diferent flare classes). The active region state of a particular timestamp is found from the NOAA records of flaring events. Fig. 1 shows the MVTS-based data model of a solar event. Each MVTS instance is divided into equal-length windows such that = , where denotes window length. The sub-MVTS is denoted by ∈ R × , and is a subsequence of . applications of convolutional neural network (ConvNet) on SDO AIA/HMI images [18, 19, 20, 21].

Angryk et al. [3] introduced temporal windowbased flare prediction, which extends the earlier sin- 3.1.2. Node-attributed functional network gle timestamp-based models. The authors published an Functional network is a undirected and edge-weighted MVTS-based active region dataset, where each MVTS graph, and defined as = (, , , ), where the instance records magnetic field data for a preset observa- set of nodes = {1, ..., } denotes magnetic field tion time and uniform sampling rate, and is labeled by parameters, : →− R is a function of mapping edges lfare classes that occurred after a given prediction time. to their weights, and node attribute matrix ∈ R× Among the MVTS classification approaches, Hamdi et contains the time series of each node in the sub-MVTS, al. [4] used statistical summarization of component uni- i.e., = . The functional network is defined on the variate time series for training kNN classifier, Ma et. al. sub-MVTS, and the weight of edge (between node [22] applied MVTS decision trees that approached the pair and ) represents the statistical similarity of problem using clustering as a preprocessing step, and length time series of and . Each functional network Muzaheed et. al. [5] used LSTM-based deep sequence derived from a MVTS dataset has the same node set . modeling for end-to-end flare classification that automated feature learning process avoiding hand-engineered 3.1.3. Graph Convolution statistical features.

Unlike previous models based on traditional ML and deep sequence learning, in this work, we present a model that leverages temporal as well as spatial relationships of the MVTS instances. Our model learns MVTS representations through an end-to-end fashion, and utilizes higher-order inter-variable relationships and local and global temporal changes.

For learning the representations of node-attributed functional networks, we use Graph Convolution Network (GCN). GCN is a widely used graph neural network [23] that learns node representations from a graph through layer-wise neighborhood aggregation. Graph convolution of layer aggregates the representations of -hop neighbors. GCN updates representation of node in a graph = (, , , ) by following equations. 3. MVTS representation learning by functional network and sequence embedding 3.1. Notations and Preliminaries 3.1.1. MVTS and Sub-MVTS Each solar active region resulting in diferent flare classes (or staying as a flare quiet region) after a given prediction window represents a solar event. The solar event is represented by a MVTS instance (), and associated by a class label (). The class label () represents the flare quiet state, or flare classes of diferent intensities. The ℎ[0] =

⎛ ℎ[+1] = ⎝[] ∑︁ ∈() | ()|

[] ℎ + []ℎ[]⎠ ,

⎞ ∀ ∈ {0, 1, ..., − 1}

[] = ℎ = 1

∑︁ | | ∈ ( 1 ) ( 2 ) ( 3 ) ( 4 )

Here, is the number of GCN layers, ∈ R is the

vector of node , ℎ[] ∈ R is the representation of node

[] in layer , ∈ R× is the weight matrix of layer

Edge-weighted network structure and node attribute matrix Window-based functional network construction

A s 1 s 2 s 3

B F B

F A

...

Sub-MVTS

C E cs<0>

F E D C B A hs<0> ...

F E D B

D C C E cs<0>

hs<0> ... LSTMs A

Sub-MVTS

... ... LSTMs

LSTMs The MVTS instance is divided into three windows (η = 3) each with -length GCN z

G hs| |

Concat zs ... z G zs

GCN ...

LSTMs hs| | c <0> f h <0> f

LSTMf c <1> f h <1> f zw<1> zw<2> zw<3> c <2> f

LSTMf

Softmax h <2> f z f

Linear Concat

LSTMf shown as {, , , , , }. For showing the functional network construction process, parameter set {1, 2, .., } of the MVTS instance has been

∈ R is the bias vector of layer , () is the set in the edge between node and its neighbor , is the ifnal representation of node after iterations of neighborhood aggregation, and is graph representation found by averaging the node representations. 3.1.4. Sequence embedding through LSTM Long-short term memory (LSTM) networks [24] are frequently used for sequence representation learning which facilitates various tasks such as sequence classification, sequence-to-sequence translation, and ˜<> = ℎ([ℎ<− 1>, <>] + ) Γ = ([ℎ<− 1>, <>] + ) Γ = ( [ℎ<− 1>, <>] + ) Γ = ([ℎ<− 1>, <>] + ) <> = Γ ⊙ ˜<> + Γ ⊙ <− 1> ℎ<> = Γ ⊙ ℎ(<>) ( 5 ) (6) (7) (8) (9) (10)

We denote the number of dimensions of the cell state representation <> and hidden state representation so on.

We use LSTM

networks for learning low- ℎ<> of the LSTM cell as . The concatenation of hiddimensional representations of MVTS instances. The den state of previous timestamp and the input of current MVTS (and sub-MVTS) instances are sequences of - timestamp is [ℎ<− 1>, <>] ∈ R+ . The candidate dimensional timestamp vectors. The timestamp vector <> ∈ R represents the magnetic filed state of cell state representation is ˜<> matrices are , , , ∈ R× (+), and bias ∈

R . The weight

the active region ( parameter values) in the times- terms are , , , ∈ R. The subscripts , , and tamp .

We denote the inputs to the LSTM cells as [<1>, <2>, <3>, ..., <> ], cell state representations as [<0>, <1>, <2>, ..., < − 1>], and hidden state representations as [ℎ<0>, ℎ<1>, ℎ<2>, ..., ℎ<> ], represents the activations of update gate, forget gate, and output gate respectively, while ⊙ multiplication, and represents sigmoid activation. Finally, we consider ℎ<> as the final representation of refers to elementwise domly initializing <0> and ℎ<0>, we update the cell state and hidden state of the timestamp by following LSTM equations [24]. where is the last timestamp of the sequence. After ran- the input MVTS. 3.2. Data Preprocessing 3.2.1. Node-level normalization Since the magnetic field parameter values are recorded in diferent scales, we perform z-score normalization. Suppose that number of MVTS instances each with parameters and time points are represented by a thirdorder tensor ∈ R× × , where three modes represent events, parameters/nodes, and timestamps. For the better performance of the GCN-based graph embedding, we perform node-level z-normalization as a preprocessing step in the following three steps.

1. We perform mode-2 matricization, i.e., reshaping the tensor so that mode-2 (parameter/node) fibers become the columns of the matrix. The matrix is denoted by ( 2 ) ∈ R × . The columns are denoted by 1, 2, . . . , . 2. For each column , we perform z-normalization as follows.

() = () − ()

() () is the -th value of the column , Here, where 1 ≤ ≤ , () is the mean of the column , and () is the standard deviation of the column . 3. We reshape the matrix ( 2 ) ∈ R × back to

third-order tensor, ∈ R× × . 3.2.2. Functional network construction 3.3. MVTS representation learning In Fig. 2, we show the components of MVTS representation learning. Firstly, the window embedding learns the local spatiotemporal changes of the sub-MVTS instances through the models denoted as and , and ifnally, the whole MVTS embedding learns global temporal changes of the local (window) representations through the model denoted as . 3.3.1. Window embedding Our model learns the representation of the window (sub-MVTS) of the MVTS instance through GCNbased node-attributed functional network embedding and LSTM-based local sequence modeling.

• GCN-based functional network embedding:

We input the node-attributed functional network (, , , ) to a two-layer GCN. The initial node attributes are set as = (Eq. 1). In the first layer, each node is embedded into a ′dimensional space through 1-hop neighborhood aggregation, and after the second layer, each node is embedded into a -dimensional space through 2-hop neighborhood aggregation (Eq. 2,3). Finally, the whole graph representation ∈ R is computed through mean pooling (Eq. 4). • LSTM-based sub-MVTS embedding: The sub

MVTS = [<1>, ..., <> ], where <> ∈ R , is sequentially input to the (Eq. 510), and we extract the last hidden representation = ℎ<> , where ∈ R .

We calculate the Pearson correlation matrix ∈ R× For the window embedding, we concatenate ∈ R for the sub-MVTS ∈ R × . In the correlation ma- and ∈ R . Therefore, the window representation is trix, represents the Pearson correlation coeficient ∈ R+ . (in the range of [-1, 1]) between -length time series and . The symmetric matrix can be considered 3.3.2. Whole MVTS embedding as an adjacency matrix of a graph of nodes. We apply a sparsity threshold of 0 so that only edges with After each of windows is represented as ( + positive weight (node pairs with positive correlation) )-dimensional vector, we feed the sequential data are considered for functional network construction. We [< 1>, ..., < > ] into for global temporal denote the sparse correlation matrix as the adjacency change modeling. Note that and have matrix ∈ R× . Although the functional network diferent learnable parameter sets (e.g., , , etc), defined over a sub-MVTS encodes inter-variable inter- although in this work the number of dimensions () in actions within a small temporal window, the adjacency the cell state and hidden state are kept the same. We matrix is not enough for the completeness of data, since extract the final hidden state representation = ℎ<> , negative correlation coeficients are discarded. To avoid where ∈ R . We input into a linear (fully conthe data missing, in addition to the adjacency matrix nected) layer. In this layer, the parameters are ∈ (graph structure), we extract the node attribute matrix R× , and ∈ R, where is the number of classes. = . In ∈ R× , each row represents node at- After this layer, we have a -dimensional representation tributes in the form of -length time series (normalized of the whole MVTS instance of event . in the previous step). () = ( + ) (11) ˆ() = () =1 () ∑︀ (12)

Finally, we input () ∈ R into a softmax layer, benchmark dataset. We used PyTorch 1.10.0 with CUDA whose number of units is equal to the number of classes. 11.1 for implementing our GCN-LSTM-based MVTS clasThe softmax layer gives us the normalized class probabil- sifier. The source code of our model and the experimental ities, and we finally get ˆ() ∈ R . dataset are available at our GitHub repository. 1

As the benchmark dataset of our experiments, we used

The predicted labels of training MVTS instances are the solar flare prediction dataset published by Angryk et. matched against true labels, and the Adam optimizer al. [3]. Each MVTS instance in the dataset is made up [25] updates the weight and bias parameter values of the of 25 time series of active region magnetic field param , , and the fully connected layer eters (for the full list of parameters, see [16]). The time through backpropagation algorithm. Algorithm 1 shows series instances are recorded at 12 minutes intervals for the training procedure of the proposed GCN-LSTM-based a total duration of 12 hours (60 time steps). The MVTS MVTS representation learning. instances are labeled according to the flaring event that occurred after 12 hours. Therefore, the dataset has the Algorithm 1 Training of GCN-LSTM-based MVTS rep- number of the observation points = 60, and the numresentation learning ber of dimensions in timestamp vectors = 25, while Input: Training set consisted of functional network the prediction window is ∆ = 12 hours. Our experiadjacency matrices ∈ R× × × and node mental dataset consists of 1,540 MVTS instances evenly attribute matrices ∈ R× × × , one-hot distributed across four classes (X, M, BC, and Q), where training labels ∈ R× , number of epochs BC represents events from both B and C classes (less inℎ, learning rate , and weight decay factor of the tense flares). We split the dataset into train and test using Adam optimizer . the stratified holdout method (two-thirds for training and Output: Learned parameters of , , and one-third for the test). .

1: Randomly initialize parameter set , which 4.2. Baseline methods

contains , , and parameters We evaluated our GCN-LSTM-based MVTS classification 2: for number of training epochs ℎ do model with six other baselines. 3: for MVTS instance = 1, 2, ..., do 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

Window matrix, = [0] × (+) for window = 1, 2, ..., do ← [, , :, :] ← [, , :, :] ← (, ) //Eq. 1-4 ( = 2) ← ( ) //Eq. 5-10 [, :] ← (, ) end for ← () //Eq. 5-10 ← ( ) //Eq. 11 () ← ( ) //Eq. 12 //negative log likelihood loss calculation 16: ℒ ← ((), ()) 17: Update minimizing ℒ by Adam(, ) 18: end for 19: end for 20: return • Flattened vector method (FLT): This is a naive method, where each 60 × 25 MVTS instance is lfattened into a 1, 500-dimensional vector. • Vector of last timestamp (LTV): This method was introduced by Bobra et al [16], where vector magnetogram data (feature space of all magnetic field parameters) were used for classification. Since the last timestamp of the MVTS is temporally nearest to the flaring event, we sampled the vector of the last timestamp (25-dimensional) to train the classifier. • Time series summarization-based MVTS representation (TS-SUM): This method, proposed by Hamdi et al [4] summarizes each individual time series of length by eight statistical features: mean, standard deviation, skewness, and kurtosis of the original time series, and the firstorder derivative of the time series. As a result, we get an 8 × 25-dimensional vector space, which is used for training the downstream classifier. • Long-short term memory (LSTM): This LSTMbased approach was proposed by Muzaheed et.

4. Experiments

In this section, we demonstrate our experimental findings. We compared the performance of our model with six other MVTS-based flare prediction baselines on a al. [5]. Each MVTS instance was considered as a and one-third for the test). In the experiments of the -length sequence of <> ∈ R timestamp vec- proposed GCN-LSTM model, we have following hypertors. After sequentially feeding the LSTM model parameters: # windows, : 4, window length, : 15, with each timestamp vector, the last hidden repre- # hidden dimensions ′ in first GCN layer: 64, # node sentation was considered as the MVTS represen- embedding dimensions in second GCN layer: 4, # ditation. Following the same experimental setting, mensions in cell state and hidden state representations we use the number of both cell state and hidden of both and : 128, # training epochs: state dimensions as 128, the number of training 100, Adam learning rate : 10− 4, and weight decay (regepochs as 500, and the learning rate in stochastic ularization factor) : 10− 3.

gradient descent as 0.01. • Recurrent Neural Network (RNN): As the fifth 4.3. Multiclass classification performance baseline, we replace LSTM cells of the model of [5] with standard RNN cells. Similar to the ex- In Table 1, we show the classification performances of perimental setting of [5], we use the number of our GCN-LSTM-based MVTS classifier along with that of RNN hidden dimensions as 128, the number of the baseline methods. For a comprehensive classification training epochs as 1,000, and the learning rate in report, we show accuracy along with precision, recall, stochastic gradient descent as 0.01. and F1 of each class. We performed five experiments • Random Convolutional Kernel Transform with diferent train/test sets sampled by stratified hold(ROCKET): We use ROCKET [26] as the sixth out (two-thirds for training and one-third for the test) and baseline for MVTS-based solar event classifica- reported the mean and standard deviation of the experition. ROCKET was shown as the best performing ments. From the results, it is visible that the GCN-LSTMalgorithm in the MVTS classification benchmark- based MVTS classification model outperforms all other ing study by Ruiz et al [27], which included 26 baselines in all the performance measures. In overall MVTS datasets of the UEA archive [28]. ROCKET evaluation, ROCKET achieves second-bast performance, uses a large number of random convolution ker- while the LSTM model becomes third. GCN-LSTM model nels in conjunction with a linear classifier (ridge achieves around 20% more accuracy in comparison with regression or logistic regression), where each ker- the LSTM model, which proves the importance of learnnel is applied to each univariate time series in- ing MVTS representations in both spatial and temporal stance. Similar to the experimental setting of domains rather than learning only from the temporal [27], we used the number of kernels in ROCKET domain. Among shallow ML models, TS-SUM performs as 10,000. better than FLT and LTV models. In general, the high performances of TS-SUM, RNN, LSTM, ROCKET, and GCN-LSTM prove the importance of time series representations of solar events.

The first three baselines are embedding followed by classification methods. After performing the embedding of MVTS instances using those methods, we use logistic regression classifier with L2 regularization. In all the experiments, we split the dataset into train and test using the stratified holdout method (two-thirds for training 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0.9

Train set size (a) Multiclass classification accuracy with increasing training data

LTV

TS-SUM

RNN

LSTM

ROCKET

GCN-LSTM

FLT 100 75 ) % (1F 50 s s a l c X 25 0

RNN

LSTM

FLT LTV TS-SUM RNN

LSTM ROCKET 1.000 increase of training set size, we observe more consistent increasing patterns in deep learning and kernel-based methods, e.g., GCN-LSTM, ROCKET, LSTM, and RNN. It proves that with suficiently large datasets, deep learning models can outperform the traditional classifiers or embedding methods in a larger margin. The time series summarization-based method TS-SUM shows promising performance throughout the experiments, but the generalization capability of this model can be limited in a more complex dataset due to its less flexible learning methodology consisting of hand-engineered features. Compared to the deep learning-based and time series-based methods, the LTV and FLT models perform poorly, which proves the importance of time series in avoiding underfitting. 4.5. Binary classification performance In addition to classifying the solar active regions in different flare classes, a major use case in data-driven flare 4.4. Classification varying train set size prediction is the binary classification, i.e., distinguishTo verify the adaptability of our model with bigger train- ing major flaring events from minor flaring events or ing datasets, we experimented by varying the training lfare quiet events. In this experiment, we considered X set size. We varied the training set size from 10% to 90% and M class MVTS instances as flaring events, while we of the dataset size, while testing the models with the considered all other instances (Q and BC) as non-flaring rest of the instances (Fig. 3). We performed stratified events. In Fig. 4, we show the mean binary classification train/test sampling with a given training set size, and performances of all models over five diferent train/test evaluated the classification performance of the classifiers samples in terms of accuracy, precision, recall, and F1 ifve times with five distinct samples of training and test of flaring and non-flaring classes. It is clearly visible sets. In Fig. 3a and 3b, we plotted the mean accuracy that the GCN-LSTM model outperforms all other basevalues and mean F1 (X class) values found in all runs of lines. We reported the performances of the two bestdiferent train/test samples with diferent training data performing models in numbers along with their bars. In sizes. GCN-LSTM consistently outperforms other base- all performance metrics, GCN-LSTM achieves an averlines in all settings of training set sizes. ROCKET is the age of ∼ 8% better performance than the second-best second-best performing classifier in this experiment, and performing ROCKET algorithm. In general, we observe especially in F1 measure ROCKET exhibits similar ro- the similar performance of the models as that of multibust performance to GCN-LSTM. With only 10% training class classification. Although one deep learning model, data, GCN-LSTM achieved 70% classification accuracy, i.e., the RNN-based model performed poorer than the while the third-best performing LSTM model achieve that TS-SUM method, the RNN-based model is an end-to-end level of high performance by using 90% training data. Al- classification model, which might outperform TS-SUM though all models gain more accuracy with a gradual with more training data, more complex model, and more eficient hyperparameter tuning. 2 n o i sen 0 m i d ESN 20 t 40 60

Class

X M BC Q 40 20 0 20 t-SNE dimension 1 40 60 sequence embedding. In contrary to other MVTS classiifcation models applied for flare prediction, our model utilizes spatial and temporal features of the MVTS instances, and does not depend on predefined statistical features. Our experiments on a real-life solar flare prediction dataset demonstrate the superior performance of our model in performing multiclass and binary MVTS classification.

In the future, we look forward to designing more eficient models by techniques such as ( 1 ) learning attention coeficients in spatial and temporal feature spaces, ( 2 ) customizing transformer models for MVTS representations, and ( 3 ) analyzing the efects of univariate sequence embedding towards MVTS representation learning. We will also apply our models in other MVTS-based solar event datasets (e.g., solar energetic particles) [30], and MVTS datasets generated from other sources such as functional MRI (fMRI)-based time series of brain regions [31].

6. Acknowledgments

4.6. Embedding performance Visualization of high-dimensional data in 2D/3D space is a well-known method of demonstrating the efectiveness of learned representations. To investigate the quality of learned MVTS representations, we provide a visualization of t-SNE [29] transformed MVTS representations extracted by the final layer of the GCN-LSTM model. Similar to section 4.3, the stratified holdout strategy is taken to pre-train the model, and all instances are projected to t-SNE-reduced 2D space (Fig. 5). The 2D projection exhibits discernible clustering of the MVTS instances. Some meaningful insights are observed by the t-SNE scatter plot such as ( 1 ) patterns of four classes are easily recognizable, ( 2 ) flare-quiet events (Q) and minor flaring events (B and C) are comparatively similar, ( 3 ) X and M class lfares exhibit significant dissimilarity from other classes, ( 4 ) some flare-quiet events are similar to the minor flaring events, ( 5 ) few minor flares show similar characteristics to M-class flares, and (6) the characteristics of the X-class lfares are exclusive, and other class instances do not show any similarity with X-class instances.

5. Conclusion

In this work, we presented an end-to-end deep learningbased flare prediction model from multivariate time series (MVTS) represented datasets that leverages intervariable relationships by graph convolutional networkbased functional network embedding, and local and global temporal change modeling through LSTM-based This project has been supported in part by funding from CISE and GEO directorates under NSF awards #2153379 and #2204363. [6] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, C. Zhang, network, The Astrophysical Journal 891 (2020) 10.

Connecting the dots: Multivariate time series fore- [20] E. Park, Y.-J. Moon, S. Shin, K. Yi, D. Lim, H. Lee, casting with graph neural networks, in: KDD ’20: G. Shin, Application of the deep convolutional neuThe 26th ACM SIGKDD Conf. on Knowledge Dis- ral network to the forecast of solar flare occurrence covery and Data Mining, Virtual Event, CA, USA, using full-disk solar magnetograms, The AstroAugust 23-27, 2020, ACM, 2020, pp. 753–763. physical Journal 869 (2018) 91. [7] P. S. McIntosh, The classification of sunspot groups, [21] N. Nishizuka, Y. Kubo, K. Sugiura, M. Den, M. Ishii,

Solar Physics 125 (1990) 251–267. Operational solar flare prediction model using deep [8] J. P. Mason, J. Hoeksema, Testing automated solar lfare net, Earth, Planets and Space 73 (2021) 1–12. lfare forecasting with 13 years of michelson doppler [22] R. Ma, S. F. Boubrahimi, S. M. Hamdi, R. A. Angryk, imager magnetograms, The Astrophysical Journal Solar flare prediction using multivariate time series 723 (2010) 634. decision trees, in: 2017 IEEE Intl. Conf. on Big Data, [9] Y. Cui, R. Li, L. Zhang, Y. He, H. Wang, Correlation BigData 2017, Boston, MA, USA, December 11-14, between solar flare productivity and photospheric 2017, IEEE Computer Society, 2017, pp. 2569–2578. magnetic field properties, Solar Physics 237 (2006) [23] T. N. Kipf, M. Welling, Semi-supervised classifica45–59. tion with graph convolutional networks, in: 5th [10] J. Jing, H. Song, V. Abramenko, C. Tan, H. Wang, Intl. Conf. on Learning Representations, ICLR 2017, The statistical relationship between the photo- Toulon, France, April 24-26, 2017, Conference Track spheric magnetic parameters and the flare produc- Proceedings, OpenReview.net, 2017. tivity of active regions, The Astrophysical Journal [24] S. Hochreiter, J. Schmidhuber, Long short-term 644 (2006) 1273. memory, Neural Comput. 9 (1997) 1735–1780. [11] K. Leka, G. Barnes, Photospheric magnetic field [25] D. P. Kingma, J. Ba, Adam: A method for stochastic properties of flaring versus flare-quiet active re- optimization, in: 3rd Intl. Conf. on Learning Repgions. ii. discriminant analysis, The Astrophysical resentations, ICLR 2015, San Diego, CA, USA, May Journal 595 (2003) 1296. 7-9, 2015, Conf. Track Proc., 2015. [12] H. Song, C. Tan, J. Jing, H. Wang, V. Yurchyshyn, [26] A. Dempster, F. Petitjean, G. I. Webb, ROCKET: V. Abramenko, Statistical assessment of photo- exceptionally fast and accurate time series classifispheric magnetic features in imminent solar flare cation using random convolutional kernels, Data predictions, Solar Physics 254 (2009) 101–125. Min. Knowl. Discov. 34 (2020) 1454–1495. [13] D. Yu, X. Huang, H. Wang, Y. Cui, Short-term so- [27] A. P. Ruiz, M. Flynn, J. Large, M. Middlehurst, lar flare prediction using a sequential supervised A. Bagnall, The great multivariate time series claslearning method, Solar Physics 255 (2009) 91–105. sification bake of: a review and experimental eval[14] O. W. Ahmed, R. Qahwaji, T. Colak, P. A. Higgins, uation of recent algorithmic advances, Data Mining P. T. Gallagher, D. S. Bloomfield, Solar flare pre- and Knowledge Discovery 35 (2021) 401–449. diction using advanced feature extraction, machine [28] A. J. Bagnall, H. A. Dau, J. Lines, M. Flynn, J. Large, learning, and feature selection, Solar Physics (2013) A. Bostrom, P. Southam, E. J. Keogh, The UEA 1–19. multivariate time series classification archive, 2018, [15] A. Al-Ghraibah, L. Boucheron, R. McAteer, An CoRR abs/1811.00075 (2018). URL: http://arxiv.org/ automated classification approach to ranking pho- abs/1811.00075. arXiv:1811.00075. tospheric proxies of magnetic energy build-up, As- [29] L. Van der Maaten, G. Hinton, Visualizing data tronomy & Astrophysics 579 (2015) A64. using t-sne., Journal of machine learning research [16] M. G. Bobra, S. Couvidat, Solar flare prediction 9 (2008).

using SDO/HMI vector magnetic field data with a [30] S. F. Boubrahimi, S. M. Hamdi, R. Ma, R. A. Angryk, machine-learning algorithm, The Astrophysical On the mining of the minimal set of time series Journal 798 (2015) 135. data shapelets, in: IEEE Intl. Conf. on Big Data, [17] N. Nishizuka, K. Sugiura, Y. Kubo, M. Den, S. Watari, Big Data 2020, Atlanta, GA, USA, December 10-13, M. Ishii, Solar flare prediction model with 2020, IEEE, 2020, pp. 493–502. three machine-learning algorithms using ultravio- [31] S. M. Hamdi, B. Aydin, S. F. Boubrahimi, R. A. Anlet brightening and vector magnetograms, Astro- gryk, L. C. Krishnamurthy, R. D. Morris, Biomarker physical Journal 835 (2017) 156. detection from fmri-based complete functional con[18] Y. Zheng, X. Li, X. Wang, Solar flare prediction with nectivity networks, in: IEEE Intl. Conf. on Artifithe hybrid deep convolutional neural network, The cial Intelligence and Knowledge Engineering, AIKE Astrophysical Journal 885 (2019) 73. 2018, Laguna Hills, CA, USA, September 26-28, 2018, [19] X. Li, Y. Zheng, X. Wang, L. Wang, Predicting IEEE, 2018, pp. 17–24.

solar flares using a novel deep convolutional neural

[1]

Eastwood , E. Bifis,

Hapgood ,

Green ,

Bisi ,

Bentley ,

Wicks , L.- A. McKinnell , M.

Gibbs , C.

Burnett , The economic impact of space weather: Where do we stand? , Risk Analysis 37 ( 2017 ) 206 - 218 .

[2]

Science , T. Council, National space weather action plan , https://obamawhitehouse.archives. gov/sites/default/files/microsites/ostp/final_ nationalspaceweatheractionplan_20151028.pdf , 2015 . [Accessed: 10 -Feb-2022].

[3]

R. A.

Angryk ,

P. C.

Martens ,

Aydin ,

Kempton ,

S. S.

Mahajan ,

Basodi ,

Ahmadzadeh ,

Cai ,

S. F.

Boubrahimi ,

S. M.

Hamdi , et al., Multivariate time series dataset for space weather data analytics , Scientific data 7 ( 2020 ) 1 - 13 .

[4]

S. M.

Hamdi ,

Kempton ,

Ma ,

S. F.

Boubrahimi ,

R. A.

Angryk , A time series classification-based approach for solar flare prediction , in: 2017 IEEE Intl. Conf. on Big Data (Big Data) , IEEE, 2017 , pp. 2543 - 2551 .

[5]

A. A. M.

Muzaheed ,

S. M.

Hamdi ,

S. F.

Boubrahimi , Sequence model-based end-to-end solar flare classiifcation from multivariate time series data , in: 20th IEEE Intl. Conf. on Machine Learning and Applications, ICMLA 2021 , Pasadena, CA, USA, December 13 - 16 , 2021 , IEEE, 2021 , pp. 435 - 440 .