1. Introduction

Forecasting Multivariate Time Series of the Magnetic Field Parameters of the Solar Events

Khaznah Alshammari

Shah Muhammad Hamdi

Ali Ahsan Muhummad Muzaheed

Soukaina Filali Boubrahimi

1 0 New Mexico State University , Las Cruces, NM, 88003 , USA 1 Utah State University , Logan, UT, 84322 , USA

Solar magnetic field parameters are frequently used by solar physicists in analyzing and predicting solar events (e.g., flares, coronal mass ejection, etc). Temporal observation of the magnetic field parameters, i.e., multivariate time series (MVTS) representation facilitates finding relationships of magnetic field states to the occurrence of the solar events. Forecasting MVTS of solar magnetic field parameters is the prediction of future magnetic field parameter values based on historic values of the past, regardless of the event labels. In this paper, we propose a deep sequence-to-sequence (seq2seq) learning approach based on batch normalization and Long-Short Term Memory (LSTM) network for MVTS forecasting of magnetic field parameters of the solar events. To the best of our knowledge, this is the first work that addresses the forecasting of magnetic field parameters rather than the classification of events based on MVTS representations of those parameters. The experimental results on a real-life MVTS-based solar event dataset demonstrate that our batch normalization-based model outperforms naive sequence models in forecasting performance.

eol>Multivariate time series Forecasting Solar Physics Solar Magnetic Field Parameters LSTM Batch Normalization

1. Introduction

events. The primary data source used in these eforts is the images captured by the Helioseismic Magnetic Imager Solar events are characterized by magnetic field param- (HMI) housed in the Solar Dynamics Observatory (SDO). eter values on the solar corona such as helicity, flux, HMI images (captured in near-continuous time) contain Lorentz force, etc. These magnetic field parameter val- spatiotemporal magnetic field data of solar active regions. ues indicate the occurrence of extreme solar events such For performing temporal window-based flare prediction as solar flares, coronal mass ejection (CME), and erup- of an AR instance, the spatiotemporal magnetic field tion of solar energetic particles (SEP) [ 1 ]. These events data of that region is mapped into a multivariate time are caused by a sudden burst of magnetic flux from the series (MVTS) instance[ 3 ]. MVTS instances, collected corona. The X-ray radiation of such extreme solar events with a uniform sampling rate throughout a present obcan have devastating efects on life and infrastructure servation period, are labeled with multiple event classes in space and ground such as disruption in GPS and ra- (e.g., flare classes), and machine learning-based classidio communication, damage to electronic devices, and ifers are trained with labeled MVTS instances to predict radiation exposure-based health risks to the astronauts. the occurrences of the events after a preset prediction The cost associated due to infrastructure damage after window. Although multiple research eforts [ 4, 5, 6 ] adextreme solar events can rise up to trillions of dollars [ 2 ]. dressed MVTS-based solar event prediction, forecasting

In recent years, the prediction of solar events given a of MVTS-represented magnetic field parameters is yet to predefined time window has become an important chal- be explored. lenge in the heliophysics community. Since the theoreti- In this work, we aim to forecast the future values of the cal relationship between magnetic field influx and the oc- magnetic field parameters, given past values in the MVTS currence of extreme events in solar active regions (AR) is representations. In case of a sudden data gap, i.e., internot yet established, space weather researchers depend on ruption in the communication between the satellite and the data of science-based approaches for predicting solar ground receiver, MVTS forecasting of magnetic field parameters can play an important role in extrapolation. To the best of our knowledge, this is the first attempt to forecast the solar magnetic field parameters. We used a deep sequence-to-sequence learning model based on batch normalization and Long-Short Term Memory (LSTM) network that is trained with input-output pairs of examAMLTS’22: Workshop on Applied Machine Learning Methods for Time Series Forecasting, co-located with the 31st ACM International Conference on Information and Knowledge Management (CIKM), October 17-21, 2022, Atlanta, USA $ kalshamm@nmsu.edu (K. Alshammari); s.hamdi@usu.edu (S. M. Hamdi); muzaheed@nmsu.edu (A. A. M. Muzaheed); soukaina.boubrahimi@usu.edu (S. F. Boubrahimi)

© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License ples, where the inputs are formed by sampling the MVTS CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) instances for an observation window, and the outputs are formed by sampling the MVTS instances for a pre- have been used successfully in multiple Natural Landiction window (which follows the observation window). guage Processing (NLP) tasks such as machine transOur LSTM-based encoder-decoder model is trained with lation [ 9, 10 ] and text summarization [ 11, 12 ]. Since a backpropagation algorithm based on mini-batch gra- multivariate time series are high-dimensional sequence dient descent-based optimization for minimizing Mean data, previously MVTS forecasting has been addressed Squared Error (MSE) between the observed MVTS (input) by diferent seq2seq models [ 13, 14 ]. In [ 15 ], batch norand predicted MVTS (output). malization has shown promising improvements in the sentiment classification task, where a batch-normalized variant of LSTM architecture is used and each LSTM cell’s 2. Related Work input, hidden state, and cell state are normalized during training. Being inspired by encoder-decoder-based machine translation models, in this work we considered the MVTS forecasting of solar magnetic field parameters as a sequence-to-sequence learning task, and used batch normalization-based LSTM architecture for capturing long-term dependencies of multi-dimensional sequence data.

Recent research eforts on solar event prediction are mostly based on data science. Data-driven extreme solar event prediction models stem from linear and nonlinear statistics. Datasets used in these models were collected from line-of-sight magnetogram and vector magnetogram data. Line-of-sight magnetogram contains only the line-of-sight component of the magnetic field, while vector magnetogram contains the full disk magnetic field data [ 7 ]. NASA launched Solar Dynamics Observatory 3. Methodology (SDO) in 2010. Since then, SDO’s instrument Helioseismic and Magnetic Imager (HMI) has been mapping the 3.1. Notations full-disk vector magnetic field every 12 minutes [ 1 ]. Most of the recent prediction models use the near-continuous Each solar active region results in diferent event occurstream of vector magnetogram data found from SDO [ 8 ]. rences after a given prediction window represents an Magnetic field parameters (e.g., helicity, flux, etc) were event instance. The event instance is represented by developed with the goal of finding a relationship between a MVTS instance . The MVTS instance ∈ the phosphoric magnetic field behavior and solar activ- R × is a collection of individual time series of magity, which usually occurs in the solar chromosphere and netic field parameters, where each time series contains transition region of the solar corona. periodic observation values of the corresponding param

Deep learning-based sequence-to-sequence models us- eter for an observation period . In the MVTS instance ing Long Short Term Memory (LSTM), Recurrent Neu- = {1 , 2 , ., ., ., }, ∈ R represents a ral Network (RNN), and Gated Recurrent Unit (GRU) timestamp vector. We divide the dataset into (, ) pairs, where = [1 : , :] ∈ R× , = [+1 : , :] ∈ R× , is the observation time, and is the prediction time.

3.2. LSTM and Batch Normalization-based MVTS Forecasting

to 1. We found batch normalization to be significant in maximizing the performance of MVTS forecasting for the magnetic field parameters of the solar events, which we demonstrate in more detail in the experiments section.

3.3. Evaluation Metrics

In this section, we present a batch normalization-based implementation of the encoder-decoder model that uses We used Mean Absolute Error (MAE), Mean Squared LSTM architecture and compare it with other baseline Error (MSE), and Root Mean Squared Error (RMSE) to sequence models of naive stochastic gradient descent report our model results. The evaluation metrics (MAE, implementation (without batch normalization). There MSE, and RMSE) measure the amount of error in statistiare diferent deep sequence learning models, which are cal models. They assess the average squared diference frequently applied in machine translation, and they can between the observed and predicted values. be adapted for time series forecasting. In this study, we Mean Absolute Error (MAE) is the average over analyze two seq2seq models: the batch normalization- the absolute values of the diferences between predicted based seq2seq LSTM Model (BN seq2seq LSTM), and the representations and ground truth representations. seq2seq models based on LSTM/GRU/RNN, and compare their forecasting results. = 1 ∑︁ | − ˆ|

Fig. 1 depicts our seq2seq-based model that uses batch =1 normalization and LSTM architecture. First, in the encoder LSTM cells, the value of each time step is used as where is the ground truth value and ˆ is the predicted input to the encoder LSTM cell together with the previ- value. ous cell state and hidden state ℎ, the process repeats Mean Squared Error (MSE) is defined as the mean until the last cell state and hidden state ℎ are generated. or average of the square of the diference between actual Then, the decoder LSTM cell uses the last cell state and and predicted values. hidden state ℎ from the encoder as the initial states for the decoder LSTM cell. The last hidden state of the encoder = 1 ∑︁( − ˆ)2 is also copied times using a Repeat Vector layer ac- =1 cording to the length of the forecasting window, and each copy is inputted into the decoder LSTM cell together with Root Mean Squared Error (RMSE) is the diference the previous cell state and hidden state ℎ. The decoder between forecast and corresponding observed values, outputs hidden states for all the time steps and the where each diference is squared and averaged over the hidden states are connected to the final Time-distributed- sample space. It denotes the square root of the MSE. dense layer in order to produce the final output sequence.

The time-distributed-dense layer allows to apply a dense ⎯ layer to every temporal slice of the input. We use this = ⎷⎸⎸ 1 ∑︁( − ˆ)2 ifnal layer to process the output from the LSTM hidden =1 layer. Every input shape is three-dimensional, and the ifrst dimension of the input is considered to be the tem- Experiments poral dimension. This means that we need to configure We compared the batch normalization-based seq2seq the last LSTM layer prior to the time-distributed-dense LSTM model with the baseline models on multivariate layer to return output sequences. The output shape will time series forecasting of magnetic field parameters of be three-dimensional as well, which means that if the a solar events dataset. The source code of our model time-distributed-dense layer is the output layer, then for and the experimental dataset are available on our GitHub predicting a sequence we need to reshape the final rep- repository 1. resentation into a three-dimensional shape [ 16 ]. In the batch normalization-based seq2seq LSTM Model, we use 3.4. Dataset Description mini-batches to feed the data into the model. Batch nor- As the benchmark dataset of our experiments, we used malization is a useful method for making deep neural the MVTS-based solar flare prediction data set published network training faster and more robust, and it normal- by Angryk et al [ 3 ]. Each MVTS instance in the dataset izes the input activations to avoid gradient explosion is made up of 25 time series of active region magnetic caused by the activation function ELU (Exponential Lin- ifeld parameters (a full list can be found in [ 1 ]). The time ear Unit) in the encoder [ 17 ]. The batch normalization series instances are recorded at 12 minutes intervals for a layer applies a transformation that maintains the mean output close to 0 and the output standard deviation close 1https://github.com/Kalshammari/BN_Seq2Seq total duration of 12 hours (60-time steps). The dataset has tions was 25, the number of epochs in training was 5, and the number of observation points = 60, and the number the learning rate in stochastic gradient descent is 0.01. of dimensions in timestamp vectors = 25, while the event occurrence window is 12 hours. Our experimental 3.7. Performance of LSTM and Batch dataset consists of 1,540 MVTS instances that are evenly Normalization-based seq2seq model distributed across four flare classes (X, M, BC, and Q).

We discarded the class labels to fit the dataset for MVTS When we apply LSTM and batch normalization-based forecasting [ 5, 4 ], where each MVTS instance is divided seq2seq model, we perform the following steps. First, into input and output (ground truth) sequences according we extract (, ) pairs from all 1,540 MVTS instances, to the observation window () and prediction window where the length of each example is = 40, the (). In our experiments, = 40, and = 20, length of each output is = 20, and each timeswhile = + . tamp vector is 25-dimensional.

In the encoder step, the input is of size (, 40, 25), where 3.5. Train/test splitting method (= 10) is the batch size of the MVTS instances. For each encoder LSTM cell, the vector of each time step is used We performed random sampling for train/test splitting, as the input to the encoder LSTM cell together with the where we use the stratified holdout method (80 % for previous cell state and hidden state ℎ, and the process training, and 20 % for testing) with six diferent random repeats until the last cell state and hidden state ℎ are seeds, and reported the mean error rates along with stan- generated. The decoder LSTM cell uses the last cell state dard deviation. Train and test datasets are z-normalized and hidden state ℎ from the encoder as the initial states since magnetic field parameter values appear on difer- for the decoder LSTM cell. The last hidden state of the ent scales. The shapes of train and test datasets are as encoder is also copied 20 times using the Repeat Vector follows. layer and each copy is inputted into the decoder LSTM cell together with the previous cell state and hidden • X_train shape:(1232, 40, 25) and y_train state ℎ. The decoder outputs a hidden state for all the shape:(1232, 20, 25) 20-time steps, and these hidden states are connected to • X_test shape:(308, 40, 25) and y_test shape:(308, a time-distributed-dense layer to generate the final fore20, 25) casting output which is of size (, 20, 25). We used Mean Absolute Error (MAE), Mean Squared Error (MSE), and 3.6. Baseline Models Root Mean Squared Error (RMSE) to report our model performance results. We reported the mean and standard deviation of the performance measures results in Table 1. We found that our approach of deep sequenceto-sequence learning based on batch normalization and Long-Short Term Memory (LSTM) network significantly outperformed the baseline methods’ results as Table 1 shows. It is visible that batch normalization makes a diference of a large margin by producing errors near 0, whereas the traditional seq2seq models result in large error values due to the absence of batch normalization.

We evaluated our model with LSTM, RNN, and GRUbased seq2seq implementations. In the forward pass, we have input the first vectors of each MVTS to the encoder cells (LSTM/RNN/GRU) to produce the encoded hidden state. That encoded hidden state is the input to the decoder cells of the same type. The decoder then predicts the next 25-dimensional timestamp vectors for each timestamp in and matches the prediction with ground truth to perform stochastic gradient descentbased backpropagation. In all three models, the number of dimensions in cell state and hidden state representa

4. Conclusion

We propose a batch normalization-based deep seq2seq model for multivariate time series forecasting of magnetic field parameters of solar events. Unlike previous works of MVTS-based event classification, we perform forecasting of magnetic field parameter values irrespective of MVTS labels. We compare it with other seq2seq implementations based on LSTM, GRU, and RNN. Our proposed approach significantly improved the MAE, MSE, and RMSE results of MVTS forecasting on a benchmark solar magnetic field parameter dataset.

For future research, we plan to develop machine learning models for MVTS forecasting that leverage MVTS labels. We aim to use the forecasting models for augmenting (creating synthetic examples) MVTS instances of minority classes (rare events). In addition, to utilize inter-variable dependencies of the MVTS instances for the task of forecasting, we plan to incorporate graph construction (e.g., functional network computation from the correlation matrices of the MVTS instances) and graph neural network (GNN)-based representation learning.

Acknowledgments

This project has been supported in part by funding from CISE and GEO directorates under NSF awards #2153379 and #2204363.

[1]

M. G.

Bobra ,

Couvidat , Solar flare prediction using sdo/hmi vector magnetic field data with a machine-learning algorithm , The Astrophysical Journal 798 ( 2015 ) 135 .

[2]

J. P.

Eastwood , E. Bifis,

M. A.

Hapgood ,

Green ,

M. M.

Bisi ,

R. D.

Bentley ,

Wicks , L.- A. McKinnell , M.

Gibbs , C.

Burnett , The economic impact of space weather: Where do we stand? , Risk Analysis 37 ( 2017 ) 206 - 218 .

[3]

R. A.

Angryk ,

P. C.

Martens ,

Aydin ,

Kempton ,

S. S.

Mahajan ,

Basodi ,

Ahmadzadeh ,

Cai ,

S. Filali

Boubrahimi ,

S. M.

Hamdi , et al., Multivariate time series dataset for space weather data analytics , Scientific data 7 ( 2020 ) 1 - 13 .

[4]

S. M.

Hamdi ,

Kempton ,

Ma ,

S. F.

Boubrahimi ,

R. A.

Angryk , A time series classification-based approach for solar flare prediction , in: 2017 IEEE Intl. Conf. on Big Data (Big Data) , IEEE, 2017 , pp. 2543 - 2551 .

[5]

Muzaheed ,

S. M.

Hamdi ,

S. F.

Boubrahimi , Sequence model-based end-to-end solar flare classification from multivariate time series data , in: 20th IEEE Intl. Conf. on Machine Learning and Applications, ICMLA 2021 , Pasadena, CA, USA, December 13 - 16 , 2021 , IEEE, 2021 , pp. 435 - 440 .

[6]

Ma ,

S. F.

Boubrahimi ,

S. M.

Hamdi ,

R. A.

Angryk , Solar flare prediction using multivariate time series decision trees , in: 2017 IEEE Intl. Conf. on Big Data, BigData 2017 , Boston, MA, USA, December 11 - 14 , 2017 , IEEE Computer Society, 2017 , pp. 2569 - 2578 .

[7]

S. F.

Boubrahimi ,

Aydin ,

Kempton ,

Angryk , Spatio-temporal interpolation methods for solar events metadata , in: 2016 IEEE Intl. Conf. on Big Data (Big Data) , IEEE, 2016 , pp. 3149 - 3157 .

[8]

J. P.

Mason ,

Hoeksema , Testing automated solar lfare forecasting with 13 years of michelson doppler imager magnetograms , The Astrophysical Journal 723 ( 2010 ) 634 .

[9]

Bahdanau ,

Cho , Y. Bengio, Neural machine translation by jointly learning to align and translate , arXiv preprint arXiv:1409.0473 ( 2014 ).

[10]

Devlin , M.-

Chang ,

Lee ,

Toutanova , Bert: Pre-training of deep bidirectional transformers for language understanding , arXiv preprint arXiv: 1810 . 04805 ( 2018 ).

[11]

Yousefi-Azar ,

Hamey , Text summarization using unsupervised deep learning , Expert Systems with Applications 68 ( 2017 ) 93 - 105 .

[12]

Radford , J. Wu ,

Child ,

Luan ,

Amodei ,

Sutskever , et al., Language models are unsupervised multitask learners , OpenAI blog 1 ( 2019 ) 9 .

[13]

Liu ,

A. M.

Finch ,

Utiyama , E. Sumita, Agreement on target-bidirectional lstms for sequence-tosequence learning , in: Proc. of the AAAI Conf. on Artificial Intelligence, February 12-17 , 2016 , Phoenix, Arizona, USA, AAAI Press, 2016 , pp. 2630 - 2637 .

[14]

P. H.

Scherrer ,

Schou ,

Bush ,

Kosovichev ,

Bogart ,

Hoeksema , Y. Liu,

Duvall ,

Zhao ,

Schrijver , et al., The helioseismic and magnetic imager (hmi) investigation for the solar dynamics observatory (sdo ), Solar Physics 275 ( 2012 ) 207 - 227 .

[15]

Margarit ,

Subramaniam , A batch-normalized recurrent network for sentiment classification , Advances in Neural Information Processing Systems ( 2016 ) 2 - 8 .

[16]

Brownlee , Long short-term memory networks with python: develop sequence prediction models with deep learning , Machine Learning Mastery , 2017 .

[17]

Santurkar ,

Tsipras ,

Ilyas ,

Madry , How does batch normalization help optimization? , Advances in neural information processing systems 31 ( 2018 ).