Forecasting Multivariate Time Series of the Magnetic Field Parameters of the Solar Events Khaznah Alshammari1,* , Shah Muhammad Hamdi2 , Ali Ahsan Muhummad Muzaheed1 and Soukaina Filali Boubrahimi2 1 New Mexico State University, Las Cruces, NM, 88003, USA 2 Utah State University, Logan, UT, 84322, USA Abstract Solar magnetic field parameters are frequently used by solar physicists in analyzing and predicting solar events (e.g., flares, coronal mass ejection, etc). Temporal observation of the magnetic field parameters, i.e., multivariate time series (MVTS) representation facilitates finding relationships of magnetic field states to the occurrence of the solar events. Forecasting MVTS of solar magnetic field parameters is the prediction of future magnetic field parameter values based on historic values of the past, regardless of the event labels. In this paper, we propose a deep sequence-to-sequence (seq2seq) learning approach based on batch normalization and Long-Short Term Memory (LSTM) network for MVTS forecasting of magnetic field parameters of the solar events. To the best of our knowledge, this is the first work that addresses the forecasting of magnetic field parameters rather than the classification of events based on MVTS representations of those parameters. The experimental results on a real-life MVTS-based solar event dataset demonstrate that our batch normalization-based model outperforms naive sequence models in forecasting performance. Keywords Multivariate time series Forecasting, Solar Physics, Solar Magnetic Field Parameters, LSTM, Batch Normalization 1. Introduction events. The primary data source used in these efforts is the images captured by the Helioseismic Magnetic Imager Solar events are characterized by magnetic field param- (HMI) housed in the Solar Dynamics Observatory (SDO). eter values on the solar corona such as helicity, flux, HMI images (captured in near-continuous time) contain Lorentz force, etc. These magnetic field parameter val- spatiotemporal magnetic field data of solar active regions. ues indicate the occurrence of extreme solar events such For performing temporal window-based flare prediction as solar flares, coronal mass ejection (CME), and erup- of an AR instance, the spatiotemporal magnetic field tion of solar energetic particles (SEP) [1]. These events data of that region is mapped into a multivariate time are caused by a sudden burst of magnetic flux from the series (MVTS) instance[3]. MVTS instances, collected corona. The X-ray radiation of such extreme solar events with a uniform sampling rate throughout a present ob- can have devastating effects on life and infrastructure servation period, are labeled with multiple event classes in space and ground such as disruption in GPS and ra- (e.g., flare classes), and machine learning-based classi- dio communication, damage to electronic devices, and fiers are trained with labeled MVTS instances to predict radiation exposure-based health risks to the astronauts. the occurrences of the events after a preset prediction The cost associated due to infrastructure damage after window. Although multiple research efforts [4, 5, 6] ad- extreme solar events can rise up to trillions of dollars [2]. dressed MVTS-based solar event prediction, forecasting In recent years, the prediction of solar events given a of MVTS-represented magnetic field parameters is yet to predefined time window has become an important chal- be explored. lenge in the heliophysics community. Since the theoreti- In this work, we aim to forecast the future values of the cal relationship between magnetic field influx and the oc- magnetic field parameters, given past values in the MVTS currence of extreme events in solar active regions (AR) is representations. In case of a sudden data gap, i.e., inter- not yet established, space weather researchers depend on ruption in the communication between the satellite and the data of science-based approaches for predicting solar ground receiver, MVTS forecasting of magnetic field pa- rameters can play an important role in extrapolation. To AMLTS’22: Workshop on Applied Machine Learning Methods for Time Series Forecasting, co-located with the 31st ACM International Con- the best of our knowledge, this is the first attempt to fore- ference on Information and Knowledge Management (CIKM), October cast the solar magnetic field parameters. We used a deep 17-21, 2022, Atlanta, USA sequence-to-sequence learning model based on batch $ kalshamm@nmsu.edu (K. Alshammari); s.hamdi@usu.edu normalization and Long-Short Term Memory (LSTM) (S. M. Hamdi); muzaheed@nmsu.edu (A. A. M. Muzaheed); network that is trained with input-output pairs of exam- soukaina.boubrahimi@usu.edu (S. F. Boubrahimi) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License ples, where the inputs are formed by sampling the MVTS CEUR Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 instances for an observation window, and the outputs Figure 1: LSTM and Batch normalization-based seq2seq model for MVTS forecasting are formed by sampling the MVTS instances for a pre- have been used successfully in multiple Natural Lan- diction window (which follows the observation window). guage Processing (NLP) tasks such as machine trans- Our LSTM-based encoder-decoder model is trained with lation [9, 10] and text summarization [11, 12]. Since a backpropagation algorithm based on mini-batch gra- multivariate time series are high-dimensional sequence dient descent-based optimization for minimizing Mean data, previously MVTS forecasting has been addressed Squared Error (MSE) between the observed MVTS (input) by different seq2seq models [13, 14]. In [15], batch nor- and predicted MVTS (output). malization has shown promising improvements in the sentiment classification task, where a batch-normalized variant of LSTM architecture is used and each LSTM cell’s 2. Related Work input, hidden state, and cell state are normalized during training. Being inspired by encoder-decoder-based ma- Recent research efforts on solar event prediction are chine translation models, in this work we considered the mostly based on data science. Data-driven extreme solar MVTS forecasting of solar magnetic field parameters as event prediction models stem from linear and nonlin- a sequence-to-sequence learning task, and used batch ear statistics. Datasets used in these models were col- normalization-based LSTM architecture for capturing lected from line-of-sight magnetogram and vector mag- long-term dependencies of multi-dimensional sequence netogram data. Line-of-sight magnetogram contains only data. the line-of-sight component of the magnetic field, while vector magnetogram contains the full disk magnetic field data [7]. NASA launched Solar Dynamics Observatory 3. Methodology (SDO) in 2010. Since then, SDO’s instrument Helioseis- mic and Magnetic Imager (HMI) has been mapping the 3.1. Notations full-disk vector magnetic field every 12 minutes [1]. Most of the recent prediction models use the near-continuous Each solar active region results in different event occur- stream of vector magnetogram data found from SDO [8]. rences after a given prediction window represents an Magnetic field parameters (e.g., helicity, flux, etc) were event instance. The event instance 𝑖 is represented by developed with the goal of finding a relationship between a 𝑇MVTS instance π‘šπ‘£π‘‘π‘ π‘– . The MVTS instance π‘šπ‘£π‘‘π‘ π‘– ∈ the phosphoric magnetic field behavior and solar activ- R ×𝑁 is a collection of individual time series of 𝑁 mag- ity, which usually occurs in the solar chromosphere and netic field parameters, where each time series contains transition region of the solar corona. periodic observation values of the corresponding param- Deep learning-based sequence-to-sequence models us- eter for an observation period 𝑇 . In the MVTS instance ing Long Short Term Memory (LSTM), Recurrent Neu- π‘šπ‘£π‘‘π‘ π‘– = {𝑣𝑑1 , 𝑣𝑑2 , ., ., ., 𝑣𝑑𝑇 }, 𝑣𝑑𝑖 ∈ R represents a 𝑁 ral Network (RNN), and Gated Recurrent Unit (GRU) timestamp vector. We divide the dataset into 𝑑(𝑋,×𝑁 π‘Œ) pairs, where 𝑋𝑖 = π‘šπ‘£π‘‘π‘ π‘– [𝑑1 : π‘‘π‘‘π‘œπ‘π‘  , :] ∈ R π‘œπ‘π‘  , π‘Œπ‘– = π‘šπ‘£π‘‘π‘ π‘– [π‘‘π‘‘π‘œπ‘π‘ +1 : 𝑑𝑇 , :] ∈ Rπ‘‘π‘π‘Ÿπ‘’π‘‘ ×𝑁 , π‘‘π‘œπ‘π‘  is the to 1. We found batch normalization to be significant in observation time, and π‘‘π‘π‘Ÿπ‘’π‘‘ is the prediction time. maximizing the performance of MVTS forecasting for the magnetic field parameters of the solar events, which we 3.2. LSTM and Batch Normalization-based demonstrate in more detail in the experiments section. MVTS Forecasting 3.3. Evaluation Metrics In this section, we present a batch normalization-based implementation of the encoder-decoder model that uses We used Mean Absolute Error (MAE), Mean Squared LSTM architecture and compare it with other baseline Error (MSE), and Root Mean Squared Error (RMSE) to sequence models of naive stochastic gradient descent report our model results. The evaluation metrics (MAE, implementation (without batch normalization). There MSE, and RMSE) measure the amount of error in statisti- are different deep sequence learning models, which are cal models. They assess the average squared difference frequently applied in machine translation, and they can between the observed and predicted values. be adapted for time series forecasting. In this study, we Mean Absolute Error (MAE) is the average over analyze two seq2seq models: the batch normalization- the absolute values of the differences between predicted based seq2seq LSTM Model (BN seq2seq LSTM), and the representations and ground truth representations. seq2seq models based on LSTM/GRU/RNN, and compare 𝑛 their forecasting results. 1 βˆ‘οΈ 𝑀 𝐴𝐸 = |𝑦𝑖 βˆ’ 𝑦ˆ𝑖 | Fig. 1 depicts our seq2seq-based model that uses batch 𝑛 𝑖=1 normalization and LSTM architecture. First, in the en- coder LSTM cells, the value of each time step is used as where 𝑦𝑖 is the ground truth value and 𝑦ˆ𝑖 is the predicted input to the encoder LSTM cell together with the previ- value. ous cell state 𝑐 and hidden state β„Ž, the process repeats Mean Squared Error (MSE) is defined as the mean until the last cell state 𝑐 and hidden state β„Ž are generated. or average of the square of the difference between actual Then, the decoder LSTM cell uses the last cell state 𝑐 and and predicted values. hidden state β„Ž from the encoder as the initial states for the 𝑛 decoder LSTM cell. The last hidden state of the encoder 𝑀 𝑆𝐸 = 1 βˆ‘οΈ (𝑦𝑖 βˆ’ 𝑦ˆ𝑖 )2 is also copied π‘‘π‘π‘Ÿπ‘’π‘‘ times using a Repeat Vector layer ac- 𝑛 𝑖=1 cording to the length of the forecasting window, and each copy is inputted into the decoder LSTM cell together with Root Mean Squared Error (RMSE) is the difference the previous cell state 𝑐 and hidden state β„Ž. The decoder between forecast and corresponding observed values, outputs hidden states for all the π‘‘π‘π‘Ÿπ‘’π‘‘ time steps and the where each difference is squared and averaged over the hidden states are connected to the final Time-distributed- sample space. It denotes the square root of the MSE. dense layer in order to produce the final output sequence. ⎯ The time-distributed-dense layer allows to apply a dense ⎸ ⎸ 1 βˆ‘οΈ 𝑛 layer to every temporal slice of the input. We use this 𝑅𝑀 𝑆𝐸 = ⎷ (𝑦𝑖 βˆ’ 𝑦ˆ𝑖 )2 𝑛 𝑖=1 final layer to process the output from the LSTM hidden layer. Every input shape is three-dimensional, and the Experiments first dimension of the input is considered to be the tem- We compared the batch normalization-based seq2seq poral dimension. This means that we need to configure LSTM model with the baseline models on multivariate the last LSTM layer prior to the time-distributed-dense time series forecasting of magnetic field parameters of layer to return output sequences. The output shape will a solar events dataset. The source code of our model be three-dimensional as well, which means that if the and the experimental dataset are available on our GitHub time-distributed-dense layer is the output layer, then for repository 1 . predicting a sequence we need to reshape the final rep- resentation into a three-dimensional shape [16]. In the batch normalization-based seq2seq LSTM Model, we use 3.4. Dataset Description mini-batches to feed the data into the model. Batch nor- As the benchmark dataset of our experiments, we used malization is a useful method for making deep neural the MVTS-based solar flare prediction data set published network training faster and more robust, and it normal- by Angryk et al [3]. Each MVTS instance in the dataset izes the input activations to avoid gradient explosion is made up of 25 time series of active region magnetic caused by the activation function ELU (Exponential Lin- field parameters (a full list can be found in [1]). The time ear Unit) in the encoder [17]. The batch normalization series instances are recorded at 12 minutes intervals for a layer applies a transformation that maintains the mean output close to 0 and the output standard deviation close 1 https://github.com/Kalshammari/BN_Seq2Seq Table 1 Forecasting Performance of Batch Normalization-based seq2seq (LSTM) Model compared to the baselines Performance Metrics Gradient Descent LSTM Gradient Descent GRU Gradient Descent RNN BN seq2seq LSTM Train MAE 14.481 Β± 0.043 14.942 Β± 0.052 15.578 Β± 0.036 0.094 Β± 0.002 Test MAE 14.55 Β± 0.103 15.042 Β± 0.092 15.68 Β± 0.107 0.057 Β± 0.010 Train MSE 18.238 Β± 0.075 19.631 Β± 0.062 21.269 Β± 0.031 0.070 Β± 0.003 Test MSE 22.598 Β± 0.251 24.906 Β± 0.821 24.589 Β± 0.726 0.002 Β± 0.001 Train RMSE 18.434 Β± 0.039 19.126 Β± 0.652 19.921 Β± 0.821 0.265 Β± 0.007 Test RMSE 18.492 Β± 0.348 19.245 Β± 0.542 20.092 Β± 0.672 0.005 Β± 0.001 total duration of 12 hours (60-time steps). The dataset has tions was 25, the number of epochs in training was 5, and the number of observation points 𝑇 = 60, and the number the learning rate in stochastic gradient descent is 0.01. of dimensions in timestamp vectors 𝑁 = 25, while the event occurrence window is 12 hours. Our experimental 3.7. Performance of LSTM and Batch dataset consists of 1,540 MVTS instances that are evenly distributed across four flare classes (X, M, BC, and Q). Normalization-based seq2seq model We discarded the class labels to fit the dataset for MVTS When we apply LSTM and batch normalization-based forecasting [5, 4], where each MVTS instance is divided seq2seq model, we perform the following steps. First, into input and output (ground truth) sequences according we extract (𝑋, π‘Œ ) pairs from all 1,540 MVTS instances, to the observation window (π‘‘π‘œπ‘π‘  ) and prediction window where the length of each example 𝑋 is π‘‘π‘œπ‘π‘  = 40, the (π‘‘π‘π‘Ÿπ‘’π‘‘ ). In our experiments, π‘‘π‘œπ‘π‘  = 40, and π‘‘π‘π‘Ÿπ‘’π‘‘ = 20, length of each output π‘Œ is π‘‘π‘π‘Ÿπ‘’π‘‘ = 20, and each times- while 𝑇 = π‘‘π‘œπ‘π‘  + π‘‘π‘π‘Ÿπ‘’π‘‘ . tamp vector is 25-dimensional. In the encoder step, the input is of size (𝑏, 40, 25), where 3.5. Train/test splitting method 𝑏(= 10) is the batch size of the MVTS instances. For each encoder LSTM cell, the vector of each time step is used We performed random sampling for train/test splitting, as the input to the encoder LSTM cell together with the where we use the stratified holdout method (80 % for previous cell state 𝑐 and hidden state β„Ž, and the process training, and 20 % for testing) with six different random repeats until the last cell state 𝑐 and hidden state β„Ž are seeds, and reported the mean error rates along with stan- generated. The decoder LSTM cell uses the last cell state dard deviation. Train and test datasets are z-normalized 𝑐 and hidden state β„Ž from the encoder as the initial states since magnetic field parameter values appear on differ- for the decoder LSTM cell. The last hidden state of the ent scales. The shapes of train and test datasets are as encoder is also copied 20 times using the Repeat Vector follows. layer and each copy is inputted into the decoder LSTM cell together with the previous cell state 𝑐 and hidden β€’ X_train shape:(1232, 40, 25) and y_train state β„Ž. The decoder outputs a hidden state for all the shape:(1232, 20, 25) 20-time steps, and these hidden states are connected to β€’ X_test shape:(308, 40, 25) and y_test shape:(308, a time-distributed-dense layer to generate the final fore- 20, 25) casting output which is of size (𝑏, 20, 25). We used Mean Absolute Error (MAE), Mean Squared Error (MSE), and 3.6. Baseline Models Root Mean Squared Error (RMSE) to report our model performance results. We reported the mean and stan- We evaluated our model with LSTM, RNN, and GRU- dard deviation of the performance measures results in based seq2seq implementations. In the forward pass, Table 1. We found that our approach of deep sequence- we have input the first π‘‘π‘œπ‘π‘  vectors of each MVTS to to-sequence learning based on batch normalization and the encoder cells (LSTM/RNN/GRU) to produce the en- Long-Short Term Memory (LSTM) network significantly coded hidden state. That encoded hidden state is the outperformed the baseline methods’ results as Table 1 input to the decoder cells of the same type. The decoder shows. It is visible that batch normalization makes a then predicts the next 25-dimensional timestamp vectors difference of a large margin by producing errors near 0, for each timestamp in π‘‘π‘π‘Ÿπ‘’π‘‘ and matches the prediction whereas the traditional seq2seq models result in large with ground truth to perform stochastic gradient descent- error values due to the absence of batch normalization. based backpropagation. In all three models, the number of dimensions in cell state and hidden state representa- 4. Conclusion IEEE Intl. Conf. on Machine Learning and Applica- tions, ICMLA 2021, Pasadena, CA, USA, December We propose a batch normalization-based deep seq2seq 13-16, 2021, IEEE, 2021, pp. 435–440. model for multivariate time series forecasting of mag- [6] R. Ma, S. F. Boubrahimi, S. M. Hamdi, R. A. Angryk, netic field parameters of solar events. Unlike previous Solar flare prediction using multivariate time series works of MVTS-based event classification, we perform decision trees, in: 2017 IEEE Intl. Conf. on Big Data, forecasting of magnetic field parameter values irrespec- BigData 2017, Boston, MA, USA, December 11-14, tive of MVTS labels. We compare it with other seq2seq 2017, IEEE Computer Society, 2017, pp. 2569–2578. implementations based on LSTM, GRU, and RNN. Our [7] S. F. Boubrahimi, B. Aydin, D. Kempton, R. Angryk, proposed approach significantly improved the MAE, MSE, Spatio-temporal interpolation methods for solar and RMSE results of MVTS forecasting on a benchmark events metadata, in: 2016 IEEE Intl. Conf. on Big solar magnetic field parameter dataset. Data (Big Data), IEEE, 2016, pp. 3149–3157. For future research, we plan to develop machine learn- [8] J. P. Mason, J. Hoeksema, Testing automated solar ing models for MVTS forecasting that leverage MVTS flare forecasting with 13 years of michelson doppler labels. We aim to use the forecasting models for aug- imager magnetograms, The Astrophysical Journal menting (creating synthetic examples) MVTS instances 723 (2010) 634. of minority classes (rare events). In addition, to utilize [9] D. Bahdanau, K. Cho, Y. Bengio, Neural machine inter-variable dependencies of the MVTS instances for translation by jointly learning to align and translate, the task of forecasting, we plan to incorporate graph con- arXiv preprint arXiv:1409.0473 (2014). struction (e.g., functional network computation from the [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, correlation matrices of the MVTS instances) and graph Bert: Pre-training of deep bidirectional transform- neural network (GNN)-based representation learning. ers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [11] M. Yousefi-Azar, L. Hamey, Text summarization Acknowledgments using unsupervised deep learning, Expert Systems This project has been supported in part by funding from with Applications 68 (2017) 93–105. CISE and GEO directorates under NSF awards #2153379 [12] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and #2204363. I. Sutskever, et al., Language models are unsuper- vised multitask learners, OpenAI blog 1 (2019) 9. [13] L. Liu, A. M. Finch, M. Utiyama, E. Sumita, Agree- References ment on target-bidirectional lstms for sequence-to- sequence learning, in: Proc. of the AAAI Conf. [1] M. G. Bobra, S. Couvidat, Solar flare prediction on Artificial Intelligence, February 12-17, 2016, using sdo/hmi vector magnetic field data with a Phoenix, Arizona, USA, AAAI Press, 2016, pp. 2630– machine-learning algorithm, The Astrophysical 2637. Journal 798 (2015) 135. [14] P. H. Scherrer, J. Schou, R. Bush, A. Kosovichev, [2] J. P. Eastwood, E. Biffis, M. A. Hapgood, L. Green, R. Bogart, J. Hoeksema, Y. Liu, T. Duvall, J. Zhao, M. M. Bisi, R. D. Bentley, R. Wicks, L.-A. McKinnell, C. Schrijver, et al., The helioseismic and magnetic M. Gibbs, C. Burnett, The economic impact of space imager (hmi) investigation for the solar dynamics weather: Where do we stand?, Risk Analysis 37 observatory (sdo), Solar Physics 275 (2012) 207–227. (2017) 206–218. [15] H. Margarit, R. Subramaniam, A batch-normalized [3] R. A. Angryk, P. C. Martens, B. Aydin, D. Kempton, recurrent network for sentiment classification, Ad- S. S. Mahajan, S. Basodi, A. Ahmadzadeh, X. Cai, vances in Neural Information Processing Systems S. Filali Boubrahimi, S. M. Hamdi, et al., Multi- (2016) 2–8. variate time series dataset for space weather data [16] J. Brownlee, Long short-term memory networks analytics, Scientific data 7 (2020) 1–13. with python: develop sequence prediction mod- [4] S. M. Hamdi, D. Kempton, R. Ma, S. F. Boubrahimi, els with deep learning, Machine Learning Mastery, R. A. Angryk, A time series classification-based 2017. approach for solar flare prediction, in: 2017 IEEE [17] S. Santurkar, D. Tsipras, A. Ilyas, A. Madry, How Intl. Conf. on Big Data (Big Data), IEEE, 2017, pp. does batch normalization help optimization?, Ad- 2543–2551. vances in neural information processing systems [5] A. Muzaheed, S. M. Hamdi, S. F. Boubrahimi, Se- 31 (2018). quence model-based end-to-end solar flare classifi- cation from multivariate time series data, in: 20th