=Paper=
{{Paper
|id=Vol-3375/paper4
|storemode=property
|title=Forecasting Multivariate Time Series of the Magnetic Field Parameters of the Solar Events
|pdfUrl=https://ceur-ws.org/Vol-3375/paper4.pdf
|volume=Vol-3375
|authors=Khaznah Alshammari,Shah Muhammad Hamdi,Ali Ahsan Muhummad Muzaheed,Soukaina Filali Boubrahimi
|dblpUrl=https://dblp.org/rec/conf/amlts/AlshammariHMB22
}}
==Forecasting Multivariate Time Series of the Magnetic Field Parameters of the Solar Events==
<pdf width="1500px">https://ceur-ws.org/Vol-3375/paper4.pdf</pdf>
<pre>
Forecasting Multivariate Time Series of the Magnetic Field
Parameters of the Solar Events
Khaznah Alshammari1,* , Shah Muhammad Hamdi2 , Ali Ahsan Muhummad Muzaheed1 and
Soukaina Filali Boubrahimi2
1
    New Mexico State University, Las Cruces, NM, 88003, USA
2
    Utah State University, Logan, UT, 84322, USA


                                       Abstract
                                       Solar magnetic field parameters are frequently used by solar physicists in analyzing and predicting solar events (e.g., flares,
                                       coronal mass ejection, etc). Temporal observation of the magnetic field parameters, i.e., multivariate time series (MVTS)
                                       representation facilitates finding relationships of magnetic field states to the occurrence of the solar events. Forecasting MVTS
                                       of solar magnetic field parameters is the prediction of future magnetic field parameter values based on historic values of the
                                       past, regardless of the event labels. In this paper, we propose a deep sequence-to-sequence (seq2seq) learning approach based
                                       on batch normalization and Long-Short Term Memory (LSTM) network for MVTS forecasting of magnetic field parameters of
                                       the solar events. To the best of our knowledge, this is the first work that addresses the forecasting of magnetic field parameters
                                       rather than the classification of events based on MVTS representations of those parameters. The experimental results on a
                                       real-life MVTS-based solar event dataset demonstrate that our batch normalization-based model outperforms naive sequence
                                       models in forecasting performance.

                                       Keywords
                                       Multivariate time series Forecasting, Solar Physics, Solar Magnetic Field Parameters, LSTM, Batch Normalization


1. Introduction                                                                                        events. The primary data source used in these efforts is
                                                                                                       the images captured by the Helioseismic Magnetic Imager
Solar events are characterized by magnetic field param- (HMI) housed in the Solar Dynamics Observatory (SDO).
eter values on the solar corona such as helicity, flux, HMI images (captured in near-continuous time) contain
Lorentz force, etc. These magnetic field parameter val- spatiotemporal magnetic field data of solar active regions.
ues indicate the occurrence of extreme solar events such For performing temporal window-based flare prediction
as solar flares, coronal mass ejection (CME), and erup- of an AR instance, the spatiotemporal magnetic field
tion of solar energetic particles (SEP) [1]. These events data of that region is mapped into a multivariate time
are caused by a sudden burst of magnetic flux from the series (MVTS) instance[3]. MVTS instances, collected
corona. The X-ray radiation of such extreme solar events with a uniform sampling rate throughout a present ob-
can have devastating effects on life and infrastructure servation period, are labeled with multiple event classes
in space and ground such as disruption in GPS and ra- (e.g., flare classes), and machine learning-based classi-
dio communication, damage to electronic devices, and fiers are trained with labeled MVTS instances to predict
radiation exposure-based health risks to the astronauts. the occurrences of the events after a preset prediction
The cost associated due to infrastructure damage after window. Although multiple research efforts [4, 5, 6] ad-
extreme solar events can rise up to trillions of dollars [2]. dressed MVTS-based solar event prediction, forecasting
    In recent years, the prediction of solar events given a of MVTS-represented magnetic field parameters is yet to
predefined time window has become an important chal- be explored.
lenge in the heliophysics community. Since the theoreti-                                                  In this work, we aim to forecast the future values of the
cal relationship between magnetic field influx and the oc- magnetic field parameters, given past values in the MVTS
currence of extreme events in solar active regions (AR) is representations. In case of a sudden data gap, i.e., inter-
not yet established, space weather researchers depend on ruption in the communication between the satellite and
the data of science-based approaches for predicting solar ground receiver, MVTS forecasting of magnetic field pa-
                                                                                                       rameters can play an important role in extrapolation. To
AMLTS’22: Workshop on Applied Machine Learning Methods for Time
Series Forecasting, co-located with the 31st ACM International Con-
                                                                                                       the best of our knowledge, this is the first attempt to fore-
ference on Information and Knowledge Management (CIKM), October cast the solar magnetic field parameters. We used a deep
17-21, 2022, Atlanta, USA                                                                              sequence-to-sequence learning model based on batch
$ kalshamm@nmsu.edu (K. Alshammari); s.hamdi@usu.edu                                                   normalization and Long-Short Term Memory (LSTM)
(S. M. Hamdi); muzaheed@nmsu.edu (A. A. M. Muzaheed);                                                  network that is trained with input-output pairs of exam-
soukaina.boubrahimi@usu.edu (S. F. Boubrahimi)
          © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License ples, where the inputs are formed by sampling the MVTS
    CEUR
          Attribution 4.0 International (CC BY 4.0).
          CEUR Workshop Proceedings (CEUR-WS.org)
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                       instances for an observation window, and the outputs
Figure 1: LSTM and Batch normalization-based seq2seq model for MVTS forecasting


are formed by sampling the MVTS instances for a pre-       have been used successfully in multiple Natural Lan-
diction window (which follows the observation window).     guage Processing (NLP) tasks such as machine trans-
Our LSTM-based encoder-decoder model is trained with       lation [9, 10] and text summarization [11, 12]. Since
a backpropagation algorithm based on mini-batch gra-       multivariate time series are high-dimensional sequence
dient descent-based optimization for minimizing Mean       data, previously MVTS forecasting has been addressed
Squared Error (MSE) between the observed MVTS (input)      by different seq2seq models [13, 14]. In [15], batch nor-
and predicted MVTS (output).                               malization has shown promising improvements in the
                                                           sentiment classification task, where a batch-normalized
                                                           variant of LSTM architecture is used and each LSTM cell’s
2. Related Work                                            input, hidden state, and cell state are normalized during
                                                           training. Being inspired by encoder-decoder-based ma-
Recent research efforts on solar event prediction are
                                                           chine translation models, in this work we considered the
mostly based on data science. Data-driven extreme solar
                                                           MVTS forecasting of solar magnetic field parameters as
event prediction models stem from linear and nonlin-
                                                           a sequence-to-sequence learning task, and used batch
ear statistics. Datasets used in these models were col-
                                                           normalization-based LSTM architecture for capturing
lected from line-of-sight magnetogram and vector mag-
                                                           long-term dependencies of multi-dimensional sequence
netogram data. Line-of-sight magnetogram contains only
                                                           data.
the line-of-sight component of the magnetic field, while
vector magnetogram contains the full disk magnetic field
data [7]. NASA launched Solar Dynamics Observatory 3. Methodology
(SDO) in 2010. Since then, SDO’s instrument Helioseis-
mic and Magnetic Imager (HMI) has been mapping the 3.1. Notations
full-disk vector magnetic field every 12 minutes [1]. Most
of the recent prediction models use the near-continuous Each solar active region results in different event occur-
stream of vector magnetogram data found from SDO [8]. rences after a given prediction window represents an
Magnetic field parameters (e.g., helicity, flux, etc) were event instance. The event instance 𝑖 is represented by
developed with the goal of finding a relationship between a 𝑇MVTS    instance 𝑚𝑣𝑡𝑠𝑖 . The MVTS instance 𝑚𝑣𝑡𝑠𝑖 ∈
the phosphoric magnetic field behavior and solar activ- R
                                                               ×𝑁
                                                                   is a collection of individual time series of 𝑁 mag-
ity, which usually occurs in the solar chromosphere and netic field parameters, where each time series contains
transition region of the solar corona.                     periodic observation values of the corresponding param-
   Deep learning-based sequence-to-sequence models us- eter for an observation period 𝑇 . In the MVTS          instance
ing Long Short Term Memory (LSTM), Recurrent Neu- 𝑚𝑣𝑡𝑠𝑖 = {𝑣𝑡1 , 𝑣𝑡2 , ., ., ., 𝑣𝑡𝑇 }, 𝑣𝑡𝑖 ∈ R represents a
                                                                                                       𝑁

ral Network (RNN), and Gated Recurrent Unit (GRU) timestamp vector. We divide the dataset into 𝑡(𝑋,×𝑁                𝑌)
                                                           pairs, where 𝑋𝑖 = 𝑚𝑣𝑡𝑠𝑖 [𝑡1 : 𝑡𝑡𝑜𝑏𝑠 , :] ∈ R 𝑜𝑏𝑠 ,
𝑌𝑖 = 𝑚𝑣𝑡𝑠𝑖 [𝑡𝑡𝑜𝑏𝑠+1 : 𝑡𝑇 , :] ∈ R𝑡𝑝𝑟𝑒𝑑 ×𝑁 , 𝑡𝑜𝑏𝑠 is the         to 1. We found batch normalization to be significant in
observation time, and 𝑡𝑝𝑟𝑒𝑑 is the prediction time.             maximizing the performance of MVTS forecasting for the
                                                                magnetic field parameters of the solar events, which we
3.2. LSTM and Batch Normalization-based                         demonstrate in more detail in the experiments section.
     MVTS Forecasting
                                                                3.3. Evaluation Metrics
In this section, we present a batch normalization-based
implementation of the encoder-decoder model that uses           We used Mean Absolute Error (MAE), Mean Squared
LSTM architecture and compare it with other baseline            Error (MSE), and Root Mean Squared Error (RMSE) to
sequence models of naive stochastic gradient descent            report our model results. The evaluation metrics (MAE,
implementation (without batch normalization). There             MSE, and RMSE) measure the amount of error in statisti-
are different deep sequence learning models, which are          cal models. They assess the average squared difference
frequently applied in machine translation, and they can         between the observed and predicted values.
be adapted for time series forecasting. In this study, we         Mean Absolute Error (MAE) is the average over
analyze two seq2seq models: the batch normalization-            the absolute values of the differences between predicted
based seq2seq LSTM Model (BN seq2seq LSTM), and the             representations and ground truth representations.
seq2seq models based on LSTM/GRU/RNN, and compare                                                 𝑛
their forecasting results.                                                                    1 ∑︁
                                                                                   𝑀 𝐴𝐸 =           |𝑦𝑖 − 𝑦ˆ𝑖 |
   Fig. 1 depicts our seq2seq-based model that uses batch                                     𝑛 𝑖=1
normalization and LSTM architecture. First, in the en-
coder LSTM cells, the value of each time step is used as        where 𝑦𝑖 is the ground truth value and 𝑦ˆ𝑖 is the predicted
input to the encoder LSTM cell together with the previ-         value.
ous cell state 𝑐 and hidden state ℎ, the process repeats           Mean Squared Error (MSE) is defined as the mean
until the last cell state 𝑐 and hidden state ℎ are generated.   or average of the square of the difference between actual
Then, the decoder LSTM cell uses the last cell state 𝑐 and      and predicted values.
hidden state ℎ from the encoder as the initial states for the                                    𝑛
decoder LSTM cell. The last hidden state of the encoder                           𝑀 𝑆𝐸 =
                                                                                             1 ∑︁
                                                                                                   (𝑦𝑖 − 𝑦ˆ𝑖 )2
is also copied 𝑡𝑝𝑟𝑒𝑑 times using a Repeat Vector layer ac-                                   𝑛 𝑖=1
cording to the length of the forecasting window, and each
copy is inputted into the decoder LSTM cell together with         Root Mean Squared Error (RMSE) is the difference
the previous cell state 𝑐 and hidden state ℎ. The decoder       between forecast and corresponding observed values,
outputs hidden states for all the 𝑡𝑝𝑟𝑒𝑑 time steps and the      where each difference is squared and averaged over the
hidden states are connected to the final Time-distributed-      sample space. It denotes the square root of the MSE.
dense layer in order to produce the final output sequence.                              ⎯
The time-distributed-dense layer allows to apply a dense                                ⎸
                                                                                        ⎸ 1 ∑︁ 𝑛

layer to every temporal slice of the input. We use this                    𝑅𝑀 𝑆𝐸 = ⎷             (𝑦𝑖 − 𝑦ˆ𝑖 )2
                                                                                           𝑛 𝑖=1
final layer to process the output from the LSTM hidden
layer. Every input shape is three-dimensional, and the
                                                                  Experiments
first dimension of the input is considered to be the tem-
                                                                  We compared the batch normalization-based seq2seq
poral dimension. This means that we need to configure
                                                                LSTM model with the baseline models on multivariate
the last LSTM layer prior to the time-distributed-dense
                                                                time series forecasting of magnetic field parameters of
layer to return output sequences. The output shape will
                                                                a solar events dataset. The source code of our model
be three-dimensional as well, which means that if the
                                                                and the experimental dataset are available on our GitHub
time-distributed-dense layer is the output layer, then for
                                                                repository 1 .
predicting a sequence we need to reshape the final rep-
resentation into a three-dimensional shape [16]. In the
batch normalization-based seq2seq LSTM Model, we use            3.4. Dataset Description
mini-batches to feed the data into the model. Batch nor-
                                                                As the benchmark dataset of our experiments, we used
malization is a useful method for making deep neural
                                                                the MVTS-based solar flare prediction data set published
network training faster and more robust, and it normal-
                                                                by Angryk et al [3]. Each MVTS instance in the dataset
izes the input activations to avoid gradient explosion
                                                                is made up of 25 time series of active region magnetic
caused by the activation function ELU (Exponential Lin-
                                                                field parameters (a full list can be found in [1]). The time
ear Unit) in the encoder [17]. The batch normalization
                                                                series instances are recorded at 12 minutes intervals for a
layer applies a transformation that maintains the mean
output close to 0 and the output standard deviation close       1
                                                                    https://github.com/Kalshammari/BN_Seq2Seq
Table 1
Forecasting Performance of Batch Normalization-based seq2seq (LSTM) Model compared to the baselines


 Performance Metrics     Gradient Descent LSTM      Gradient Descent GRU        Gradient Descent RNN        BN seq2seq LSTM
      Train MAE             14.481 ± 0.043             14.942 ± 0.052              15.578 ± 0.036            0.094 ± 0.002
       Test MAE              14.55 ± 0.103             15.042 ± 0.092              15.68 ± 0.107             0.057 ± 0.010
      Train MSE             18.238 ± 0.075             19.631 ± 0.062              21.269 ± 0.031            0.070 ± 0.003
       Test MSE             22.598 ± 0.251             24.906 ± 0.821              24.589 ± 0.726            0.002 ± 0.001
     Train RMSE             18.434 ± 0.039             19.126 ± 0.652              19.921 ± 0.821            0.265 ± 0.007
      Test RMSE             18.492 ± 0.348             19.245 ± 0.542              20.092 ± 0.672            0.005 ± 0.001


total duration of 12 hours (60-time steps). The dataset has   tions was 25, the number of epochs in training was 5, and
the number of observation points 𝑇 = 60, and the number       the learning rate in stochastic gradient descent is 0.01.
of dimensions in timestamp vectors 𝑁 = 25, while the
event occurrence window is 12 hours. Our experimental         3.7. Performance of LSTM and Batch
dataset consists of 1,540 MVTS instances that are evenly
distributed across four flare classes (X, M, BC, and Q).
                                                                   Normalization-based seq2seq model
We discarded the class labels to fit the dataset for MVTS     When we apply LSTM and batch normalization-based
forecasting [5, 4], where each MVTS instance is divided       seq2seq model, we perform the following steps. First,
into input and output (ground truth) sequences according      we extract (𝑋, 𝑌 ) pairs from all 1,540 MVTS instances,
to the observation window (𝑡𝑜𝑏𝑠 ) and prediction window       where the length of each example 𝑋 is 𝑡𝑜𝑏𝑠 = 40, the
(𝑡𝑝𝑟𝑒𝑑 ). In our experiments, 𝑡𝑜𝑏𝑠 = 40, and 𝑡𝑝𝑟𝑒𝑑 = 20,      length of each output 𝑌 is 𝑡𝑝𝑟𝑒𝑑 = 20, and each times-
while 𝑇 = 𝑡𝑜𝑏𝑠 + 𝑡𝑝𝑟𝑒𝑑 .                                      tamp vector is 25-dimensional.
                                                                 In the encoder step, the input is of size (𝑏, 40, 25), where
3.5. Train/test splitting method                              𝑏(= 10) is the batch size of the MVTS instances. For each
                                                              encoder LSTM cell, the vector of each time step is used
We performed random sampling for train/test splitting,        as the input to the encoder LSTM cell together with the
where we use the stratified holdout method (80 % for          previous cell state 𝑐 and hidden state ℎ, and the process
training, and 20 % for testing) with six different random     repeats until the last cell state 𝑐 and hidden state ℎ are
seeds, and reported the mean error rates along with stan-     generated. The decoder LSTM cell uses the last cell state
dard deviation. Train and test datasets are z-normalized      𝑐 and hidden state ℎ from the encoder as the initial states
since magnetic field parameter values appear on differ-       for the decoder LSTM cell. The last hidden state of the
ent scales. The shapes of train and test datasets are as      encoder is also copied 20 times using the Repeat Vector
follows.                                                      layer and each copy is inputted into the decoder LSTM
                                                              cell together with the previous cell state 𝑐 and hidden
     • X_train shape:(1232, 40, 25) and y_train
                                                              state ℎ. The decoder outputs a hidden state for all the
       shape:(1232, 20, 25)
                                                              20-time steps, and these hidden states are connected to
     • X_test shape:(308, 40, 25) and y_test shape:(308,
                                                              a time-distributed-dense layer to generate the final fore-
       20, 25)
                                                              casting output which is of size (𝑏, 20, 25). We used Mean
                                                              Absolute Error (MAE), Mean Squared Error (MSE), and
3.6. Baseline Models                                          Root Mean Squared Error (RMSE) to report our model
                                                              performance results. We reported the mean and stan-
We evaluated our model with LSTM, RNN, and GRU-
                                                              dard deviation of the performance measures results in
based seq2seq implementations. In the forward pass,
                                                              Table 1. We found that our approach of deep sequence-
we have input the first 𝑡𝑜𝑏𝑠 vectors of each MVTS to
                                                              to-sequence learning based on batch normalization and
the encoder cells (LSTM/RNN/GRU) to produce the en-
                                                              Long-Short Term Memory (LSTM) network significantly
coded hidden state. That encoded hidden state is the
                                                              outperformed the baseline methods’ results as Table 1
input to the decoder cells of the same type. The decoder
                                                              shows. It is visible that batch normalization makes a
then predicts the next 25-dimensional timestamp vectors
                                                              difference of a large margin by producing errors near 0,
for each timestamp in 𝑡𝑝𝑟𝑒𝑑 and matches the prediction
                                                              whereas the traditional seq2seq models result in large
with ground truth to perform stochastic gradient descent-
                                                              error values due to the absence of batch normalization.
based backpropagation. In all three models, the number
of dimensions in cell state and hidden state representa-
4. Conclusion                                                     IEEE Intl. Conf. on Machine Learning and Applica-
                                                                  tions, ICMLA 2021, Pasadena, CA, USA, December
We propose a batch normalization-based deep seq2seq               13-16, 2021, IEEE, 2021, pp. 435–440.
model for multivariate time series forecasting of mag-        [6] R. Ma, S. F. Boubrahimi, S. M. Hamdi, R. A. Angryk,
netic field parameters of solar events. Unlike previous           Solar flare prediction using multivariate time series
works of MVTS-based event classification, we perform              decision trees, in: 2017 IEEE Intl. Conf. on Big Data,
forecasting of magnetic field parameter values irrespec-          BigData 2017, Boston, MA, USA, December 11-14,
tive of MVTS labels. We compare it with other seq2seq             2017, IEEE Computer Society, 2017, pp. 2569–2578.
implementations based on LSTM, GRU, and RNN. Our              [7] S. F. Boubrahimi, B. Aydin, D. Kempton, R. Angryk,
proposed approach significantly improved the MAE, MSE,            Spatio-temporal interpolation methods for solar
and RMSE results of MVTS forecasting on a benchmark               events metadata, in: 2016 IEEE Intl. Conf. on Big
solar magnetic field parameter dataset.                           Data (Big Data), IEEE, 2016, pp. 3149–3157.
   For future research, we plan to develop machine learn-     [8] J. P. Mason, J. Hoeksema, Testing automated solar
ing models for MVTS forecasting that leverage MVTS                flare forecasting with 13 years of michelson doppler
labels. We aim to use the forecasting models for aug-             imager magnetograms, The Astrophysical Journal
menting (creating synthetic examples) MVTS instances              723 (2010) 634.
of minority classes (rare events). In addition, to utilize    [9] D. Bahdanau, K. Cho, Y. Bengio, Neural machine
inter-variable dependencies of the MVTS instances for             translation by jointly learning to align and translate,
the task of forecasting, we plan to incorporate graph con-        arXiv preprint arXiv:1409.0473 (2014).
struction (e.g., functional network computation from the     [10] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
correlation matrices of the MVTS instances) and graph             Bert: Pre-training of deep bidirectional transform-
neural network (GNN)-based representation learning.               ers for language understanding, arXiv preprint
                                                                  arXiv:1810.04805 (2018).
                                                             [11] M. Yousefi-Azar, L. Hamey, Text summarization
Acknowledgments                                                   using unsupervised deep learning, Expert Systems
This project has been supported in part by funding from           with Applications 68 (2017) 93–105.
CISE and GEO directorates under NSF awards #2153379          [12] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei,
and #2204363.                                                     I. Sutskever, et al., Language models are unsuper-
                                                                  vised multitask learners, OpenAI blog 1 (2019) 9.
                                                             [13] L. Liu, A. M. Finch, M. Utiyama, E. Sumita, Agree-
References                                                        ment on target-bidirectional lstms for sequence-to-
                                                                  sequence learning, in: Proc. of the AAAI Conf.
 [1] M. G. Bobra, S. Couvidat, Solar flare prediction             on Artificial Intelligence, February 12-17, 2016,
     using sdo/hmi vector magnetic field data with a              Phoenix, Arizona, USA, AAAI Press, 2016, pp. 2630–
     machine-learning algorithm, The Astrophysical                2637.
     Journal 798 (2015) 135.                                 [14] P. H. Scherrer, J. Schou, R. Bush, A. Kosovichev,
 [2] J. P. Eastwood, E. Biffis, M. A. Hapgood, L. Green,          R. Bogart, J. Hoeksema, Y. Liu, T. Duvall, J. Zhao,
     M. M. Bisi, R. D. Bentley, R. Wicks, L.-A. McKinnell,        C. Schrijver, et al., The helioseismic and magnetic
     M. Gibbs, C. Burnett, The economic impact of space           imager (hmi) investigation for the solar dynamics
     weather: Where do we stand?, Risk Analysis 37                observatory (sdo), Solar Physics 275 (2012) 207–227.
     (2017) 206–218.                                         [15] H. Margarit, R. Subramaniam, A batch-normalized
 [3] R. A. Angryk, P. C. Martens, B. Aydin, D. Kempton,           recurrent network for sentiment classification, Ad-
     S. S. Mahajan, S. Basodi, A. Ahmadzadeh, X. Cai,             vances in Neural Information Processing Systems
     S. Filali Boubrahimi, S. M. Hamdi, et al., Multi-            (2016) 2–8.
     variate time series dataset for space weather data      [16] J. Brownlee, Long short-term memory networks
     analytics, Scientific data 7 (2020) 1–13.                    with python: develop sequence prediction mod-
 [4] S. M. Hamdi, D. Kempton, R. Ma, S. F. Boubrahimi,            els with deep learning, Machine Learning Mastery,
     R. A. Angryk, A time series classification-based             2017.
     approach for solar flare prediction, in: 2017 IEEE      [17] S. Santurkar, D. Tsipras, A. Ilyas, A. Madry, How
     Intl. Conf. on Big Data (Big Data), IEEE, 2017, pp.          does batch normalization help optimization?, Ad-
     2543–2551.                                                   vances in neural information processing systems
 [5] A. Muzaheed, S. M. Hamdi, S. F. Boubrahimi, Se-              31 (2018).
     quence model-based end-to-end solar flare classifi-
     cation from multivariate time series data, in: 20th

</pre>