<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Solar flare prediction with temporal convolutional networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>D. Kryn</string-name>
          <email>dewaldkrynauw123@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>n Lotz</string-name>
          <email>slotz@sansa.org.za</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Artificial Intelligence Research (CAIR)</institution>
          ,
          <country country="ZA">South Africa</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Multilingual Speech Technologies (MuST), North-West University</institution>
          ,
          <country country="ZA">South Africa</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>South African National Space Agency (SANSA), Space Science directorate</institution>
          ,
          <addr-line>Hermanus</addr-line>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Sequences are typically modelled with recurrent architectures, but growing
research is finding convolutional architectures to also work well for sequence
modelling [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. We explore the performance of Temporal Convolutional Networks
(TCNs) when applied to an important sequence modelling task: solar flare
prediction. We take this approach, as our future goal is to apply techniques developed
for probing and interpreting general convolutional neural networks (CNNs) to
solar flare prediction.
      </p>
      <p>
        Severe space weather events originate near sunspots and are caused by
solar flares (broadband bursts of electromagnetic energy) and the accompanying
coronal mass ejections (plumes of magnetised gas projected outwards into space).
These space weather phenomena can damage spacecraft, communications and
electric power systems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We follow Liu et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] in trying to predict future
flares from past observations of the sun, and specifically from various images
and magnetograms of identified active regions (ARs), and parameters derived
from these. This is framed as a binary classification task that asks: will an AR
produce a Υ -class flare within the next 24 hours? In the current work we focus
on ≥ M 5.0 class flares. These are potentially more harmful, but also easier to
predict than lower class flares.
      </p>
      <p>
        The dataset used in this work is open source and compiled by Liu et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
It consists of the Space Weather HMI Active Region Patches (SHARP) data
produced by the Helioseismic and Magnetic Imager (HMI) on Solar Dynamics
Observatory (SDO) and an additional 15 parameters from Jonas et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]; plus
another 9 from Nishizuka et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] related to the flaring history. Due to the
unbalanced nature of the task, the True Skill Statistic (TSS) metric is most commonly
used to determine the effectiveness of a model, as suggested by Bloomfield et
al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. As of this writing, Liu et al. has produced the best result on this dataset: a
test TSS of 0.858 for a vanilla Long Short-Term Memory (LSTM), and 0.877 for
an LSTM extended with additional attention layers and fully connected layers.
      </p>
      <p>
        We replicate the LSTM only as a sanity check of the results from Liu et al.,
as our focus is on developing and optimising a TCN [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to determine whether
similar performance is achievable. A vanilla LSTM was trained and evaluated
with different numbers of layers (1, 5, 10), and varying batch sizes and learning
rates. After optimizing on the training and validation set, TSS was measured
      </p>
      <p>D.D. Krynauw et al.
on the test set, averaging over 3 seeds. After basic optimisation, a test TSS of
0.850 could be achieved, which is fairly similar to that achieved by Liu et al. No
dropout or weight decay was used.</p>
      <p>
        A TCN is in essence a 1D Fully Convolutional Network (FCN) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with dilated
causal convolutions. This network architecture is not new and is based on the
time delay neural network published 30 years ago by Waibel et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], with
the addition of zero-padding to ensure all layers are of equal size. There are
essentially two ways to increase the receptive field of the TCN: increasing the
number of levels (that is, the number of residual blocks [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), or the kernel size.
      </p>
      <p>
        We implement the TCN using publicly available source code4, implemented
using Pytorch. All the models are trained with the weighted cross-entropy loss
function to combat the unbalanced data and Adam [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is used as the optimiser.
After a set of initial probing runs to determine well-performing network
hyperparameters, a more in-depth optimisation of the TCN was conducted, by
searching over a wider range of learning rates. The results are logged and graphed using
the “Weights &amp; Biases” API and a full report of the results is available 5.
      </p>
      <p>A grid search over different levels, kernel sizes and hidden dimensions
(channels) was performed and the best two networks on the validation set were selected
to further refine learning rates. At first, a hidden dimension of 128 was selected,
but showed no significant increase and was reduced to the number of input
features (20). Initially, the results showed that the more levels the TCN has the
better it performs, but after optimising on different learning rates; the same TSS
can be achieved on the shallower networks with smaller learning rates that train
longer.</p>
      <p>The TCN was able to reach average validation TSS of 0.838 and average
test TSS of 0.848 with 7 levels and a kernel size of 2 (with dropout and weight
decay). A TCN with many levels comes at a large computational cost relative
to the LSTM. The best-performing TCN took 1 hour to train, compared to 10
minutes for the LSTM, using the same hardware. Optimising the TCN further
(using smaller learning rates, training longer and no regularisation), the shallow
networks of 1 level were able to obtain an average validation TSS of 0.711 and
test TSS of 0.886, with some individual networks reaching up to 0.910 test TSS.
These level 1 TCNs average around 17 minutes of training time, which is more
comparable (though still slower) than the vanilla LSTM.</p>
      <p>
        We applied TCNs to solar flare prediction – according to our knowledge an
architecture not yet used for this task. Results indicate that TCNs perform on
par with the LSTMs used by Liu et al., which are currently considered state of
the art. This is important as we are specifically interested in developing models
that can be probed and interpreted, and LSTMs are very difficult to analyse.
Our work confirms the statement by Bai et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], predicting that TCNs should
have similar performance as vanilla LSTMs.
4 https://github.com/locuslab/TCN
5 Full report
      </p>
      <p>Solar flare prediction with temporal convolutional networks</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolter</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koltun</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (3</article-title>
          <year>2018</year>
          ), http://arxiv.org/abs/
          <year>1803</year>
          .01271
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>D.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daly</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daglis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kappenman</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panasyuk</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Effects of Space Weather on Technology Infrastructure</article-title>
          .
          <source>Space Weather</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <source>n/a-n/a (2</source>
          <year>2004</year>
          ). https://doi.org/10.1029/2003sw000044, http://doi.wiley.
          <source>com/10</source>
          .1029/ 2003SW000044
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bloomfield</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Higgins</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McAteer</surname>
            ,
            <given-names>R.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gallagher</surname>
          </string-name>
          , P.T.:
          <article-title>Toward reliable benchmarking of solar flare forecasting methods</article-title>
          .
          <source>Astrophysical Journal Letters</source>
          <volume>747</volume>
          (
          <issue>2</issue>
          ),
          <volume>41</volume>
          (
          <year>2012</year>
          ). https://doi.org/10.1088/2041-8205/747/2/L41, http://www. swpc.noaa.gov/ftpdir/warehouse/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: Proceedings of the IEEE Computer Society Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (
          <year>2016</year>
          ). https://doi.org/10.1109/CVPR.
          <year>2016</year>
          .90
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jonas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bobra</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shankar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Todd</surname>
            <given-names>Hoeksema</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Recht</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Flare Prediction Using Photospheric and Coronal Image Data</article-title>
          .
          <source>Solar Physics</source>
          <volume>293</volume>
          (
          <issue>3</issue>
          ),
          <volume>48</volume>
          (
          <year>2018</year>
          ). https://doi.org/10.1007/s11207-018-1258-9, https://doi.org/10.1007/ s11207-018-1258-9
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A Method for Stochastic Optimization (12</article-title>
          <year>2014</year>
          ), http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.T.L.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          :
          <article-title>Predicting Solar Flares Using a Long Short-term Memory Network</article-title>
          .
          <source>The Astrophysical Journal</source>
          <volume>877</volume>
          (
          <issue>2</issue>
          ),
          <volume>121</volume>
          (6
          <year>2019</year>
          ). https://doi.org/10.3847/
          <fpage>1538</fpage>
          -4357/ab1b3c, https://iopscience.iop. org/article/10.3847/
          <fpage>1538</fpage>
          -4357/ab1b3c
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nishizuka</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugiura</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Den</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watari</surname>
            ,
            <given-names>S.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ishii</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Solar Flare Prediction Using Machine Learning with Multiwavelength Observations</article-title>
          .
          <source>Proceedings of the International Astronomical Union</source>
          <volume>13</volume>
          (
          <issue>S335</issue>
          ),
          <fpage>310</fpage>
          -
          <lpage>313</lpage>
          (
          <year>2017</year>
          ). https://doi.org/10.1017/s1743921317007293, https://doi.org/10.1017/ S1743921317007293
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Shelhamer</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Fully Convolutional Networks for Semantic Segmentation</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>39</volume>
          (
          <issue>4</issue>
          ),
          <fpage>640</fpage>
          -
          <lpage>651</lpage>
          (4
          <year>2017</year>
          ). https://doi.org/10.1109/TPAMI.
          <year>2016</year>
          .
          <volume>2572683</volume>
          , http:// ieeexplore.ieee.org/document/7478072/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Waibel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanazawa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shikano</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>K.J.</given-names>
          </string-name>
          :
          <article-title>Phoneme Recognition Using Time-Delay Neural Networks</article-title>
          .
          <source>IEEE Transactions on Acoustics, Speech, and Signal Processing</source>
          (
          <year>1989</year>
          ). https://doi.org/10.1109/29.21701
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>