<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Forecasting in Africa</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kinyua Gikunda</string-name>
          <email>n@up8.edu</email>
          <email>patrick.gikunda@dkut.ac.ke</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolas Jouandeau</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Weather Forecasting, Deep Learning, Transfer Learning, Active Learning</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dedan Kimathi University of Technology</institution>
          ,
          <addr-line>Nyeri</addr-line>
          ,
          <country country="KE">Kenya</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>Weather forecasting in Africa is hampered by sparse meteorological data and limited computational resources. This paper addresses these challenges by proposing lightweight deep learning (DL) for weather prediction and forecasting. We integrate active learning and transfer learning methods to enhance model training eficiency and accuracy. By focusing on the informativeness and representativeness of training samples, our approach significantly reduces the need for extensive and costly labeling. After training on a source dataset, model skills are transferred to target datasets, allowing for efective weather variable predictions with minimal data. Extensive experiments on three weather datasets demonstrate that our hybrid Transfer Active Learning method achieves similar classification accuracy compared to existing methods, using only 20% of the training samples. This study highlights the potential of advanced DL techniques to improve weather forecasting in Africa, despite the constraints of data scarcity and limited computational infrastructure.</p>
      </abstract>
      <kwd-group>
        <kwd>The non-linear behavior of meteorological data poses significant challenges for weather prediction</kwd>
        <kwd>even</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>https://csit.dkut.ac.ke/departments/it/dr-kinyua-gikunda/ (K. Gikunda); https:https://n.up8.site/ (N. Jouandeau)</p>
      <p>CEUR
Workshop
Proceedings</p>
      <p>
        ceur-ws.org
ISSN1613-0073
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Non-parametric learners like Gaussian kernels ofer flexibility but are hindered by their reliance on
local generalization and the exponential growth of input dimensionality.
      </p>
      <p>Deep Learning (DL) methods address these challenges by stacking multiple feature learning layers to
form deep representations, enhancing both computational and statistical eficiency. Recent
advancements have improved the representation of inputs with fewer parameters, allowing for efective feature
learning using both labeled and unlabeled data. Transfer Learning (TL), a process within DL, leverages
learned features to apply knowledge from one domain to another related domain, improving learning
eficiency and efectiveness. This makes DL particularly suitable for complex and dynamic fields like
weather prediction.</p>
      <p>
        Deep learning methods, especially convolutional neural network (CNN)-based time series classifiers,
have proven highly efective for extracting temporal and spatial features from spatio-temporal weather
data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These methods ofer faster and more accurate predictions and can handle large, complex
datasets from weather satellites and IoT devices [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Unlike traditional models, DL do not require
extensive feature engineering, making them more adaptable and practical for weather forecasting
applications.
      </p>
      <p>The flexibility and robustness of DL approaches make them well-suited for the complexities of weather
data, which often exhibit non-linear and chaotic behavior. DL models, leveraging distributed and sparse
representations, can capture intricate data structures that traditional parametric and non-parametric
models struggle to represent efectively. This capability is crucial for processing high-dimensional
meteorological datasets, where capturing subtle patterns and correlations can significantly enhance
prediction accuracy.</p>
      <p>
        DL’s superior feature learning capabilities allow for better representation and understanding of
weather patterns, leading to improved prediction accuracy and reliability [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These techniques reduce
the need for manual data preprocessing and feature extraction, streamlining the forecasting process.
Moreover, DL methods excel at learning from vast amounts of data, continually improving predictive
performance as more data becomes available. Their scalability ensures that forecasting systems remain
eficient and efective even as data volumes grow, making DL particularly beneficial for weather
forecasting.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Transfer Learning and Active Learning</title>
      <p>To address the challenge of sparse training data in time series datasets, the proposed model incorporates
two primary DL techniques: Transfer Learning and Active Learning.</p>
      <p>TL allows the model to leverage pre-existing knowledge from a related source task and apply it
to the target task. This technique enhances the model’s ability to generalize and perform well even
with limited data by re-using model skills. AL dynamically queries and selects the most informative
samples to add to the training set. It uses labeled data to provide critical information about class labels
or boundaries, while unlabeled data helps in understanding the base data distribution. This iterative
process improves the eficiency of the learning process by focusing on the most useful data points.</p>
      <p>Before delving into the specifics of these techniques, it is essential to define the Time Series
Classification (TSC) problem.</p>
      <p>Definition 1. An univariate time series   = [ 1,  2, ...,   ] is an ordered set of real values. The length of
  is equal to the number of observable time-points T.</p>
      <p>Definition 2.
   ∈</p>
      <p>A multivariate time series   =   1,   2, ....,    consist of n observations per time-point with
Definition 3. A dataset  = ( 1,  1), ( 2,  2), ..., (  ,   ) is a collection of pairs (  ,   ) where   could
either be Ut or Mt with   as its corresponding label. For a dataset containing  classes, the label vector   is
a vector of length  where each element  ∈ [1,  ] is equal to 1 if the class of   is j and 0 otherwise.</p>
      <p>We can define Time Series Classification (TSC) as the task of mapping time-based inputs to a
probability distribution over a set of labels. This can be mathematically represented by the following
  =  ( ∗ 
−/2∶+/2
+ )|∀ ∈ 1, 
 denotes the convolution result on a univariate time series   of length  with a filter  of length  , a
bias parameter  and a non-linear function  . Applying several filters on a time series will result in a
multivariate time series whose dimensions are equal to the number of filters used. Using the same filter
values  and  in ConvNets its possible to find the results for all time stamps  ∈ [1,  ] . This is possible
by using weight sharing that enables the model to learn feature detectors that are invariant across the
equation:
time array
4.</p>
    </sec>
    <sec id="sec-3">
      <title>Deep Transfer Active Learning</title>
      <p>During target training, the model’s parameters are initialized using weights from a previous task,
represented as Θ ←   . After initializing the weights, a forward pass through the model is performed
using the function  (,</p>
      <p>), which computes the output for an input   . The output is a vector of estimated
probabilities for   belonging to each class. The prediction loss is then computed using a cost function,
such as the negative log likelihood. Using gradient descent, the weights are updated in a backward pass
to propagate the error. This iterative process of forward pass followed by backpropagation updates the
model’s parameters to minimize the loss on the training data. During testing, the model is evaluated on
unseen data. A forward pass is performed on the new input, followed by class prediction. The predicted
class corresponds to the one with the highest probability. For this, categorical cross-entropy is applied
as the loss function, denoted as:
(1)
(2)
(3)
(4)
The value of   (  ) represents the density of   in the unlabeled set. Higher values indicate that an
instance is closely related to others, while lower values suggest outliers, which should be avoided for
labeling.
  () = arg max − ∑  (  |  ) log  (
 |  )
where   is the true label and  ̂ is the predicted probability for class  . This loss function helps to
measure the performance of the classification model by comparing the predicted probabilities with the
actual labels.</p>
      <p>AL is used to select instances a model is most uncertain about to improve learning eficiency. In
uncertainty sampling, the model aims to identify and learn from the most informative data points.
Three primary metrics used to define uncertainty are least confidence, sample margin, and entropy. To
take consideration of the entire output distribution, entropy is used as a metric which is defined as:
Here,  (</p>
      <p>|  ) is the posterior probability of instance   belonging to class  . For binary classification,
the most uncertain instances are those with nearly equal probabilities for both classes.</p>
      <p>Besides uncertainty, considering the distribution of instances can enhance AL performance. Instance
diversity helps in selecting the most representative samples, thus improving query performance and
avoiding outliers.</p>
      <p>The correlation measure assesses the pairwise similarities of instances. The informativeness of
an instance is determined by its average similarity to its neighbors. For two instances   and   , the
correlation measure   is defined as:
  () =

1</p>
      <p>∑
  ∈ / 
  (  ,   )</p>
      <p>To select the most informative and representative samples, a heuristic combination of correlation
and uncertainty measures is employed. The most efective instance to label can be expressed as:

 =̂ arg max(  () ⋅   ())
(5)
This approach ensures that the selected samples are both uncertain and representative, enhancing the
learning process.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <p>
        Three datasets were used in the experiments namely: a) RAUS1 dataset contains daily weather
observations from various Australian weather stations for a period of 10 years, b) KenCentralMet (Kenya
Meteorological Department2 privately acquired daily weather observations covering Central Kenya for
a period of 3 years from 2012-2014 ) and c) MeteoNet3 a meteorological dataset developed and made
available by the French national meteorological service. For each of the dataset, less than 20% of the
labeled samples was used as the initial training set. We present comparison of the proposed DTAL
method, as detailed in the previous section, against: i) Random selection of data samples to query,
iii) QUIRE method inspired by the margin based active learning from the minimax viewpoint with
emphasize on selecting unlabeled instances that are both informative and representative [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], iv) DFAL
method that selects unlabeled samples with the smallest perturbation. The distance between a sample
and its smallest adversarial example better approximates the original distance to the decision boundary
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], v) Core-Set non-uncertainty based AL method [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>Random</title>
      </sec>
      <sec id="sec-4-2">
        <title>DTAL</title>
      </sec>
      <sec id="sec-4-3">
        <title>QUIRE</title>
      </sec>
      <sec id="sec-4-4">
        <title>DFAL</title>
      </sec>
      <sec id="sec-4-5">
        <title>Core-Set RAUS</title>
        <p>ℙ
81
80
89
83
79
ℝ
80
85
84
82
83

79
85
81
80
84</p>
      </sec>
      <sec id="sec-4-6">
        <title>KenCentralMet MeteoNet</title>
        <p>ℙ
64
68
67
60
65
ℝ
67
64
68
62
65

62
67
67
64
68
ℙ
89
91
87
91
90
ℝ
85
90
88
88
91

91
93
86
93
91</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusion</title>
      <p>This paper demonstrates the eficacy of lightweight deep learning, integrating active and transfer
learning, for weather prediction in Africa. Our hybrid Transfer Active Learning method significantly
enhances forecasting accuracy with minimal data, using only small portion of the training samples
compared to existing methods. Despite challenges of data scarcity and limited computational resources,
our approach shows promise in providing good weather forecasts essential for efective decision-making
and resource management in Africa. Future work will focus on refining these techniques and validating
their practical benefits in real-world applications.
1https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package
The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dimes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shapiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shiferaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Twomlow</surname>
          </string-name>
          ,
          <article-title>Coping better with current climatic variability in the rain-fed farming systems of sub-saharan africa: an essential first step in adapting to future climate change?</article-title>
          , Agriculture, ecosystems &amp; environment
          <volume>126</volume>
          (
          <year>2008</year>
          )
          <fpage>24</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Radeny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desalegn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mubiru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kyazze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Recha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kimeli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Solomon</surname>
          </string-name>
          ,
          <article-title>Indigenous knowledge for seasonal weather and climate forecasting across east africa</article-title>
          ,
          <source>Climatic Change</source>
          <volume>156</volume>
          (
          <year>2019</year>
          )
          <fpage>509</fpage>
          -
          <lpage>526</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Benavides</surname>
          </string-name>
          <string-name>
            <surname>Cesar</surname>
          </string-name>
          , R. Amaro e Silva,
          <string-name>
            <given-names>M. Á. Manso</given-names>
            <surname>Callejo</surname>
          </string-name>
          , C.
          <article-title>-I. Cira, Review on spatio-temporal solar forecasting methods driven by in situ measurements or their combination with satellite and numerical weather prediction (nwp) estimates</article-title>
          ,
          <source>Energies</source>
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>4341</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. Das</surname>
            ,
            <given-names>S. K.</given-names>
          </string-name>
          <string-name>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Data-driven approaches for meteorological time series prediction: a comparative study of the state-of-the-art computational intelligence techniques</article-title>
          ,
          <source>Pattern Recognition Letters</source>
          <volume>105</volume>
          (
          <year>2018</year>
          )
          <fpage>155</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sharir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shashua</surname>
          </string-name>
          ,
          <article-title>On the expressive power of deep learning: A tensor analysis</article-title>
          ,
          <source>in: Conference on learning theory, PMLR</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>698</fpage>
          -
          <lpage>728</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Torres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hadjout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sebaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Martínez-Álvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Troncoso</surname>
          </string-name>
          ,
          <article-title>Deep learning for time series forecasting: a survey, Big Data 9 (</article-title>
          <year>2021</year>
          )
          <fpage>3</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Machine learning methods in weather and climate applications: A survey</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>13</volume>
          (
          <year>2023</year>
          )
          <fpage>12019</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-G.</given-names>
            <surname>Ham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <article-title>Toward a learnable climate model in the artificial intelligence era, Advances in Atmospheric Sciences (</article-title>
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.-J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Active learning by querying informative and representative examples</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>23</volume>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ducofe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Precioso</surname>
          </string-name>
          ,
          <article-title>Adversarial active learning for deep networks: a margin based approach</article-title>
          , arXiv preprint arXiv:
          <year>1802</year>
          .
          <volume>09841</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Sener</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Savarese</surname>
          </string-name>
          ,
          <article-title>Active learning for convolutional neural networks: A core-set approach</article-title>
          ,
          <source>arXiv preprint arXiv:1708.00489</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>