1. Introduction

Forecasting in Africa

Kinyua Gikunda

n@up8.edu patrick.gikunda@dkut.ac.ke 0

Nicolas Jouandeau

Weather Forecasting, Deep Learning, Transfer Learning, Active Learning

0 Dedan Kimathi University of Technology , Nyeri , Kenya

3 7

Weather forecasting in Africa is hampered by sparse meteorological data and limited computational resources. This paper addresses these challenges by proposing lightweight deep learning (DL) for weather prediction and forecasting. We integrate active learning and transfer learning methods to enhance model training eficiency and accuracy. By focusing on the informativeness and representativeness of training samples, our approach significantly reduces the need for extensive and costly labeling. After training on a source dataset, model skills are transferred to target datasets, allowing for efective weather variable predictions with minimal data. Extensive experiments on three weather datasets demonstrate that our hybrid Transfer Active Learning method achieves similar classification accuracy compared to existing methods, using only 20% of the training samples. This study highlights the potential of advanced DL techniques to improve weather forecasting in Africa, despite the constraints of data scarcity and limited computational infrastructure.

The non-linear behavior of meteorological data poses significant challenges for weather prediction even

1. Introduction

https://csit.dkut.ac.ke/departments/it/dr-kinyua-gikunda/ (K. Gikunda); https:https://n.up8.site/ (N. Jouandeau)

CEUR Workshop Proceedings

ceur-ws.org ISSN1613-0073 [ 5 ]. Non-parametric learners like Gaussian kernels ofer flexibility but are hindered by their reliance on local generalization and the exponential growth of input dimensionality.

Deep Learning (DL) methods address these challenges by stacking multiple feature learning layers to form deep representations, enhancing both computational and statistical eficiency. Recent advancements have improved the representation of inputs with fewer parameters, allowing for efective feature learning using both labeled and unlabeled data. Transfer Learning (TL), a process within DL, leverages learned features to apply knowledge from one domain to another related domain, improving learning eficiency and efectiveness. This makes DL particularly suitable for complex and dynamic fields like weather prediction.

Deep learning methods, especially convolutional neural network (CNN)-based time series classifiers, have proven highly efective for extracting temporal and spatial features from spatio-temporal weather data [ 6 ]. These methods ofer faster and more accurate predictions and can handle large, complex datasets from weather satellites and IoT devices [ 7 ]. Unlike traditional models, DL do not require extensive feature engineering, making them more adaptable and practical for weather forecasting applications.

The flexibility and robustness of DL approaches make them well-suited for the complexities of weather data, which often exhibit non-linear and chaotic behavior. DL models, leveraging distributed and sparse representations, can capture intricate data structures that traditional parametric and non-parametric models struggle to represent efectively. This capability is crucial for processing high-dimensional meteorological datasets, where capturing subtle patterns and correlations can significantly enhance prediction accuracy.

DL’s superior feature learning capabilities allow for better representation and understanding of weather patterns, leading to improved prediction accuracy and reliability [ 8 ]. These techniques reduce the need for manual data preprocessing and feature extraction, streamlining the forecasting process. Moreover, DL methods excel at learning from vast amounts of data, continually improving predictive performance as more data becomes available. Their scalability ensures that forecasting systems remain eficient and efective even as data volumes grow, making DL particularly beneficial for weather forecasting.

3. Transfer Learning and Active Learning

To address the challenge of sparse training data in time series datasets, the proposed model incorporates two primary DL techniques: Transfer Learning and Active Learning.

TL allows the model to leverage pre-existing knowledge from a related source task and apply it to the target task. This technique enhances the model’s ability to generalize and perform well even with limited data by re-using model skills. AL dynamically queries and selects the most informative samples to add to the training set. It uses labeled data to provide critical information about class labels or boundaries, while unlabeled data helps in understanding the base data distribution. This iterative process improves the eficiency of the learning process by focusing on the most useful data points.

Before delving into the specifics of these techniques, it is essential to define the Time Series Classification (TSC) problem.

Definition 1. An univariate time series = [ 1, 2, ..., ] is an ordered set of real values. The length of is equal to the number of observable time-points T.

Definition 2. ∈

A multivariate time series = 1, 2, ...., consist of n observations per time-point with Definition 3. A dataset = ( 1, 1), ( 2, 2), ..., ( , ) is a collection of pairs ( , ) where could either be Ut or Mt with as its corresponding label. For a dataset containing classes, the label vector is a vector of length where each element ∈ [1, ] is equal to 1 if the class of is j and 0 otherwise.

We can define Time Series Classification (TSC) as the task of mapping time-based inputs to a probability distribution over a set of labels. This can be mathematically represented by the following = ( ∗ −/2∶+/2 + )|∀ ∈ 1, denotes the convolution result on a univariate time series of length with a filter of length , a bias parameter and a non-linear function . Applying several filters on a time series will result in a multivariate time series whose dimensions are equal to the number of filters used. Using the same filter values and in ConvNets its possible to find the results for all time stamps ∈ [1, ] . This is possible by using weight sharing that enables the model to learn feature detectors that are invariant across the equation: time array 4.

Deep Transfer Active Learning

During target training, the model’s parameters are initialized using weights from a previous task, represented as Θ ← . After initializing the weights, a forward pass through the model is performed using the function (,

), which computes the output for an input . The output is a vector of estimated probabilities for belonging to each class. The prediction loss is then computed using a cost function, such as the negative log likelihood. Using gradient descent, the weights are updated in a backward pass to propagate the error. This iterative process of forward pass followed by backpropagation updates the model’s parameters to minimize the loss on the training data. During testing, the model is evaluated on unseen data. A forward pass is performed on the new input, followed by class prediction. The predicted class corresponds to the one with the highest probability. For this, categorical cross-entropy is applied as the loss function, denoted as: (1) (2) (3) (4) The value of ( ) represents the density of in the unlabeled set. Higher values indicate that an instance is closely related to others, while lower values suggest outliers, which should be avoided for labeling. () = arg max − ∑ ( | ) log ( | ) where is the true label and ̂ is the predicted probability for class . This loss function helps to measure the performance of the classification model by comparing the predicted probabilities with the actual labels.

AL is used to select instances a model is most uncertain about to improve learning eficiency. In uncertainty sampling, the model aims to identify and learn from the most informative data points. Three primary metrics used to define uncertainty are least confidence, sample margin, and entropy. To take consideration of the entire output distribution, entropy is used as a metric which is defined as: Here, (

| ) is the posterior probability of instance belonging to class . For binary classification, the most uncertain instances are those with nearly equal probabilities for both classes.

Besides uncertainty, considering the distribution of instances can enhance AL performance. Instance diversity helps in selecting the most representative samples, thus improving query performance and avoiding outliers.

The correlation measure assesses the pairwise similarities of instances. The informativeness of an instance is determined by its average similarity to its neighbors. For two instances and , the correlation measure is defined as: () = 1

∑ ∈ / ( , )

To select the most informative and representative samples, a heuristic combination of correlation and uncertainty measures is employed. The most efective instance to label can be expressed as: =̂ arg max( () ⋅ ()) (5) This approach ensures that the selected samples are both uncertain and representative, enhancing the learning process.

5. Results

Three datasets were used in the experiments namely: a) RAUS1 dataset contains daily weather observations from various Australian weather stations for a period of 10 years, b) KenCentralMet (Kenya Meteorological Department2 privately acquired daily weather observations covering Central Kenya for a period of 3 years from 2012-2014 ) and c) MeteoNet3 a meteorological dataset developed and made available by the French national meteorological service. For each of the dataset, less than 20% of the labeled samples was used as the initial training set. We present comparison of the proposed DTAL method, as detailed in the previous section, against: i) Random selection of data samples to query, iii) QUIRE method inspired by the margin based active learning from the minimax viewpoint with emphasize on selecting unlabeled instances that are both informative and representative [ 9 ], iv) DFAL method that selects unlabeled samples with the smallest perturbation. The distance between a sample and its smallest adversarial example better approximates the original distance to the decision boundary [ 10 ], v) Core-Set non-uncertainty based AL method [ 11 ].

Random DTAL QUIRE DFAL Core-Set RAUS

ℙ 81 80 89 83 79 ℝ 80 85 84 82 83 79 85 81 80 84

KenCentralMet MeteoNet

ℙ 64 68 67 60 65 ℝ 67 64 68 62 65 62 67 67 64 68 ℙ 89 91 87 91 90 ℝ 85 90 88 88 91 91 93 86 93 91

6. Conclusion

This paper demonstrates the eficacy of lightweight deep learning, integrating active and transfer learning, for weather prediction in Africa. Our hybrid Transfer Active Learning method significantly enhances forecasting accuracy with minimal data, using only small portion of the training samples compared to existing methods. Despite challenges of data scarcity and limited computational resources, our approach shows promise in providing good weather forecasts essential for efective decision-making and resource management in Africa. Future work will focus on refining these techniques and validating their practical benefits in real-world applications. 1https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package The author(s) have not employed any Generative AI tools.

[1]

P. J.

Cooper ,

Dimes ,

Rao ,

Shapiro ,

Shiferaw ,

Twomlow , Coping better with current climatic variability in the rain-fed farming systems of sub-saharan africa: an essential first step in adapting to future climate change? , Agriculture, ecosystems & environment 126 ( 2008 ) 24 - 35 .

[2]

Radeny ,

Desalegn ,

Mubiru ,

Kyazze ,

Mahoo ,

Recha ,

Kimeli ,

Solomon , Indigenous knowledge for seasonal weather and climate forecasting across east africa , Climatic Change 156 ( 2019 ) 509 - 526 .

[3]

Benavides Cesar , R. Amaro e Silva,

M. Á. Manso

Callejo , C. -I. Cira, Review on spatio-temporal solar forecasting methods driven by in situ measurements or their combination with satellite and numerical weather prediction (nwp) estimates , Energies 15 ( 2022 ) 4341 .

[4] M. Das , S. K. Ghosh , Data-driven approaches for meteorological time series prediction: a comparative study of the state-of-the-art computational intelligence techniques , Pattern Recognition Letters 105 ( 2018 ) 155 - 164 .

[5]

Cohen ,

Sharir ,

Shashua , On the expressive power of deep learning: A tensor analysis , in: Conference on learning theory, PMLR , 2016 , pp. 698 - 728 .

[6]

J. F.

Torres ,

Hadjout ,

Sebaa ,

Martínez-Álvarez ,

Troncoso , Deep learning for time series forecasting: a survey, Big Data 9 ( 2021 ) 3 - 21 .

[7]

Chen ,

Han ,

Wang ,

Zhao ,

Yang ,

Yang , Machine learning methods in weather and climate applications: A survey , Applied Sciences 13 ( 2023 ) 12019 .

[8]

Huang ,

Wang ,

Y.-G.

Ham ,

Mu ,

Tao ,

Xie , Toward a learnable climate model in the artificial intelligence era, Advances in Atmospheric Sciences ( 2024 ) 1 - 8 .

[9]

S.-J.

Huang ,

Jin ,

Z.-H.

Zhou , Active learning by querying informative and representative examples , Advances in neural information processing systems 23 ( 2010 ).

[10]

Ducofe ,

Precioso , Adversarial active learning for deep networks: a margin based approach , arXiv preprint arXiv: 1802 . 09841 ( 2018 ).

[11]

Sener ,

Savarese , Active learning for convolutional neural networks: A core-set approach , arXiv preprint arXiv:1708.00489 ( 2017 ).