1. Introduction

Workshop on Complex Data Challenges in Earth Observation, November

10.3390/app10217834

Eficient Spatio-temporal Weather Forecasting Using U-Net

Akshay Punjabi

Pablo Izquierdo-Ayala

2021

1 2021

Weather forecast plays an essential role in multiple aspects of the daily life of human beings. Currently, physics based numerical weather prediction is used to predict the weather and requires enormous amount of computational resources. In recent years, deep learning based models have seen wide success in many weather-prediction related tasks. In this paper we describe our experiments for the Weather4cast 2021 Challenge, where 8 hours of spatio-temporal weather data is predicted based on an initial one hour of spatio-temporal data. We focus on SmaAt-UNet, an eficient U-Net based autoencoder. With this model we achieve competent results whilst maintaining low computational resources. Furthermore, several approaches and possible future work is discussed at the end of the paper.

eol>weather4cast 2021 weather forecast deep learning neural networks CNN U-Net SmaAt-UNet

1. Introduction

object detection[ 6 ] and image classification[ 7 ] tasks.

The model employed in this work is a variant of U-Net Weather prediction is an art that can be traced back to defined as SmaAt-UNet[ 8 ]. Both model and architecture Ancient History. Around the year 650 B.C, the Babylo- are further described throughout the text. nians were already using clouds and haloes to predict short-term weather variations. 2600 years later, weather forecasting has changed substantially but it still plays an 2. Weather4cast 2021 Challenge active role in the development of our society, becoming a valuable asset in many situations, such as the creation of warnings prior to a severe storm [ 1 ] .

Most of these predictions are now generated through Numerical Weather Prediction models (NWP) that provide estimates by means of various physical variables, such as atmospheric pressure, temperature, etc. While accurate, these models are often slow and require vast amounts of computational power, making them inaccessible to the public and impractical when attempting short-term forecasts [ 2 ].

In recent years, with the outbreak of Machine Learning and the growing volume of increasing higher-resolution information available, deep learning models have found major success in this domain and have managed to even rival the original NWP-based approaches [ 3 ][ 4 ]. These deep learning models do not rely in the current physical state of the atmosphere but instead utilize historical weather data to generate a future prediction.

In this paper we focus on a Convolutional Neural Network (CNN) approach.

Convolutional Neural Networks, such as U-Net[ 5 ], are a type of Artificial Neural Network (ANN) that is commonly used to process image data. They are based on convolutions, a kernel operation that allows the model to capture local invariant features in a given image. These 3. Methods networks are used in a wide range of tasks, especially in Weather4cast 2021 Challenge [9] is a competition held by the Institute of Advanced Research in Artificial Intelligence (IARAI) [10] with the goal of generating a shortterm prediction of selected weather products based on meteorological satellite data-products from diferent regions of Europe. These data-products range from February 2019 to February 2021 and are obtained in collaboration with AEMET [11] / NWC SAF [12]. This challenge presents weather forecast as a video frame prediction task, similarly to the Trafic4cast competitions at NeurIPS in 2019 [13] and 2020 [14], hosted by the same institute.

The data consists of four target weather variables: temperature (on accessible surface: top cloud or earth), convective rainfall rate, probability of occurrence of tropopause folding and cloud mask. The weather products are encoded as separate channels in the weather images. Each weather image contains 256 x 256 pixels of a particular region, in which each pixel corresponds to an area of about 4 km x 4 km. The images are recorded at 15 minute intervals throughout a year.

The goal is to predict the next 32 weather images (8 hours in 15 minute intervals) given 4 images (1 hour) of each of the regions provided.

There are several ways to approach this challenge, such as with ConvLSTMs [4], Graph Neural Networks (GNN) [15] and U-Nets [16]. In other similar competitions of spatiotemporal data, U-Net type architectures have shown the U-Net with DSC U-Net with CBAM and DSC (SmaAt-UNet) U-Net++ U-Net++ U-Net++

U-Net++ U-Net

U-Net with CBAM U-Net++ U-Net++ U-Net++ Backbone Eficientnet-b0 [19] Eficientnet-b1 Eficientnet-b2 Eficientnet-b3 Eficientnet-b4 Eficientnet-b5 SE-Resnext50 32x4d

Parameters 4 Millions 4.1 Millions 6 Millions 9.1 Millions 10.4 Millions 13.6 Millions 17.3 Millions 17.4 Millions 20.8 Millions 31.9 Millions 51 Millions

4. Experiments and Results

best results. For that reason we mainly base our work on U-nets, specially on eficient U-Nets. The neural network architecture used in our work is a recent state of the art Following the objective of the Weather4cast Core Commodel called SmaAt-UNet [ 8 ] (See Section 3.1). Some petition, we trained and experimented with our models preliminary tests were done on the U-Net++[17], a U- on regions R1 (Nile region), R2 (Eastern Europe) and R3 Net based model with nested dense convolutional blocks, (South West Europe) to obtain an eficient and competent and diferent backbones. Table 1 shows the size and model for spatio-temporal weather forecast. the number of parameters of each model. These larger autoencoders were not used as they reported virtually 4.1. Data the same results while requiring larger training time. In contrast, SmaAt-UNet is a much smaller and eficient model. As a result, all further experiments were done with the SmaAt-UNet model.

We employed the four data elements defined in Section 2

(temperature, convective rainfall rate, probability of occurrence of tropopause folding and cloud mask) and 3 additional static variables (latitude, longitude and elevation) provided by the organiser, adding up to 7 dimensions. 3.1. SmaAt-UNet We also modified the data structure. The original modSmaAt-UNet is a novel model that extends the origi- els would generate one single prediction given 4 input nal encoder-decoder structure proposed in the U-Net variables and a lead-time component. This lead-time architecture[ 5 ]. The architecture can be seen in Figure component would then be used as an index to extract 1. There are two major diferences when compared to its the 32 individual images from the output prediction. Our forerunner: models avoid using a lead-time component and instead

Firstly, the encoder contains a Convolutional Block generate the 32 individual predictions directly from the Attention Module (CBAM)[18]. This module combines a 4 input variables. channel attention module and a spatial attention module that enhance a given feature map. 4.2. Experimental Settings

Secondly, all the regular convolutions present in the original U-Net version are replaced by Depthwise-Separable Models were trained for 10 epochs using MSE loss and Convolutions (DSC), allowing the model to reduce sig- Adam [20] optimizer, with a learning rate of 0.001 and nificantly the number of parameters, hence making it Cosine Annealing with Warm Restarts schedule. [21]. lightweight in comparison to the original version. The experiments were run through a Colab Pro sub

This combination improves the performance of U-Net scription, which provides a single restricted Tesla P100 while significantly reducing the computational cost of the or restricted Tesla V100, and Pytorch v1.9 [22]. This platmodel (≈ 17 Million parameters of U-Net versus the ≈ 4 form limits its usage to a 24h time frame, after which any Million parameters of its SmaAt counterpart, see Table 1), running code is abruptly terminated. This time frame is allowing us to obtain reasonable results in our resource- reduced if overused, which caused many disruptions in restricted environment. our training pipeline and required active monitoring.

We also used 16-bit precision operations for a faster

training speed instead of the default 32-bit precision operations.

Code and experiments are publicly available and can be found in our GitHub repository. 1 4.3. Quantitative results

An extract of our results can be seen in Table 2, with several baseline models that we used to compare our ifndings: Model Persistence U-Net SmaAt-UNet SmaAt-UNet with CAWRS Best Ensemble SmaAt-UNet

MSE 1.000 0.669 0.612 0.597 0.572

The Persistence model uses the last image of the se

quence as the prediction image under the assumption obtained through an ensemble of several SmaAt-UNet that the weather will not vary significantly from a given models, obtaining an MSE of 0.572 over the testing set. time point t to t+1. By running this approach we obtain a baseline MSE of 1.0 . The U-Net model is a pretrained Our methods obtain a significantly lower MSE than the model provided by IARAI. This model performs with an baseline models while keeping a low resource demand. MSE of 0.669. Next is a single SmaAt-UNet, that already reduces the U-Net MSE down to 0.612, demonstrating 4.4. Qualitative results the power of this lightweight architecture. By adding a Cosine Annealing scheduler with Warm Restarts[21], In Figure 3 we visualize a prediction of cloud coverage the model performs considerably better in comparison obtained from one of the test sets, in particular for March to the scheduler-free version. Finally, the best result was 16th 2020. Due to the uncertainty of the future, the model does not really predict future positions of cloud coverage 1github.com/Dauriel/weather4cast2021/ and instead regresses to the mean for all the possible

5. Future Work In this section, we discuss some important considerations to be taken into account towards future work.

5.1. U-Net

As we have seen in our experiments, and in other sim

ilar competitions of spatio-temporal data, U-Net type architectures have shown the best results when dealing with this type of datasets. This is due to the capability of U-Net to model spatial characteristics of the data. However temporal characteristics are not captured correctly by this architecture (See Figure 2 vs Figure 3). Over an increasing time frame, U-Net is not able to capture the temporality of the data and predictions become consid- these two variables as an optical flow could boost the erably homogeneous in comparison to the ground truth. prediction score significantly and should be considered This condition is present in all of our predictions. in further competitions.

Including some kind of "memory", that is, the use of Recurrent Neural Networks (LSTM [23], ConvLSTM [24], etc) could allow the model to handle these temporal char- 6. Conclusion acteristics, improving the results substantially at the expense of a considerable increase in the computational resources required. 5.2. MSE loss

Another problem is that of the use of the MSE loss. The

MSE loss computes the average of the pixel values so that the error is minimized for any possible real prediction value. For this specific task, a better loss function that does not result on averaging possible pixel values would perform significantly better. Some researchers have tried addressing this problem by including new loss functions like the adversarial loss and the perceptual loss [25], which works well for images (e.g. ImageNet). However, these losses would probably perform poorly for these spatio-temporal physical variables. Moreover, modifying these loss functions comes at the expense of higher and more expensive training times. 5.3. Invertible Neural Networks

Given our main focus of creating eficient low resource

neural networks, we also studied the realm of Invertible Neural Networks (INN) [26].

INNs enable memory-eficient training by re-computing intermediate activations in the backward pass, rather than storing them in memory during the forward pass [27]. This enables eficient large-scale generative modeling [28] and high-resolution medical image analysis [29].

However, these were proven to be dificult to train and showed very notable checkerboard artifacts yielding very bad predictions. These results are inline with other papers about INN in literature [30]. 5.4. Wind data as an optical flow

Optical flow models are gaining a lot of interest in recent

video based tasks, such as video object detection [31] and video action recognition [32]. In fact, they are used in some of the state of the art models for video action recognition [33].

An approach could be the computation of the optical flows between each time step of the spatio-temporal images using these optical flow neural networks. However, the wind speed magnitude and wind direction of the provided data could already be considered optical lfow, removing the need to artificially compute it. Using

In this paper we display the findings obtained during our participation in the Weather4cast 2021 Competition. Our experiments show that the SmaAt-UNet model is a better alternative than the classical U-Net, as it improves the quality of the prediction and requires less resources to train than the original architecture. We achieved the best results by generating an ensembled prediction of several training checkpoints. We also discuss various improvements in the Future Work Section (see Section 5). These ideas will be further developed for future competitions. doi:https://doi.org/10.1016/j.patrec. learning library, in: H. Wallach, H. Larochelle, 2021.01.036. A. Beygelzimer, F. d'Alché-Buc, E. Fox, R. Garnett [9] Weather4cast 2021 Challenge, https://www.iarai.ac. (Eds.), Advances in Neural Information Processat/weather4cast/, 2021. [Online; accessed 29-July- ing Systems 32, Curran Associates, Inc., 2019, 2021]. pp. 8024–8035. URL: http://papers.neurips.cc/paper/ [10] Institute of Advanced Research in Artificial Intelli- 9015-pytorch-an-imperative-style-high-performance-deep-learning-librar gence (IARAI), https://www.iarai.ac.at, 2021. [On- pdf .

line; accessed 29-July-2021]. [23] S. Hochreiter, J. Schmidhuber, Long short-term [11] AEMET, http://www.aemet.es/, 2021. [Online; ac- memory, Neural computation 9 (1997) 1735–80.

cessed 29-July-2021]. doi:10.1162/neco.1997.9.8.1735. [12] NWC SAF, https://www.nwcsaf.org/, 2021. [Online; [24] X. SHI, Z. Chen, H. Wang, D.-Y. Yeung, W.-k.

accessed 29-July-2021]. Wong, W.-c. WOO, Convolutional lstm network: [13] D. P. Kreil, M. K. Kopp, D. Jonietz, M. Neun, A machine learning approach for precipitation A. Gruca, P. Herruzo, H. Martin, A. Soleymani, nowcasting, in: C. Cortes, N. Lawrence, D. Lee, S. Hochreiter, The surprising eficiency of framing M. Sugiyama, R. Garnett (Eds.), Advances in geo-spatial time series forecasting as a video predic- Neural Information Processing Systems, voltion task – insights from the iarai ⁀4c competition at ume 28, Curran Associates, Inc., 2015. URL: neurips 2019, in: H. J. Escalante, R. Hadsell (Eds.), https://proceedings.neurips.cc/paper/2015/file/ Proceedings of the NeurIPS 2019 Competition and 07563a3fe3bbe7e3ba84431ad9d055af-Paper.pdf .

Demonstration Track, volume 123 of Proceedings [25] M. Mathieu, C. Couprie, Y. LeCun, Deep multi-scale of Machine Learning Research, PMLR, 2020, pp. video prediction beyond mean square error, CoRR 232–241. URL: http://proceedings.mlr.press/v123/ abs/1511.05440 (2016).

kreil20a.html. [26] I. Kobyzev, S. Prince, M. A. Brubaker, Nor[14] Trafic4cast 2020 Challenge, https://www.iarai. malizing flows: Introduction and ideas, ArXiv ac.at/trafic4cast/2020-competition/challenge/ abs/1908.09257 (2019).

#challenge, 2021. [Online; accessed 29-July-2021]. [27] A. N. Gomez, M. Ren, R. Urtasun, R. B. Grosse, [15] Q. Qi, P. H. Kwok, Trafic4cast 2020–graph ensem- The reversible residual network: Backpropagable net and the importance of feature and loss func- tion without storing activations, in: I. Guyon, tion design for trafic prediction, arXiv preprint U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, arXiv:2012.02115 (2020). S. Vishwanathan, R. Garnett (Eds.), Advances [16] S. Choi, Utilizing unet for the future trafic map in Neural Information Processing Systems, prediction task trafic4cast challenge 2020, arXiv volume 30, Curran Associates, Inc., 2017. URL: preprint arXiv:2012.00125 (2020). https://proceedings.neurips.cc/paper/2017/file/ [17] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, f9be311e65d81a9ad8150a60844bb94c-Paper.pdf .

Unet++: Redesigning skip connections to exploit [28] J. Donahue, K. Simonyan, Large scale advermultiscale features in image segmentation, IEEE sarial representation learning, in: H. Wallach,

Transactions on Medical Imaging (2019). H. Larochelle, A. Beygelzimer, F. d'Alché[18] S. Woo, J. Park, J.-Y. Lee, I. S. Kweon, Cbam: Con- Buc, E. Fox, R. Garnett (Eds.), Advances in volutional block attention module, in: Proceedings Neural Information Processing Systems, volof the European conference on computer vision ume 32, Curran Associates, Inc., 2019. URL: (ECCV), 2018, pp. 3–19. https://proceedings.neurips.cc/paper/2019/file/ [19] M. Tan, Q. Le, Eficientnet: Rethinking model scal- 18cdf49ea54eec029238fcc95f76ce41-Paper.pdf . ing for convolutional neural networks, in: Inter- [29] C. Etmann, R. Ke, C.-B. Schönlieb, iunets: Learnnational Conference on Machine Learning, PMLR, able invertible up- and downsampling for large2019, pp. 6105–6114. scale inverse problems, in: 2020 IEEE 30th Interna[20] D. P. Kingma, J. Ba, Adam: A method for stochastic tional Workshop on Machine Learning for Signal optimization, CoRR abs/1412.6980 (2015). Processing (MLSP), 2020, pp. 1–6. doi:10.1109/ [21] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient MLSP49062.2020.9231874.

descent with warm restarts, arXiv: Learning (2017). [30] J. Behrmann, P. Vicol, K.-C. Wang, R. B. Grosse, J. Ja[22] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, cobsen, Understanding and mitigating exploding G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, inverses in invertible neural networks, in: AISTATS, L. Antiga, A. Desmaison, A. Kopf, E. Yang, 2021.

Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, [31] H. Zhu, H. Wei, B. Li, X. Yuan, N. Kehtarnavaz, A B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: review of video object detection: Datasets, metAn imperative style, high-performance deep rics and methods, Applied Sciences 10 (2020).

[1]

P. D.

Polger ,

B. S.

Goldsmith ,

R. C.

Przywarty ,

J. R.

Bocchieri , National weather service warning performance based on the wsr-88d, Bulletin of the American Meteorological Society 75 ( 1994 ) 203 - 214 .

[2]

S. S.

Soman ,

Zareipour ,

Malik ,

Mandal , A review of wind power and wind speed forecasting methods with diferent time horizons , in: North American Power Symposium 2010 , IEEE, 2010 , pp. 1 - 8 .

[3]

Q.-K.

Tran , S. -k. Song, Multi-channel weather radar echo extrapolation with convolutional recurrent neural networks , Remote Sensing 11 ( 2019 ) 2303 .

[4]

Xingjian ,

Chen ,

Wang , D.-

Yeung , W.-K. Wong, W.-c. Woo, Convolutional lstm network: A machine learning approach for precipitation nowcasting , in: Advances in neural information processing systems , 2015 , pp. 802 - 810 .

[5]

Ronneberger ,

Fischer ,

Brox , U-net: Convolutional networks for biomedical image segmentation , in: International Conference on Medical image computing and computer-assisted intervention , Springer, 2015 , pp. 234 - 241 .

[6]

Ren ,

He ,

Girshick ,

Sun , Faster r-cnn: Towards real-time object detection with region proposal networks , Advances in neural information processing systems 28 ( 2015 ) 91 - 99 .

[7]

Krizhevsky , I. Sutskever,

G. E.

Hinton , Imagenet classification with deep convolutional neural networks , Advances in neural information processing systems 25 ( 2012 ) 1097 - 1105 .

[8]

Trebing ,

Staczyk ,

Mehrkanoon , Smaatunet: Precipitation nowcasting using a small attention-unet architecture , Pattern Recognition Letters ( 2021 ). URL: https://www.sciencedirect. com/science/article/pii/S0167865521000556.