-

Convolutional Neural Network with U-Net Architecture

Peter Pavlík

peter.pavlik@kinit.sk 0 1

Viera Rozinajová

viera.rozinajova@kinit.sk 1 2

Anna Bou Ezzeddine

anna.bou.ezzeddine@kinit.sk 1 0 Faculty of Information Technology, Brno University of Technology , Božetěchova 1/2, Brno-Královo Pole, 612 00, Czechia 1 Kempelen Institute of Intelligent Technologies , Mlynské Nivy II. 18890/5, Bratislava, 821 09 , Slovakia 2 Slovak Centre for Research of Artificial Intelligence - slovak.AI , Slovakia

In recent years - like in many other domains - deep learning models have found their place in the domain of precipitation nowcasting. Many of these models are based on the U-Net architecture, which was originally developed for biomedical segmentation, but is also useful for the generation of short-term forecasts and therefore applicable in the weather nowcasting domain. The existing U-Net-based models use sequential radar data mapped into a 2-dimensional Cartesian grid as input and output. We propose to incorporate a third - vertical - dimension to better predict precipitation phenomena such as convective rainfall and present our results here. We compare the nowcasting performance of two comparable U-Net models trained on two-dimensional and three-dimensional radar observation data. We show that using volumetric data results in a small, but significant reduction in prediction error.

precipitation nowcasting radar imaging U-Net

1. Introduction ning various human activities and tasks such as agriculture, construction building or winter road maintenance. Nowcasting is defined by the World Meteorological Agency as forecasting with local detail, by any method, over a period from the present to six hours ahead, including a detailed description of the present weather [1].

In practice, simpler - and therefore faster - models out

perform complex Numerical Weather Prediction (NWP) models at the task of precipitation nowcasting because

NWP models cannot consider the latest observations due

to their long inference time. The highly sophisticated NWP models usually need hours to produce their forecasts and so they are not able to take into consideration the latest data observations. Even a simple model that can quickly output a prediction will outperform the NWP models at the task of precipitation nowcasting simply by the fact that it can consider the present data. Nowcasting models can work in conjunction with NWP models and use their long-term forecasts as additional inputs to further refine their nowcasts [ 1].

Precipitation nowcasting is usually performed using temporal extrapolation of past data from weather radar

nEvelop-O of convective storms [7].

We compare two models - a reference U-Net architec

ture based on existing research [3, 4] and an alternative with 2D convolutional layers replaced by 3D convolution.

We evaluate their performance in the task of predicting

The first deep learning approach applied to the task of 300 precipitation nowcasting was a ConvLSTM model pre40 sented in [2] that outperformed the operational optical250 lfow-based ROVER nowcasting system. Experiments 30 with other CNN architectures started, such as a Con200 vGRU model from [15] or a U-Net-based architecture 20 dZB introduced in [16]. The U-Net architectures, originally de150 veloped for segmentation of medical images [17], proved 10 to be quite popular with models such as RainNet[3] and 100 SmaAt-U-Net[4] further exploring this approach.

0 The previously mentioned neural network regression 50 models trying to nowcast the future state of precipita10 tion fields were afected by blurring. When using tra0 0 50 100 150 200 250 300 ditional gridpoint-based verification statistics such as Mean Squared Error (MSE) as the training loss function, Figure 1: A single radar echo observation. The shown re- we face the so-called “double penalty problem”. A forefrlaedcatirvi(tCyAvPaPluIe).sTrheperreesfelnetctrievfilteyctmivaitpy icsaopvtuerreladidato2vekrmaasbaotveel- cast of a precipitation feature that is correct in terms of lite image of the appropriate area centered on the Malý Ja- intensity, size, and timing, but incorrect concerning locavorník radar station generated using Google Earth Engine [8]. tion, results in very large mean square error [18]. This Landsat-8 image courtesy of the U.S. Geological Survey. causes the model to produce blurry outputs to mitigate the penalisation caused by spatially incorrect precipitation features.

The blurry predictions pose one of the biggest chala single constant-altitude radar reflectivity observation lenges for anyone trying to develop a nowcasting model 30 minutes into the future. based on machine learning as such predictions have difi

Our experiments show that providing volumetric data culties predicting extreme events due to the smoothing. from multiple altitude levels results in small, but statisti- Recently, this problem started to be addressed by training cally significant reduction of prediction error. models using the Generative Adversarial Network (GAN) approach, the most prominent being DGMR[6]. They 2. Related Work introduced a GAN framework[19] to solve the problem of blurry predictions present in other deep learning preMany automated nowcasting systems that employ var- cipitation nowcasting models such as RainNet. Model ious inputs and computation approaches are in use to- is trained using a combination of two discriminators inday [9, 10, 11, 12, 13]. These systems are generally based spired by existing research in video generation and a on extrapolating past observed rainfall data forwards in regularization term that comprise the loss function. The time. They typically estimate the future advection based ifrst discriminator, spatial, discourages blurry predictions on motion observed in the most recent radar images us- while the second one, temporal, discourages jumpy preing cross-correlation or optical flow techniques [ 1]. dictions. The regularization term penalizes deviations

Some nowcasting systems use the cell tracking ap- between the observed radar sequences and the model proach. They firstly identify storms in the radar scan prediction. The DGMR model can be currently considand then locate the corresponding object in the consecu- ered the state-of-the-art in the precipitation nowcasting tive scans to track its motion. Cell tracking is useful for domain. tracking severe storms and is useful for generating early warnings [1]. 2.1. Motivation for Volumetric

The shortcoming of these advection nowcasting meth- Nowcasting ods is the assumption that the observed precipitation ifeld will not change, only move elsewhere. Therefore, The application of deep learning models for precipitation they lack the capability to predict beginning of new pre- nowcasting is the focus of many research works. Howcipitation phenomena such as convective initiation (start ever, the vast majority of the models use 2-dimensional of a storm triggered by rising moist warm air) or the aggregate radar products and thus throw away any infordecaying of the storm at the end of its lifecycle [1, 14]. mation which can be gained from processing the vertical

In the past years, data-driven approaches using deep structure of precipitation objects captured by the radar. learning to construct precipitation nowcasting models When reviewing the existing works in the precipito mitigate these limitations have started to gain atten- tation nowcasting domain, we identified a need to extion [2, 3, 6]. plore the efect of working with 3-dimensional volumetric rmaadpa,rwdaetlao.sBeyalplrioncfoersmsinatgiotnheabdoatuat itnhteovaer2tDicaalgsgtrreugcatuterde 10 60 vaommnofooldCtuvdhopmeeemmrlepetptedrrraenaiicrcictentiopdmetfhidttoepoaidtanf2ieuro-ltttdsnihuciaimrplseeraeeswrpntcmraisaeciyuuocleicsncpsehaaiddnltlaepnebttsroiyeoestccnutpciepporaddeicntvrcbasaaotiydftilrooedternnhrintednt.gohrolOaweywd.nnvcaeedarr.srsttaTuiiftcnhcaghel, i()trrsckvoaaaeenbdADm4628 02400 lliiiff()ttttrrccvvyoaaeeeenuqdZB model was presented in [20], where a ConvLSTM model 20 was used to predict future radar reflectivity. The model 0 0 50 Di1s0ta0nce from radar (km1)50 200 input shape is 18×18×20 (18×18 km with 1 km resolution, 10 km above at 500 m resolution) provided at multiple tFiiognuartea2s:etVaezrtimicaulthsl.icTeheofseapsairnagtlee”rraadyas”r artefdliefecrteinvtiteyleovbasteiorvnatime steps, each one is processed by a 3D-CNN first, angles are identifiable. then passed on to ConvLSTM sequential network. The output is a classification for the central region of 6 × 6 km predicting whether the reflectivity in the next 30 and 60 minutes will exceed a set threshold. The final result is single radar observation. a binary map with resolution of 6 × 6 km. The problem Since the convolutional neural network models cannot with this approach is that the model cannot consider any process the data in polar coordinates, we need to convert fast moving precipitation particles, since it cannot see them into Cartesian maps. We processed the data using more than 6 km past its target region. Also, the target the Py-ART Python library [21]. The radar echo obserregion size of 6 × 6 km can hardly be considered a high vations are typically aggregated into precipitation maps spatial resolution, which is one of the defining traits of in two forms. The first one is Constant Altitude Plan nowcasting. Position Indicator (CAPPI), which displays reflectivity

One other work worth mentioning is a 3D-CNN+GAN gate values at certain altitude slice above radar. The other hybrid model from [5]. This model is quite sophisticated. is CMAX, which aggregates the vertical dimension and It uses the GAN-based approach to predict plausible data displays the maximum value in the vertical column for and a weighted MSE loss function to give more impor- each data point. If a 3D volume is created from multiple tance to high reflectivity values, resulting in better ability CAPPI maps at diferent altitude levels, the product is to predict extreme precipitation events and reduce out- called MCAPPI. put blurring. However, the third data dimension is not The reflectivity maps can be converted to rainfall rate actually the altitude above radar we want to consider, maps using the Marshall-Palmer Z-R relationship[22]: cbhuatntnimeles,-biu.et.fothrmepaa3sDtovbosleurmvaet.iNonesvearrtehenloetssa,sthseepmaoradteel = 200 1.6 (1) drives the development of 3D-CNN models for precipita- where is the reflectivity factor and is the rainfall tion nowcasting. rate in /ℎ .

3. Radar Reflectivity Dataset To explore the efect of volumetric precipitation now

casting, we collaborated with the Slovak Meteorological Institute that provided us a dataset of roughly 3.5 years of reflectivity data from Malý Javorník weather radar station. The data is captured in 5 minute intervals. The dataset consists of 355 761 separate observations in the ODIM HDF5 format.

The radar captures the precipitation particles in the air by measuring returned radar wave power (echo) after hitting precipitation particles. This value is called reflectivity, measured in logarithmic dimensionless units called decibels (dBZ). The data consists of reflectivity values at the so-called reflectivity gates in multiple elevation angles distributed around the radar station and encoded in polar coordinates. See Figure 2 for a vertical slice of a

3.1. Training data selection

The dataset requires filtering before training since the majority of the observations are of clear skies with nothing to learn from. Most of the observations from the dataset therefore have no value for training the model and could even negatively afect the training by biasing the outputs toward clear sky prediction, while we are mostly interested in non-trivial cases with high precipitation. We ifltered the images as follows:

1. Create a CAPPI radar reflectivity map at 2 km

altitude above radar at 1 × 1 km resolution and select a center slice of size 336 × 336 km. 2. Convert reflectivity to rainfall rate according to

Marshall-Palmer Z-R relationship (1). 3. Compute the ratio of rainy to clear pixels (threshold 0.05 mm/5 min or 0.6 mm/h - corresponds to slight rain).

Set Full Dataset Target Observations Target + Lead Obs.

Training Set Targets Validation Set Targets Test Set Targets 355761 9018 11310 6515 1150 1353

4.1. Training and Evaluation

4. If the rainfall map contains at least 20% of rainy pixels and 11 previous observations are available, add it to the target observation set.

Each selected target observation was included in the training dataset, along with a set number of previous observations to serve as inputs and non-target intermediary outputs. For our models, we decided to use 6 observations as input and 6 as output, efectively predicting the precipitation half an hour in advance based on the last half hour of data. This means that for each target observation, we also needed to include 11 leading observations in the dataset. This process returned 9 018 suitable target images which together with the necessary leading images represent 3.18% of the original dataset.

It should be noted that the data converted to rainfall described above was not used for training, only for filtering the target observations based on the ratio of rainy pixels. The actual training data used reflectivity directly for both 2D images and 3D volumes. The 2D dataset was a collection of CAPPI radar reflectivity maps at 2 km altitude above radar. A 3D dataset was a collection of CAPPI radar reflectivity maps at 8 altitude levels above radar, from 500 m.a.r to 4000 m.a.r. The extent of the data was set to 336 × 336 km centered on the radar station with spatial resolution of 1 × 1 km for both 2D and 3D data, resulting in images of size 336 × 336 pixels and 8 × 336 × 336 voxels respectively for a single observation.

To train and evaluate the models, the training dataset was split into training, validation and test subsets in chronological order. The last 15% of target observations were selected for the test set, the rest was chosen for training. Out of these, the last 15% of target observations were again selected for validation and the rest was used as training samples. See Table 1 for the exact number of observations in each set.

Adam optimizer was used for training the model. To 4. Model Architectures ifnd the optimal training model hyperparameters - starting learning rate, optimizer learning rate scheduler paTo compare the impact of adding a vertical dimension as rameters and gradient clipping threshold - we utilized the fairly as possible, we chose a basic U-Net architecture in- Bayesian sweep search provided by Weights & Biases[24]. spired by models developed in [3, 4] as a reference model. We trained 20 models with 2D CNN architecture and 5 As U-Net is a fully convolutional neural network, convert- with 3D CNN architecture. The best performing model of ing it to process volumetric data is a trivial task - mostly each architecture variant was selected for performance just a matter of replacing 2D convolutional layers with evaluation. See Table 2 for all the possible hyperparam3D convolutions. Besides this, the model only required eter values and the best performing ones for both 2D replacing 2D max-pooling layers in the encoder for 3D and 3D models. Early stopping after 15 non-improving max-pooling and bilinear upsample in the decoder for epochs was utilized. trilinear. See Figure 3 for the specific number of channels Choosing the right metric to evaluate the performance and kernel sizes at each layer of the model. Both were of precipitation nowcasting models is not simple. The implemented using the PyTorch library [23]. correct method depends on a model’s use-case and no

The conversion of the model from 2D to 3D convo- single composite measure is currently able to objectively lutions was mostly straightforward and resulted in in- evaluate performance of precipitation nowcasting modcreasing the number of trainable parameters 3-fold from els [1]. While we outlined the shortcomings of using roughly 17 to 52 million. The three-fold increase is based MSE to evaluate precipitation nowcasting models above on the fact that the model uses convolution kernels of in Section 2, we are using MSE as the loss function and the size 3 at every convolutional layer, therefore each kernel primary evaluation metric despite the double penalizahas 27 (3 × 3 × 3) instead of 9 (3 × 3) weights (disregarding tion efect that occurs since it is still the most commonly bias and multiple channels). Other architectural parame- used metric in this domain. Additionally, to provide more ters of the model such as number of kernels at each layer insight into model performance, we are also computing were kept the same for the comparison between these mean model accuracy, precision, recall and F1 scores on models to be fair and dependent solely on the provided binarized precipitation maps using a threshold value of 64 Double Conv Skip connection Single Conv Max Pooling Upsample 20 dBZ (corresponding to light rain) to diferentiate be- into the future based on past radar reflectivity maps at tween rain and no rain areas. This way, we can evaluate the same altitude. Subsequently, we trained a 3D model only the shape of precipitation features and disregard to predict equivalent 3D reflectivity maps at 8 altitude the intensity, which can serve as another valuable metric. levels based on recent volumetric observation data. To Our experiments have shown that higher threshold val- evaluate which model is better at precipitation nowcastues corresponding to extreme precipitation events show ing, we evaluate the prediction error on a single CAPPI larger diferences between model metrics during evalua- map at 2 km above radar from the target observation tion, however the informative value would be lower due (nowcast 30 minutes in the future). This can be done to such events occurring only in the small minority of because one slice of the output volume of the 3D model the test set observations. matches the altitude level the 2D model was trained on (2000 m.a.r.).

A simple euclidean persistence was used as a bench5. 2D vs. 3D: A Comparison mark. This benchmark method simply copies the last input observation as the prediction output. Despite the The impact of providing a vertical dimension to the model method being trivial, the precipitation data is highly dewas evaluated by comparing the error rate when predict- pendent on previous observations and so it provides a ing a single reflectivity map at constant altitude above good performance benchmark. Using this benchmark, radar. We trained the 2D model to output the next CAPPI we can also evaluate the rate of change in the data and radar reflectivity maps at 2 km above radar 30 minutes therefore see how ”dificult” it is to make an accurate

6. Conclusion prediction for each sample.

The results in Table 3 show that the best 3D-CNN U-Net model slightly outperformed the best 2D-CNN counterpart. On average, the 3D model achieved lower prediction error on the test set, in both MSE and MAE metrics. The improvement is small, but statistically significant (paired t-test at 0.99 confidence level on test set MSE scores rejected the null hypothesis that the means of 2D and 3D model error scores are the same, p-value is very close to zero). The area-based metrics also show small improvements, with accuracy and F1 scores being slightly higher. Based on considerably higher recall and lower precision, we can assume the 3D model predicts larger precipitation bodies on average. See Figure 4 for a visual comparison of the model outputs.

Acknowledgments This research was partially supported by TAILOR, a

project funded by EU Horizon 2020 research and innovation programme under GA No 952215; by The Ministry of Education, Science, Research and Sport of the Slovak Republic under the Contract No. 0827/2021; and by Life Defender - Protector of Life, ITMS code: 313010ASQ6, coifnanced by the European Regional Development Fund (Operational Programme Integrated Infrastructure). tive style, high-performance deep learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 8024–8035. URL: http://papers.neurips.cc/paper/9015-pytorch-animperative-style-high-performance-deep-learninglibrary.pdf. [24] L. Biewald, Experiment tracking with weights and biases, 2020. URL: https://www.wandb.com/, software available from wandb.com.