4 UDC 004.032.26 Investigation of Parameters of Meteorological Models Based on Patterns Andrey K. Gorshenin Federal Research Center “Computer Science and Control” of RAS 44-2 Vavilov str., Moscow, 119333, Russian Federation Email: agorshenin@frccsc.ru The probabilistic characteristics and the forecasts for precipitation on the basis of a special transformation of the initial data, which makes it possible to reveal patterns in observations, are briefly discussed. Patterns in data analysis can be used to improve the accuracy and speed of forecasting. Moreover, pattern’s methodology is a convenient approach to the solution of various climatological problems. The issues of testing the Markov property of data, probabilistic and neural network forecasting for statistical observations without involvement of any additional information about meteorological conditions are investigated. The initial data is volumes of daily precipitation observed during 60 years. The best accuracy for neural networks trained on patterns based on sequences of «D» (dry days, i.e. without precipitations) and «W» (wet ones, i.e. with any nonzero volume) is 97% for one-day and 89% for two-day forecasts. Few directions for further investigations are suggested. The paper continues the author’s research in the fields of creation of mathematical models and data mining algorithms for meteorological observations. Key words and phrases: precipitation, patterns, forecast, neural networks, deep learn- ing, probabilistic forecasting. Copyright © 2018 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. In: K. E. Samouylov, L. A. Sevastianov, D. S. Kulyabov (eds.): Selected Papers of the VIII Conference “Information and Telecommunication Technologies and Mathematical Modeling of High-Tech Systems”, Moscow, Russia, 20-Apr-2018, published at http://ceur-ws.org Gorshenin Andrey K. 5 1. Introduction Precipitation is an important parameter for meteorological models (see, for example, papers [1–3]), so the development of an adequate mathematical models (including probabilistic and statistical) and the creation of software tools for processing a significant amount of accumulated observations using modern methods are in demand. In this case, probabilistic approaches can be used to solve forecasting problems (see, for example, paper [4]) as well as neural networks that are very effective in a wide range of application areas (see, for example, papers [5–7]). Moreover, at present, the research of various precipitation processes in the context of global warming and climate change problems is quite popular (see, for example, [8–11]). In this paper, the probabilistic characteristics and the forecasts for precipitation on the basis of a special transformation of the initial data, which makes it possible to reveal patterns in observations, are briefly discussed. Patterns in data analysis can be used to improve the accuracy and speed of forecasting. Moreover, pattern’s methodology is a fairly common tool in the solution of various climatological problems. This paper continues the previous author’s research in the fields of creation mathematical models and data mining algorithms for meteorological observations [12–15]. 2. Investigation of the probabilistic characteristics of precipitation based on patterns The volumes of daily precipitation observed during 60 years in Potsdam are the initial data. Let’s consider transformation of non-negative data 𝑉𝑑𝑎𝑖𝑙𝑦 according to the following rule: if any positive value was observed in the 𝑖-th day, it is replaced (𝑖) (𝑖) by one (𝑉̃︀daily = 1), otherwise the value of 𝑉̃︀daily equals zero. Thus, the initial series consisting of continuous values becomes discrete, taking two possible values {0, 1}. This simplification makes it possible to analyze the presence or absence of precipitation irrespective of their volume. Thus, any sequence of dry (without precipitations) and wet (with any nonzero volume) days can be represented as a «0–1» (or «D–W») chain (the pattern). For each pattern within the historical data it is possible to determine the frequencies of appearance as the ratio of the number of such sets of fixed length 𝑁 to the total number of possible chains (obviously 2𝑁 ). In fact, these are the probabilities according to the classical definition. Within the framework of the research, observations for 60 years for Potsdam were analyzed for the values of the parameter 𝑁 from 1 to 14. For each set, the frequencies (probabilities) were obtained, the pattern with a maximum value was determined [12]. Fig. 1 demonstrates an example of frequencies for patterns of length 𝑁 = 5. The numerical values of the corresponding probabilities are indicated in Table 1. In the most of the papers devoted to the statistical analysis of meteorological data, it is assumed that the duration of the period of precipitation, measured in days (that is, the number of successive wet days), has the geometric distribution. Perhaps, these assumptions are based on the classical interpretation of the geometric distribution in terms of Bernoulli’s tests as the distribution of the number of successive wet days («success») to the first day without precipitation («failure»). With the use of patterns, it was demonstrated [12] that the sequence of wet and dry days is not even Markovian, so using Bernoulli’s scheme based on independence of data is incorrect. Alternative probabilistic models are proposed in the papers [14, 15]. 3. Forecasts for precipitation based on patterns Using the data on frequencies (probabilities) for patterns, it is possible to calculate the values of the conditional probability of occurrence in the future of certain combinations, 6 ITTMM—2018 Probability, pattern's length N=5 DDDDD DDDDW DDDWD DDDWW DDWDD DDWDW DDWWD DDWWW DWDDD DWDDW DWDWD DWDWW DWWDD Pattern DWWDW DWWWD DWWWW WDDDD WDDDW WDDWD WDDWW WDWDD WDWDW WDWWD WDWWW WWDDD WWDDW WWDWD WWDWW WWWDD WWWDW WWWWD WWWWW 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 Probability Figure 1. Probabilities/frequencies for patterns with length that equals 5 Table 1 Probabilities/frequencies for patterns with length that equals 5 Pattern Probability Pattern Probability DDDDD 0,21 WDDDD 0,05 DDDDW 0,05 WDDDW 0,02 DDDWD 0,02 WDDWD 0,01 DDDWW 0,04 WDDWW 0,02 DDWDD 0,02 WDWDD 0,01 DDWDW 0,01 WDWDW 0,01 DDWWD 0,02 WDWWD 0,01 DDWWW 0,04 WDWWW 0,02 DWDDD 0,03 WWDDD 0,04 DWDDW 0,01 WWDDW 0,02 DWDWD 0,01 WWDWD 0,01 DWDWW 0,01 WWDWW 0,02 DWWDD 0,02 WWWDD 0,04 DWWDW 0,01 WWWDW 0,02 DWWWD 0,02 WWWWD 0,04 DWWWW 0,04 WWWWW 0,08 Gorshenin Andrey K. 7 that is, to obtain probabilistic forecasts for certain events. For example, if current observations is «Wet-Wet-Dry-Dry» (that is, there were precipitations for two days in a row, on the next two days volumes were equal zero), the following statements can be formulated as: «The probability of precipitation through 2 days in Potsdam at current observations is 0,3961, and the probability of precipitation absence through 2 days is 0,6039». Unlike the standard practice for data analysis, when the predicted window should not exceed the size of input observations, this rule can be violated for historical values. As an alternative forecasting tool, feed-forward neural networks with several hidden layers and various activation functions were used [12]. The patterns are used as the training sets. However, the frequency of each of the sets is not used explicitly, and the corresponding procedures are implemented in the hidden layers of the neural network. As a result of the work, a forecast is obtained for the following 1 − 2 days. The best obtained prediction accuracy for a neural network with a sigmoid activation function and two hidden layers was 82% for a one-day and 74% for a two-day forecast (with "PHP"implementation). The adding of hidden layer, the changing of the activation function to the rectifier, the increasing of a size of the input sample and the use of the deep learning library Keras (with Python implementation) lead to enhance the forecast accuracy for the same data to 97% for one-day and to 89% for two-day forecasts. For the chosen architecture of the neural network, there is no overfitting: the error value is the same for both the training part and for the test part, which does not participate directly in the process of building the neural network. Thus, we can expect that the model will work correctly not only for the training part, but also for real data. Fig. 2 demonstrates an example of the accuracy of precipitation prediction for the next day taking into account the month of data. The numerical values of the corresponding forecast errors are indicated in Table 2. Monthly 1-day precipitation forecast January February March April May Month June July August September October November December 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Accuracy Figure 2. Accuracy of the monthly 1-day precipitation forecast 8 ITTMM—2018 Table 2 Monthly 1-day precipitation forecast errors Month Error (1-day) January 1,1% February 1,5% March 2% April 1,2% May 2% June 2,2% July 1,9% August 4,2% September 3,5% October 3,6% November 5,6% December 5,7% 4. Conclusions For the analysis of the probabilistic behavior of the precipitation process and the forecasting, it is suggested to use the chains of events (patterns) extracted from the data. High accuracy for neural networks forecasting is demonstrated, wherein the analysis is based solely on basic statistical data without any additional information about meteorological conditions. Working with patterns is of interest in terms of verification ensemble of forecasts. Also, this methodology can be used to predict the behavior of the moment characteristics of finite normal mixtures of probability distributions [16] to determine the trend direction within the framework of modeling of physical experiments [17]. Such data are different from observations considered in this work (for example, there is no a seasonal factor), but in general the task seems to be quite similar, although the architecture of neural network should be modified. As one of the directions for further research, it is possible to propose a transition from the binary model of the discretization of events to base-𝑘 numeral system. It could allow to solve more complex forecasting tasks, for example, to predict the amount of precipitation in terms of falling into pre-selected ranges of values. That is, if any positive (𝑖) value was observed in 𝑖-th day, it is replaced by 𝑗 from the range 1, . . . , 𝑘 − 1 (𝑉̃︀daily = 𝑗) corresponding to the precipitation volume partition by 𝑘 − 2 intervals; otherwise, the (𝑖) value 𝑉̃︀daily is assigned a zero value. To maximize the automation of the research process, the developed forecasting methods will be integrated into the service of stochastic data analysis [18–20] as a special tool for data mining. Acknowledgments The research was supported by the Russian Foundation for Basic Research (project No 17-07-00851) and the Ministry of Education and Science of the Russian Federation (No 538.2018.5). The author is grateful to Corresponding Member of the Russian Academy of Sciences Prof. S. K. Gulev for provided data, to Prof. V. Yu. Korolev for useful discussions within the framework of joint studies of meteorological phenomena and V. Kuzmin for training of neural networks. Gorshenin Andrey K. 9 References 1. L. V. Alexander, X. Zhang, T. C. Peterson et al., Global observed changes in daily climate extremes of temperature and precipitation, Journal of Geophysical Research- Atmospheres, 111 (D5) (2006) D05109. doi:10.1029/2005JD006290. 2. J. H. Christensen, F. Boberg, O. B. Christensen, P. Lucas-Picher, On the need for bias correction of regional climate change projections of temperature and precipitation, Geophysical Research Letters, 35 (20) (2008) L20709. doi:10.1029/2008GL035694. 3. R. W. Portmann, S. Solomon, G. C. Hegerl, Spatial and seasonal patterns in climate change, temperatures, and precipitation across the United States, Proceedings of the National Academy of Sciences of the United States of America, 106 (18) (2009) 7324–7329. doi:10.1073/pnas.0808533106. 4. R. Krzysztofowicz, The case for probabilistic forecasting in hydrology, Journal of Hydrology, 249 (1–4) (2001) 2–9. doi:10.1016/S0022-1694(01)00420-6. 5. N. Q. Hung, M. S. Babel, S. Weesakul, N. K. Tripathi, An artificial neural network model for rainfall forecasting in Bangkok, Thailand, Hydrology and Earth System Sciences, 13 (8) (2009) 1413–1425. doi:10.5194/hess-13-1413-2009. 6. T. Partal, H. K. Cigizoglu, Prediction of daily precipitation using wavelet-neural networks, Hydrological Sciences Journal, 54 (2) (2009) 234–246. doi:10.1623/hysj. 54.2.234. 7. K. P. Moustris, I. K. Larissi, P. T. Nastos, A. G. Paliatsos, Precipitation Forecast Using Artificial Neural Networks in Specific Regions of Greece, Water Resources Management, 25 (8) (2011) 1979–1993. doi:10.1007/s11269-011-9790-5. 8. P. A. O’Gorman, T. Schneider, The physical basis for increases in precipitation extremes in simulations of 21st-century climate change, Proceedings of the National Academy of Sciences of the United States of America, 106 (35) (2009) 14773–14777. doi:10.1073/pnas.0907610106. 9. K. E. Trenberth, Changes in precipitation with climate change, Climate Research, 47 (1–2) (2011) 123–138. doi:10.3354/cr00953. 10. K. E. Kunkel, T. R. Karl, D. R. Easterling et al., Probable maximum precipitation and climate change, Geophysical Research Letters, 40 (7) (2013) 1402–1408. doi: 10.1002/grl.50334. 11. N. Ban, J. Schmidli, C. Schar, Heavy precipitation in a changing climate: Does short-term summer precipitation increase faster?, Geophysical Research Letters, 42 (4) (2015) 1165–1172. doi:10.1002/2014GL062588. 12. A. K. Gorshenin, Pattern-based analysis of probabilistic and statistical characteristics of precipitations, Informatics and Applications, 11 (4) (2017) 38–46. doi:10.14357/ 19922264170405. 13. A. K. Gorshenin, On some mathematical and programming methods for construction of structural models of information flows, Informatics and Applications, 11 (1) (2017) 58–68. doi:10.14357/19922264170105. 14. V. Yu. Korolev, A. K. Gorshenin, S. K. Gulev, K. P. Belyaev, A. A. Grusho, Statistical Analysis of Precipitation Events, AIP Conference Proceedings 1863 (2017) 090011. doi:10.1063/1.4992276. 15. V. Yu. Korolev, A. K. Gorshenin, The probability distribution of extreme pre- cipitation, Doklady Earth Sciences, 477 (2) (2017) 1461–1466. doi:10.1134/ S1028334X17120145. 16. A. K. Gorshenin, Concept of online service for stochastic modeling of real processes, Informatics and Applications, 10 (1) (2016) 72–81. doi:10.14357/19922264160107. 17. G. M. Batanov, A. K. Gorshenin, V. Yu. Korolev, D. V. Malakhov, N. N. Skvortsova, The Evolution of Probability Characteristics of Low-Frequency Plasma Turbulence, Mathematical Models and Computer Simulations, 4 (1) (2011) 10–25. doi:10.1134/ S2070048212010048. 18. A. Gorshenin, V. Kuzmin, Online system for the construction of structural models of information flows, in: Proceedings of the 7𝑡ℎ International Congress on Ultra Modern Telecommunications and Control Systems and Workshops, 2015, 216–219. 10 ITTMM—2018 19. A. Gorshenin, V. Kuzmin, On an interface of the online system for a stochastic analysis of the varied information flows, AIP Conference Proceedings 1738 (2016) 220009. doi:10.1063/1.4952008. 20. A. K. Gorshenin, V. Yu. Kuzmin, Research support system for stochastic data processing, Pattern Recognition and Image Analysis, 27 (3) (2017) 518–524. doi: 10.1134/S1054661817030117.