Analysis of Nonstationary Extreme Events Norbert A. Agana; Mohammad Gorji Sefidmazgi; Abdollah Homaifar Department of Electrical Engineering, North Carolina A&T State University, Greensboro, NC, USA. {naagana, mgorjise}@aggies.ncat.edu, homaifar@ncat.edu Abstract Sefidmazgi, and Homaifar 2014; Vogel, Yaindl, and Extreme events by definition are rare events that occur in- Walter 2011; AghaKouchak et al. 2013). frequently but their impacts on both physical and socioec- Although relaxing the assumption of stationarity can onomic resources are very enormous. Extreme climate lead to accurate models, the results can be potentially events such as heavy precipitation, drought, tropical cy- misleading. Hence, there is the need to modify the as- clones, hurricanes and heat waves are known to have tre- sumption of a series of independently and identically dis- mendous impact on the society. Over the last few decades, tributed data with constant properties through time (sta- our understanding of the mean behavior of the climate and its normal variability has improved to a large extend but tionarity) to reflect the effect of long-term climate change the same cannot be said of climate extremes. Climate ex- on the variable of interest. For instance, the maximum tremes represent nonlinear systems that are very hard to time series of climatic variables such as temperature and study and even harder to make predictions on them. The precipitation could show trends over time (Panagoulia, objective of this paper is to assess how these extreme Economou, and Caroni 2014). Also, due to natural cli- events relate to modes of climatic variability such as El mate variability or anthropogenic climate change, there is Nino–Southern Oscillation, the Pacific Decadal Oscillation and the North Atlantic Oscillation by utilizing the familiar evidence that the hydroclimatic extreme series are not distributions that arise out of the extreme value theory such stationary (Jain and Lall 2001; Milly et al. 2008). Large- as the generalized extreme value distribution and the gen- scale modes of climate variability such as El Nino– eralized Pareto distribution. Nonstationarity is ensured by Southern Oscillation (ENSO), the Pacific Decadal Oscil- expressing the parameters of the distribution as functions lation (PDO), and the North Atlantic Oscillation (NAO) of the covariates. are known to have profound impacts on the precipitation regimes, especially during the winter season over North America. A number of researchers have studied the im- pact of modes of climate variability on climate extremes Keywords: Extreme Events; Covariates; Maximum like- and have shown that these variables have great influence lihood; Bayesian Information Criterion. on extreme precipitation and temperature (Zhang et al. 2010; Griffis and Stedinger 2007). ENSO events, in particular have influence on the Introduction occurrence of precipitation events. It has also been shown that there is a well-established connection between the Extreme events are rare events that occur infrequently two phases of ENSO and the North American precipita- but have enormous impact on both physical and socioeco- tion (Cayan, Redmond, and Riddle 1999; Gershunov and nomic resources. Extreme climate events such as heavy Barnett 1998; Ropelewski and Halpert 1986; Shabbar, precipitation, temperature, drought, tropical cyclones, Bonsal, and Khandekar 1997). El Nino events influence hurricanes and heat waves have had tremendous impact the frequency of occurrence of different daily precipita- on the society, costing lives and property. There is there- tion magnitudes in Western U.S. winters and tend to be fore no surprise that considerable attention has been given associated with an increase in the frequency of high daily to climate studies in the past decades. However, in ana- precipitation over the Southwest but a decrease in the lyzing precipitation events, the majority of the existing Northwest (Cayan, Redmond, and Riddle 1999). In order methods are based on the assumption that precipitation to Model nonstationary extreme events within the frame- time series are stationary, implying that the distribution of work of the GEV distribution, the GEV distribution re- precipitation events is not significantly affected by climat- quires extended models with covariate-dependent changes ic trends, long-term cycles or modes of climate variability in at least one of the distribution’s parameters (Coles (Gorji Sefidmazgi, Sayemuzzaman, and Homaifar 2014; 2001). Gorji Sefidmazgi, Sayemuzzaman, et al. 2014; Agana, The objective of this research is to assess how these     exp  1    x     1 extreme events relate to modes of climatic variability such as the ENSO, NAO and PDO. Similar work has been car-      ried out on non-stationary extreme events where they con- G ( x,  ,  ,  )      (1) sidered only trend in their analysis (AghaKouchak et al.   x    1   0,   0 2013; Katz, Parlange, and Naveau 2002; Feng, Nadarajah,   and Hu 2007). Instead of analyzing only trend, we have Where µ, σ>0, ξ are the location, scale and shape parame- also analyzed the effect of ENSO on extreme precipitation ters respectively. The expression in Equation (1) can be and sea level rising. We achieved these by utilizing the made non-stationary by expressing the parameters of the familiar generalized extreme value (GEV) distribution distribution as linear functions of covariates which have that arises out of the extreme value theory (EVT) (Coles influence on the occurrence of extreme events. In our 2001). Non-stationarity is ensured by expressing the pa- case, we only expressed the location parameter as a func- rameters of the GEV distribution as functions of time and tion of time and the El Nino Southern Oscillation (Nino ENSO. The maximum likelihood estimation (MLE) 3.4), which are shown in Equations (3) to (5). method is employed to estimate the distribution parame- ters (Coles 2001; Katz, Parlange, and Naveau 2002; Model 1:   o (2) Vogel, Yaindl, and Walter 2011). We applied the model Model 2 :  (t )   o  1t (3) to precipitation data in Pasquotank, North Carolina and also to the sea level data at Pensacola, Florida. Further- Model 3:  ( y)   o  1 y (4) more, we have compared the different fitted models and Model 4 :  (t , y)   o  1t   2 y (5) selected the best model based on the Bayesian Infor- mation Criterion (BIC). This paper demonstrates that co- The combined effect of both time and the El Nino variate-dependent models are necessary for analyzing Southern Oscillation is shown in Equation (5) where t extreme events, especially precipitation extremes. Also (time) is the year in which the maxima is taken and y the based on the results obtained from this work, it is ob- covariate representing the Nino 3.4. The above GEV dis- served that linear parameter-covariate dependence might tribution models are fitted to both the annual monthly not be able to relate the dependence of the parameters on maxima of precipitation data at Pasquotank, North Caro- the covariates well and therefore nonlinear dependent lina and the mean sea level data at Pensacola, Florida. models might be appropriate. Methodology Parameter Estimation The foundation of Extreme Value Theory (EVT) is the Generalized Extreme Value (GEV) distribution All the model parameters are obtained using the (AghaKouchak et al. 2013; Coles 2001). The GEV distri- maximum likelihood estimation (MLE) procedure. Alt- bution classically models block maxima (or minima) of hough other methods such as the Method of Moments data over a certain period of time such as daily, monthly (MOM), Probability Weighted Moments (PWM) can be or annual maxima. The block maxima refers to the num- used, we exclusively used the MLE because of its easy ber of years (for annual maxima) from which the maxima adaptability to non-stationary conditions (Katz, Parlange, is taken. The justification of the GEV arises from an as- and Naveau 2002; El Adlouni et al. 2007). Also, the ad- ymptotic argument that postulates that as the sample size vantage of maximum-likelihood estimators is that they increases, the distribution of the sample maxima, for ex- can employ censored information without difficulty ample X, follow a Frechet, Weibull or Gumbel distribu- (Martins and Stedinger 2000). tion. The EVT characterize rare events by describing the If G(x(t);µ(t),σ(t),ξ(t)) is the probability density func- tail behavior of the underlying distribution. Let the time tion of a random variable x with µ(t),σ(t) and ξ(t) as pa- series denoted by {X1, X2…Xn} be independent random rameters, the log likelihood of the GEV distribution is variables having a distribution function G. simply given by (Coles 2001): n L(  ,  ,  )   g ( x(t ); (t ),  (t ),  (t )) Let Mn=max{X1, X2,…, Xn} suppose there exist nor- malizing constants an>0 and bn>0 such that (6) t 1 prM n  bn  / an  x  G( x) as n   then the Both the stationary and nonstationary models of the GEV distribution can be fitted to the time series of the cumulative distribution function for the GEV distribution random variable x by maximizing the log-likelihood of is defined as shown in Equation (1) (Coles 2001; Katz, the function (Coles 2001): Parlange, and Naveau 2002). If n is the number of obser- vations in a year, then Mn is the annual maximum. l  ,  ,    cipitation vary with the El Nino Southern Oscillation In- dex (Nino3.4). n  x   (t )  (7)  n log  (t )  (1  1 /  (t )) log1   (t ) i  Also, we have analyzed the annual mean sea level at i 1   (t )  Pensacola, Florida during the same time period of 1950- 1 /  ( t ) 2008. The sea level data was obtained from the University n  x   (t )    1   (t ) i  of Hawaii Sea Level Center. Figure 2 shows a time series i 1   (t )  plot of the mean sea level data as well as a scatter plot of the sea level versus the El Nino Southern Oscillation Index which is obtained by taking the log of Equation (6).The (Nino3.4). Most climate indices such as ENSO, NAO and maximum likelihood estimates are the values of the pa- PDO do not contain values beyond 1950. Hence, in order rameters µ, σ and ξ that maximize the log likelihood func- to analyze the effects of these indices on climate ex- tion in (7). Time is used as an explanatory variable (co- tremes, we chose a time period 1950-2008 so as to have variate). The parameters µ, σ and ξ can also be expressed the same data length. as functions of other explanatory variables and similar procedure is followed to estimate them. Instead of max- imizing the log likelihood, we can rather minimize the negative log likelihood of Equation (7). Numerical meth- ods such as the Newton-Raphson iteration algorithm can be used to solve Equation (7) (Martins and Stedinger 2000; Hosking 1985). Model Selection Model choice is usually necessary when you have more than one model to choose from. For instance, to compare two nested models (usually between a simpler model and a complex model), we can easily apply the Likelihood Ratio Test (LRT) to select the best model by (a) computing the test statistic and determining whether it is significant or not. However, the use of the likelihood ra- tio test becomes cumbersome when there are more than two models to choose from. Model selection techniques such as the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) can be used to se- lect the best model among a collection of nested models. The BIC selection criterion is applied here to select the best model among a collection of nested models (AghaKouchak et al. 2013; Gorji Sefidmazgi, Moradi Kordmahalleh, et al. 2014). The BIC selects the model that minimizes the quantity: (b) BIC (k )  2l (k )  k ln n (8) Figure. 1. (a) Time series plot of annual maxima of precipi- tation and (b) Scatter plot of precipitation and El Nino 3.4 at where l is the log-likelihood which is obtained from Pasquotank, North Carolina. Trend is indicated by the solid Equation (6). Also, k and n are the number of parameters line. and number of block maxima respectively (number of years in this case). Simulation Results Data Sets We investigated the use of the GEV distribution to Monthly precipitation data in North Carolina for the model both extreme precipitation and sea level in North time period 1950-2008 was obtained from the National Carolina and Florida respectively. We modeled these Climatic Data Center (NCDC). We selected the station events using both stationary and non-stationary models near the Atlantic Ocean. Figure 1 shows a time series plot for the time period 1950-2008. of the annual maxima of monthly precipitation data at Pasquotank as well as a scatter plot showing how the pre- in time was analyzed. From the results, it is realized from the log likelihood values that the sea level at Pensacola is affected by both time and the ENSO as seen in Model 4. Model 4, which is a combination of both time and the ENSO, has the lowest negative log likelihood as com- pared to the stationary model in Model 1. Comparing the results of models 2 and 4, it is observed that the combined effect of both time and the ENSO is greater than that of time alone but due to computational complexity the BIC results favor the time dependent model in Model 2. Simi- lar observations can be made of from the precipitation data at Pasquotank. However, for this station, the impact (a) of ENSO seems to be greater than that of time as can be seen from both the negative log likelihood and BIC val- ues. This is also evident from the time series plots shown in Fig.1. Again, due to computational complexity, the BIC results favors the stationary model. This suggests that the effect is not significant enough as compared to the com- putation complexity involved. Hence, the stationary mod- el may be suitable for the precipitation data according to the BIC values. The work presented here only considered simple forms of nonstationarity, where we only relied on linear models of the covariates. The linear models used might have influenced the less significance of the effects of the covariates. As such, as future work, we will consid- er nonlinear forms of nonstationarity such as vector gen- (b) eralized additive models (VGAM) or generalized additive Figure. 2. (a) Time series plot of mean Sea level and (b) Scatter models for location, scale and shape (GAMLSS) parame- plot of Sea level and Nino 3.4 at Pensacola, Florida. Trend indicat- ters of the distribution. ed by the solid line. Table 1 Parameter estimates and standard errors for the GEV The effects of both time and El Nino Southern Oscillation distribution fitted to the annual maxima of precipitation (inches) index (Nino 3.4) were taken into account. The results are at Pasquotank, North Carolina (values in brackets are standard errors) summarized in Tables 1 and 2. The values in parenthesis are the standard errors of the estimates for the parameters. Model 1 Model 2 Model 3 Model 4 The minimized negative log-likelihood as well as the BIC σ 2.7326 2.7200 2.7013 2.7272 values is shown in the tables. From the results, it can be (0.310) (0.315) (0.3065) (0.318) seen from Table 1 that when the ENSO was used as a ξ 0.0743 0.0817 0.0636 0.0769 covariate, the negative log-likelihood was minimum as (0.112) (0.1195) (0.1128 ) (0.121) observed in model 3. However, there was an increase in the BIC value. This increase in the BIC value implies that µ0 10.8150 10.9124 -9.7803 9.7730 (0.409) (0.6479) (14.2051) (14.092) though its introduction has an effect, the change is not significant as compared to the computation complexity µ1 - -0.0036 0.7654 -0.0048 involved. In Table 2, model 2 for the Florida sea level has (0.0190) (0.5281) (0.019) the most minimized negative log likelihood as well the µ2 - - - 0.5232 least BIC value, and hence is selected by the BIC as the -log l 155.062 155.043 153.9863 153.956 best model. This means that the model with the linear trend in time is the most appropriate model to be consid- BIC 322.3567 326.397 324.2828 328.300 ered for the sea level data at Pensacola. Conclusion The effects of both time and ENSO have been analyzed in this research unlike previous work where only linear trend Table 2 Parameter estimates and model selection for the GEV Jamshidi, Vladik Kreinovich and Janusz Kacprzyk (eds.), distribution fitted to the time series of mean sea level (mm) at Advance Trends in Soft Computing (Springer International Pensacola Florida (values in brackets are standard errors) Publishing). Model 1 Model 2 Model 3 Model 4 Gorji Sefidmazgi, Mohammad, Mohammad Sayemuzzaman, σ 52.3875 50.8995 52.799 49.2521 Abdollah Homaifar, MK Jha, and Stefan Liess. 2014. 'Trend (6.576) (5.631) (6.552) (5.394) analysis using non-stationary time series clustering based on the finite element method', Nonlinear Processes in ξ 0.0658 -0.1265 0.0474 -0.1251 Geophysics, 21: 605-15. (0.152) (0.116) (0.148) (0.114) Griffis, Veronica W., and Jery R. Stedinger. 2007 'Incorporating Climate Change and Variability into Bulletin 17B µ0 2840.55 2793.74 2840.57 2787.8965 (8.389) (15.217) (8.324) (14.806) LP3 Model.' in, World Environmental and Water Resources Congress 2007. µ1 - 1.8946 4.3535 2.0330 Hosking, J. R. M. 1985. 'Algorithm AS 215: Maximum- (0.506) (7.181) (0.475 ) Likelihood Estimation of the Parameters of the Generalized µ2 - - - 13.3157 Extreme-Value Distribution', Journal of the Royal Statistical -log l 329.1631 320.8547 328.9922 319.0587 Society. Series C (Applied Statistics), 34: 301-10. Jain, Shaleen, and Upmanu Lall. 2001. 'Floods in a changing BIC 670.5588 658.0195 674.6331 661.5552 climate: Does the past represent the future?', Water Resources Research, 37: 3193-205. Katz, Richard W., Marc B. Parlange, and Philippe Naveau. 2002. 'Statistics of extremes in hydrology', Advances in Water Resources, 25: 1287-304. Acknowledgments This work is partially supported by the Expe- Martins, Eduardo S., and Jery R. Stedinger. 2000. 'Generalized ditions in Computing by the National Science Foundation under maximum-likelihood generalized extreme-value quantile Award CCF-1029731. estimators for hydrologic data', Water Resources Research, 36: 737-44. Milly, P. C. D., Julio Betancourt, Malin Falkenmark, Robert M. References Hirsch, Zbigniew W. Kundzewicz, Dennis P. Lettenmaier, and Ronald J. Stouffer. 2008. 'Stationarity Is Dead: Whither Agana, Norbert, Mohammad Gorji Sefidmazgi, and Abdollah Water Management?', Science, 319: 573-74. Homaifar. 2014. "Analysis of Extreme Precipitation Events." Panagoulia, D., P. Economou, and C. Caroni. 2014. 'Stationary Fourth International Workshop on Climate Informatics. and nonstationary generalized extreme value modelling of AghaKouchak, Amir, David Easterling, Kuolin Hsu, Siegfried extreme precipitation over a mountainous area under climate Schubert, and Soroosh Sorooshian. 2013. 'Extremes in a change', Environmetrics, 25: 29-43. Changing Climate Detection, Analysis and Uncertainty', Ropelewski, C. F., and M. S. Halpert. 1986. 'North American Springer. Precipitation and Temperature Patterns Associated with the Cayan, Daniel R., Kelly T. Redmond, and Laurence G. Riddle. El Niño/Southern Oscillation (ENSO)', Monthly Weather 1999. 'ENSO and Hydrologic Extremes in the Western Review, 114: 2352-62. United States*', Journal of Climate, 12: 2881-93. Shabbar, Amir, Barrie Bonsal, and Madhav Khandekar. 1997. Coles, S. 2001. An Introduction to Statistical Modeling of 'Canadian Precipitation Patterns Associated with the Extreme Values (Springer). Southern Oscillation', Journal of Climate, 10: 3016-27. El Adlouni, S., T. B. M. J. Ouarda, X. Zhang, R. Roy, and B. Vogel, Richard M., Chad Yaindl, and Meghan Walter. 2011. Bobée. 2007. 'Generalized maximum likelihood estimators 'Nonstationarity: Flood Magnification and Recurrence for the nonstationary generalized extreme value model', Reduction Factors in the United States1', JAWRA Journal of Water Resources Research, 43: W03410. the American Water Resources Association, 47: 464-74. Feng, Song, Saralees Nadarajah, and Qi Hu. 2007. 'Modeling Zhang, Xuebin, Jiafeng Wang, Francis W Zwiers, and Pavel Ya Annual Extreme Precipitation in China Using the Groisman. 2010. 'The influence of large-scale climate Generalized Extreme Value Distribution', Journal of the variability on winter maximum daily precipitation over Meteorological Society of Japan. Ser. II, 85: 599-613. North America', Journal of Climate, 23: 2902-15. Gershunov, Alexander, and Tim P. Barnett. 1998. 'ENSO Influence on Intraseasonal Extreme Rainfall and Temperature Frequencies in the Contiguous United States: Observations and Model Results', Journal of Climate, 11: 1575-86. Gorji Sefidmazgi, Mohammad, Mina Moradi Kordmahalleh, Abdollah Homaifar, and Stefan Liess. 2014. "Change Detection in Linear Trend of Temperature over US 1900- 2012." In Fourth International Workshop on Climate Informatics. Gorji Sefidmazgi, Mohammad, Mohammad Sayemuzzaman, and Abdollah Homaifar. 2014. 'Non-stationary Time Series Clustering with Application to Climate Systems.' in Mo