Likelihood Asymptotics for Changepoint Problem K. O. Obisesan Department of Statistics University of Ibadan Nigeria email:ko.obisesan@ui.edu.ng ; obidairo@gmail.com ABSTRACT contrast when the point is unknown this leads to various Changepoint problems are often encountered when series undergo complexities and non-linearity. abrupt changes or discontinuities. Detecting changepoints can Many applications of changepoints analysis exist. Relevant signal useful actions towards sustainable developments. However literature can be found in many fields including; Biology, Physics, the presence of changepoints have often been known to lead to Chemistry, Environmental Sciences and Climate Change, failure of some regular assumptions. In theory much has not been Engineering, Econometrics, Medicine, Behavioral Sciences, done on which assumptions fail and to what extent will it affect Political Science, Finance, Image Analysis, Security etc. The the score functions of the likelihood asymptotic. In this work we earliest works found seem to be those by Page(1954, 1955, 1957) concentrate on simulating the likelihood function using R to where the cumulative sum(CUSUM) approach was used. establish the failure of regular assumption due to the presence of Consequently Jandhyala and MacNeil(1986) and Jandhyala changepoint. The failure of regular assumption is established et.al(1999) provided detailed reviews of many approaches to using various score functions coded in R thereby making it changepoint modelling. It is important to state that the large body possible to understand the statistical theory and the consequences of literature exists due to the fact that the standard theory breaks of the failure of assumptions as a result changepoints. down where the time of change is unknown. Much has not been done in showing the breakdown of the standard theory as regards CCS Concepts failure of regular assumption. More details in respect of standard theory on changepoint are available in Easterling and • Computing methodologies ➝Simulation ➝Simulation Peterson(1995), Chen and Gupta(2000), Lu et.al (2005),Hanesiak evaluation and Wang(2005) and Wang(2006). Keywords It is indicated in Obisesan.et.al(2013) that the data analysed were changepoint, likelihood asymptotic, regular assumption, the physico-chemical properties of water samples obtained from simulation. two reservoirs in Oyo State Nigeria. The data were seen to contain some abrupt changes in behaviour. In the work various charts and 1. INTRODUCTION diagrams were engaged in showing the positions and locations of Changepoints are referred to as discontinuities that can lead to changepoints and the likelihood function was written to show the non-linearity even in complex functions (Chen and Gupta, 2000). single changepoint detection. However the theory on changepoint The causes of changepoints include; changes in locations of linking failure of assumption was not shown therefore this present observations, equipment, measurement methods, environmental work attempts to extend the likelihood theory to show the effects, regulations, standards and so on. Generally we need to implications of failure of regular assumptions as a result of the investigate the potential presence of possible changes in the data presence of changepoint. set indicating data quality problems that should be resolved prior to any subsequent analysis. This will therefore signal signs for timely protection and knowing this could be highly advantageous in planning for the future. However Yang.et.al.(2006) noted that 2. STANDARD TECHNIQUES OF changes do occur even in the best regulated systems. They LIKELIHOOD ASYMPTOTICS indicated that discrepancies in records, occasional disagreement To study the inference of changepoint problems such as to between documentation and data, abnormal data entry, changed understand its non-standard nature it is important to review some units of measurement and other problems require adequate properties of likelihood functions. The likelihood function for a attention. Most times we need to detect the number of scalar parameter based on data as a collection changepoints, or jumps, and their locations whereas it is noted in of independence observations is defined to be Mainly(2001) that it is much easier if the point of change is known. This case is referred to as intervention analysis. In ( | ) ( ) ∏ ( ) which is simply the joint density of the data, regarded as a function of the parameter (Rice, 2007). For convenience, we study CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria. the log-likelihood function ( | ) ( ) and write 115 ( ) ( | ) and ( ) ( ) ( ) { ∑ ( )} ( ) Which can be written ∑ ( ) ( ) ( ) ∑ { } The maximum likelihood estimate of ̂ which is a value of that minimizes the log-likelihood function. If the likelihood ( ) function is a differentiable function of then ̂ will be the root Where ( ) refers to single observation information. ( ) ( ) Moreover, for a local maximum we need at ̂ The main assumptions here can be stated simply as Now we show some characteristics of the score function when Assumption 1 : The log-likelihood is a twice data are assumed generated from ( ) so that (assumed true value of is the parameter to be estimated. If we have an differentiable function. independent and identically distributed sample of size n, the log- Assumption 2 : The second derivative likelihood is written as ( ) at ̂ ( ) ∑ ( | ) ( ) A careful illustration of the behavior of the score function is given in Figure 1. This allows the sampling variation of score function 3. THE SCORE FUNCTION: SIMULATION for different models (Normal, Poisson, Binomial and Cauchy) for samples of size . Figure 1(a) shows 25 score function, Under Assumption 1, the first derivative is usually called the ( ) [∑ ( )] each based on independent and identically distributed sample of score function: and is regarded as a size from N(4, 1). Each function is exactly linear and the function of for fixed X This function plays a central role in score varies around 10 at the true parameter . Figure maximum likelihood theory. We can also define the observed 1(b) shows score function for 25 independent samples of size 10 information as from a Poisson distribution with mean 4 (Each function looks approximately linear) and at the true parameter the score ( ) ( ) function also varies around 0. Figure 1(c) shows score function of ( ) ∑ ( ) ∑ 25 independent samples of size n=10 from binomial (10, 0.4) where In Figure 1(d), the score function for Cauchy which is a sum of components. Also the Fisher information is distributions (also based on 25 independent samples of size 10) defined as are rather irregular and fail to behave as the previous models ( ) (although the score function also varies around 0 at but ( ) { } ( ) there is the potential for multiple roots to the score equation). This case indicates problems with a complicated likelihood. 116 Therefore using the stated assumptions we have ( ( )) Figure 1: Sampling variation of score functions for different as required. We have also find the variance of the score function} distributions. as In all examples in Figure 1, the score varies around zero at the ( ( )) ( ( )) ( ( ( ))) true parameter value. We can show this is generally the case. [ ( )] Recall from Section 3 that the score function is the first derivative ( ) since ( ( )) as seen above. We can rewrite ( ) as of the log-likelihood function where we set ( ) , then say at the true value of which is we have ( ) ( ) ∑{ } ∑ ( ) ( ( )) ∫ ( ) ( | ) This implies that ( ) ( ( )) [∑ ( ) ( ( ))] ∫ ( | ) ( | ) | ( | ) Now ∫ ( | ) ( | ) ( | ) | ( ( )) ( ) ( | ) [ ] ∫ ( ) ( ) The major assumption here is needed to justify interchanging the and ( ( )) . Differentiating Equation 2 with respect to order of differentiation and integration and can be stated in we have Assumption 3 as Assumption 3 : The range of integration does not depend on ( ( )) ∫ ( ) 117 Therefore ( (( ))) ( ( )) ∫ [ ] ( ) ∫ ( ) ( | ) ( ) ∫ ( | ) ( ) ( | ) ∫ [ ] ( ) Now consider ( | ) ( ( )) *∑ + ( | ) ( | ) ∫ [ ( ) ] ( ) * + * + is finite for all . Therefore for a case where we have ( (( ))) At therefore we have ( | ) ( | ) * + and [ ( )] * + ( ( )) ( | ) | and so ( ) * + Therefore as the function ( ) ( ) tends to a deterministic function ∫{ with root ( ) | { } } ( ) 4. SIMULATION CODE WITH R. ( ) | In this section the R code used in simulating the likelihood functions for the Normal and Poison distributions are stated as run ∫ ∫[ ] ( ) from the prompt. The Binomial and Cauchy distributions follow | ( ) | similar way. After simulating from the distributions the likelihood Since the score function is a sum of n independent random functions are plotted to show the distribution of the parameter. It variables, the last equation above shows that is clear from the code that the expected value of the score function moves around 0. ( ( )) ( ( )) ( ) set.seed(3) [ ] ( ) n<- 10 #............................... Normal Score Functions: t0<- 4 Next we see how ( ) behaves by studying ( ) as x<- rnorm(n,t0) we have theta<- seq(t0/2,t0*2,len=40) ( ) stheta<- n*(mean(x)-theta) [ ] ( ( )) par(mfrow=c(1,2)) and also that (assuming ) plot(theta,stheta,type='n', xlab=expression(theta),ylab='Score',cex=.6) ( ) ( ) lines(theta,stheta,lwd=.4) [ ] [ ] ( ) title(expression('(a) Normal n=10')) text(6.5,5.5,expression(paste('true ',theta,'=4'))) as Hence ( ) in probability as abline(v=t0,h=0) for(i in 1:20){ The discussion so far has dealt with the behavior of the score x<- rnorm(n,t0) function at the true parameter value. We now consider its stheta<- n*(mean(x)-theta) behavior at other values of . In general (we need to investigate a lines(theta,stheta,lwd=.1) case that indicate the existence of changepoint) for = we find } that there may be need for another assumption: # ............................... Poisson Score Functions: Assumption 4: For the density ( ) differs from t0<- 4 ( ) on a set of non zero measure. x<- rpois(n,t0) Note that ( ( )) unless for all x (which itself theta<- seq(t0/2,t0*2,len=40) stheta<- -n + sum(x)/theta contradicts Assumption 4). plot(theta,stheta,type='n',xlab=expression(theta), Then for an arbitrary value ylab='Score',ylim=c(-5,15),cex=.6) for(i in 1:20){ x<- rpois(n,t0) stheta<- -n + sum(x)/theta lines(theta,stheta,lwd=.1) } 118 abline(v=t0,h=0) title(expression('(b) Poisson n=10')) [ ] | ̂ √ [ ] | ( ) 5. Consistency of Maximum Likelihood | Estimators [ ] We now consider whether ̂ is a constant estimator of Using a | ( ) Taylor expansion for ( ) around , we have ( ) | ( ) ( ) ( ) ( ) ( ) | Now as ( )and since it lies For some ( ) and so we can write | | ( ) ( ) between ̂ and . Also, if ifs a continuous in then as ( ) ( ) | In particular, when ̂ then we have (nothing that ( ̂) ) [ ] [ ] | | ( ) ̂ ( ) In the case, the final term in Equation 7 and we have ( ) | Which can be rewritten as (̂ )√ ( ) | ̂ ( ) ( ) which has a standard normal distribution asymptotically. | 7. Limiting Chi-Square Distributions: Note that the numerator of Equation 6 approaches 0 as If we assume that the denominator is guaranteed nonzero, then Likelihood Ratio Statistic Equation 6 implies that and therefore ̂ . This requires the following assumption which can be seen as a Now we discuss the basic test statistic used for testing hypothesis strengthened version of Assumption 2. using the principles of likelihood functions. Suppose that l(.) is the log-likelihood established from the probability density f . then the consistency of ̂ implies that we can write Assumption 5: is non- zero in an interval containing . (̂ ) ( ) ( ̂) ( ̂ ) ( ̂) ( ) 6. Limiting Distribution of ̂ Where is between and . As well as demonstrated the consistency of the maximum likelihood estimator ̂, Equation 5 allows us to establish its Then representing the likelihood ratio statistic with distribution when n is large. Recall again that ( ( ̂) ( )) gives ( ) ( ( )) ( ( )) [ ] (̂ ) ( ̂) ( ̂ ) ( ) (̂ ) and since by definition we can write that Moreover, ( ) is a sum of independent and identically ( ) ( ) distributed contributions. Hence from the central limit theorem we ( )( ̂ ) have asymptotically, ( ) ( ) ( ) ( ) ( )( ̂ ) ( ) ( ) ( ) √ [ ] ( ) | ( ) ( ) Now write . Then from Equation 6, we have It is clear that the first part of Equation 8 is asymptotically the √ [ ] | square of a standard normal random variable and it is therefore a ( ) ( ) distribution in addition, the last two ratios ( ) and ( ) √ [ ] | tend to 1 using similar arguments to those applied in the previous ̂ subsection. In the same direction, we can obtain the ( ) distribution for a case when is vector (without proof) in that as | Hence we write above we write [ ( ̂) ( )] ( ̂ ) ( )( ̂ ) It is therefore noted that ( ) has an approximate chi-square distribution on p degree(s) of freedom for repeated sampling of data from the model. We can write ( ) 119 7.1 The two-mean model 8. RESULTS In Obisesan et.al(2013), the development of changepoint detection was based on Hinkley(1970) work. Hinkley(1970) considered In this work it has been shown that changepoint arises as a sequences of random variables and discussed the point at which result of failure of some regular assumptions specifically in this the probability distribution changes using a normal distribution case Assumptions 1 and Assumptions 4 may fail. This work has with changing mean. The asymptotic distribution of the maximum justified using simulation in the theory of likelihood function for likelihood estimate discussed in this paper is particularly relevant the score functions to show the change in parameter allowing to change-point. The author indicated the simplest model over a changepoint to occur. The work also justifies the application of whole range of data as ( ) as usual changepoint detection as used in Obisesan et.al(2013). The use of where ( )is a mean function and refer to error terms. Hinkley R has therefore made it possible to show the failure of regular (1970) computed the asymptotic distribution in the normal case assumption. when $\theta_0$ and $\theta_1$ are unknown. The asymptotic dsitribution is found to be the same when the mean levels are known. The two-mean model to be considered supposes that there 9. CONCLUSION Single changepoint detection has been discussed in the framework exist a mean ( ) and mean ( ) and of the failure of regular assumptions that have not been commonly respectively. He also computed the asymptotic noticed. Likelihood function was used to merge the two-mean distribution of the likelihood estimate of the change-point levels and various score functions were simulated using the ̂ (where and are known and is unknown) is obtained successful statistical computing language R. The theoretical from a sample by simply maximizing the likelihood implications of failure of regular assumptions were discussed and function of the form the failed assumption identified using R . This work has therefore provided a basis for using computational statistics methods in solving a mathematical problem. ( ) ∏ ( ) ∏ ( ) which can be written in form of log likelihood as 10. REFERENCES [1] Chen, J. amd Gupta, A.L. (2000): Parametric statistical ( ) ∑ ( ) ∑ ( ) ( ) change Point Analysis. Birkhauser Boston. Moreover, many cases arise when the mean levels are not known. [2] Easterling, D.R. and Peterson, T.C. (1995). A New Method The log-likelihood of the observed sequence ( ) is for Detecting Undocumented Discontinuities in Climatological Time Series. Int.J. Climatol, 15:369-377. ( | ) [3] Hanesiak, J.M. and Wang, X.L. (2005). Adverse Weather Trends in the Canadian Arctic J.Climate, 18(2):3140-3156. [4] Hinkley, D.V. (1970). Inference about the Change-point in a Sequence of Random Variables. Biometrika, 57(1):1-17. {∑( ) [5] Jandhyala, V.K., Fotopoulus, S.B., and Evagelopoulos, N. (1999). Change-point Methods for Weibull Models with Applications to Detection of Trends in Extreme Temperature. ∑ ( ) } ( ) Environmetrics, 10:547-564. [6] Jandhyala, V.K., and MacNeil, I.B. (1986) The Changepoint If we assume that is known therefore the maximum likelihood Problem: A review of Applications. Elsevier: In statistical ∑ estimators respectively are ̂ ̂ Aspects of Water Quality Monitoring (Eds. A.H. ElShaarawi ∑ ∑ ( ) ∑ ( ) and R.E. Kwiatkowski), pages 381-387. and ̂ [7] Lu, Q., Lund, R., and Seymour, L. (2005). An update of US Particularly for convenience. Hinkley (1970) substituted temperature trends. J. Climate, 18:4906-4919. as known so that Equation 10 becomes [8] Manly, B.F.L. (2001). Statistics for Environmental Science ( | ) and Management. Chapman Hall/CRC, Boca Raton. {∑( ) [9] Obisesan K.O, Lawal M, Bamiduro T.A, and Adelakun A.A(2013): Data Visualization and Changepoints Detection in Environmental Data: The Case of Water Pollution in Oyo State, Nigeria: Journal of Science Research: Vol.12:181-190. ∑ ( ) } ( ) [10] Obisesan K.O(2015) Modelling Multiple Changepoints Detection Ph.D Thesis, Department of Statistics, University Assuming that is unknown and putting the maximum of Ibadan. likelihood estimates of and back into the log-likelihood in Equation 11 and re-arranging the emerging sums of squares [11] Obisesan K.O(2011) Changepoint Detection in Time-Series conditional on t Equation 11 was used to estimate changepoint of with Hydrological Applications M.Phil Thesis, Department water pollution in Eleyele and Asejire reservoirs in Nigeria. This of Statistical Science, University College London. confirms the application of the likelihood theory of changepoint. [12] Page, E.S.(1954). Continuous Inspection Schemes. More on the applications are discussed in Obisesan(2011, 2015). .Biometrika, 41:100-114. 120 [13] Page, E.S. (1955). A test for a Change in a Parameter [16] Wang, X.L. (2006). Climatology and Trends in Some Occuring at an Unknown Time Point. Biometrika, 42:523- Adverse and Fair Weather Conditions in Canada, 1953- 526. 2004. J. Geophys. Res 111 [14] Page, E.S. (1957). On Problems in which a Change in [17] Yang, C., Chandler, R.E., Isham, V.S., and Wheater, H.S. Parameter Occurs at Unknown Point. Biometrika, 44:248- (2006). Quality Control for Daily Observational Rainfall 252. Series in UK.Water and Environment Journal, 20:185-193. [15] Rice, J.A. (2007). Mathematical Statistics and Data Analysis. Duxbury. 121