Likelihood Asymptotics for Changepoint Problem
                                                            K. O. Obisesan
                                                         Department of Statistics
                                                          University of Ibadan
                                                                Nigeria
                                                email:ko.obisesan@ui.edu.ng ;
                                                    obidairo@gmail.com


ABSTRACT                                                                    contrast when the point is unknown this leads to various
Changepoint problems are often encountered when series undergo              complexities and non-linearity.
abrupt changes or discontinuities. Detecting changepoints can               Many applications of changepoints analysis exist. Relevant
signal useful actions towards sustainable developments. However             literature can be found in many fields including; Biology, Physics,
the presence of changepoints have often been known to lead to               Chemistry, Environmental Sciences and Climate Change,
failure of some regular assumptions. In theory much has not been            Engineering, Econometrics, Medicine, Behavioral Sciences,
done on which assumptions fail and to what extent will it affect            Political Science, Finance, Image Analysis, Security etc. The
the score functions of the likelihood asymptotic. In this work we           earliest works found seem to be those by Page(1954, 1955, 1957)
concentrate on simulating the likelihood function using R to                where the cumulative sum(CUSUM) approach was used.
establish the failure of regular assumption due to the presence of          Consequently Jandhyala and MacNeil(1986) and Jandhyala
changepoint. The failure of regular assumption is established               et.al(1999) provided detailed reviews of many approaches to
using various score functions coded in R thereby making it                  changepoint modelling. It is important to state that the large body
possible to understand the statistical theory and the consequences          of literature exists due to the fact that the standard theory breaks
of the failure of assumptions as a result changepoints.                     down where the time of change is unknown. Much has not been
                                                                            done in showing the breakdown of the standard theory as regards
CCS Concepts                                                                failure of regular assumption. More details in respect of standard
                                                                            theory on changepoint are available in Easterling and
• Computing       methodologies       ➝Simulation      ➝Simulation          Peterson(1995), Chen and Gupta(2000), Lu et.al (2005),Hanesiak
evaluation                                                                  and Wang(2005) and Wang(2006).
Keywords                                                                    It is indicated in Obisesan.et.al(2013) that the data analysed were
changepoint,     likelihood    asymptotic,   regular    assumption,         the physico-chemical properties of water samples obtained from
simulation.                                                                 two reservoirs in Oyo State Nigeria. The data were seen to contain
                                                                            some abrupt changes in behaviour. In the work various charts and
1. INTRODUCTION                                                             diagrams were engaged in showing the positions and locations of
Changepoints are referred to as discontinuities that can lead to            changepoints and the likelihood function was written to show the
non-linearity even in complex functions (Chen and Gupta, 2000).             single changepoint detection. However the theory on changepoint
The causes of changepoints include; changes in locations of                 linking failure of assumption was not shown therefore this present
observations, equipment, measurement methods, environmental                 work attempts to extend the likelihood theory to show the
effects, regulations, standards and so on. Generally we need to             implications of failure of regular assumptions as a result of the
investigate the potential presence of possible changes in the data          presence of changepoint.
set indicating data quality problems that should be resolved prior
to any subsequent analysis. This will therefore signal signs for
timely protection and knowing this could be highly advantageous
in planning for the future. However Yang.et.al.(2006) noted that            2. STANDARD TECHNIQUES OF
changes do occur even in the best regulated systems. They                   LIKELIHOOD ASYMPTOTICS
indicated that discrepancies in records, occasional disagreement            To study the inference of changepoint problems such as to
between documentation and data, abnormal data entry, changed                understand its non-standard nature it is important to review some
units of measurement and other problems require adequate                    properties of likelihood functions. The likelihood function for a
attention. Most times we need to detect the number of                       scalar parameter based on data                      as a collection
changepoints, or jumps, and their locations whereas it is noted in          of independence observations is defined to be
Mainly(2001) that it is much easier if the point of change is
known. This case is referred to as intervention analysis. In
                                                                                           ( | )         (      )     ∏ (             )

                                                                            which is simply the joint density of the data, regarded as a
                                                                            function of the parameter (Rice, 2007). For convenience, we study
CoRI’16, Sept 7–9, 2016, Ibadan, Nigeria.                                   the log-likelihood function ( | )      ( ) and write


                                                                      115
        ( )           ( | )                                                 and
                                    (         )           (       )
                                                                                              ( )       {       ∑        (     )}
                                             (        )
                                                                            Which can be written
                                ∑       (         )
                                                                                                                        (     )
                                                                                               ( )     ∑       {                  }
The maximum likelihood estimate of ̂ which is a value of
that minimizes the log-likelihood function. If the likelihood                                                 ( )
function is a differentiable function of then ̂ will be the root            Where    ( ) refers to single observation information.
  ( )                                               ( )
          Moreover, for a local maximum we need             at ̂
The main assumptions here can be stated simply as
                                                                            Now we show some characteristics of the score function when
    Assumption 1 : The log-likelihood is a twice                            data are assumed generated from (       ) so that (assumed true
                                                                            value of      is the parameter to be estimated. If we have an
    differentiable function.                                                independent and identically distributed sample of size n, the log-
    Assumption 2 : The second derivative                                    likelihood is written as
       ( )
               at ̂                                                            ( )    ∑        ( | )                                        ( )

                                                                            A careful illustration of the behavior of the score function is given
                                                                            in Figure 1. This allows the sampling variation of score function
3. THE SCORE FUNCTION: SIMULATION                                           for different models (Normal, Poisson, Binomial and Cauchy) for
                                                                            samples of size            . Figure 1(a) shows 25 score function,
Under Assumption 1, the first derivative is usually called the
                     ( )   [∑       (   )]                                  each based on independent and identically distributed sample of
score function:                        and is regarded as a                 size          from N(4, 1). Each function is exactly linear and the
function of for fixed X This function plays a central role in               score varies around 10 at the true parameter                   . Figure
maximum likelihood theory. We can also define the observed                  1(b) shows score function for 25 independent samples of size 10
information as                                                              from a Poisson distribution with mean 4 (Each function looks
                                                                            approximately linear) and at the true parameter              the score
                    ( )                                       (   )         function also varies around 0. Figure 1(c) shows score function of
         ( )                    ∑ ( )             ∑
                                                                            25 independent samples of size n=10 from binomial (10, 0.4)
                                                                            where            In Figure 1(d), the score function for Cauchy
which is a sum of    components. Also the Fisher information is               distributions (also based on 25 independent samples of size 10)
defined as                                                                  are rather irregular and fail to behave as the previous models
                                    ( )                                     (although the score function also varies around 0 at                but
                     ( )    {          }          ( )                       there is the potential for multiple roots to the score equation). This
                                                                            case indicates problems with a complicated likelihood.


                                                                      116
                                                                              Therefore using the stated assumptions we have ( (          ))
   Figure 1: Sampling variation of score functions for different              as required. We have also find the variance of the score function}
                         distributions.                                       as


In all examples in Figure 1, the score varies around zero at the                        ( (          ))        (   (               ))           ( ( (       )))
true parameter value. We can show this is generally the case.                                                              [       (            )]
Recall from Section 3 that the score function is the first derivative
                                                            ( )               since ( (         ))            as seen above. We can rewrite (                       ) as
of the log-likelihood function where we set (        )         , then
                                                                              say
at the true value of which is we have
                                                                                                                               (        )
                                                                                           (         )        ∑{                            }        ∑ ( )

              ( (          ))        ∫       (       )     ( | )
                                                                              This implies that

                                             ( )                                          ( (            ))        [∑ ( )                            ( ( ))]
                       ∫                             ( | )
                           ( |   )       |

                                                 ( | )                        Now
             ∫                                             ( | )
                 ( |   )         ( | )           |                                  ( ( ))
                                                                                                (         )
                             ( | )                   [ ]                            ∫                          (       )                                          ( )

The major assumption here is needed to justify interchanging the              and ( ( ))             . Differentiating Equation 2 with respect to
order of differentiation and integration and can be stated in
                                                                              we have
Assumption 3 as

Assumption 3 : The range of integration does not depend on                                          ( ( ))         ∫                             (      )


                                                                        117
Therefore                                                                                                                                                 ( ((                   )))

                       ( ( ))                          ∫               [               ]               (               )                                   ∫ (                   ) ( | )
                                                                                                                                                                                        (        )
                                                                                                                                                           ∫                                            ( | )                               ( )
                                                                                                                                                                     ( | )
                                   ∫               [               ]       (                       )                                            Now                                                                                             consider
                                                                                                                                                                                                 ( |    )
                                                                                                                                                 ( (            ))               *∑                         +
                                                                                                                                                            ( |          )                             ( |          )
                       ∫ [                                         (       ) ] (                                   )                                  *                      +               *                      + is finite for all . Therefore
                                                                                                                                                for a case where                                                        we have    ( ((           )))
At       therefore we have
                                                                                                                                                          ( |    )                                                                        ( |    )
                                                                                                                                                  *                  +                 and       [              (         )]       *                 +
                      ( ( ))                                                                                                                                                                           ( |      )
             |                                                                                                                                  and so               (            )          *                      +      Therefore as                  the
                                                                                                                                                function             ( )                     (         ) tends to a deterministic function
                                                   ∫{                                                                                           with root
                                                               (               )               |


                                                                           {                           } } (                   )
                                                                                                                                                4. SIMULATION CODE WITH R.
                                                       (               )           |
                                                                                                                                                In this section the R code used in simulating the likelihood
                                                                                                                                                functions for the Normal and Poison distributions are stated as run
             ∫                                     ∫[                                                          ]       (       )                from the prompt. The Binomial and Cauchy distributions follow
                  |
                                                               (               )           |
                                                                                                                                                similar way. After simulating from the distributions the likelihood
Since the score function is a sum of n independent random                                                                                       functions are plotted to show the distribution of the parameter. It
variables, the last equation above shows that                                                                                                   is clear from the code that the expected value of the score function
                                                                                                                                                moves around 0.
         ( (          ))           (           (               ))
                                                                            (                  )                                                set.seed(3)
                                                       [                                        ]                              ( )              n<- 10
                                                                                                                                                #............................... Normal Score Functions:
                                                                                                                                                 t0<- 4
Next we see how                    (               ) behaves by studying                                                   (       ) as          x<- rnorm(n,t0)
      we have                                                                                                                                    theta<- seq(t0/2,t0*2,len=40)
                               (               )                                                                                                 stheta<- n*(mean(x)-theta)
                           [                    ]                      ( (                         ))
                                                                                                                                                par(mfrow=c(1,2))
and also that (assuming                                                        )                                                                plot(theta,stheta,type='n',
                                                                                                                                                    xlab=expression(theta),ylab='Score',cex=.6)
         (       )                                         (               )                                                                     lines(theta,stheta,lwd=.4)
     [            ]                    [                                       ]                                                   ( )           title(expression('(a) Normal n=10'))
                                                                                                                                                 text(6.5,5.5,expression(paste('true ',theta,'=4')))
as               Hence                     (               )               in probability as                                                     abline(v=t0,h=0)

                                                                                                                                                for(i in 1:20){
The discussion so far has dealt with the behavior of the score                                                                                    x<- rnorm(n,t0)
function at     the true parameter value. We now consider its                                                                                     stheta<- n*(mean(x)-theta)
behavior at other values of . In general (we need to investigate a                                                                                lines(theta,stheta,lwd=.1)
case that indicate the existence of changepoint) for =    we find                                                                               }
that there may be need for another assumption:

                                                                                                                                                # ............................... Poisson Score Functions:
Assumption 4: For             the density                                                                  (           ) differs from
                                                                                                                                                t0<- 4
 (   ) on a set of non zero measure.
                                                                                                                                                x<- rpois(n,t0)
Note that ( (      ))      unless                                                              for all x (which itself                          theta<- seq(t0/2,t0*2,len=40)
                                                                                                                                                stheta<- -n + sum(x)/theta
contradicts Assumption 4).
                                                                                                                                                plot(theta,stheta,type='n',xlab=expression(theta),
Then for an arbitrary value                                                                                                                         ylab='Score',ylim=c(-5,15),cex=.6)
                                                                                                                                                for(i in 1:20){
                                                                                                                                                  x<- rpois(n,t0)
                                                                                                                                                  stheta<- -n + sum(x)/theta
                                                                                                                                                  lines(theta,stheta,lwd=.1)
                                                                                                                                                }

                                                                                                                                          118
abline(v=t0,h=0)
title(expression('(b) Poisson n=10'))                                                                                                                                             [                ]
                                                                                                                                                                                               |
                                                                                                                    ̂          √        [             ]
                                                                                                                                                 |                                 ( )
5. Consistency of Maximum Likelihood                                                                                                                                                   |
Estimators                                                                                                                                                        [                        ]
We now consider whether ̂ is a constant estimator of Using a                                                                                                                   |
                                                                                                                                                                                                            ( )
Taylor expansion for ( ) around , we have                                                                                                                                 ( )
                                                                                                                                                                              |
                 ( )                  ( )         (                    )
                                                                                                                                        ( )                       ( )
                                                                               |                           Now as                                             (           )and                             since it lies
For some         (      ) and so we can write                                                                                               |                     |

                                    ( )     ( )                                                            between ̂ and           . Also, if                     ifs a continuous in                             then as
                      (      )
                                        ( )
                                                               |
In particular, when          ̂ then we have (nothing that ( ̂)                                   )                                      [             ]               [                    ]
                                                                                                                                                  |                                |
                           ( )
       ̂                                                   ( )                                             In the case, the final term in Equation 7                                       and we have
                           ( )
                              |
Which can be rewritten as                                                                                                          (̂            )√
                                      ( )                                                                                                                                 |
        ̂                                                                              ( )
                                       ( )                                                                 which has a standard normal distribution asymptotically.
                                          |

                                                                                                           7. Limiting Chi-Square Distributions:
Note that the numerator of Equation 6 approaches 0 as     If
we assume that the denominator is guaranteed nonzero, then                                                 Likelihood Ratio Statistic
Equation 6 implies that              and therefore ̂  . This
requires the following assumption which can be seen as a                                                   Now we discuss the basic test statistic used for testing hypothesis
strengthened version of Assumption 2.                                                                      using the principles of likelihood functions. Suppose that l(.) is the
                                                                                                           log-likelihood established from the probability density f . then the
                                                                                                           consistency of ̂ implies that we can write
Assumption 5:         is non- zero in an interval containing                                 .
                                                                                                                                                      (̂     )
                                                                                                                    ( )      ( ̂) ( ̂       ) ( ̂)                 ( )

6. Limiting Distribution of ̂                                                                              Where        is between              and       .
As well as demonstrated the consistency of the maximum
likelihood estimator ̂, Equation 5 allows us to establish its                                              Then representing the likelihood ratio statistic with
distribution when n is large. Recall again that                                                             ( ( ̂)   ( )) gives
                                                ( )
       ( ( ))       ( (        ))           [       ]                                                                           (̂      ) ( ̂) ( ̂         ) ( )
                                                                                                                        (̂ )
                                                                                                           and since             by definition we can write that
Moreover, ( ) is a sum of independent and identically
                                                                                                                                                  ( )         ( )
distributed contributions. Hence from the central limit theorem we                                                             ( )( ̂     )
have asymptotically,                                                                                                                              ( )            ( )
                                                                                                                                                                   ( )
                                      ( )                                                                                                      ( )( ̂       )
                                                                                                                                                                   ( )
                                                                   (           )
                                                                                                                                              ( )
                       √                  [           ]                                                                                                                       ( )
                                              |                                                                                                               ( )
                       (      )
Now write                              . Then from Equation 6, we have
                                                                                                           It is clear that the first part of Equation 8 is asymptotically the
                 √        [           ]
                                  |                                                                        square of a standard normal random variable and it is therefore a
                                                                                                                                                                                                       (   )        (       )
                                                                                                              distribution in addition, the last two ratios                                            (   )
                                                                                                                                                                                                             and        (       )
                                              √            [                       ]
                                                                           |                               tend to 1 using similar arguments to those applied in the previous
                      ̂                                                                                    subsection. In the same direction, we can obtain the
                                                          ( )
                                                                                                           distribution for a case when is vector (without proof) in that as
                                                          |
Hence we write                                                                                             above we write             [ ( ̂)   ( )] ( ̂         ) ( )( ̂
                                                                                                              ) It is therefore noted that ( ) has an approximate chi-square
                                                                                                           distribution on p degree(s) of freedom for repeated sampling of
                                                                                                           data from the model. We can write ( )


                                                                                                     119
7.1 The two-mean model                                                                    8. RESULTS
In Obisesan et.al(2013), the development of changepoint detection
was based on Hinkley(1970) work. Hinkley(1970) considered                                       In this work it has been shown that changepoint arises as a
sequences of random variables and discussed the point at which                            result of failure of some regular assumptions specifically in this
the probability distribution changes using a normal distribution                          case Assumptions 1 and Assumptions 4 may fail. This work has
with changing mean. The asymptotic distribution of the maximum                            justified using simulation in the theory of likelihood function for
likelihood estimate discussed in this paper is particularly relevant                      the score functions to show the change in parameter allowing
to change-point. The author indicated the simplest model over a                           changepoint to occur. The work also justifies the application of
whole range of data as           ( )                       as usual                       changepoint detection as used in Obisesan et.al(2013). The use of
where ( )is a mean function and refer to error terms. Hinkley                             R has therefore made it possible to show the failure of regular
(1970) computed the asymptotic distribution in the normal case                            assumption.
when $\theta_0$ and $\theta_1$ are unknown. The asymptotic
dsitribution is found to be the same when the mean levels are
known. The two-mean model to be considered supposes that there
                                                                                          9. CONCLUSION
                                                                                          Single changepoint detection has been discussed in the framework
exist a mean ( ) and mean ( )                          and
                                                                                          of the failure of regular assumptions that have not been commonly
           respectively. He also computed the asymptotic
                                                                                          noticed. Likelihood function was used to merge the two-mean
distribution of the likelihood estimate of the change-point
                                                                                          levels and various score functions were simulated using the
̂ (where      and     are known and is unknown) is obtained                               successful statistical computing language R. The theoretical
from a sample              by simply maximizing the likelihood                            implications of failure of regular assumptions were discussed and
function of the form                                                                      the failed assumption identified using R . This work has therefore
                                                                                          provided a basis for using computational statistics methods in
                                                                                          solving a mathematical problem.
              (         )       ∏ (             ) ∏       (           )

which can be written in form of log likelihood as                                         10. REFERENCES
                                                                                          [1] Chen, J. amd Gupta, A.L. (2000): Parametric statistical
    (         )   ∑             (       ) ∑         (         )               ( )             change Point Analysis. Birkhauser Boston.

Moreover, many cases arise when the mean levels are not known.                            [2] Easterling, D.R. and Peterson, T.C. (1995). A New Method
The log-likelihood of the observed sequence (        ) is                                     for Detecting Undocumented Discontinuities in
                                                                                              Climatological Time Series. Int.J. Climatol, 15:369-377.
          (                 |                   )                                         [3] Hanesiak, J.M. and Wang, X.L. (2005). Adverse Weather
                                                                                              Trends in the Canadian Arctic J.Climate, 18(2):3140-3156.
                                                                                          [4] Hinkley, D.V. (1970). Inference about the Change-point in a
                                                                                              Sequence of Random Variables. Biometrika, 57(1):1-17.
                                            {∑(               )
                                                                                          [5] Jandhyala, V.K., Fotopoulus, S.B., and Evagelopoulos, N.
                                                                                              (1999). Change-point Methods for Weibull Models with
                                                                                              Applications to Detection of Trends in Extreme Temperature.
                                        ∑ (             ) }       (       )                   Environmetrics, 10:547-564.
                                                                                          [6] Jandhyala, V.K., and MacNeil, I.B. (1986) The Changepoint
If we assume that       is known therefore the maximum likelihood                             Problem: A review of Applications. Elsevier: In statistical
                                                      ∑
estimators                     respectively are ̂            ̂                                Aspects of Water Quality Monitoring (Eds. A.H. ElShaarawi
∑                   ∑       (       )   ∑   (       )                                         and R.E. Kwiatkowski), pages 381-387.
         and ̂
                                                                                          [7] Lu, Q., Lund, R., and Seymour, L. (2005). An update of US
Particularly for convenience. Hinkley (1970) substituted                                      temperature trends. J. Climate, 18:4906-4919.
as known so that Equation 10 becomes
                                                                                          [8] Manly, B.F.L. (2001). Statistics for Environmental Science
       (            |             )
                                                                                              and Management. Chapman Hall/CRC, Boca Raton.
                                    {∑(             )                                     [9] Obisesan K.O, Lawal M, Bamiduro T.A, and Adelakun
                                                                                              A.A(2013): Data Visualization and Changepoints Detection
                                                                                              in Environmental Data: The Case of Water Pollution in Oyo
                                                                                              State, Nigeria: Journal of Science Research: Vol.12:181-190.
                                    ∑ (             ) }                   (   )
                                                                                          [10] Obisesan K.O(2015) Modelling Multiple Changepoints
                                                                                               Detection Ph.D Thesis, Department of Statistics, University
Assuming that      is unknown and putting the maximum
                                                                                               of Ibadan.
likelihood estimates of    and back into the log-likelihood in
Equation 11 and re-arranging the emerging sums of squares                                 [11] Obisesan K.O(2011) Changepoint Detection in Time-Series
conditional on t Equation 11 was used to estimate changepoint of                               with Hydrological Applications M.Phil Thesis, Department
water pollution in Eleyele and Asejire reservoirs in Nigeria. This                             of Statistical Science, University College London.
confirms the application of the likelihood theory of changepoint.                         [12] Page, E.S.(1954). Continuous Inspection Schemes.
More on the applications are discussed in Obisesan(2011, 2015).                                .Biometrika, 41:100-114.


                                                                                    120
[13] Page, E.S. (1955). A test for a Change in a Parameter                 [16] Wang, X.L. (2006). Climatology and Trends in Some
     Occuring at an Unknown Time Point. Biometrika, 42:523-                     Adverse and Fair Weather Conditions in Canada, 1953-
     526.                                                                       2004. J. Geophys. Res 111
[14] Page, E.S. (1957). On Problems in which a Change in                   [17] Yang, C., Chandler, R.E., Isham, V.S., and Wheater, H.S.
     Parameter Occurs at Unknown Point. Biometrika, 44:248-                     (2006). Quality Control for Daily Observational Rainfall
     252.                                                                       Series in UK.Water and Environment Journal, 20:185-193.
[15] Rice, J.A. (2007). Mathematical Statistics and Data Analysis.
     Duxbury.


                                                                     121