=Paper=
{{Paper
|id=None
|storemode=property
|title=Predicting Ramp Events with a Stream-based HMM framework
|pdfUrl=https://ceur-ws.org/Vol-960/paper6.pdf
|volume=Vol-960
}}
==Predicting Ramp Events with a Stream-based HMM framework==
<pdf width="1500px">https://ceur-ws.org/Vol-960/paper6.pdf</pdf>
<pre>
      Predicting Ramp Events with a Stream-based HMM
                        framework
 Carlos A. Ferreira1 and João Gama2 and Vı́tor S. Costa3 and Vladimiro Miranda4 and Audun Botterud5


Abstract. The motivation for this work is the study and prediction             parameters are estimated from historical data, the state transitions
of wind ramp events occurring in a large-scale wind farm located in            probabilities are estimated from wind power measurements and the
the US Midwest. In this paper we introduce the SHREA framework, a              emission probabilities, at each state, are estimated from wind speed
stream-based model that continuously learns a discrete HMM model               observations. To estimate the state probability transitions, first, we
from wind power and wind speed measurements. We use a super-                   combine a ramp filter, a derivative alike filter, and a user-defined
vised learning algorithm to learn HMM parameters from discretized              threshold to translate the real-valued wind power time series into
data, where ramp events are HMM states and discretized wind speed              a labeled time-series, coding three different types of ramp events:
data are HMM observations. The discretization of the historical data           ramp-up, no-ramp and ramp-down. Then, the transitions occurring
is obtained by running the SAX algorithm over the first order varia-           in this labeled time series are used to estimate the transitions of the
tions in the original signal. SHREA updates the HMM using the most             Markov process hidden in the HMM, i.e., to model the transitions be-
recent historical data and includes a forgetting mechanism to model            tween the three states associated with the three types of ramp events.
natural time dependence in wind patterns. To forecast ramp events              To learn the HMM emission probabilities, first we combine a ramp
we use recent wind speed forecasts and the Viterbi algorithm, that             filter and the SAX algorithm [9] to translate the wind speed measure-
incrementally finds the most probable ramp event to occur.                     ments signal into a string. Next we use both the wind power labeled
   We compare SHREA framework against Persistence baseline in                  time series and the wind speed string to estimate the emission proba-
predicting ramp events occurring in very short-time horizons.                  bilities at each state. The estimative is obtained by counting the string
                                                                               symbols, coding wind speed variations, associated with a given state/
1 Introduction                                                                 ramp event.
                                                                                   When we analyze wind power historical data we observe both sea-
Ramping is one notable characteristic in a time series associated with         sonal weather regimes and short-time ahead dependence of the recent
a drastic change in value in a set of consecutive time steps. Two prop-        past wind power/speed measurements. Thus, to accommodate these
erties of a ramping event i.e. slope and phase error, are important            issues, in SHREA we included a strategy that forgets old weather
from the point of view of the System Operator (SO), with impor-                regimes and continuously updates the HMM with the most recent
tant implications in the decisions associated with unit commitment or          measurements, both wind power measurements and wind speed mea-
generation scheduling. Unit commitment decisions must prepare the              surements.
generation schedule in order to smoothly accommodate forecasted                    To generate ramp event predictions occurring in short-time ahead
drastic changes in wind power availability [2]. In this paper we               window we use the wind speed forecast, obtained from a major NWP
present SHREA a novel stream-based framework that predicts ramp-               provider, and the current HMM. First, we run a filter over the wind
ing events in short term wind power forecasting.                               speed forecast signal to obtain a signal of wind speed variations.
   The development of the SHREA framework is the answer to the                 Next, we run the SAX algorithm to translate the resulting real-valued
three main issues available in ramp event forecasting. How can we              time series into a string. Then, we run the Viterbi algorithm [12] to
describe and get insights on the wind power, and wind speed, time-             obtain the most likely sequence of ramp events. We could use the
dependent dynamic and use this description to predict short-time               Forward-Backward algorithm [12] usually used to estimate the pos-
ahead ramp events? How can we combine real valued historical wind              terior probability but we would be using long time ahead, thus unre-
power and speed measurements and Numerical Weather Predictions                 liable, wind speed forecasts to predict current ramp events.
(NWP), specially wind speed predictions, to output reliable real-time              It is important to observe that wind speed measurements and
predictions? How can we continuously adapt SHREA to accommo-                   forecasts, mainly short time horizon predictions, are approximately
date different natural weather regimes yet producing reliable predic-          equally distributed over time. Moreover, the wind power output of
tions?                                                                         each turbine is related to wind speed measurements.
   To answer these questions we designed a stream-based framework                  In this work we run the SHREA framework to describe and predict
that continuously learns a discrete Hidden Markov Model (HMM)                  very short-time ahead ramp events occurring in a large-scale wind
and uses it to generate predictions. To learn and update the HMM the           farm located in the US Midwest. We present a comparison against
SHREA framework uses a supervised strategy whereas the HMM                     the Persistence model that is known to be hard to beat in short-time
1 LIAAD-INESC TEC and ISEP - Polytechnic Institute of Porto, Portugal          forecasts [10].
2 LIAAD-INESC TEC and FEP - University of Porto, Portugal                          Despite the difficulty of the ramp forecasting problem, in this work
3 CRACS-INESC TEC and FC - University of Porto, Portugal                       we make the following contributions: Develop a stream-based frame-
4 INESC TEC and FE - University of Porto, Portugal
                                                                               work that predicts ramp events and generates both descriptive and
5 Argonne National Laboratory, Argonne, IL, USA


                                                                          28
                                                                                     results are presented that relate this parameter to the type and mag-
                                                                                     nitude of identified ramps. The Pref parameter is usually defined
                                                                                     according to the specific features of the wind farm site and, usually,
                                                                                     is defined as a percentage of the nominal wind power capacity or a
                                                                                     specified amount of megawatts.
                                                                                        A comprehensive analysis of ramp modeling and prediction may
                                                                                     be found in [2].

                                                                                         Algorithm 1: SHREA: a stream-based ramp predictor
Figure 1: Illustration of ramp events, defined as a change of at least 50% in             input : Three time series: PT , wind power measurements; OT , wind speed measurements; and JT , wind
power in an interval of 4 hours                                                                   Speed forecasts; a, the forecast horizon; Pref , threshold to identify ramp events; ∆t, the ramp
                                                                                                  definition parameter; W, the PAA parameter that specifies the amount of signal aggregation; σ, a
                                                                                                  forgetting factor
                                                                                          output: A sequence of predictions Qd          d
                                                                                                                              r . . . Qr+a for each period/window d = 1, . . .
cost-effective models; Introduce a forgetting mechanism so that we                        countT imeP eriods ← 0; f lag ← 0; Acount ← 0; B count ← 0;
                                                                                          for each period/window d do
can learn a HMM using only the most recent weather regimes; Use                                  countT imeP eriods + +
                                                                                     1           Preprocessing
wind speed forecasts as observations of a discrete HMM to predict                                Pd                  d    d                 d d                   d
                                                                                                  s ← fitSpline(P ), Os ← fitSpline(O ), Js ← fitSpline(J )
short-time ahead ramp events.                                                                    Pd
                                                                                                  f   ←   rampDef(P  d , ∆t), Od ← rampDef(Od , ∆t), Jd ← rampDef(Jd , ∆t)
                                                                                                                     s          f               s            f                s
   In the next Section we introduce the ramp event forecast problem.                            Ld ← label(Pd
                                                                                                            f , Pref ); // Label Data
In Section 3 we present a detailed description of our framework.                                Od                d   d
                                                                                                   n ← znorm(Of ), Jn ← znorm(Jf )
                                                                                                                                    d

                                                                                                Od       ←             d )), OFd
                                                                                                                               str ← SAX(PAA(Jn ))
                                                                                                                                                  d
In Section 4 we present and discuss the obtained results. Last, we                   2
                                                                                                   str       SAX(PAA(O
                                                                                                Learn Supervised HMM
                                                                                                                       n

present some conclusions and present future research directions.                                π ←
                                                                                                (δ(Ld (r) = rampDown), δ(Ld (r) = noramp), δ(Ld (r) = rampU p))
                                                                                                λd (A, B, π) ← LearnHMM(Ostr       d (1, . . . , r), Ld (1, . . . , r), A
                                                                                                                                                                         count , Bcount )
                                                                                     3          Predict Ramp Events using the learned HMM
2 Ramp Event Definition and Related Work                                                        Qd r . . . Qd
                                                                                                            r+a   ←  V  iterbi(λ,   OF d (r + 1, . . . , r + a))
                                                                                                                                       str
                                                                                                λd (A, B, π) ← updateHMM(Ostr      d (r + 1, . . .), Ld (r + 1, . . .))
One of the main problems in ramp forecasting is how to define a                      4          Forgetting mechanism
                                                                                                if (countTimePeriods==σ) then
ramp. In fact, there is no standard definition [7, 3, 8] and almost                                     Aaux                        aux
                                                                                                           count ← Acount ; Bcount ← Bcount ; f lag ← 1
                                                                                                if (countTimePeriods mod σ == 0 & flag==1) then
all existing literature report different definitions, depending, for in-                               Acount ← Acount − Aaux                                   aux
                                                                                                                                     count ; Bcount ← Bcount − Bcount
stance, on the location or on the farm’s size.
   The authors in [5] and [11] define several relevant characteris-
tics for ramp definition, characterization and identification: to define             3 Methodology developed to Forecast Ramps
a ramp event, we have to determine values for its three key char-
acteristics: direction, duration and magnitude (see Figure 1). With                  In this section we present SHREA framework, a stream-based frame-
respect to direction there are two basic types of ramps: the upward                  work that uses a supervised learning strategy to obtain a HMM.
ones (or ramp-ups), and the downward ones (or ramp-downs). The                       SHREA continuously learns a discrete HMM on a fixed size non-
former, characterized by an increase of wind power, result from a                    overlapping moving window and, at each time period, uses the up-
rapid raise of wind speeds, which might (not necessarily) be due to                  dated HMM to predict ramp events. We introduce a forgetting mech-
low-pressure systems, low-level jets, thunderstorms, wind gusts, or                  anism to forget old wind regimes and to accommodate weather global
other similar weather phenomena. Downward ramps are due to a de-                     changes. The SHREA architecture has three main steps (see algo-
crease in wind power, which may occur because of a sudden deple-                     rithm pseudo-code in Algorithm 1): preprocessing phase, where a
tion of the pressure gradient, or due to very high wind speeds, that                 ramp filter and the SAX algorithm are used to translate real valued
lead wind turbines to reach cut-out limits (typically 22-25m/s) and                  signals into events/strings; learning phase, where a supervised strat-
shut down, in order to prevent the wind turbine from damage [4]. In                  egy is used to learn a HMM; and prediction phase, where the Viterbi
order to consider a ramp event, the minimum duration is assumed to                   algorithm is used to forecast ramp events. In the following lines we
be 1 hour in [11], although in [7] these events lie in intervals of 5 to             describe each one of these phases.
60 minutes. The magnitude of a ramp is typically represented by the
percentage of the wind farm’s nominal power - nameplate.                               3.1 Preprocessing In the preprocessing phase we translate the
   In [7] the authors studied the sensitivity of two ramp definitions                real-valued points occurring in a given time period d, i.e. occurring
to each one of the two parameters introduced above: ramp amplitude                   inside a non-overlapping fixed size window, into a discrete time-
ranging from 150 to 600MW and ramp duration values varying be-                       series suitable to be used at HMM learning and prediction time. First,
tween 5 and 60 minutes. The definition that we present and use in                    we fit a spline to both the wind power and wind speed measurements
this work is similar to the one described in [7]. It is more appropri-               time series obtaining, respectively, two new signals, Pds and Ods . We
ate to use in real operations since it does not considers a time-ahead               run the same procedure over J time series, a wind speed forecast, and
point to identify a ramp event.                                                      obtain Jds . We fit splines to the original data to remove high frequen-
                                                                                     cies that can be considered noisy data. Second, we run ramp defini-
Definition 1 A ramp event is considered to occur at time point t, the                tion one, presented above in Section 2, to filter the three smoothed
end of an interval, if the magnitude of the increase or decrease in the              signals and obtain three new signals: Pdf , Odf and Jdf . These signals
power signal is greater than the threshold value, the Pref :                         are wind power and speed variations, derivative alike signals, suitable
                                                                                     to identify ramp events. Third, we use a user-defined power varia-
                      |P (t) − P (t − ∆t)| > Pref                                    tion threshold, the input parameter Pref value, to translate the wind
                                                                                     power signal Pdf into a labeled time series Ld (1, . . . , r + a), where 1
   The parameter ∆t is related to the ramp duration and defines the                  is the first point of the time window, r is the forecast launch time and
size of the time interval considered to identify a ramp. In [11] some                a is the time horizon. We map each wind power variation into one of


                                                                                29
three labels/ramp events: ramp-up, ramp-down and no-ramp. These                      come less sensitive to new weather regimes. Thus we introduce a for-
three labels will be the three states of our HMM and the transitions                 getting strategy to update the HMM using only the most recent mea-
will be estimated using the points of the Ld time series.                            surements and forgetting the old data. This strategy relies on a thresh-
   At this point we already have the data needed to estimate the tran-               old that specifies the number of time periods to include in the HMM
sitions of the Markov process hidden in the HMM process. Now we                      estimation. This forgetting parameter, σ, is a user-defined value that
need to transform wind speed data into a format suitable to estimate                 can be set by experienced wind power technicians. Considering that
emission probabilities of the discrete HMM that we are learning. We                  at time period d we have read σ time periods and that we backup the
combine Piecewise Aggregate Approximation (PAA) and SAX algo-                        current counts into Aaux              aux
                                                                                                             count and Bcount temporary matrices. After
rithms [9] to translate the wind speed variations into symbolic time                 reading 2σ time periods we will use the following forgetting mecha-
series, more precisely. Thus, we normalize the two wind speed sig-                   nism: A2σ            2σ        aux          2σ         2σ
                                                                                              count = Acount − Acount and Bcount = Bcount − Bcount .
                                                                                                                                                       aux

nals and obtain Odn and Jdn signals. Odn will be used to estimate the                Then, we reset Aauxcount and B  aux
                                                                                                                     count equal  to the updated A 2σ
                                                                                                                                                   count and
HMM emission probabilities and the Jdn will be used as the ahead ob-                   2σ
                                                                                     Bcount   matrices, respectively. Next, to predict ramp events occurring
servations that will be used to predict ramp events. Next, we run the                in the time periods following 2σ, we will update and use the HMM
PAA algorithm in each one of these signals to reduce complexity and,                 parameters obtained from the A2σ                 2σ
                                                                                                                        count and Bcount to forecast ramp
again, obtain smoothed signals. The degree of signal compression is                  events. Every time we read a number of time periods that equals a
the W PAA parameter that is a user-defined parameter of SHREA.                       multiple of σ we apply this forgetting mechanism using the updated
This parameter is related with time point aggregation. Next, we run                  auxiliary matrices.
the SAX algorithm to map each PAA signal into string symbols. This
way we obtain two discrete signals Odstr and Jdstr . After the prepro-
                                                                                      3.3 Predict Ramp Events using the learned HMM In this step
cessing phase we have two discrete time series, Ld and Odstr that will
                                                                                     we use the HMM learned in time period d, the λd , and the string
be used to learn the HMM state transitions and emissions probabili-
                                                                                     Jdstr , obtained from wind speed forecasts, to predict ramp events for
ties, respectively.
                                                                                     the time points ranging from r to r + a. Remember that r is the
                                                                                     prediction launch time and a is the forecast horizon.
 3.2 Learn a Discrete HMM Here we explain how do we learn the                           To obtain the ramp event predictions we run the Viterbi algo-
HMM in the time period d, and then how we update it in time.                         rithm [12]. We feed this algorithm with Jdstr and λd and get the state
   In the HMM that we learn, compactly written λ(A, B, π), the                       predictions (the ramp events) Qdr+1 , . . . , Qdr+a for the time points
state transitions, the A parameter, are associated with wind power                   r + 1, . . . , r + a of time period d. Saying it in other way we obtain
measurements and the emissions probabilities, the B parameter, are                   predictions for the points occurring in a non overlapping time win-
associated with wind speed measurements. In Figure 2 we show a                       dow starting at r and with length equal to a. We will obtain the most
HMM learned by SHREA at the end of the 2010 winter. To estimate                      likely sequence of states that best explains the observations, i.e., we
these two parameters we use the ramp labels, Ld (1, . . . , r), and the              will obtain a sequence of states Qdr+1 , . . . , Qdr+a that maximizes the
wind speed mesurements signals, Odstr (1, . . . , r), and run the well-              probability P (Qdr+1 , . . . , Qdr+a |Jdr+1 , . . . , Jdr+a , λd ).
known and straightforward supervised learning algorithm described                       Regarding the π parameter, we introduce a non classical approach
in [12]. To estimate the transition probabilities between states, the                to estimate this parameter. We defined this strategy after observing
three-way matrix A, we count the transitions between symbols ob-                     that it is almost impossible to beat a ramp event forecaster that pre-
served in Ld (1, . . . , r) and compute the marginals to estimate the                dicts the ramp event occurring one step ahead to be the current ob-
probabilities. To estimate the emission probabilities for each state,                served ramp event. Thus, we set π to be a distribution having zero
the matrix B, we count, for each state, the observed frequency of                    probability for all events except the event observed at launch time,
each symbol and then use state marginals to compute the probabili-                   the r time point. In the pseudo code we write π ← (δ(Ld (r) ==
ties. This way, we obtain the maximum likelihood estimate of both                    rampDown), δ(Ld (r) == noramp), δ(Ld (r) == rampU p),)
the transitions and the emission probability matrices.                               where δ is a Dirac delta function defined by δ(x) = 1, if x is T RU E
   We now explain how to update the model in the time. We de-                        and δ(x) = 0, if x is F ALSE.
sign our framework to improve over the time with the arriving of
new data. At each time period d SHREA is fed with new data and
the HMM parameters are updated to include the most recent histor-                    4 Experimental Evaluation
ical data. At each time period d we update the HMM parameters by
counting the state transitions and state emissions coded in the cur-                 In this section we describe the configurations, the metrics
                 d
rent vectors Ostr  (1, . . . , r) and Ld (1, . . . , r), obtaining the number        and the results that we obtain in our experimental evaluation.
of state transitions and emissions at each HMM state, the Acount and                                                                                         b                    c
                                                                                                                                                                                               d
Bcount . Then, we compute the marginal probabilities of each matrix                                                                                     a           .02           .02
                                                                                                                                                                                  .53 .02                      e
and obtain the updated HMM, the model λd (Ad , B d , π d ) that will be
                                                                                                                                                             .02
                                                                                                                                                                                        .04


used to predict ramp events. The learned HMM, λd , will be used to                                                                        f
                                                                                                                                                                        ramp
                                                                                                                                                                         up                  .89
                                                                                                                                                                                                         f

                                                                                     Table          1:    Misclassification
predict ramp events occurring between r and r + a. In the next time                  Costs
                                                                                                                                  e
                                                                                                                                          .21
                                                                                                                                                .12
                                                                                                                                                      .94   .04
                                                                                                                                                                  .45
period (i.e. the next fixed sized time window) we will update the λd                                                          d
                                                                                                                                      .24          no
                                                                                                                                                                            .02        .02
                                                                                                                                                                                             .47
                                                                                                                                                                                                                   f
                                                                                                                                                  ramp             .02                             .02
HMM, using this same strategy but including also the transitions and                                          Observed
                                                                                                                                      .20
                                                                                                                                                             .51                      ramp               .02

emissions of the time period d that were not used to estimate λd , i.e.,
                                                                                                                                                                                      down                             e
                                                                                                         down no up               c       .14
                                                                                                                                                .09                                                .02
                                                                                      Predicted


                                                                                                  down      0     10 80
we update Acount and Bcount with the wind measurements of the                                      no      20      0   10             b           a                          .85        .08
                                                                                                                                                                                               .02
                                                                                                                                                                                                                   d

time period d occurring after d’s launch time and before d + 1 period                              up     100     30   0                                                a                            c

launch time, the r point. By using this strategy we continuously up-                                                                                                                    b


date the HMM to include both the most recent data and all old data.                                                                       Figure 2: Winter HMM
By using this strategy, and with the course of time, the HMM can be-


                                                                                30
                                    Table 2: KSS, SS and Expected Cost Mean and standard deviation for the last 100 days of the evaluation period

                                                                                                   SHREA                                                                                    Persistence
                                                       ∆t=1                                          ∆t=2                                         ∆t=3
                           Metric                                                                                                                                              ∆t=1            ∆t=2           ∆t=3
                                        phE=0          phE=1          phE=2          phE=0          phE=1          phE=2          phE=0          phE=1          phE=2
                           KSS       0.144(0.002)        –              –         0.332(0.001)         –             –         0.446(0.002)         –             –         0.144(0.002)    0.332(0.001)   0.446(0.002)
                  30 min    SS            0(0)           –              –             0(0)             –             –             0(0)             –             –               –              –               –
     Time ahead


                           ECost     3.129(0.016)        –              –         4.176(0.027)         –             –         4.04(0.019)          –             –          3.129(0.02)    4.176(0.03)     4.041(0.02)
                           KSS       0.152(0.001)   0.202(0.002)        –         0.278(0.001)   0.314(0.204)        –         0.369(0.001)   0.417(0.001)        –         0.127(0.009)    0.203(0.001)   0.343(0.001)
                  60 min    SS       0.028(0.001)   0.085(0.002)        –         0.094(0.00)    0.139(0.001)        –         0.038(0.001)   0.113(0.001)        –
                           ECost      2.312(0.18)   2.107(0.014)        –         3.860(0.39)     3.719(0.39)        –         4.374(0.61)     4.108(0.61)        –          8.731(0.99)    14.687(1.50)   16.104(1.63)
                           KSS       0.123(0.000)   0.185(0.001)   0.231(0.002)   0.193(0.001)   0.240(0.001)   0.296(0.002)   0.271(0.001)   0.316(0.001)   0.345(0.001)   0.101(0.001)    0.163(0.002)   0.258(0.002)
                  90 min    SS      0.0244(0.002)   0.093(0.001)   0.145(0.001)   0.035(0.001)   0.091(0.002)   0.159(0.001)   0.018(0.001)   0.079(0.001)   0.118(0.001)         –              –              –
                           ECost     2.089(0.013)   1.938(0.012)   1.807(0.010)   4.252(0.03)    4.028(0.025)   3.728(0.024)   5.165(0.025)   4.893(0.023)   4.677(0.025)   3.204 (0.030)   6.112(0.042)   6.783(0.050)


  4.1 Experimental Configuration Our goal is to predict ramp                                                          4.2 Results This work is twofold and here we present and ana-
events in a large-scale wind farm located in the US Midwest. To                                                      lyze both the descriptive and predictive performance of the SHREA
evaluate our system we collected historical data and, to make pre-                                                   framework.
dictions, use wind speed power predictions (NWP) for the time pe-                                                       In Figure 2 we present an example of HMM generated by SHREA
riod ranging between 3rd of June 2009 and 16th of February 2010.                                                     in February. This model was learned when running SHREA to pre-
Each turbine in the wind farm has a Supervisory Control and Data                                                     dict 90 minutes ahead events and setting ∆t = 2. This HMM has
Acquisition System (SCADA) that registers several parameters, in-                                                    three states, each state is associated with one ramp type, and each
cluding the wind power generated by each turbine and the measured                                                    state emits six symbols, each representing a discrete bin of the ob-
wind speed at the turbine, the latter are 10 minute spaced point mea-                                                served wind speed. The lower level of wind speed is associated with
surements. In this work we consider a subset of turbines and com-                                                    the a character and the higher level of wind speed is associated with
pute, for each time point, the subset mean wind power output and                                                     the f character. The labels in the edges show the state emissions and
the subset mean wind speed, obtaining two time series of measure-                                                    the state transition probabilities.
ments. The wind speed power prediction for the wind farm location                                                       The HMM models that we obtained in our experiments uncover
was obtained from a major provider. Every day we get a wind speed                                                    interesting ramp behaviors. If we consider all the data used in these
forecast with launch time at 6 am and having 24 hours horizon. The                                                   experiments, when we set ∆t = 1 we found that there were de-
predictions are 10 minute spaced point forecasts. In this work we                                                    tected 7% more ramp-up events than ramp-down events. When we
run SHREA to forecast ramp events occurring 30, 60 and 90 min-                                                       set ∆t = 3 we get the inverse behavior, we get 4% more ramp-
utes ahead, the a parameter. We start by learning a HMM using five                                                   downs than ramp-ups. This behavior is easily explained by the wind
days of data and then use the learned, and updated, HMM to gen-                                                      natural dynamics that causes steepest ramp-up events and smooth
erate predictions for each fixed size non overlapping time window.                                                   ramp-down events. If we analyze independently the four periods of
Moreover, we split the day in four periods and run SHREA to learn                                                    the day we can say that we have a small number of ramp events,
four independent HMM models: dawn, period ranging between zero                                                       both ramp-ups and ramp-downs, in the afternoon. If we compute the
and six hours; morning, period ranging between six to twelve hours;                                                  mean number of ramps, for all ∆t parameters we get approximately
afternoon, period ranging between twelve and eighteen hours; nigh,                                                   30%(15%) more ramp-up(ramp-down) events at night than in the af-
period ranging between eighteen and midnight. The last four models                                                   ternoon. Overall, we can say that we get more ramp events at night
were only used to give some insight on the ramp dynamics and were                                                    and, in second place, at the dawn period. Moreover, we can say that
not used to make predictions. We define a ramp event to be a change                                                  in the summer we get, both for ramp-up and ramp-down events, wind
in wind power production higher than 20% of the nominal capacity,                                                    speed distributions with higher entropy, we get approximately 85%
i.e., we set the Pref threshold equal to 20% of the nominal capacity.                                                of the probability concentrated in two observed symbols. Different
Moreover, we run a set of experiments by setting ∆t parameter equal                                                  from this behavior, in the winter we have less entropy in the wind
to 1, 2 and 3 time points, i.e., equal to 30, 60 and 90 minutes. We run                                              speed distribution associated with both types of ramp events. In the
SHREA using thirty minute signal aggregation, thus each time point                                                   winter we have approximately 91% of the probability distribution
represents thirty minutes of data. In these experiments we also con-                                                 concentrated in the one symbol. The emission probability distribu-
sider phase error corrections. Phase errors are errors in forecasting                                                tion of the ramp-down state is concentrated in symbol a and the emis-
ramp timing [5]. We identify events that occur in a timestamp, t, not                                                sion probability distribution in the ramp-down state is concentrated
predicted at that time, but predicted instead to occur in one, or two,                                               in symbol f. These two findings are consistent with our empirical
time periods immediately before or after t.                                                                          visual analysis and other findings [4]: Large wind ramps tend to oc-
   Furthermore, as SHREA is continuously updating the HMM, we                                                        cur in the winter and usually there is a rapid wind speed increase
set the forgetting parameter σ = 30, i.e., each time the system reads a                                              followed by a more gradual wind speed decrease. These findings are
new period of 30 days of data, the system forgets 30 days of old data.                                               also related with the average high temperature in the summer and
The amount of forgetting used in this work results from a careful                                                    with the stable temperatures registered during the afternoons. Con-
study of the wind patterns.                                                                                          sidering the ∆t parameter, we can say that the number of ramps,
   For this configuration we compute and present the Hanssen &                                                       both ramp-ups and ramp-downs, increase with the ∆t parameter. In
Kuippers Skill Score (KSS) and the Skill Score (SS) [1, 6]. More-                                                    general, we observe large ramps only when we compare time points
over, we compute the expected misclassification costs (EC) using the                                                 that are 20 to 30 minutes apart.
formula presented in [13]. The cost matrix presented in Table 1 de-                                                     As is illustrated in Figure 2 we identified a large portion of self-
fines the misclassification costs. We compare SHREA against a Per-                                                   loops, especially ramp-up to ramp-up transitions in the winter nights.
sistence baseline algorithm. Despite its simplicity, the predictions of                                              The percentage of self-loops range between 12%, when we run
this model are the same as the last observation, this model is known                                                 SHREA with ∆t = 1, and 55% when we set ∆t = 3. This self-loop
to be hard to beat in short-time ahead predictions [10].                                                             transition shows that we have a high percentage of ramp events hav-


                                                                                                                31
ing a magnitude of at least 40% of the nameplate, two times the Pref            classification cost scenario (see Table 2) and show that SHREA
threshold. Furthermore, in the winter we get a higher proportion of             produces valuable predictions. In this real scenario, SHREA gener-
ramp-up to ramp-down and ramp-down to ramp-up transitions than                  ates significant lower operational costs and better operational perfor-
in the summer. This is especially clear at the dawn and night periods.          mance than the baseline model.
This phenomena can be related with the difference in the average
temperatures registered in these time periods.                                  5 Conclusions and Future Work
   Before presenting the forecast performance, it must be said that
the quality of ramp forecasting depends a great deal on the quality of          In this work we obtained some insights on the intricate mechanisms
meteorological forecasts. Moreover, as the HMMs represent proba-                hidden in the ramp event dynamics and obtain valuable forecasts for
bility distributions it is expected that SHREA will be biased to predict        very short-time horizons. For instance, we can now say that steepest
no-ramp events. Typically SHREA over predicts no-ramp events but                and large wind ramps tend to occur more often in the winter. More-
makes less severe errors. This biased behavior of SHREA is an ac-               over, typically there is a rapid wind speed increase followed by a
ceptable feature since it is better to forecast a no-ramp event when            more gradual wind speed decrease. Overall, with the obtained HMM
we observe a ramp-down(ramp-up) event than predicting a ramp-                   models we both obtained insights on the wind ramp dynamics and
up(ramp-down) event. In real wind power operations (see Table 1)                generate accurate predictions that prove to be cost beneficial when
the cost of the later error is several times larger than the former er-         compared against a Persistence forecast method.
rors.                                                                              The performance of SHREA is heavily dependent on the wind
   In Table 2 we present the mean (inside brackets we present the as-           speed forecasts quality. Thus, in a near future we hope to get spe-
sociated standard deviation) KSS, SS and Expected Cost metrics that             cial purpose NWP suitable to detect ramp events and having more
we obtained when running SHREA, and the reference model, to pre-                frequent daily updates. Moreover, we will study multi-variate HMM
dict ramp events occurring in the last hundred days of the evaluation           emissions to include other NWP parameters like wind direction and
period.                                                                         temperature.
   Before presenting a detailed discussion of the obtained results, we
must say that, if we consider the same ∆t parameter, in all exper-                Acknowledgments: This manuscript has been created by
iments we obtained better, or equal, results than the baseline algo-            UChicago Argonne, LLC, Operator of Argonne National Laboratory
rithm, the Persistence algorithm. Moreover, we must say that when               (“Argonne“). Argonne, a U.S. Dep. of Energy Office of Science lab-
we generate predictions for the 30 minute horizon (one time point               oratory, is operated under Contract No. DE AC02-06CH11357. The
ahead, since we use 30 minutes aggregation) we get the same results             authors also acknowledge EDP Renewables, North America, LLC.
as the Persistence model. This phenomena is related with the strategy           This work was is also funded by the ERDF - through the COMPETE
that we used to define the HMM initial state distribution. Remember             programme and by National Funds through the FCT Project KDUS.
that we set the HMM π parameter equal to the last state observed.
   As expected, the KSS results worsen with the increase of the time            REFERENCES
horizon. It is well known that the forecast reliability/fit worsens as
                                                                                 [1] K.T. Bradford, R.L. Carpenter, and B. Shaw, ‘Forecasting southern
the distance from the forecast launch time increases. Moreover we                    plains wind ramp events using the wrf model at 3-km’, in AMS Student
can say that we obtained better KSS values for the morning period                    Conference, (2010).
than in the other three periods of the day. For lack of space we do not          [2] C. Ferreira, J. Gama, V. Miranda, and A. Botterud, ‘A survey on wind
present a detailed description of the results that we obtain when we                 power ramp forecasting’, in Report ANL/DIS 10-13, Argonne National
                                                                                     Laboratory, (2010).
run SHREA to predict ramp events occurring in each one of the four               [3] U. Focken and M. Lange, ‘Wind power forecasting pilot project in al-
periods of the day. This can be related with the wind speed forecasts                berta’, Oldenburg, Germany: energy & meteo systems GmbH, (2008).
launch time. The wind speed forecast that we use in this work is                 [4] J. Freedman, M. Markus, and R. Penc, ‘Analysis of west texas wind
updated every day at 6 am.                                                           plant ramp-up and ramp-down events’, in AWS Truewind, LLC, Albany,
   The analysis of the ∆t parameter shows that the mean KSS val-                     NY, (2008).
                                                                                 [5] B. Greaves, J. Collins, J. Parkes, and A. Tindal, ‘Temporal forecast un-
ues increase with the increase in the ∆t value. Again, this can be                   certainty for ramp events’, Wind Engineering, 33(11), 309–319, (2009).
explained by the wind patterns, typically the wind speed increases               [6] A.W. Hanssen and W.J.A. Kuipers, ‘On the relationship between the
smoothly during more than 30 minutes. In Table 2 we can see clearly                  frequency of rain and various meteorological parameters’, Mededelin-
that SHREA performance improves with the increase in ∆t param-                       gen van de Verhandlungen, 81, (1965).
                                                                                 [7] C. Kamath, ‘Understanding wind ramp events through analysis of his-
eter. We observe the same behavior when inspecting the results that                  torical data’, in IEEE PES Transmission and Distribution Conference
we obtained by running the Persistence algorithm. Concerning the                     and Expo, New Orleans, LA, United States, (2010).
SS, we can see that we obtain improvements over the Persistence                  [8] A. Kusiak and H. Zheng, ‘Prediction of wind farm power ramp rates: A
forecast that ranges between 0% and 16%.                                             data-mining approach’, J. of Solar Energy Engineering, 131, (2009).
   Concerning the phase error technique, we get important improve-               [9] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, ‘A symbolic representation
                                                                                     of time series, with implications for streaming algorithms’, in 8th ACM
ments for the two phase error parameter values considered in this                    SIGMOD Workshop on Research Issues in Data Mining and Knowledge
study. The amount of improvement that we obtained by considering                     Discovery, San Diego, CA, (2003).
the phase error can be valuable in real time operations. The techni-            [10] C. Monteiro, R. Bessa, V. Miranda, A. Botterud, J. Wang, and
cians can prepare the wind farm to deal with a nearby ramp event.                    G. Conzelmann, ‘Wind power forecasting: State-of-the-art 2009’, in
                                                                                     Report ANL/DIS 10-1, Argonne National Laboratory, (209).
In Table 2 we present the results without considering the phase error           [11] C. W. Potter, E. Grimit, and B. Nijssen, ‘Potential benefits of a dedi-
technique, phE = 0, and considering one time point (30 minutes),                     cated probabilistic rapid ramp event forecast tool’, IEEE, (2009).
phE = 1, and two time points (60 minutes), phE = 2, phase errors                [12] L.R. Rabiner, ‘A tutorial on hidden markov models and selected appli-
corrections.                                                                         cations in speech recognition’, Proceedings of the IEEE, 77(2), (1989).
   We also introduce a misclassification cost analysis framework that           [13] A. Srinivasan, ‘Note on the location of optimal classifiers in n-
                                                                                     dimensional roc space’, in Oxford University Technical Report PRG-
can be used to quantify the management decisions. We define a mis-                   TR-2-99, Oxford, England, (1999).


                                                                           32

</pre>