=Paper=
{{Paper
|id=Vol-1663/bmaw2016_paper_6
|storemode=property
|title=The Efficacy of the POMDP-RTI Approach for Early Reading Intervention
|pdfUrl=https://ceur-ws.org/Vol-1663/bmaw2016_paper_6.pdf
|volume=Vol-1663
|authors=Umit Tokac,Russell G.Almond
|dblpUrl=https://dblp.org/rec/conf/uai/TokacA16
}}
==The Efficacy of the POMDP-RTI Approach for Early Reading Intervention==
<pdf width="1500px">https://ceur-ws.org/Vol-1663/bmaw2016_paper_6.pdf</pdf>
<pre>
        The Efficacy of the POMDP-RTI Approach for Early Reading
                               Intervention


                       Umit Tokac                                              Russell G. Almond
               Educational Psychology and                                  Educational Psychology and
                    Learning Systems                                            Learning Systems
                 Florida State University                                    Florida State University
                 Tallahassee, FL 32306                                       Tallahassee, FL 32306
                    ut08@my.fsu.edu                                             ralmond@fsu.edu


                        Abstract                              Mastropieri, Scruggs, and Graetz (2003) argued that
                                                              reading is the main problem for most students with
                                                              learning disabilities.
     A POMDP is a tool for planning: selecting a
     policy that will lead to an optimal outcome.             Torgesen (2004) asserts that reading consists of five
     Response to intervention (RTI) is an approach to         components: phonological awareness, phonological
     instruction, where teachers craft individual plans       decoding,      fluency,     vocabulary,    and    reading
     for students based on the results of progress            comprehension. According to the Simple View Theory of
     monitoring tests.      Current practice assigns          Reading Development (Gough & Tunmer, 1986) for
     students into tiers of instruction at each time          children at young ages, mastery of the first two
     point based on cut scores on the most recent test.       components, phonological decoding and phonological
     This paper explores whether a tier assignment            awareness, generate the remaining three reading
     policy determined by a POMDP model in a RTI              components: fluency, vocabulary, and reading
     setting offer advantages over the current practice.      comprehension. A lack of either phonological decoding
     Simulated data sets were used to compare the
                                                              or phonological awareness affects the other components
     two approaches; the model had a single latent
                                                              and causes reading difficulties. Because the development
     reading construct and two observed reading
                                                              of reading skills is critical, instructors should identify
     measures: Phoneme Segmentation Fluency (PSF)
     for phonological awareness and Nonsense Word             children with reading difficulties and provide additional
     Fluency (NWF) for phonological decoding. The             instructional support (Catts, Hogan & Fey, 2003).
     two simulation studies compared how the
     students were placed into instructional groups           Response to intervention (RTI) is an educational
     using the two approaches, POMDP-RTI and                  framework designed to identify students with difficulties
     RTI. This paper explored the efficacy of using a         in reading and math, and intervene as early as possible by
     POMDP to select and apply appropriate                    providing more intensive instruction for students who
     instruction.                                             need it. The RTI approach divides instruction into Tiers;
                                                              each tier includes different intervention or instruction.
                                                              The RTI process starts with screening tests which monitor
1.   INTRODUCTION                                             general knowledge and skills of all students in the class.
                                                              The screening tests are administered on multiple
Statistics gathered by local school districts reflect that    occasions during a school year. The screening test results
roughly 30% of their first-grade students read below          provide teachers with a rough estimate of each student’s
grade level standards (Matthews, 2015). Moreover,             proficiency that guides the assignment of students into
Landerl and Wimmer (2009) reported that 70% of                appropriate tiers of instruction. RTI has produced good
struggling readers in first grade continued to struggle in    results in both research and operational settings, and
eight grade when no intervention was provided.                hence is considered to be one of the evidence-based


                                                   BMAW 2016 - Page 36 of 59
practices for improving reading and preventing learning        based on their observed score in the current-time only-
disabilities (Greenwood et al., 2011).                         RTI model. The initial value of the parameters were based
                                                               on a longitudinal Florida Center for Reading Research
Ideally, the placement into Tiers of students in an RTI        (FCRR) study of reading proficiency (Al Otaiba et al,
program would be based on their unobservable true              2011) and data sets were simulated based on the Almond
proficiency. As this is unobservable, the placement            (2007) model in order to produce realistic data for
decision is instead made basis of the estimates of             answering the research question posed above.            The
proficiency from screening tests. Often in current             parameters of the simulation were chosen so that the
practice this is implemented through a cut score on the        distribution of scores on the screening test were similar to
most recent screening test. Naturally, a certain amount of     those of the Al Otaiba et al. study at both the initial and
measurement error causes some students to be placed            final measurement period.
incorrectly.    Considering the entire (both students’
previous screen-tests results and changes in instruction)      2.1 THE POMDP-RTI FRAMEWORK
history in account should improve the proficiency
estimates performance. Almond (2007) suggested that this       Almond (2007) describes a general mapping of a POMDP
could be done using a partially observed Markov decision       into an educational setting. It is assumed that the
process (POMDP) — partially observed, because the true         student’s proficiency is measured at a number of
student proficiency is latent; a decision process, because     occasions. The latent proficiencies of the students is the
the instructors decide what instruction or intervention to     hidden layer of the POMDP model. The actual test scores
use between measurement occasions.                             are the observable outcomes, and the instructional options
                                                               for the teacher between measurement occasions are the
A POMDP is a probabilistic and sequential model. A             action space. The utility is assumed to be an increasing
POMDP can be in one of a number of distinct states at          function of the latent proficiency variable at the last
any point in time, and its state changes over time in          measurement occasion; thus, it is finite time horizon
response to events (Boutilier, Dean & Hanks, 1999). One        model.
noteworthy difference between a RTI approach and a
POMDP model is that most RTI approaches use only the           Figure 1 show a realization of an RTI program in this
latest test results to identify students’ proficiencies and    framework. The nodes marked R represent the latent
assign them to appropriate tier (Nese et al., 2010). We call   student proficiency as it evolves over time. At each time
the approach the current-time only-RTI model. On the           slice, there is generally some kind of measurement of
other hand, a POMDP-RTI model is the combination of a          student progress represented by the observable outcomes,
periodically applied screening test, and the RTI into a        Phoneme Segmentation Fluency (PSF) for phonological
POMDP model. Additionally, a POMDP considers the               awareness and Nonsense Word Fluency (NWF) for
students’ entire histories (both actions and test scores)      phonological decoding. Tiers are instructional tasks
when determining appropriate interventions at in order to      chosen by the instructor and applied during time slices.
identify their current abilities and forecast their future     Note that in an RTI implementation, Tier 1 refers to
abilities under competing policies. Therefore, a POMDP-        whole class instruction given to all students, while Tier 2
RTI model should perform better than current-time only-        is small group supplemental instruction generally given
RTI model.                                                     only to the students most at risk. Students in Tier 2 are
                                                               given the Tier 1 instruction as well.
To test the last assertion, this paper compares the
POMDP-RTI model with the current-time only-RTI,
evaluating the predictive accuracy of each model, the
quality of the instructional plans produced and the reading
levels achieved at the end of the year. It does this through
simulation studies based on numbers obtained from fitting
the POMDP model to a group of kindergarten students in
an earlier RTI study (Al Otaiba, Connor, Folsom,
Greulich, Meadows, & Li, 2011).

2.   METHOD                                                                Figure 1: The POMDP-RTI model
Two simulated datasets were used in order to address how       The Figure 1 was designed based on evidence-centered
properly students are assigned to each tier based on their     assessment design (ECD; Mislevy, Steinberg, & Almond,
latent reading score in the POMDP-RTI model compared           2003) we call this an evidence model. In general, both the


                                                   BMAW 2016 - Page 37 of 59
proficiency variables at Measurement Occasion m, Rm,              assumed to depend on the tier assignment.              Thus, for
and the observable multivariate outcome variables are             measurement occasion m > 1,
PSFm and NWFm on that occasion. Extending the ECD
terminology, Almond (2007) calls the model for the Rm's,                  Rnm = Rn(m-1) + γa(n,m) ΔTm + ηnm,             (1)
the proficiency growth model. Following the normal logic
of POMDPs this is expressed with two parts: the first is                       where       ηnm ~N(0, σa(n,m)�∆𝑇𝑇𝑚𝑚 ),
the initial proficiency model, which gives the population
                                                                  and where ΔTm represents the elapsed time period
distribution for proficiency at the first measurement
                                                                  between measurement occasions m and m-1 for Tier 1 and
occasion. The second is an action, which gives a
                                                                  Tier 2. In this study, each school year was equal to 1, and
probability distribution for change in proficiency over
time that depends on the instructional activity chosen            ΔTm was fixed and equal to 1/M (e.g. M = 3, so             ).
between measurement occasions.                                    The parameter γa(n,m) is a tier-specific growth rate and it
                                                                  was fixed and had two different initial values for each
There are two notable differences between the POMDP               tier. We set γam = 0.9 for Tier 1, and γam = 1.2 for Tier 2.
models used in this application and those commonly seen           The residual standard deviation, σa(n,m)�∆𝑇𝑇𝑚𝑚 , depends on
in the literature. First, the models have a fixed and finite      both a tier-specific rate, σa(n,m), and the length of time,
time horizon, with the reward occurring only at the last          ΔTm, between measurements (thus, growth is occurring
time step (although the actions at each step have a cost          via a non-stationary Brownian motion process). The
which is subtracted from the reward). This removes the            standard deviation of the growth per unit time, σa(n,m), was
need for the usual discounting of future rewards. The             fixed to 1 for both tiers.
second is that the Markov process in non-stationary (it is
hoped that the student’s abilities will improve over time).       2.1.2    Evidence Model
This produces a potential identifiability issue, as growth is
                                                                  The evidence model involved two independent
difficult to distinguish between difficulty shifts in the
                                                                  regressions, one for each observed variable i. These two
measurement instruments (Almond, Tokac & Al Otaiba,
                                                                  observable variables were chosen because they are critical
2012). Assuming that the screening tests have all be
                                                                  reading components for later reading performance in the
equated, hence are on the same scale, takes care of the
                                                                  first two years of elementary school (Rock, 2007). Let
identification issue. An alternative approach would be to
                                                                  Ynmi be the observation for individual n at measurement
subtract the expected growth from the model, making the
                                                                  occasion m on observed variable i of the proficiency
latent proficiency variable represent deviations from the
                                                                  variables, then:
expected growth model (Almond, et al., 2014).
                                                                                            Rn0 ~ N(0,1)
2.1.1    Proficiency Growth Model                                                  Ynmi = ai + biRnm +𝜀𝜀𝑛𝑛𝑚𝑚𝑛𝑛 ,         (2)

The model from which the data was simulated was a                                        𝜀𝜀𝑛𝑛𝑚𝑚𝑛𝑛 ~ N(0, ωi).
unidimensional model of reading with a single latent,             The reliability of the instruments can be used to determine
continuous variable: Rnm, the reading ability of individual       b and ω. The reliability of an observed variable i at any
n on measurement occasion m. In this case, N was 300              time point was represented as ri. In classical test theory,
students and M represented the three equally spaced time          the reliability is the squared correlation coefficient
points, t1, t2, t3. (RTI screening tests are typically given 3    between the true score and the observed score of the
times per year.)                                                  student. This definition translates into an equation as
                                                                                   ri = 1- (Varn(ϵnmi)/ Varn(Ynmi))
This study assumed that a teacher provided general
instruction to all the students until the first time point, t1,   where Varn(.) indicates that the variance comes from
and that the initial ability distribution was normal,             individuals (where measurement occasion and instrument
R0 ~ N(0,1). As this is a purely latent variable, the scale       are considered as constant). Then
and location is arbitrary. Fixing the initial population to
have a standard normal distribution establishes the scale.                            bi = 𝜎𝜎𝑌𝑌𝑖𝑖 /𝜎𝜎𝑅𝑅𝑖𝑖 *√𝑟𝑟 2   and

                                                                                         ωi =𝜎𝜎𝑌𝑌𝑖𝑖 *√1 − 𝑟𝑟 2
After analyzing the results of assessments administered
at t1, the teacher delivered additional and more intensive        In order to make ri = .45 at each time point, tm, for the
instruction to students who were assigned to Tier 2, but          measurement of each skill on observed variable i, bi = .98
delivered only general instruction to students in Tier 1.         and ωi = .65 was used at tm.        These numbers are
The tier to which student n is assigned at time m is              comparable to reading measures commonly used with 1st
represented by a(n,m). The growth rate for the students is        grade students. At this point, the model is very close to


                                                     BMAW 2016 - Page 38 of 59
the model described in Almond, Tokac and Al Otaiba                represents the model cost of taking action or activity a in
(2012), except that the previous work assumed all                 state s, where k is a constant used to put the cost function
students were in the same Tier. Appropriate values for a          on the same scale as the utility function. In this study, the
and b depend on the scale of the instruments chosen. The          cost value was fixed at c(Tier 2) = 0.1 and c(Tier 1) = 0.
values used in the simulation were chosen so that the
                                                                  The utility function is
mean and standard deviation of the simulated data
matched the data set from Al Otaiba et al. (2011) at the                        u(RM) = logit-1(α(RM -β)).      (4)
first and last time points.
                                                                  In this equation α and β are fixed parameters; β is a
2.1.3    Decision Rules                                           proficiency target, which is on the scale of the internal
                                                                  latent variable RM. Specifically β = 0.5 for Tier 1 and β=
The key research question compares the performance of             0.1 for Tier 2. Also, α is a slope parameter, and α= 0.8 for
the system under two different policies. The first is a           both Tier 1 and Tier 2. High values of α favor bringing
fixed decision rule implicit in the current-time RTI policy:      students near proficiency standards above the proficiency
Students who are below a cut-score on either of the two           target β, while low values of α give more weight to
screening tests are placed into Tier 2 instruction. The           enriching students at the high end of the scale and
second policy is the optimal policy found by solving the          providing remediation at the low end of the scale
POMDP. Implementing this policy requires an explicit              (Almond & Tokac, 2014).                (Almond & Tokac
specification of the utility function and the cost function       alternatively recommend using a probit function in place
for the instructional options.                                    of a logit, so that α becomes effectively a standard
                                                                  deviation; however, the as the shape of the logit and
Many RTI implementations used the reference score                 probit curves are so similar, we expect the results using a
(general class median score or some other percentile rank)        probit curve would be similar as well.)
as a cut score for assigning each student to either the
Tier 1 or Tier 2 group. The simulated model used                  In this case, the total reward is u(RM) – c(a(s,2)) –
different Tier 2 for each of the two screening tests (NWF         c(a(s,3)). The difference between the utility function and
and PSF) giving four possible Tier assignments. For               the cost function is the total reward for getting the student
instance, if a student’s score on the NWF test is lower           to proficiency level Tier 1 using instruction a(s,2) and
than the cut score for NWF but higher for PSF, the                a(s,3) between measurements 1 and 2, and 2 and 3. The
student was assigned to Tier 2 for NWF and Tier 1 for             reward is the basis for the assignment of each student to
PSF. (This differs slightly from the common practice              Tier 1 or Tier 2. The POMDP model forecasts the
which would put students who fail to meet the cut on              expected reward, and balances that with cost during each
either measure into a single Tier 2.)                             period.

The POMDP forecasts expected learning under each                  2.2 SIMULATION DESIGN
possible outcome and assigns students to tiers in a way           The initial value of the simulated data student distribution
that balances the expected learning gains with the cost of        at time 0 was based on the FCRR data set (Al Otaiba,
instruction. The utility function is the expected gain at the     2007). In the FCRR data, the correlation between NWF
last time point and the cost function is the sum of costs of      and PSF was .65. The simulation generated latent
applied instruction at each state. The benefit is always          proficiency variables for each simulee, and simulated
higher for Tier 2, as is the cost. However, the cost exceeds      scores on the reading scores on the NWF and PSF test
the utility of the benefit for some regions of the                administered at t1, t2 and t3 in the model. At each time
distribution because the utility is nonlinear, while for          point, the correlation coefficient between NWF and PSF
other regions it does not.                                        was around 0.65 and the same growth and measurement
                                                                  error residuals were used for both the POMDP-RTI and
The contact hours with the instructor drive the cost of           current-time only-RTI models.
each block. Cost is high for more intensive instruction in
Tier 2, and, without loss of generality, it is zero for Tier 1,   The proficiency growth model and evidence model
as all students receive Tier 1 instruction. The cost              parameters were estimated from the simulated data
function consists of three components: the frequency with         through Markov Chain Monte Carlo (MCMC) simulation
which the group meets, fa, the duration of the meeting            using JAGS (Plummer, 2003). Four independent Markov
time, da, and size of the group, ga (Almond & Tokac,              chains with random starting positions were used with
2014). Then                                                       500000 iterations. This is consistent with standard
                                                                  practice (Gelman, Carlin, Stern & Rubin, 2004; Neal,
           c(a) = k fa da/ga ,                   (3)


                                                       BMAW 2016 - Page 39 of 59
2010).    Tokac (2016) describes tests done for                                      Number of Non-Matching Students
convergence and parameter recovery with this model.                                              Time 3
                                                                         Tiers      POMDP - RTI     Current - Time RTI
3.    RESULTS                                                             1-1           49                  20
Data were simulated for students under two different                      1-2           38                  36
policies, (1) current-time only-RTI policy where students                 2-1           42                  40
are assigned to Tier 1 or Tier 2 based on a cuts scores on                2-2           22                  55
the PSF and NWF tests at the most recent time point, and
(2) a POMDP-RTI policy where each student is assigned               Thus, there is a fair bit of difference in the placement, but
to the tier that maximizes the expected utility for that            which placement is better? As this is a simulation
student. This resulted in two different simulated series:           student, the true abilities are known it should be possible
   ˇ was the true reading ability under the current-time            to determine an ideal placement based on the known
𝑅𝑅𝑛𝑛𝑚𝑚
                                                                                                                     ˇ and 𝑅𝑅^
                                                                    simulated abilities. However, the abilities, 𝑅𝑅𝑛𝑛𝑚𝑚       𝑛𝑛𝑚𝑚 ,
only cut score policy and 𝑅𝑅^   𝑛𝑛𝑚𝑚 was the true reading ability
under the POMDP-RTI policy. Note that the two                       are different in the two branches of the assessment
simulations used the same residuals in equation (1)                 (because a different policy was actually employed).
(growth residual ηnm) and equation (2) (measurement error           Therefore, the ideal placements will be different under
                                                                    each policy.
𝜀𝜀𝑛𝑛𝑚𝑚𝑛𝑛 ). Thus, they differed only by the value of the growth
rate parameter, γa(n,m) , used in equation (1).                     In determining the ideal placement, the two mixed
                                                                    assignments, 1-2 and 2-1, were combined into a single
Table 3: Comparison of the number of PSF and NWF                    mixed tier. Cut scores on the latent ability variable were
scores between tiers categorized by cut scores or POMDP             calculated based on the utilities in equations (3) and (4)
estimates                                                           and a single growth step after the last measurement: the
                                                                    students with abilities higher than 0.1 should be placed
                                                                    into Tier 1, those lower than -0.4 into Tier 2 and students
Method      Tier         PSFt2    NWFt2      PSFt3    NWFt3         in between into the Mixed Tier. Both policies used the
            Tier 1       150      149        181      181           same cut points for determining the ideal placement, but
POMDP                                                               because the abilities were different, the actual ideal
            Tier 2       150      151        119      119
            Tier 1       150      149        150      150           placement could be different for the two students under
Cut
                                                                    the same policy at Time 3.
Score       Tier 2       150      151        150      150
                                                                    Table 5 presents the number of students placed in each
                                                                    tier under the actual and ideal placements under both
Table 3 shows the pattern of Tier assignment under the
                                                                    policies. It also presents a measure of agreement which is
two models. At the second time point, the two policies
                                                                    the number of students assigned to that tier in the ideal
behave roughly the same assigning the lowest performing
                                                                    placement that were actually assigned to the Tier. The
50% of students to Tier 2. However, at the third time
                                                                    POMDP-RTI does well under that metric, with all of the
point, substantially fewer students are assigned to Tier 2
                                                                    students who should be placed into Tier 1 or 2 correctly
under the POMDP-RTI policy. This might be a result of
                                                                    placed in that tier. This policy only had problems with
better placement policies, or simply that the Tier 2
                                                                    the mixed tier, with 35% of the students being incorrectly
support is less needed in the latter part of the school year.
                                                                    placed in Tier 1 or Tier 2.

Table 4 breaks down the differences between the two                 The current-time only-RTI policy did not fare as well.
policies at time point 3. Recall that the students were             First, note that under the ideal placement for this policy
classified into Tiers independently based on the PSF and            fewer students would be in the high-performing Tier 1
NWF measures, resulting effectively in four different               group. This is likely due to incorrect assignment at
classifications: 1-1 (both in Tier 1), 1-2, 2-1 (mixed), and        Time 2. Next, note that agreement rates are lower. So the
2-2 (both Tier 2). Table 4 shows the number of students             POMDP-RTI model did better on two important metrics.
who were classified into one of the four groups who were            To summarize the agreement numbers, we used Goodman
classified into a different group by the other policy.              and Kruskall’s lambda (Almond, Mislevy, Steinberg,
Slightly over half (151) students were assigned different           Yan, and Williamson, 2015). Usually, this adjusts the
instruction under the different policies.                           raw agreement rate by subtracting out the agreement with
                                                                    a classifier which simply classifies everybody at the
Table 4: Comparison of POMDP-RTI and Current-Time                   modal category (which would be the mixed tier for both
only-RTI models                                                     policies). However, Tier 1 has a special meaning in the
                                                                    context of RTI; Tier 1 is the normal whole-class
                                                                    instruction that is given regardless of the test score.


                                                       BMAW 2016 - Page 40 of 59
Table 5. Agreement between ideal and actual placement                     may have been influenced by the use of the same utility
under POMDP-RTI.                                                          model used in the POMDP to define ideal placement.
                                                                          The cut-score approach currently in common use does
                                    POMDP-RTI Placement                   have one clear advantage over the POMDP model: it is
 Ideal Placement


                               Tier 1 Mix Tier Tier 2 Total               simpler to implement and explain. However, if the
                     Tier 1     118      0        0     118               POMDP recommendations were integrated into an
                    Mix Tier     18     90       30     138               electronic gradebook, it might be better received by
                     Tier 2       0      0       44      44               teachers. However, while teachers may not feel the need
                     Total      136     90       74     300               for the POMDP software to address the Tier 1/Tier 2
                                                                          placement, there is another aspect of the RTI framework
                                                                          which was not addressed in this study. During Tier 2,
Table 6. Agreement between ideal and actual placement                     students receive regular progress monitoring assessments,
under current-time only RTI.                                              and the teacher is supposed to be making fine-grained
                                                                          adjustments if the student is not responding to the
                               Current-Time only-RTI Placement            intervention (hence the name response-to-intervention).
  Ideal Placement


                               Tier 1 Mix Tier Tier 2 Total               In particular, the teachers can adjust the intensity of the
                     Tier 1      72       17        0      89             intervention (equation 3) adding more time on task if
                    Mix Tier     35       58       50     143             needed, or using less support if the teacher is appearing to
                     Tier 2       0       11       57      68             do well. This is a target of opportunity for the POMDP
                     Total      107       86      107     300             model, as teachers have responded favorability to the idea
                                                                          of computer support to help them with tracking and
                                                                          intervention adjustment for Tier 2 students. 1 The present
Therefore, by using Tier 1 as the baseline in lambda, the                 work shows that POMDPs are a promising approach to
result is a statistic that describes how much better the RTI              this problem.
is performing than undifferentiated whole class
instruction. Let ki be the number of students correctly                   Another limitation of the current work is that it assumes
classified into Tier i, and let kTier1 be the number of                   all students grow at the same rate under each of the
students who should ideally be assigned to Tier 1. Then                   instructional conditions (e.g., given the tier placement).
                                                                          In practice, many studies looking at RTI have found that
                                       ∑𝑖𝑖 𝑘𝑘𝑖𝑖 −𝑘𝑘𝑇𝑇𝑖𝑖𝑇𝑇𝑇𝑇1              students grow at different rates, with a low growth rate
                                  λ=                                      often corresponding to low initial ability. 2 While this
                                         𝑁𝑁−𝑘𝑘𝑇𝑇𝑖𝑖𝑇𝑇𝑇𝑇1
                                                                          adds complexity to the model, we think that the POMDP
Like a correlation coefficient, the value of lambda ranges                framework will help educators make optimal policy
between -1 and 1, with 0 representing a classifier which                  decisions with this additional information.
does no better than simply assigning everybody to the
model category. If it is 1, it means that the policy did a
perfect job of assigning students to the ideal tier. Using                Acknowledgements
the data in Table 5, λ = 0.74 for POMDP-RTI, λ = 0.51
for Current-time only RTI. So RTI does better than                        We would like to thank the Florida Center for Reading
undifferentiated instruction, but the POMDP-RTI policy                    Research for allowing us access to the data used in this
also does better than the current-time only-RTI.                          paper. The data were originally collected as part of a
                                                                          larger National Institute of Child Health and Human
                                                                          Development Early Child Care Research Network study.
4.                  CONCLUSION
As expected, a policy produced by a POMDP (which is                       References
designed to produce optimal policies) performed better
than current-time only cut-score policy current used in                   Almond, R. G. (2007). Cognitive modeling to represent
many RTI implementations. In particular, the POMDP-                              growth (learning) using Markov decision
RTI had a better agreement with the ideal placement (λ =                         processes. Technology, Instruction, Cognition
0.74) than the current-time only model did (λ = 0.51).                           and Learning (TICL), 5, 313-324. Retrieved
The likely reason for the better performance is that the
POMDP model is better able to use the entire student
record, both the history of assessments and instruction                   1
                                                                              Joe Nese, U. Oregon, private communication. May 16, 2016.
and multiple tests taken at the same time to build a more                 2
                                                                              Young-Suk Kim, Florida State University. Private communication.
accurate estimate of student proficiency, although some
                                                                               March 31, 2016.


                                                               BMAW 2016 - Page 41 of 59
       fromhttp://www.oldcitypublishing.com/TICL/TI             Gough, P. B., & Tunmer, W. E. (1986). Decoding,
       CL.html                                                         reading, and reading disability. Remedial and
Almond, R. G. (2011). Estimating Parameters of Periodic                Special Education, 7, 6–10.
       Assessment Models (Repot No. RM-11-06).                  Greenwood, C. R., Bradfield, T., Kaminski, R., Linas, M.,
       Educational Testing Service. Retrieved from                     Carta, J. J., & Nylander, D. (2011). The
       http://www.ets.org/research/policy_research_rep                 Response to Intervention ( RTI ) Approach in
       orts/rm-11-06.pdf                                               Early Childhood. Focus on Exceptional
Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D.,              Children, 43(9), 1–24.
       & Williamson, D. M. (2015). Bayesian Networks
       in Educational Assessment. Springer.                     Landerl K & Wimmer H. (2008) Development of word
Almond, R., Goldin, I., Guo, Y., & Wang, N. (2014).                      reading fluency and spelling in a consistent
       Vertical and Stationary Scales for Progress                       orthography: An 8-year follow-up. Journal of
       Maps. In J Stamper, Z Pardoz, M Mavrikis, & B.
                                                                         Educational Psychology. 100(1):150–161.
       M. McLaren (Eds.), Proceedings of the 7th
                                                                Mastropieri, M. A., Scruggs, T. E., & Graetz, J. E. (2003).
       International Conference on Educational Data
       Mining, London, England. Society for                              Reading       comprehension       instruction    for
       Educational Data Mining. 169—176. Retrieved                       secondary students: Challenges for struggling
       from                                                              students and teachers. Learning Disability
       http://educationaldatamining.org/EDM2014/uplo                     Quarterly, 26(4), 103-116.
       ads/procs2014/long%20papers/169_EDM-2014-                Matthews, E. (2015). Analysis of an Early Intervention
       Full.pdf                                                          Reading Program for First Grade Students.
                                                                         Retrieved                                      from
Almond, G. R., Tokac, U., & Al Otaiba, S. (2012). Using                  http://scholarworks.waldenu.edu/cgi/viewcontent
         POMDPs to Forecast Kindergarten Students'
                                                                         .cgi?article=1395&context=dissertations
         Reading Comprehension. In Agosta, J. M.,
                                                                Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003).
         Nicholson, A., & Flores, M. J. (Eds.), The 9th
         Bayesian Modeling Application Workshop at                       On the structure of educational assessment (with
         UAI 2012. Catalina Island, CA. Retrieved from                   discussion). Measurement: Interdisciplinary
         http://www.abnms.org/uai2012-apps-                              Research and Perspective,1 (1), 3-62.
         workshop/papers/AlmondEtal.pdf                         Neal, R. M. (2010) ``MCMC using Hamiltonian
Almond, R. G., & Tokac, U. (2014, November). Using                       dynamics'', in the Handbook of Markov Chain
         Decision Theory to Allocate Educational                         Monte Carlo, S. Brooks, A. Gelman, G. L. Jones,
         Resources. Paper presented at Annual Meeting,                   and X.-L. Meng (editors), Chapman & Hall /
         Florida Educational Research Association,                       CRC Press, pp. 113-162.
         Cocoa Beach, FL.                                       Nese, T. F. J., Lai, C., Anderson, D., Jamgochian, M. E.,
Almond, R. G., Yan, D., & Hemat, L. A. (2008).                           Kamata, A., Saez, L., Park, J. B., Alonzo, J., &
         Parameter Recovery Studies with a Diagnostic                    Tinda, G. (2010). Technical Adequacy of the
         Bayesian Network Model. Behaviormetrika,                        easyCBM® Mathematics Measures: Grades 3-8,
         35(2), 159-185.                                                 2009-2010 Version (Technical Report No: 1007).
Al Otaiba, S., Folsom, J. S., Schatschneider, C., Wanzek,                Eugene, OR: Behavioral Research and Teaching,
         J., Greulich, L., Meadows, J., & Li, Z. (2011).                 University of Oregon.
         Predicting first grade reading performance from        Plummer, M. (2003). JAGS: A program for analysis of
         kindergarten response to instruction. Exceptional               Bayesian graphical models using Gibbs
         Children, 77(4), 453-470.                                       sampling. Proceeding of the 3rd International
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-                   Workshop on Distributed Statistical Computing,
         theoretic planning: Structural assumptions and                  Viena, Austria.
         computational leverage. Journal of Artificial          R Development Core Team. (2014). R: A language and
         Intelligence Research, 11, 1-94. Available from                 environment for statistical computing. Vienna,
         citeseer.ist.psu.edu/boutilier99decisiontheoretic.              Austria: R Foundation for Statistical Computing.
         html                                                            Retrieved from http://www.R-project.org
Catts, H. W., Hogan, T. P. E., & Fey, M. (2003).                Rafferty, A. N., Brunskill, E.B., Griffiths, T. L., & Shafto,
         Subgrouping poor readers on the basis of                        P. (2011). Faster teaching by POMDP planning.
         individual differences in reading-related abilities.            Proceedings of the 15th International
                                                                         Conference on Artificial Intelligence in
         Journal of Learning Disabilities, 36, 151–164.
                                                                         Education (AIED2011). Auckland, New Zealand.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B.
                                                                Raftery, A. E., Lewis, S. M. (1995). The number of
         (2004). Bayesian Data Analysis. Boca Raton,
                                                                         iterations, convergence diagnostics and generic
         FL: Chapman and Hall.
                                                                         Metropolis algorithms. In: Gilks, W. R.,


                                                    BMAW 2016 - Page 42 of 59
         Spiegelhalter, D. J., Richardson, S., eds.           Tokac, Umit. (2016). Using partially observed Markov
         Practical Markov Chain Monte Carlo. London:                  decision processes (POMDPs) to implement a
         Chapman and Hall.                                            response-to-intervention (RTI) framework for
Rock, D. A. (2007). Growth in reading performance                     early reading. Doctoral Dissertation. Florida
         during the first four years in school. (Report No:           State University.
         RR-07-39). Princeton, NJ: Educational Testing        Torgesen, J.K. (2004). Avoiding the devastating
         Service.                                                     downward spiral: The evidence that early
Ross, M. S. (1983). Introduction to stochastic dynamic                intervention prevents reading failure. American
         programming. London:Academic Press.                          Educator, 28, 6-19. Reprinted in the 56th
Ross, M. S. (2000). Introduction to Probability Models.               Annual Commemorative Booklet of the
         London: Academic Press.                                      International Dyslexia Association, November,
Tierney, L. (1994). Markov Chain for exploring posterior              2005.
         distributions (with discussion). Ann. Statist. 22:
         1701- 1762.


                                                  BMAW 2016 - Page 43 of 59

</pre>