-

The Efficacy of the POMDP-RTI Approach for Early Reading Intervention

Umit Tokac

ut08@my.fsu.edu 0

Russell G. Almond

ralmond@fsu.edu 0 0 Educational Psychology and , Learning Systems , Florida State University , Tallahassee, FL 32306 , USA

2016

A POMDP is a tool for planning: selecting a policy that will lead to an optimal outcome. Response to intervention (RTI) is an approach to instruction, where teachers craft individual plans for students based on the results of progress monitoring tests. Current practice assigns students into tiers of instruction at each time point based on cut scores on the most recent test. This paper explores whether a tier assignment policy determined by a POMDP model in a RTI setting offer advantages over the current practice. Simulated data sets were used to compare the two approaches; the model had a single latent reading construct and two observed reading measures: Phoneme Segmentation Fluency (PSF) for phonological awareness and Nonsense Word Fluency (NWF) for phonological decoding. The two simulation studies compared how the students were placed into instructional groups using the two approaches, POMDP-RTI and RTI. This paper explored the efficacy of using a POMDP to select and apply appropriate instruction.

INTRODUCTION Statistics gathered by local school districts reflect that roughly 30% of their first-grade students read below grade level standards (Matthews, 2015) . Moreover, Landerl and Wimmer (2009) reported that 70% of struggling readers in first grade continued to struggle in eight grade when no intervention was provided. Mastropieri, Scruggs, and Graetz (2003 ) argued that reading is the main problem for most students with learning disabilities.

Torgesen (2004) asserts that reading consists of five components: phonological awareness, phonological decoding, fluency, vocabulary, and reading comprehension. According to the Simple View Theory of Reading Development (Gough & Tunmer, 1986) for children at young ages, mastery of the first two components, phonological decoding and phonological awareness, generate the remaining three reading components: fluency, vocabulary, and reading comprehension. A lack of either phonological decoding or phonological awareness affects the other components and causes reading difficulties. Because the development of reading skills is critical, instructors should identify children with reading difficulties and provide additional instructional support (Catts, Hogan & Fey, 2003) . Response to intervention (RTI) is an educational framework designed to identify students with difficulties in reading and math, and intervene as early as possible by providing more intensive instruction for students who need it. The RTI approach divides instruction into Tiers; each tier includes different intervention or instruction. The RTI process starts with screening tests which monitor general knowledge and skills of all students in the class. The screening tests are administered on multiple occasions during a school year. The screening test results provide teachers with a rough estimate of each student’s proficiency that guides the assignment of students into appropriate tiers of instruction. RTI has produced good results in both research and operational settings, and hence is considered to be one of the evidence-based practices for improving reading and preventing learning disabilities (Greenwood et al., 2011) .

Ideally, the placement into Tiers of students in an RTI program would be based on their unobservable true proficiency. As this is unobservable, the placement decision is instead made basis of the estimates of proficiency from screening tests. Often in current practice this is implemented through a cut score on the most recent screening test. Naturally, a certain amount of measurement error causes some students to be placed incorrectly. Considering the entire (both students’ previous screen-tests results and changes in instruction) history in account should improve the proficiency estimates performance. Almond (2007) suggested that this could be done using a partially observed Markov decision process (POMDP) — partially observed, because the true student proficiency is latent; a decision process, because the instructors decide what instruction or intervention to use between measurement occasions.

A POMDP is a probabilistic and sequential model. A POMDP can be in one of a number of distinct states at any point in time, and its state changes over time in response to events (Boutilier, Dean & Hanks, 1999) . One noteworthy difference between a RTI approach and a POMDP model is that most RTI approaches use only the latest test results to identify students’ proficiencies and assign them to appropriate tier (Nese et al., 2010) . We call the approach the current-time only-RTI model. On the other hand, a POMDP-RTI model is the combination of a periodically applied screening test, and the RTI into a POMDP model. Additionally, a POMDP considers the students’ entire histories (both actions and test scores) when determining appropriate interventions at in order to identify their current abilities and forecast their future abilities under competing policies. Therefore, a POMDPRTI model should perform better than current-time onlyRTI model.

To test the last assertion, this paper compares the POMDP-RTI model with the current-time only-RTI, evaluating the predictive accuracy of each model, the quality of the instructional plans produced and the reading levels achieved at the end of the year. It does this through simulation studies based on numbers obtained from fitting the POMDP model to a group of kindergarten students in an earlier RTI study (Al Otaiba, Connor, Folsom, Greulich, Meadows, & Li, 2011) .

METHOD Two simulated datasets were used in order to address how properly students are assigned to each tier based on their latent reading score in the POMDP-RTI model compared based on their observed score in the current-time onlyRTI model. The initial value of the parameters were based on a longitudinal Florida Center for Reading Research (FCRR) study of reading proficiency (Al Otaiba et al, 2011) and data sets were simulated based on the Almond (2007) model in order to produce realistic data for answering the research question posed above. The parameters of the simulation were chosen so that the distribution of scores on the screening test were similar to those of the Al Otaiba et al. study at both the initial and final measurement period. 2.1 THE POMDP-RTI FRAMEWORK Almond (2007) describes a general mapping of a POMDP into an educational setting. It is assumed that the student’s proficiency is measured at a number of occasions. The latent proficiencies of the students is the hidden layer of the POMDP model. The actual test scores are the observable outcomes, and the instructional options for the teacher between measurement occasions are the action space. The utility is assumed to be an increasing function of the latent proficiency variable at the last measurement occasion; thus, it is finite time horizon model. proficiency variables at Measurement Occasion m, Rm, and the observable multivariate outcome variables are PSFm and NWFm on that occasion. Extending the ECD terminology, Almond (2007) calls the model for the Rm's, the proficiency growth model. Following the normal logic of POMDPs this is expressed with two parts: the first is the initial proficiency model, which gives the population distribution for proficiency at the first measurement occasion. The second is an action, which gives a probability distribution for change in proficiency over time that depends on the instructional activity chosen between measurement occasions.

There are two notable differences between the POMDP models used in this application and those commonly seen in the literature. First, the models have a fixed and finite time horizon, with the reward occurring only at the last time step (although the actions at each step have a cost which is subtracted from the reward). This removes the need for the usual discounting of future rewards. The second is that the Markov process in non-stationary (it is hoped that the student’s abilities will improve over time). This produces a potential identifiability issue, as growth is difficult to distinguish between difficulty shifts in the measurement instruments (Almond, Tokac & Al Otaiba, 2012) . Assuming that the screening tests have all be equated, hence are on the same scale, takes care of the identification issue. An alternative approach would be to subtract the expected growth from the model, making the latent proficiency variable represent deviations from the expected growth model (Almond, et al., 2014) . 2.1.1

Proficiency Growth Model

The model from which the data was simulated was a unidimensional model of reading with a single latent, continuous variable: Rnm, the reading ability of individual n on measurement occasion m. In this case, N was 300 students and M represented the three equally spaced time points, t1, t2, t3. (RTI screening tests are typically given 3 times per year.) This study assumed that a teacher provided general instruction to all the students until the first time point, t1, and that the initial ability distribution was normal, R0 ~ N(0,1). As this is a purely latent variable, the scale and location is arbitrary. Fixing the initial population to have a standard normal distribution establishes the scale. After analyzing the results of assessments administered at t1, the teacher delivered additional and more intensive instruction to students who were assigned to Tier 2, but delivered only general instruction to students in Tier 1. The tier to which student n is assigned at time m is represented by a(n,m). The growth rate for the students is assumed to depend on the tier assignment. Thus, for measurement occasion m > 1,

Rnm = Rn(m-1) + γa(n,m) ΔTm + ηnm, (1) where

ηnm ~N(0, σa(n,m) ∆ ), and where ΔTm represents the elapsed time period between measurement occasions m and m-1 for Tier 1 and Tier 2. In this study, each school year was equal to 1, and ΔTm was fixed and equal to 1/M (e.g. M = 3, so ). The parameter γa(n,m) is a tier-specific growth rate and it was fixed and had two different initial values for each tier. We set γam = 0.9 for Tier 1, and γam = 1.2 for Tier 2. The residual standard deviation, σa(n,m) ∆ , depends on both a tier-specific rate, σa(n,m), and the length of time, ΔTm, between measurements (thus, growth is occurring via a non-stationary Brownian motion process). The standard deviation of the growth per unit time, σa(n,m), was fixed to 1 for both tiers. 2.1.2

Evidence Model

The evidence model involved two independent regressions, one for each observed variable i. These two observable variables were chosen because they are critical reading components for later reading performance in the first two years of elementary school (Rock, 2007) . Let Ynmi be the observation for individual n at measurement occasion m on observed variable i of the proficiency variables, then:

Rn0 ~ N(0,1) Ynmi = ai + biRnm + , ~ N(0, ωi).

(2) The reliability of the instruments can be used to determine b and ω. The reliability of an observed variable i at any time point was represented as ri. In classical test theory, the reliability is the squared correlation coefficient between the true score and the observed score of the student. This definition translates into an equation as ri = 1- (Varn(ϵnmi)/ Varn(Ynmi)) where Varn(.) indicates that the variance comes from individuals (where measurement occasion and instrument are considered as constant). Then bi = / *√ 2

and ωi = *√1 − 2 In order to make ri = .45 at each time point, tm, for the measurement of each skill on observed variable i, bi = .98 and ωi = .65 was used at tm. These numbers are comparable to reading measures commonly used with 1st grade students. At this point, the model is very close to the model described in Almond, Tokac and Al Otaiba (2012), except that the previous work assumed all students were in the same Tier. Appropriate values for a and b depend on the scale of the instruments chosen. The values used in the simulation were chosen so that the mean and standard deviation of the simulated data matched the data set from Al Otaiba et al. (2011) at the first and last time points. 2.1.3

Decision Rules

The key research question compares the performance of the system under two different policies. The first is a fixed decision rule implicit in the current-time RTI policy: Students who are below a cut-score on either of the two screening tests are placed into Tier 2 instruction. The second policy is the optimal policy found by solving the POMDP. Implementing this policy requires an explicit specification of the utility function and the cost function for the instructional options.

Many RTI implementations used the reference score (general class median score or some other percentile rank) as a cut score for assigning each student to either the Tier 1 or Tier 2 group. The simulated model used different Tier 2 for each of the two screening tests (NWF and PSF) giving four possible Tier assignments. For instance, if a student’s score on the NWF test is lower than the cut score for NWF but higher for PSF, the student was assigned to Tier 2 for NWF and Tier 1 for PSF. (This differs slightly from the common practice which would put students who fail to meet the cut on either measure into a single Tier 2.) The POMDP forecasts expected learning under each possible outcome and assigns students to tiers in a way that balances the expected learning gains with the cost of instruction. The utility function is the expected gain at the last time point and the cost function is the sum of costs of applied instruction at each state. The benefit is always higher for Tier 2, as is the cost. However, the cost exceeds the utility of the benefit for some regions of the distribution because the utility is nonlinear, while for other regions it does not.

The contact hours with the instructor drive the cost of each block. Cost is high for more intensive instruction in Tier 2, and, without loss of generality, it is zero for Tier 1, as all students receive Tier 1 instruction. The cost function consists of three components: the frequency with which the group meets, fa, the duration of the meeting time, da, and size of the group, ga (Almond & Tokac, 2014) . Then c(a) = k fa da/ga , (3) represents the model cost of taking action or activity a in state s, where k is a constant used to put the cost function on the same scale as the utility function. In this study, the cost value was fixed at c(Tier 2) = 0.1 and c(Tier 1) = 0. The utility function is u(RM) = logit-1(α(RM -β)). (4) In this equation α and β are fixed parameters; β is a proficiency target, which is on the scale of the internal latent variable RM. Specifically β = 0.5 for Tier 1 and β= 0.1 for Tier 2. Also, α is a slope parameter, and α= 0.8 for both Tier 1 and Tier 2. High values of α favor bringing students near proficiency standards above the proficiency target β, while low values of α give more weight to enriching students at the high end of the scale and providing remediation at the low end of the scale (Almond & Tokac, 2014) . (Almond & Tokac alternatively recommend using a probit function in place of a logit, so that α becomes effectively a standard deviation; however, the as the shape of the logit and probit curves are so similar, we expect the results using a probit curve would be similar as well.) In this case, the total reward is u(RM) – c(a(s,2)) – c(a(s,3)). The difference between the utility function and the cost function is the total reward for getting the student to proficiency level Tier 1 using instruction a(s,2) and a(s,3) between measurements 1 and 2, and 2 and 3. The reward is the basis for the assignment of each student to Tier 1 or Tier 2. The POMDP model forecasts the expected reward, and balances that with cost during each period.

2.2 SIMULATION DESIGN

The initial value of the simulated data student distribution at time 0 was based on the FCRR data set (Al Otaiba, 2007). In the FCRR data, the correlation between NWF and PSF was .65. The simulation generated latent proficiency variables for each simulee, and simulated scores on the reading scores on the NWF and PSF test administered at t1, t2 and t3 in the model. At each time point, the correlation coefficient between NWF and PSF was around 0.65 and the same growth and measurement error residuals were used for both the POMDP-RTI and current-time only-RTI models.

The proficiency growth model and evidence model parameters were estimated from the simulated data through Markov Chain Monte Carlo (MCMC) simulation using JAGS (Plummer, 2003) . Four independent Markov chains with random starting positions were used with 500000 iterations. This is consistent with standard practice (Gelman, Carlin, Stern & Rubin, 2004; Neal, 2010) . Tokac (2016) describes tests done convergence and parameter recovery with this model. for

RESULTS Data were simulated for students under two different policies, (1) current-time only-RTI policy where students are assigned to Tier 1 or Tier 2 based on a cuts scores on the PSF and NWF tests at the most recent time point, and (2) a POMDP-RTI policy where each student is assigned to the tier that maximizes the expected utility for that student. This resulted in two different simulated series: ˇ was the true reading ability under the current-time only cut score policy and ^ was the true reading ability under the POMDP-RTI policy. Note that the two simulations used the same residuals in equation (1) (growth residual ηnm) and equation (2) (measurement error ). Thus, they differed only by the value of the growth rate parameter, γa(n,m) , used in equation (1). Table 3 shows the pattern of Tier assignment under the two models. At the second time point, the two policies behave roughly the same assigning the lowest performing 50% of students to Tier 2. However, at the third time point, substantially fewer students are assigned to Tier 2 under the POMDP-RTI policy. This might be a result of better placement policies, or simply that the Tier 2 support is less needed in the latter part of the school year. Table 4 breaks down the differences between the two policies at time point 3. Recall that the students were classified into Tiers independently based on the PSF and NWF measures, resulting effectively in four different classifications: 1-1 (both in Tier 1), 1-2, 2-1 (mixed), and 2-2 (both Tier 2). Table 4 shows the number of students who were classified into one of the four groups who were classified into a different group by the other policy. Slightly over half (151) students were assigned different instruction under the different policies. Thus, there is a fair bit of difference in the placement, but which placement is better? As this is a simulation student, the true abilities are known it should be possible to determine an ideal placement based on the known simulated abilities. However, the abilities, ˇ and ^ , are different in the two branches of the assessment (because a different policy was actually employed). Therefore, the ideal placements will be different under each policy.

In determining the ideal placement, the two mixed assignments, 1-2 and 2-1, were combined into a single mixed tier. Cut scores on the latent ability variable were calculated based on the utilities in equations (3) and (4) and a single growth step after the last measurement: the students with abilities higher than 0.1 should be placed into Tier 1, those lower than -0.4 into Tier 2 and students in between into the Mixed Tier. Both policies used the same cut points for determining the ideal placement, but because the abilities were different, the actual ideal placement could be different for the two students under the same policy at Time 3.

Table 5 presents the number of students placed in each tier under the actual and ideal placements under both policies. It also presents a measure of agreement which is the number of students assigned to that tier in the ideal placement that were actually assigned to the Tier. The POMDP-RTI does well under that metric, with all of the students who should be placed into Tier 1 or 2 correctly placed in that tier. This policy only had problems with the mixed tier, with 35% of the students being incorrectly placed in Tier 1 or Tier 2.

The current-time only-RTI policy did not fare as well. First, note that under the ideal placement for this policy fewer students would be in the high-performing Tier 1 group. This is likely due to incorrect assignment at Time 2. Next, note that agreement rates are lower. So the POMDP-RTI model did better on two important metrics. To summarize the agreement numbers, we used Goodman and Kruskall’s lambda (Almond, Mislevy, Steinberg, Yan, and Williamson, 2015) . Usually, this adjusts the raw agreement rate by subtracting out the agreement with a classifier which simply classifies everybody at the modal category (which would be the mixed tier for both policies). However, Tier 1 has a special meaning in the context of RTI; Tier 1 is the normal whole-class instruction that is given regardless of the test score. may have been influenced by the use of the same utility model used in the POMDP to define ideal placement. Therefore, by using Tier 1 as the baseline in lambda, the result is a statistic that describes how much better the RTI is performing than undifferentiated whole class instruction. Let ki be the number of students correctly classified into Tier i, and let kTier1 be the number of students who should ideally be assigned to Tier 1. Then λ = ∑ − 1

− 1 Like a correlation coefficient, the value of lambda ranges between -1 and 1, with 0 representing a classifier which does no better than simply assigning everybody to the model category. If it is 1, it means that the policy did a perfect job of assigning students to the ideal tier. Using the data in Table 5, λ = 0.74 for POMDP-RTI, λ = 0.51 for Current-time only RTI. So RTI does better than undifferentiated instruction, but the POMDP-RTI policy also does better than the current-time only-RTI.

CONCLUSION As expected, a policy produced by a POMDP (which is designed to produce optimal policies) performed better than current-time only cut-score policy current used in many RTI implementations. In particular, the POMDPRTI had a better agreement with the ideal placement (λ = 0.74) than the current-time only model did (λ = 0.51). The likely reason for the better performance is that the POMDP model is better able to use the entire student record, both the history of assessments and instruction and multiple tests taken at the same time to build a more accurate estimate of student proficiency, although some The cut-score approach currently in common use does have one clear advantage over the POMDP model: it is simpler to implement and explain. However, if the POMDP recommendations were integrated into an electronic gradebook, it might be better received by teachers. However, while teachers may not feel the need for the POMDP software to address the Tier 1/Tier 2 placement, there is another aspect of the RTI framework which was not addressed in this study. During Tier 2, students receive regular progress monitoring assessments, and the teacher is supposed to be making fine-grained adjustments if the student is not responding to the intervention (hence the name response-to-intervention). In particular, the teachers can adjust the intensity of the intervention (equation 3) adding more time on task if needed, or using less support if the teacher is appearing to do well. This is a target of opportunity for the POMDP model, as teachers have responded favorability to the idea of computer support to help them with tracking and intervention adjustment for Tier 2 students.1 The present work shows that POMDPs are a promising approach to this problem.

Another limitation of the current work is that it assumes all students grow at the same rate under each of the instructional conditions (e.g., given the tier placement). In practice, many studies looking at RTI have found that students grow at different rates, with a low growth rate often corresponding to low initial ability.2 While this adds complexity to the model, we think that the POMDP framework will help educators make optimal policy decisions with this additional information.

Acknowledgements We would like to thank the Florida Center for Reading Research for allowing us access to the data used in this paper. The data were originally collected as part of a larger National Institute of Child Health and Human Development Early Child Care Research Network study.

Almond , R. G. ( 2007 ). Cognitive modeling to represent growth (learning) using Markov decision processes . Technology, Instruction, Cognition and Learning (TICL) , 5 , 313 - 324 . Retrieved

Joe

Nese , U. Oregon, private communication. May 16 , 2016 .

2 Young-Suk

Kim

, Florida State University. Private communication. March 31 , 2016 .

Almond , R. G. ( 2011 ). Estimating Parameters of Periodic Assessment Models (Repot No . RM- 11 -06). Educational Testing Service . Retrieved from http://www.ets.org/research/policy_research_rep orts/rm-11-06.pdf

Almond , R. G. , Mislevy , R. J. , Steinberg , L. S. , Yan , D. , & Williamson , D. M. ( 2015 ). Bayesian Networks in Educational Assessment . Springer.

Almond , R. , Goldin , I. , Guo , Y. , & Wang , N. ( 2014 ). Vertical and Stationary Scales for Progress Maps . In J Stamper,

Pardoz , M Mavrikis , & B. M. McLaren (Eds.), Proceedings of the 7th International Conference on Educational Data Mining, London, England. Society for Educational Data Mining . 169 - 176 . Retrieved from http://educationaldatamining.org/EDM2014/uplo ads/procs2014/long%20papers/169_EDM-2014- Full.pdf

Almond , G. R. , Tokac , U. , & Al Otaiba, S. ( 2012 ). Using POMDPs to Forecast Kindergarten Students' Reading Comprehension . In Agosta, J. M. , Nicholson , A. , & Flores , M. J . (Eds.), The 9th Bayesian Modeling Application Workshop at UAI 2012 . Catalina Island, CA. Retrieved from http://www.abnms.org/uai2012-appsworkshop/papers/AlmondEtal.pdf

Almond , R. G. , & Tokac , U. ( 2014 , November). Using Decision Theory to Allocate Educational Resources . Paper presented at Annual Meeting, Florida Educational Research Association, Cocoa Beach, FL.

Almond , R. G. , Yan , D. , & Hemat , L. A. ( 2008 ). Parameter Recovery Studies with a Diagnostic Bayesian Network Model . Behaviormetrika, 35 ( 2 ), 159 - 185 .

Otaiba , S. , Folsom , J. S. , Schatschneider , C. , Wanzek , J. , Greulich , L. , Meadows , J. , & Li , Z. ( 2011 ). Predicting first grade reading performance from kindergarten response to instruction . Exceptional Children , 77 ( 4 ), 453 - 470 .

Boutilier , C. , Dean , T. , & Hanks , S. ( 1999 ). Decisiontheoretic planning: Structural assumptions and computational leverage . Journal of Artificial Intelligence Research , 11 , 1 - 94 . Available from citeseer.ist.psu.edu/boutilier99decisiontheoretic. html

Catts , H. W. , Hogan , T. P. E. , & Fey , M. ( 2003 ). Subgrouping poor readers on the basis of individual differences in reading-related abilities . Journal of Learning Disabilities , 36 , 151 - 164 .

Gelman , A. , Carlin , J. B. , Stern , H. S. , & Rubin , D. B. ( 2004 ). Bayesian Data Analysis . Boca Raton , FL: Chapman and Hall.

Gough , P. B. , & Tunmer , W. E. ( 1986 ). Decoding, reading, and reading disability. Remedial and Special Education , 7 , 6 - 10 .

Greenwood , C. R. , Bradfield , T. , Kaminski , R. , Linas , M. , Carta , J. J. , & Nylander , D. ( 2011 ). The Response to Intervention ( RTI ) Approach in Early Childhood . Focus on Exceptional Children , 43 ( 9 ), 1 - 24 .

Landerl

& Wimmer

( 2008 ) Development of word reading fluency and spelling in a consistent orthography: An 8-year follow-up . Journal of Educational Psychology . 100 ( 1 ): 150 - 161 .

Mastropieri , M. A. , Scruggs , T. E. , & Graetz , J. E. ( 2003 ). Reading comprehension instruction for secondary students: Challenges for struggling students and teachers . Learning Disability Quarterly , 26 ( 4 ), 103 - 116 .

Matthews , E. ( 2015 ). Analysis of an Early Intervention Reading Program for First Grade Students . Retrieved from http://scholarworks.waldenu.edu/cgi/viewcontent .cgi?article=1395&context=dissertations

Mislevy , R. J. , Steinberg , L. S. , & Almond , R. G. ( 2003 ). On the structure of educational assessment (with discussion) . Measurement: Interdisciplinary Research and Perspective , 1 ( 1 ), 3 - 62 .

Neal , R. M. ( 2010 ) ``MCMC using Hamiltonian dynamics'', in the Handbook of Markov Chain Monte Carlo ,

Brooks ,

Gelman ,

G. L.

Jones , and X.-L. Meng (editors), Chapman & Hall / CRC Press, pp. 113 - 162 .

Nese , T. F. J. , Lai , C. , Anderson , D. , Jamgochian , M. E. , Kamata , A. , Saez , L. , Park , J. B. , Alonzo , J. , & Tinda , G. ( 2010 ). Technical Adequacy of the easyCBM® Mathematics Measures: Grades 3-8 , 2009 -2010 Version (Technical Report No: 1007) . Eugene, OR: Behavioral Research and Teaching, University of Oregon.

Plummer , M. ( 2003 ). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling . Proceeding of the 3rd International Workshop on Distributed Statistical Computing , Viena, Austria.

Development Core

Team. ( 2014 ). R: A language and environment for statistical computing . Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org

Rafferty , A. N. , Brunskill , E.B. , Griffiths , T. L. , & Shafto , P. ( 2011 ). Faster teaching by POMDP planning . Proceedings of the 15th International Conference on Artificial Intelligence in Education (AIED2011) . Auckland, New Zealand.

Raftery , A. E. , Lewis , S. M. ( 1995 ). The number of iterations, convergence diagnostics and generic Metropolis algorithms . In: Gilks, W. R. , Spiegelhalter , D. J. , Richardson , S., eds. Practical Markov Chain Monte Carlo. London: Chapman and Hall.

Rock , D. A. ( 2007 ). Growth in reading performance during the first four years in school . (Report No: RR-07-39) . Princeton, NJ: Educational Testing Service.

Ross , M. S. ( 1983 ). Introduction to stochastic dynamic programming . London:Academic Press.

Ross , M. S. ( 2000 ). Introduction to Probability Models. London: Academic Press.

Tierney , L. ( 1994 ). Markov Chain for exploring posterior distributions (with discussion) . Ann. Statist . 22 : 1701 - 1762 .

Tokac , Umit. ( 2016 ). Using partially observed Markov decision processes (POMDPs) to implement a response-to-intervention (RTI) framework for early reading . Doctoral Dissertation . Florida State University.

Torgesen , J.K. ( 2004 ). Avoiding the devastating downward spiral: The evidence that early intervention prevents reading failure . American Educator , 28 , 6 - 19 . Reprinted in the 56th Annual Commemorative Booklet of the International Dyslexia Association , November, 2005 .