=Paper=
{{Paper
|id=Vol-1663/bmaw2016_paper_6
|storemode=property
|title=The Efficacy of the POMDP-RTI Approach for Early Reading Intervention
|pdfUrl=https://ceur-ws.org/Vol-1663/bmaw2016_paper_6.pdf
|volume=Vol-1663
|authors=Umit Tokac,Russell G.Almond
|dblpUrl=https://dblp.org/rec/conf/uai/TokacA16
}}
==The Efficacy of the POMDP-RTI Approach for Early Reading Intervention==
The Efficacy of the POMDP-RTI Approach for Early Reading Intervention Umit Tokac Russell G. Almond Educational Psychology and Educational Psychology and Learning Systems Learning Systems Florida State University Florida State University Tallahassee, FL 32306 Tallahassee, FL 32306 ut08@my.fsu.edu ralmond@fsu.edu Abstract Mastropieri, Scruggs, and Graetz (2003) argued that reading is the main problem for most students with learning disabilities. A POMDP is a tool for planning: selecting a policy that will lead to an optimal outcome. Torgesen (2004) asserts that reading consists of five Response to intervention (RTI) is an approach to components: phonological awareness, phonological instruction, where teachers craft individual plans decoding, fluency, vocabulary, and reading for students based on the results of progress comprehension. According to the Simple View Theory of monitoring tests. Current practice assigns Reading Development (Gough & Tunmer, 1986) for students into tiers of instruction at each time children at young ages, mastery of the first two point based on cut scores on the most recent test. components, phonological decoding and phonological This paper explores whether a tier assignment awareness, generate the remaining three reading policy determined by a POMDP model in a RTI components: fluency, vocabulary, and reading setting offer advantages over the current practice. comprehension. A lack of either phonological decoding Simulated data sets were used to compare the or phonological awareness affects the other components two approaches; the model had a single latent and causes reading difficulties. Because the development reading construct and two observed reading of reading skills is critical, instructors should identify measures: Phoneme Segmentation Fluency (PSF) for phonological awareness and Nonsense Word children with reading difficulties and provide additional Fluency (NWF) for phonological decoding. The instructional support (Catts, Hogan & Fey, 2003). two simulation studies compared how the students were placed into instructional groups Response to intervention (RTI) is an educational using the two approaches, POMDP-RTI and framework designed to identify students with difficulties RTI. This paper explored the efficacy of using a in reading and math, and intervene as early as possible by POMDP to select and apply appropriate providing more intensive instruction for students who instruction. need it. The RTI approach divides instruction into Tiers; each tier includes different intervention or instruction. The RTI process starts with screening tests which monitor 1. INTRODUCTION general knowledge and skills of all students in the class. The screening tests are administered on multiple Statistics gathered by local school districts reflect that occasions during a school year. The screening test results roughly 30% of their first-grade students read below provide teachers with a rough estimate of each student’s grade level standards (Matthews, 2015). Moreover, proficiency that guides the assignment of students into Landerl and Wimmer (2009) reported that 70% of appropriate tiers of instruction. RTI has produced good struggling readers in first grade continued to struggle in results in both research and operational settings, and eight grade when no intervention was provided. hence is considered to be one of the evidence-based BMAW 2016 - Page 36 of 59 practices for improving reading and preventing learning based on their observed score in the current-time only- disabilities (Greenwood et al., 2011). RTI model. The initial value of the parameters were based on a longitudinal Florida Center for Reading Research Ideally, the placement into Tiers of students in an RTI (FCRR) study of reading proficiency (Al Otaiba et al, program would be based on their unobservable true 2011) and data sets were simulated based on the Almond proficiency. As this is unobservable, the placement (2007) model in order to produce realistic data for decision is instead made basis of the estimates of answering the research question posed above. The proficiency from screening tests. Often in current parameters of the simulation were chosen so that the practice this is implemented through a cut score on the distribution of scores on the screening test were similar to most recent screening test. Naturally, a certain amount of those of the Al Otaiba et al. study at both the initial and measurement error causes some students to be placed final measurement period. incorrectly. Considering the entire (both students’ previous screen-tests results and changes in instruction) 2.1 THE POMDP-RTI FRAMEWORK history in account should improve the proficiency estimates performance. Almond (2007) suggested that this Almond (2007) describes a general mapping of a POMDP could be done using a partially observed Markov decision into an educational setting. It is assumed that the process (POMDP) — partially observed, because the true student’s proficiency is measured at a number of student proficiency is latent; a decision process, because occasions. The latent proficiencies of the students is the the instructors decide what instruction or intervention to hidden layer of the POMDP model. The actual test scores use between measurement occasions. are the observable outcomes, and the instructional options for the teacher between measurement occasions are the A POMDP is a probabilistic and sequential model. A action space. The utility is assumed to be an increasing POMDP can be in one of a number of distinct states at function of the latent proficiency variable at the last any point in time, and its state changes over time in measurement occasion; thus, it is finite time horizon response to events (Boutilier, Dean & Hanks, 1999). One model. noteworthy difference between a RTI approach and a POMDP model is that most RTI approaches use only the Figure 1 show a realization of an RTI program in this latest test results to identify students’ proficiencies and framework. The nodes marked R represent the latent assign them to appropriate tier (Nese et al., 2010). We call student proficiency as it evolves over time. At each time the approach the current-time only-RTI model. On the slice, there is generally some kind of measurement of other hand, a POMDP-RTI model is the combination of a student progress represented by the observable outcomes, periodically applied screening test, and the RTI into a Phoneme Segmentation Fluency (PSF) for phonological POMDP model. Additionally, a POMDP considers the awareness and Nonsense Word Fluency (NWF) for students’ entire histories (both actions and test scores) phonological decoding. Tiers are instructional tasks when determining appropriate interventions at in order to chosen by the instructor and applied during time slices. identify their current abilities and forecast their future Note that in an RTI implementation, Tier 1 refers to abilities under competing policies. Therefore, a POMDP- whole class instruction given to all students, while Tier 2 RTI model should perform better than current-time only- is small group supplemental instruction generally given RTI model. only to the students most at risk. Students in Tier 2 are given the Tier 1 instruction as well. To test the last assertion, this paper compares the POMDP-RTI model with the current-time only-RTI, evaluating the predictive accuracy of each model, the quality of the instructional plans produced and the reading levels achieved at the end of the year. It does this through simulation studies based on numbers obtained from fitting the POMDP model to a group of kindergarten students in an earlier RTI study (Al Otaiba, Connor, Folsom, Greulich, Meadows, & Li, 2011). 2. METHOD Figure 1: The POMDP-RTI model Two simulated datasets were used in order to address how The Figure 1 was designed based on evidence-centered properly students are assigned to each tier based on their assessment design (ECD; Mislevy, Steinberg, & Almond, latent reading score in the POMDP-RTI model compared 2003) we call this an evidence model. In general, both the BMAW 2016 - Page 37 of 59 proficiency variables at Measurement Occasion m, Rm, assumed to depend on the tier assignment. Thus, for and the observable multivariate outcome variables are measurement occasion m > 1, PSFm and NWFm on that occasion. Extending the ECD terminology, Almond (2007) calls the model for the Rm's, Rnm = Rn(m-1) + γa(n,m) ΔTm + ηnm, (1) the proficiency growth model. Following the normal logic of POMDPs this is expressed with two parts: the first is where ηnm ~N(0, σa(n,m)�∆𝑇𝑇𝑚𝑚 ), the initial proficiency model, which gives the population and where ΔTm represents the elapsed time period distribution for proficiency at the first measurement between measurement occasions m and m-1 for Tier 1 and occasion. The second is an action, which gives a Tier 2. In this study, each school year was equal to 1, and probability distribution for change in proficiency over time that depends on the instructional activity chosen ΔTm was fixed and equal to 1/M (e.g. M = 3, so ). between measurement occasions. The parameter γa(n,m) is a tier-specific growth rate and it was fixed and had two different initial values for each There are two notable differences between the POMDP tier. We set γam = 0.9 for Tier 1, and γam = 1.2 for Tier 2. models used in this application and those commonly seen The residual standard deviation, σa(n,m)�∆𝑇𝑇𝑚𝑚 , depends on in the literature. First, the models have a fixed and finite both a tier-specific rate, σa(n,m), and the length of time, time horizon, with the reward occurring only at the last ΔTm, between measurements (thus, growth is occurring time step (although the actions at each step have a cost via a non-stationary Brownian motion process). The which is subtracted from the reward). This removes the standard deviation of the growth per unit time, σa(n,m), was need for the usual discounting of future rewards. The fixed to 1 for both tiers. second is that the Markov process in non-stationary (it is hoped that the student’s abilities will improve over time). 2.1.2 Evidence Model This produces a potential identifiability issue, as growth is The evidence model involved two independent difficult to distinguish between difficulty shifts in the regressions, one for each observed variable i. These two measurement instruments (Almond, Tokac & Al Otaiba, observable variables were chosen because they are critical 2012). Assuming that the screening tests have all be reading components for later reading performance in the equated, hence are on the same scale, takes care of the first two years of elementary school (Rock, 2007). Let identification issue. An alternative approach would be to Ynmi be the observation for individual n at measurement subtract the expected growth from the model, making the occasion m on observed variable i of the proficiency latent proficiency variable represent deviations from the variables, then: expected growth model (Almond, et al., 2014). Rn0 ~ N(0,1) 2.1.1 Proficiency Growth Model Ynmi = ai + biRnm +𝜀𝜀𝑛𝑛𝑚𝑚𝑛𝑛 , (2) The model from which the data was simulated was a 𝜀𝜀𝑛𝑛𝑚𝑚𝑛𝑛 ~ N(0, ωi). unidimensional model of reading with a single latent, The reliability of the instruments can be used to determine continuous variable: Rnm, the reading ability of individual b and ω. The reliability of an observed variable i at any n on measurement occasion m. In this case, N was 300 time point was represented as ri. In classical test theory, students and M represented the three equally spaced time the reliability is the squared correlation coefficient points, t1, t2, t3. (RTI screening tests are typically given 3 between the true score and the observed score of the times per year.) student. This definition translates into an equation as ri = 1- (Varn(ϵnmi)/ Varn(Ynmi)) This study assumed that a teacher provided general instruction to all the students until the first time point, t1, where Varn(.) indicates that the variance comes from and that the initial ability distribution was normal, individuals (where measurement occasion and instrument R0 ~ N(0,1). As this is a purely latent variable, the scale are considered as constant). Then and location is arbitrary. Fixing the initial population to have a standard normal distribution establishes the scale. bi = 𝜎𝜎𝑌𝑌𝑖𝑖 /𝜎𝜎𝑅𝑅𝑖𝑖 *√𝑟𝑟 2 and ωi =𝜎𝜎𝑌𝑌𝑖𝑖 *√1 − 𝑟𝑟 2 After analyzing the results of assessments administered at t1, the teacher delivered additional and more intensive In order to make ri = .45 at each time point, tm, for the instruction to students who were assigned to Tier 2, but measurement of each skill on observed variable i, bi = .98 delivered only general instruction to students in Tier 1. and ωi = .65 was used at tm. These numbers are The tier to which student n is assigned at time m is comparable to reading measures commonly used with 1st represented by a(n,m). The growth rate for the students is grade students. At this point, the model is very close to BMAW 2016 - Page 38 of 59 the model described in Almond, Tokac and Al Otaiba represents the model cost of taking action or activity a in (2012), except that the previous work assumed all state s, where k is a constant used to put the cost function students were in the same Tier. Appropriate values for a on the same scale as the utility function. In this study, the and b depend on the scale of the instruments chosen. The cost value was fixed at c(Tier 2) = 0.1 and c(Tier 1) = 0. values used in the simulation were chosen so that the The utility function is mean and standard deviation of the simulated data matched the data set from Al Otaiba et al. (2011) at the u(RM) = logit-1(α(RM -β)). (4) first and last time points. In this equation α and β are fixed parameters; β is a 2.1.3 Decision Rules proficiency target, which is on the scale of the internal latent variable RM. Specifically β = 0.5 for Tier 1 and β= The key research question compares the performance of 0.1 for Tier 2. Also, α is a slope parameter, and α= 0.8 for the system under two different policies. The first is a both Tier 1 and Tier 2. High values of α favor bringing fixed decision rule implicit in the current-time RTI policy: students near proficiency standards above the proficiency Students who are below a cut-score on either of the two target β, while low values of α give more weight to screening tests are placed into Tier 2 instruction. The enriching students at the high end of the scale and second policy is the optimal policy found by solving the providing remediation at the low end of the scale POMDP. Implementing this policy requires an explicit (Almond & Tokac, 2014). (Almond & Tokac specification of the utility function and the cost function alternatively recommend using a probit function in place for the instructional options. of a logit, so that α becomes effectively a standard deviation; however, the as the shape of the logit and Many RTI implementations used the reference score probit curves are so similar, we expect the results using a (general class median score or some other percentile rank) probit curve would be similar as well.) as a cut score for assigning each student to either the Tier 1 or Tier 2 group. The simulated model used In this case, the total reward is u(RM) – c(a(s,2)) – different Tier 2 for each of the two screening tests (NWF c(a(s,3)). The difference between the utility function and and PSF) giving four possible Tier assignments. For the cost function is the total reward for getting the student instance, if a student’s score on the NWF test is lower to proficiency level Tier 1 using instruction a(s,2) and than the cut score for NWF but higher for PSF, the a(s,3) between measurements 1 and 2, and 2 and 3. The student was assigned to Tier 2 for NWF and Tier 1 for reward is the basis for the assignment of each student to PSF. (This differs slightly from the common practice Tier 1 or Tier 2. The POMDP model forecasts the which would put students who fail to meet the cut on expected reward, and balances that with cost during each either measure into a single Tier 2.) period. The POMDP forecasts expected learning under each 2.2 SIMULATION DESIGN possible outcome and assigns students to tiers in a way The initial value of the simulated data student distribution that balances the expected learning gains with the cost of at time 0 was based on the FCRR data set (Al Otaiba, instruction. The utility function is the expected gain at the 2007). In the FCRR data, the correlation between NWF last time point and the cost function is the sum of costs of and PSF was .65. The simulation generated latent applied instruction at each state. The benefit is always proficiency variables for each simulee, and simulated higher for Tier 2, as is the cost. However, the cost exceeds scores on the reading scores on the NWF and PSF test the utility of the benefit for some regions of the administered at t1, t2 and t3 in the model. At each time distribution because the utility is nonlinear, while for point, the correlation coefficient between NWF and PSF other regions it does not. was around 0.65 and the same growth and measurement error residuals were used for both the POMDP-RTI and The contact hours with the instructor drive the cost of current-time only-RTI models. each block. Cost is high for more intensive instruction in Tier 2, and, without loss of generality, it is zero for Tier 1, The proficiency growth model and evidence model as all students receive Tier 1 instruction. The cost parameters were estimated from the simulated data function consists of three components: the frequency with through Markov Chain Monte Carlo (MCMC) simulation which the group meets, fa, the duration of the meeting using JAGS (Plummer, 2003). Four independent Markov time, da, and size of the group, ga (Almond & Tokac, chains with random starting positions were used with 2014). Then 500000 iterations. This is consistent with standard practice (Gelman, Carlin, Stern & Rubin, 2004; Neal, c(a) = k fa da/ga , (3) BMAW 2016 - Page 39 of 59 2010). Tokac (2016) describes tests done for Number of Non-Matching Students convergence and parameter recovery with this model. Time 3 Tiers POMDP - RTI Current - Time RTI 3. RESULTS 1-1 49 20 Data were simulated for students under two different 1-2 38 36 policies, (1) current-time only-RTI policy where students 2-1 42 40 are assigned to Tier 1 or Tier 2 based on a cuts scores on 2-2 22 55 the PSF and NWF tests at the most recent time point, and (2) a POMDP-RTI policy where each student is assigned Thus, there is a fair bit of difference in the placement, but to the tier that maximizes the expected utility for that which placement is better? As this is a simulation student. This resulted in two different simulated series: student, the true abilities are known it should be possible ˇ was the true reading ability under the current-time to determine an ideal placement based on the known 𝑅𝑅𝑛𝑛𝑚𝑚 ˇ and 𝑅𝑅^ simulated abilities. However, the abilities, 𝑅𝑅𝑛𝑛𝑚𝑚 𝑛𝑛𝑚𝑚 , only cut score policy and 𝑅𝑅^ 𝑛𝑛𝑚𝑚 was the true reading ability under the POMDP-RTI policy. Note that the two are different in the two branches of the assessment simulations used the same residuals in equation (1) (because a different policy was actually employed). (growth residual ηnm) and equation (2) (measurement error Therefore, the ideal placements will be different under each policy. 𝜀𝜀𝑛𝑛𝑚𝑚𝑛𝑛 ). Thus, they differed only by the value of the growth rate parameter, γa(n,m) , used in equation (1). In determining the ideal placement, the two mixed assignments, 1-2 and 2-1, were combined into a single Table 3: Comparison of the number of PSF and NWF mixed tier. Cut scores on the latent ability variable were scores between tiers categorized by cut scores or POMDP calculated based on the utilities in equations (3) and (4) estimates and a single growth step after the last measurement: the students with abilities higher than 0.1 should be placed into Tier 1, those lower than -0.4 into Tier 2 and students Method Tier PSFt2 NWFt2 PSFt3 NWFt3 in between into the Mixed Tier. Both policies used the Tier 1 150 149 181 181 same cut points for determining the ideal placement, but POMDP because the abilities were different, the actual ideal Tier 2 150 151 119 119 Tier 1 150 149 150 150 placement could be different for the two students under Cut the same policy at Time 3. Score Tier 2 150 151 150 150 Table 5 presents the number of students placed in each tier under the actual and ideal placements under both Table 3 shows the pattern of Tier assignment under the policies. It also presents a measure of agreement which is two models. At the second time point, the two policies the number of students assigned to that tier in the ideal behave roughly the same assigning the lowest performing placement that were actually assigned to the Tier. The 50% of students to Tier 2. However, at the third time POMDP-RTI does well under that metric, with all of the point, substantially fewer students are assigned to Tier 2 students who should be placed into Tier 1 or 2 correctly under the POMDP-RTI policy. This might be a result of placed in that tier. This policy only had problems with better placement policies, or simply that the Tier 2 the mixed tier, with 35% of the students being incorrectly support is less needed in the latter part of the school year. placed in Tier 1 or Tier 2. Table 4 breaks down the differences between the two The current-time only-RTI policy did not fare as well. policies at time point 3. Recall that the students were First, note that under the ideal placement for this policy classified into Tiers independently based on the PSF and fewer students would be in the high-performing Tier 1 NWF measures, resulting effectively in four different group. This is likely due to incorrect assignment at classifications: 1-1 (both in Tier 1), 1-2, 2-1 (mixed), and Time 2. Next, note that agreement rates are lower. So the 2-2 (both Tier 2). Table 4 shows the number of students POMDP-RTI model did better on two important metrics. who were classified into one of the four groups who were To summarize the agreement numbers, we used Goodman classified into a different group by the other policy. and Kruskall’s lambda (Almond, Mislevy, Steinberg, Slightly over half (151) students were assigned different Yan, and Williamson, 2015). Usually, this adjusts the instruction under the different policies. raw agreement rate by subtracting out the agreement with a classifier which simply classifies everybody at the Table 4: Comparison of POMDP-RTI and Current-Time modal category (which would be the mixed tier for both only-RTI models policies). However, Tier 1 has a special meaning in the context of RTI; Tier 1 is the normal whole-class instruction that is given regardless of the test score. BMAW 2016 - Page 40 of 59 Table 5. Agreement between ideal and actual placement may have been influenced by the use of the same utility under POMDP-RTI. model used in the POMDP to define ideal placement. The cut-score approach currently in common use does POMDP-RTI Placement have one clear advantage over the POMDP model: it is Ideal Placement Tier 1 Mix Tier Tier 2 Total simpler to implement and explain. However, if the Tier 1 118 0 0 118 POMDP recommendations were integrated into an Mix Tier 18 90 30 138 electronic gradebook, it might be better received by Tier 2 0 0 44 44 teachers. However, while teachers may not feel the need Total 136 90 74 300 for the POMDP software to address the Tier 1/Tier 2 placement, there is another aspect of the RTI framework which was not addressed in this study. During Tier 2, Table 6. Agreement between ideal and actual placement students receive regular progress monitoring assessments, under current-time only RTI. and the teacher is supposed to be making fine-grained adjustments if the student is not responding to the Current-Time only-RTI Placement intervention (hence the name response-to-intervention). Ideal Placement Tier 1 Mix Tier Tier 2 Total In particular, the teachers can adjust the intensity of the Tier 1 72 17 0 89 intervention (equation 3) adding more time on task if Mix Tier 35 58 50 143 needed, or using less support if the teacher is appearing to Tier 2 0 11 57 68 do well. This is a target of opportunity for the POMDP Total 107 86 107 300 model, as teachers have responded favorability to the idea of computer support to help them with tracking and intervention adjustment for Tier 2 students. 1 The present Therefore, by using Tier 1 as the baseline in lambda, the work shows that POMDPs are a promising approach to result is a statistic that describes how much better the RTI this problem. is performing than undifferentiated whole class instruction. Let ki be the number of students correctly Another limitation of the current work is that it assumes classified into Tier i, and let kTier1 be the number of all students grow at the same rate under each of the students who should ideally be assigned to Tier 1. Then instructional conditions (e.g., given the tier placement). In practice, many studies looking at RTI have found that ∑𝑖𝑖 𝑘𝑘𝑖𝑖 −𝑘𝑘𝑇𝑇𝑖𝑖𝑇𝑇𝑇𝑇1 students grow at different rates, with a low growth rate λ= often corresponding to low initial ability. 2 While this 𝑁𝑁−𝑘𝑘𝑇𝑇𝑖𝑖𝑇𝑇𝑇𝑇1 adds complexity to the model, we think that the POMDP Like a correlation coefficient, the value of lambda ranges framework will help educators make optimal policy between -1 and 1, with 0 representing a classifier which decisions with this additional information. does no better than simply assigning everybody to the model category. If it is 1, it means that the policy did a perfect job of assigning students to the ideal tier. Using Acknowledgements the data in Table 5, λ = 0.74 for POMDP-RTI, λ = 0.51 for Current-time only RTI. So RTI does better than We would like to thank the Florida Center for Reading undifferentiated instruction, but the POMDP-RTI policy Research for allowing us access to the data used in this also does better than the current-time only-RTI. paper. The data were originally collected as part of a larger National Institute of Child Health and Human Development Early Child Care Research Network study. 4. CONCLUSION As expected, a policy produced by a POMDP (which is References designed to produce optimal policies) performed better than current-time only cut-score policy current used in Almond, R. G. (2007). Cognitive modeling to represent many RTI implementations. In particular, the POMDP- growth (learning) using Markov decision RTI had a better agreement with the ideal placement (λ = processes. Technology, Instruction, Cognition 0.74) than the current-time only model did (λ = 0.51). and Learning (TICL), 5, 313-324. Retrieved The likely reason for the better performance is that the POMDP model is better able to use the entire student record, both the history of assessments and instruction 1 Joe Nese, U. Oregon, private communication. May 16, 2016. and multiple tests taken at the same time to build a more 2 Young-Suk Kim, Florida State University. Private communication. accurate estimate of student proficiency, although some March 31, 2016. BMAW 2016 - Page 41 of 59 fromhttp://www.oldcitypublishing.com/TICL/TI Gough, P. B., & Tunmer, W. E. (1986). Decoding, CL.html reading, and reading disability. Remedial and Almond, R. G. (2011). Estimating Parameters of Periodic Special Education, 7, 6–10. Assessment Models (Repot No. RM-11-06). Greenwood, C. R., Bradfield, T., Kaminski, R., Linas, M., Educational Testing Service. Retrieved from Carta, J. J., & Nylander, D. (2011). The http://www.ets.org/research/policy_research_rep Response to Intervention ( RTI ) Approach in orts/rm-11-06.pdf Early Childhood. Focus on Exceptional Almond, R. G., Mislevy, R. J., Steinberg, L. S., Yan, D., Children, 43(9), 1–24. & Williamson, D. M. (2015). Bayesian Networks in Educational Assessment. Springer. Landerl K & Wimmer H. (2008) Development of word Almond, R., Goldin, I., Guo, Y., & Wang, N. (2014). reading fluency and spelling in a consistent Vertical and Stationary Scales for Progress orthography: An 8-year follow-up. Journal of Maps. In J Stamper, Z Pardoz, M Mavrikis, & B. Educational Psychology. 100(1):150–161. M. McLaren (Eds.), Proceedings of the 7th Mastropieri, M. A., Scruggs, T. E., & Graetz, J. E. (2003). International Conference on Educational Data Mining, London, England. Society for Reading comprehension instruction for Educational Data Mining. 169—176. Retrieved secondary students: Challenges for struggling from students and teachers. Learning Disability http://educationaldatamining.org/EDM2014/uplo Quarterly, 26(4), 103-116. ads/procs2014/long%20papers/169_EDM-2014- Matthews, E. (2015). Analysis of an Early Intervention Full.pdf Reading Program for First Grade Students. Retrieved from Almond, G. R., Tokac, U., & Al Otaiba, S. (2012). Using http://scholarworks.waldenu.edu/cgi/viewcontent POMDPs to Forecast Kindergarten Students' .cgi?article=1395&context=dissertations Reading Comprehension. In Agosta, J. M., Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). Nicholson, A., & Flores, M. J. (Eds.), The 9th Bayesian Modeling Application Workshop at On the structure of educational assessment (with UAI 2012. Catalina Island, CA. Retrieved from discussion). Measurement: Interdisciplinary http://www.abnms.org/uai2012-apps- Research and Perspective,1 (1), 3-62. workshop/papers/AlmondEtal.pdf Neal, R. M. (2010) ``MCMC using Hamiltonian Almond, R. G., & Tokac, U. (2014, November). Using dynamics'', in the Handbook of Markov Chain Decision Theory to Allocate Educational Monte Carlo, S. Brooks, A. Gelman, G. L. Jones, Resources. Paper presented at Annual Meeting, and X.-L. Meng (editors), Chapman & Hall / Florida Educational Research Association, CRC Press, pp. 113-162. Cocoa Beach, FL. Nese, T. F. J., Lai, C., Anderson, D., Jamgochian, M. E., Almond, R. G., Yan, D., & Hemat, L. A. (2008). Kamata, A., Saez, L., Park, J. B., Alonzo, J., & Parameter Recovery Studies with a Diagnostic Tinda, G. (2010). Technical Adequacy of the Bayesian Network Model. Behaviormetrika, easyCBM® Mathematics Measures: Grades 3-8, 35(2), 159-185. 2009-2010 Version (Technical Report No: 1007). Al Otaiba, S., Folsom, J. S., Schatschneider, C., Wanzek, Eugene, OR: Behavioral Research and Teaching, J., Greulich, L., Meadows, J., & Li, Z. (2011). University of Oregon. Predicting first grade reading performance from Plummer, M. (2003). JAGS: A program for analysis of kindergarten response to instruction. Exceptional Bayesian graphical models using Gibbs Children, 77(4), 453-470. sampling. Proceeding of the 3rd International Boutilier, C., Dean, T., & Hanks, S. (1999). Decision- Workshop on Distributed Statistical Computing, theoretic planning: Structural assumptions and Viena, Austria. computational leverage. Journal of Artificial R Development Core Team. (2014). R: A language and Intelligence Research, 11, 1-94. Available from environment for statistical computing. Vienna, citeseer.ist.psu.edu/boutilier99decisiontheoretic. Austria: R Foundation for Statistical Computing. html Retrieved from http://www.R-project.org Catts, H. W., Hogan, T. P. E., & Fey, M. (2003). Rafferty, A. N., Brunskill, E.B., Griffiths, T. L., & Shafto, Subgrouping poor readers on the basis of P. (2011). Faster teaching by POMDP planning. individual differences in reading-related abilities. Proceedings of the 15th International Conference on Artificial Intelligence in Journal of Learning Disabilities, 36, 151–164. Education (AIED2011). Auckland, New Zealand. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. Raftery, A. E., Lewis, S. M. (1995). The number of (2004). Bayesian Data Analysis. Boca Raton, iterations, convergence diagnostics and generic FL: Chapman and Hall. Metropolis algorithms. In: Gilks, W. R., BMAW 2016 - Page 42 of 59 Spiegelhalter, D. J., Richardson, S., eds. Tokac, Umit. (2016). Using partially observed Markov Practical Markov Chain Monte Carlo. London: decision processes (POMDPs) to implement a Chapman and Hall. response-to-intervention (RTI) framework for Rock, D. A. (2007). Growth in reading performance early reading. Doctoral Dissertation. Florida during the first four years in school. (Report No: State University. RR-07-39). Princeton, NJ: Educational Testing Torgesen, J.K. (2004). Avoiding the devastating Service. downward spiral: The evidence that early Ross, M. S. (1983). Introduction to stochastic dynamic intervention prevents reading failure. American programming. London:Academic Press. Educator, 28, 6-19. Reprinted in the 56th Ross, M. S. (2000). Introduction to Probability Models. Annual Commemorative Booklet of the London: Academic Press. International Dyslexia Association, November, Tierney, L. (1994). Markov Chain for exploring posterior 2005. distributions (with discussion). Ann. Statist. 22: 1701- 1762. BMAW 2016 - Page 43 of 59