The Relationship Between Course Scheduling and Student Performance Seth Poulsen Carolyn J. Anderson Matthew West University of Illinois at University of Illinois at University of Illinois at Urbana-Champaign Urbana-Champaign Urbana-Champaign sethp3@illinois.edu cja@illinois.edu mwest@illinois.edu ABSTRACT cational techniques, it is completely impractical to expect Using 10 years of grade data from a university computer that researchers will be able to experiment on more than a science department we fit a multi-level proportional odds single section of a course. model and find that students earn a higher grade in an af- ternoon class at 1.15 times the odds for a morning class, Unfortunately, experimenting with one or only a few sections even when controlling for GPA. This finding has implica- of a course requires the researcher to make the assumption tions both for student learning and for experimental studies that essentially all things are equal about the students tak- that compare classes without considering the time of day at ing the courses and the courses themselves, apart from the which they are taught. We find that there are no significant intervention. trends for student performance based on term when looking at the department as a whole, though there are such trends Despite controlling for as many factors as possible, such as for certain courses in particular. instructor, course assignments, tests, and more, there are still often factors that lie outside the researcher’s control, such as the day of the week, time of day, term, and location Keywords that their course is scheduled for. Furthermore, students course scheduling, GPA, research methods, multi-level mod- self-select into which section of the course that they want to els take! These variations between sections may be introducing a selection bias threatening the validity of these educational 1. INTRODUCTION experiments. When evaluating the effectiveness of a new instructional technique or educational intervention, researchers would ide- In this year’s SIGCSE Technical Symposium alone, there ally test the intervention on many sections of the same course were 7 studies which tested new educational practice through to increase confidence that the intervention is working as experimenting with either one or just a few experimental intended. Data would then be analyzed using multi-level and control sections of the same course, operating either ex- regression to properly treat the variance that naturally hap- plicitly or implicitly under the “all things equal” assumption pens between sections of the same course [24]. [7, 12, 13, 15, 19, 20, 25]. While six of these seven studies clearly stated the year and term of the sections that they Multi-levels models have been used in computer science ed- collected data from, only one of them stated the time of day ucation, for example, [21], but they are few and far between. and days of the week on which the sections were held. This Even thought there have been many multi-institution, multi- method of comparing one or only a few sections of a course national studies in computer science, [2, 6, 8, 9, 10, 18, 23] in assessing instructional practice is also used in other areas even they often don’t include enough clusters of data to be of discipline-based education research, including chemistry able to use a multi-level model. [26], physics [14], and materials science [16], to name a few. As researchers and educators we understand the reasons that The desire to check the validity of the “all things equal” larger studies are not undertaken more often: even planning assumption for experimenting with multiple sections of the an educational intervention experiment with one experimen- same course, and discussion with colleagues, led us to the tal and one control section can be very resource intensive! following research questions: In many situations, especially when first piloting new edu- 1. Is student performance in a course related to the time of day the course is scheduled for? 2. Is student performance in a course related to the term? 3. Is student performance in a course related to the days of the week the course is held on? Copyright c 2020 for this paper by its authors. Use permitted under Cre- 4. Is student performance in a course related to the build- ative Commons License Attribution 4.0 International (CC BY 4.0). ing in which the course is held? 2. LITERATURE REVIEW It is has been shown that adolescents struggle to perform to their fullest potential early in the morning, causing many A/A+ school districts to push back school start times [5, 11]. How- ever, there has not been enough work done to verify that this 20000 effect also holds true for college students [17]. Marbouti et. al. analyzed data from 15 different sections 15000 of an introductory university engineering course and found that due to lower attendance in morning sections, the early Frequency morning sections of the course significantly under-performed other sections [17]. To our knowledge, no one so far has B 10000 examined a data set including more than one course to see A− B+ if this trend holds generally. C B− 5000 Most literature agrees that courses offered in condensed terms C+ (such as most universities’ summer terms) lead to the stu- C− F D dents learning the material equally well or even better than D− D+ courses that are taught over a full length term [1]. 0 0 1 2 3 4 When it comes to day-of-week scheduling, there is quite a di- vision in the literature, with some studies finding that spac- Grades ing lessons out over the week more helps students learn more, while others find that students perform just as well when the course material is presented only one day a week. [4]. Some Figure 1: All student grades in the data set. studies even suggest that the outcome depends on whether the material requires deep comprehension and analysis, or taught on the same days of the week every time they were simply recall [4]. taught, and another 30 were only taught on 2 different day configurations (e.g. a class was taught either Monday and 3. DATA Wednesday or Tuesday and Thursday, but not in any other Our grade and course scheduling data was acquired from the day configurations). Because of this, we chose to leave day registrar at the University of Illinois at Urbana-Champaign. of the week considerations out of our analysis entirely. Because our primary focus is the relationship between course scheduling and time of day, we removed topics and reading Figure 1 shows the distribution of student grades from the courses that were only taught once, as well as courses that entire data set. As hinted at by Figure 2 and Figure 3, are not scheduled, such as independent study and senior and revealed by deeper data exploration, there was sufficient thesis courses. Summer courses were also removed from the variance in the performance of students between courses as data set, to avoid comparing versions of the same course well as between sections of the same course to justify group- which were taught on an entirely different time scale, and ing the data by course and by section for the fitting of the sometimes even with a different set of instructor expecta- model, leading to a three-level model. tions. 4. METHODS Drops and withdrawals were also removed from the data We fit the data using a three level model of the following set. After cleaning the data, we were left with 72,739 stu- form, where students are indexed by i, sections are indexed dent grades from 24,705 students across 1,938 sections of by j, and coursed are indexed by k, and y represents some 101 courses. The grade data consists of letter grades, which grade (e.g. A, A-, B, etc.): we converted to grade points for the purposes of fitting the model (A → 4.0, A- → 3.67, B+ → 3.33, etc.). The overall mean grade in the data set is 3.100, and the median is 3.33 • Level 1 (student): (B+).  P (gradeijk < y)  ln = β0jk + β1jk GPAijk P (gradeijk ≥ y) At the University of Illinois, the Fall term starts in late Au- gust and ends in mid-December, and the Spring term starts • Level 2 (section): in mid-January and ends in mid-May. We chose from the be- ginning to treat time of day as a categorical variable, where β0jk = γ0k + γ1k Middayjk + γ2k Afternoonjk courses beginning before 10:00 a.m. were considered “Morn- + γ3k Eveningjk + Ujk ing,” courses starting between 10:00 a.m. and 2:00 p.m. were considered “Midday,” courses starting between 2:00 p.m. and • Level 3 (course): 5:00 p.m. were considered “Afternoon,” and courses which γ0k = δ0 + Wk started after 5:00 p.m. were considered “Evening” courses. A look at the data set shows that of the 101 courses offered Middayjk , Afternoonjk , and Eveningjk are dummy codes in the computer science department, 54 of them were always denoting the time of day a course was held, and all of them being 0 represents a Morning class. Our model assumes that the error term on the section level, Ujk , and the error term at the course level, Wk , are multivariate normal distributions which are independent of one another. After substituting and gathering the error terms, we obtain the mixed model: B B+ 500 P (gradeijk < y)   ln = δ0 + β1jk GPAijk + γ1k Middayjk P (gradeijk ≥ y) 400 + γ2k Afternoonjk + γ3k Eveningjk B− + Wk + Ujk Frequency 300 A− We used the R (version 3.6.3) package brms [3, 22] to fit the model using Bayesian estimation. We used brms with 200 default priors, 3 chains, 1500 warm-ups, and 3000 iterations. A/A+ After examining the R̂ values and trace plots, we concluded C+ that the model converged. 100 C We also fit a version of the model including a dummy code F D D+ D− C− for term (Fall vs. Spring) and found that there was no sig- 0 nificant general trend for the relationship between term and 0 1 2 3 4 course. However, fitting similar models for some individual Grades courses revealed that some courses do have significant differ- ences in performance between semesters, with some courses having better performance in the Fall and some having bet- Figure 2: Distribution of section averages ter performance in the Spring. Finally, we fit a version of the model including a dummy code for whether or not the section was held in the computer sci- ence department building, with the hypothesis that sections held in the computer science department building would be more desirable and would thus fill up with more responsible students who registered on time. We found no significant re- lationship between student performance and which building the course was held in. 30 B+ B 5. RESULTS The estimated parameters of the final model are listed in 25 Table 1. This model allows us to estimate the relative prob- A− ability that a student will receive each letter grade, given their cumulative GPA, the course and section of the course, 20 and the time of day that the course was scheduled. The probability of a higher grade increases for higher values of Frequency 15 GPA. The later in day, the probability of a higher grade increases. According to the model, the odds that a student B− receives a higher grade in an afternoon class (based on model 10 A/A+ fit information from Table 1) are e0.14 = 1.15 times the odds for a student with the same GPA taking the same class in the morning. Additionally, holding all other variable con- 5 stant, the odds of a higher grade in an evening class are D D+ C e0.17 = 1.19 times the odds in a morning class. F D− C− C+ 0 To allow an interpretation of the effect size in grade units 0 1 2 3 4 rather than only as probabilities, we used the model to sim- Grades ulate what the average grade over all data points would be if all courses in the department were offered at the same time of day. The results, shown in Table 2, show that stu- Figure 3: Distribution of course averages dents perform 0.04 and 0.05 grade points better in afternoon and evening classes, respectively, than they do in morning classes. Figure 4 helps us to visualize that student perfor- mance is actually monotonically increasing throughout the Estimate Est. Error Lower-95% Upper-95% Significance Credible Interval Credible Interval Population-Level Effects: Intercept (D-) 4.68 0.15 4.40 4.97 *** Intercept (D) 5.09 0.15 4.81 5.38 *** Intercept (D+) 5.94 0.15 5.66 6.23 *** Intercept (C-) 6.27 0.15 5.98 6.56 *** Intercept (C) 6.84 0.15 6.56 7.14 *** Intercept (C+) 7.83 0.15 7.54 8.12 *** Intercept (B-) 8.35 0.15 8.07 8.65 *** Intercept (B) 8.98 0.15 8.69 9.28 *** Intercept (B+) 10.07 0.15 9.78 10.37 *** Intercept (A-) 10.81 0.15 10.53 11.12 *** Intercept (A/A+) 11.70 0.15 11.41 12.01 *** Cumulative GPA 3.26 0.02 3.23 3.30 *** Afternoon 0.14 0.06 0.02 0.26 * Evening 0.17 0.11 -0.05 0.39 Midday 0.06 0.06 -0.06 0.17 Group-Level Effects: sd(Course Intercept) 1.15 0.11 0.95 1.37 sd(Section Intercept) 0.61 0.01 0.58 0.64 Significance codes: 0 < *** < 0.001 < ** < 0.01 < * < 0.05 Table 1: Estimated Paramaters of the Proportional Odds Model. Time Average Grade Difference from Morning Morning 3.075 - Midday 3.092 0.018 Afternoon 3.118 0.043 Evening 3.129 0.054 3.150 Table 2: Simulated average grade using the model, if all classes were offered at the same time of day. 3.125 day, with the worst performance in morning classes, and the best performance in evening classes. Grade It is also important to note that there is a large variance 3.100 in grades between sections and courses, so in addition to the general trends, it appears there is often large variation between any two sections of a given course. In Figure 5, we visualize the relative amount of uncertainty at the section 3.075 (Ujk ) and course (Wk ) levels. It appears that much more variance in course performance comes from the course rather than the section level, but there is still a significant amount of unexplained variance between sections in our model. 3.050 6. DISCUSSION Morning Midday Afternoon Evening 6.1 Implications Time When planning experiments on multiple sections of the same course, researchers should be aware of the differences be- Figure 4: Simulated average grade if all courses were tween sections that may have an influence on the student’s offered at the same time of day. Note that the y- grades independent of the instructional techniques used, and axis does not start at 0. The 95% confidence inter- plan accordingly. If they must, for some reason or another, vals shown are basic bootstrap confidence intervals conduct an educational experiment between sections that calculated using the distribution of student grades are taught at different times of day, or under other differing predicted by the model. circumstances, they should be aware of the typical variance in grades that can be brought on by such circumstances, and ensure that the effect size of the intervention they are trying to study is significantly larger. Alternatively, if one section Another limitation is that our data come from a single de- partment at a single university. Replications of our study us- ing data from other universities will be useful to corroborate our findings and give education researchers more confidence in the way they plan their experiments. 1.0 Additionally, our results should not be interpreted to mean Standard Deviation that a particular student will earn higher grades if they regis- ter for afternoon classes instead of morning, because we are using observational data where students have self-selected into courses, leading to selection bias. Our study is unable to make any statement about why student performance varies 0.5 by time of day, but a great area of future work would be to investigate why these performance differences exist, and what types of interventions may be able to help mitigate them. 7. CONCLUSION 0.0 We find that in the computer science department, the odds of a student receiving a higher grade in an afternoon class Section Course is 1.15 times the odds of a student with the same GPA in a Source morning class earning a higher grade. According to simula- tions run using our model, this difference amounts to an av- Figure 5: Standard deviation of the section level erage grade difference of 0.04 grade points between morning (Ujk ) and course level (Wk ) error terms from the and afternoon classes. There is also a large unexplained vari- model, with the 95% credible intervals provided by ance in grades between sections of the same course. Based the Bayesian estimation. on these findings and prior work in this area, we assert that the course scheduling information is an important piece of data which should be included in studies that make compar- is expected to have a higher grade due to documented rea- isons between treatments on different course sections. Based sons (i.e. being in the afternoon vs. in the morning), they on our data set, we were not able to investigate trends in stu- could use the expected-to-be better performing section as dent performance based on which days of the week courses the control, and the expected-to-be worse section as the ex- were scheduled for, and we found no overall trends for the perimental group, counting on the intervention to have a term a course was offered in, or for the classroom building large enough effect to overcome the small negative impact in which it was offered. Replication of our work, as well as of scheduling. work to answer the research questions which were unable to answer given our data set, would be great future contribu- Despite what we do know about trends in student perfor- tions to the literature. mance based on scheduling, it is critical to remember that all the above statistics only show general trends, and can not tell us about the relationship between any particular two 8. ACKNOWLEDGMENTS course or section instances. Researchers should do all they We would like to thanks the computer science education re- can to ensure “all things equal” between their experimental search group at the University of Illinois at Urbana-Champaign and control groups, and should document all the information for useful feedback and suggested references for this work. that they can about their course sections in the interest of good science, i.e. interpretability and reproducibility of their 9. REFERENCES work. They should also be aware of and document the per- [1] A. M. Austin and L. Gustafson. Impact of course formance trends of the course they are experimenting with length on student learning. Journal of Economics and in particular, as some courses have much larger differences Finance Education, 5(1):26–37, 2006. term-to-term or based on time of day than others do. [2] K. Blaha, A. Monge, D. Sanders, B. Simon, and T. VanDeGrift. Do students recognize ambiguity in 6.2 Limitations software design? a multi-national, multi-institutional As we have discussed, our study was limited by the data we report. In Proceedings. 27th International Conference were able to receive from the registrar at the University of on Software Engineering, 2005. ICSE 2005., pages Illinois at Urbana-Champaign, and the way that the com- 615–616. IEEE, 2005. puter science department decided to schedule the courses, [3] P.-C. BÃijrkner. brms: An R package for Bayesian making it impossible for us to draw any conclusions about multilevel models using Stan. Journal of Statistical day-of-week effects on scheduling. We also did not have ac- Software, 80(1):1–28, 2017. cess to attendance data, making it impossible to verify if [4] L. G. Carrington. The impact of course scheduling on morning classes performed more poorly for the same reason student success in intermediate accounting. American as in [17], namely, that students miss morning classes more Journal of Business Education (AJBE), 3(4):51–60, often than they miss afternoon and evening classes. 2010. [5] M. A. Carskadon, C. Vieira, and C. Acebo. science curriculum with computational modules. Association between puberty and delayed phase Journal of Materials Education, 38(3-4):161–174, 2016. preference. Sleep, 16(3):258–262, 1993. [17] F. Marbouti, A. Shafaat, J. Ulas, and H. A. [6] T. Clear, J. Whalley, P. Robbins, A. Philpott, Diefes-Dux. Relationship between time of class and A. Eckerdal, and M.-J. Laakso. Report on the final student grades in an active learning course. Journal of bracelet workshop: Auckland university of technology, Engineering Education, 107(3):468–490, 2018. september 2010. 2011. [18] M. McCracken, V. Almstrum, D. Diaz, M. Guzdial, [7] J. Corley, A. Stanescu, L. Baumstark, and M. C. D. Hagan, Y. B.-D. Kolikant, C. Laxer, L. Thomas, Orsega. Paper or ide? the impact of exam format on I. Utting, and T. Wilusz. A multi-national, student performance in a cs1 course. In Proceedings of multi-institutional study of assessment of the 51st ACM Technical Symposium on Computer programming skills of first-year cs students. In Science Education, SIGCSE âĂŹ20, page 706âĂŞ712, Working group reports from ITiCSE on Innovation New York, NY, USA, 2020. Association for and technology in computer science education, pages Computing Machinery. 125–180. 2001. [8] S. Fincher, B. Baker, I. Box, Q. Cutts, M. de Raadt, [19] A. Naeem Syeda, R. Engineer, and B. Simion. P. Haden, J. Hamer, M. Hamilton, R. Lister, and Analyzing the effects of active learning classrooms in M. Petre. Programmed to succeed?: A multi-national, cs2. In Proceedings of the 51st ACM Technical multi-institutional study of introductory programming Symposium on Computer Science Education, SIGCSE courses. 2005. âĂŹ20, page 93âĂŞ99, New York, NY, USA, 2020. [9] S. Fincher, R. Lister, T. Clear, A. Robins, Association for Computing Machinery. J. Tenenberg, and M. Petre. Multi-institutional, [20] J. Parham-Mocello and M. Erwig. Does story multi-national studies in csed research: some design programming prepare for coding? In Proceedings of considerations and trade-offs. In Proceedings of the the 51st ACM Technical Symposium on Computer first international workshop on Computing education Science Education, SIGCSE âĂŹ20, page 100âĂŞ106, research, pages 111–121, 2005. New York, NY, USA, 2020. Association for [10] S. Fincher, M. Petre, J. Tenenberg, K. Blaha, Computing Machinery. D. Bouvier, T.-Y. Chen, D. Chinn, S. Cooper, [21] M. S. Peteranetz and L.-K. Soh. A multi-level analysis A. Eckerdal, and H. Johnson. A multi-national, of the relationship between instructional practices and multi-institutional study of student-generated retention in computer science. In Proceedings of the software. Kolin Kolistelut-Koli Calling Proceedings, 51st ACM Technical Symposium on Computer Science pages 20–28, 2004. Education, SIGCSE âĂŹ20, page 37âĂŞ43, New York, [11] M. H. Hagenauer, J. I. Perryman, T. M. Lee, and NY, USA, 2020. Association for Computing M. A. Carskadon. Adolescent changes in the Machinery. homeostatic and circadian regulation of sleep. [22] R Core Team. R: A Language and Environment for Developmental neuroscience, 31(4):276–284, 2009. Statistical Computing. R Foundation for Statistical [12] G. L. Herman and S. Azad. A comparison of peer Computing, Vienna, Austria, 2020. instruction and collaborative problem solving in a [23] K. Sanders, S. Fincher, D. Bouvier, G. Lewandowski, computer architecture course. In Proceedings of the B. Morrison, L. Murphy, M. Petre, B. Richards, 51st ACM Technical Symposium on Computer Science J. Tenenberg, and L. Thomas. A multi-institutional, Education, SIGCSE âĂŹ20, page 461âĂŞ467, New multinational study of programming concepts using York, NY, USA, 2020. Association for Computing card sort data. Expert Systems, 22(3):121–128, 2005. Machinery. [24] T. A. Snijders and R. J. Bosker. Multilevel analysis: [13] S. Krause-Levy, L. Porter, B. Simon, and C. Alvarado. An introduction to basic and advanced multilevel Investigating the impact of employing multiple modeling. Sage, 2011. interventions in a cs1 course. In Proceedings of the [25] G. Sprint and E. Fox. Improving student study choices 51st ACM Technical Symposium on Computer Science in cs1 with gamification and flipped classrooms. In Education, SIGCSE âĂŹ20, page 1082âĂŞ1088, New Proceedings of the 51st ACM Technical Symposium on York, NY, USA, 2020. Association for Computing Computer Science Education, SIGCSE âĂŹ20, page Machinery. 773âĂŞ779, New York, NY, USA, 2020. Association [14] E. Kuo, M. M. Hull, A. Elby, and A. Gupta. for Computing Machinery. Mathematical sensemaking as seeking coherence [26] M. Stieff, B. L. Dixon, M. Ryu, B. C. Kumi, and between calculations and concepts: Instruction and M. Hegarty. Strategy training eliminates sex assessments for introductory physics. arXiv preprint differences in spatial problem solving in a stem arXiv:1903.05596, 2019. domain. Journal of Educational Psychology, [15] A. Lionelle, J. Grinslad, and J. R. Beveridge. Cs 0: 106(2):390, 2014. Culture and coding. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education, SIGCSE âĂŹ20, page 227âĂŞ233, New York, NY, USA, 2020. Association for Computing Machinery. [16] R. Mansbach, A. Ferguson, K. Kilian, J. Krogstad, C. Leal, A. Schleife, D. Trinkle, M. West, and G. Herman. Reforming an undergraduate materials