The Relationship Between Course Scheduling and Student
                     Performance

                    Seth Poulsen                           Carolyn J. Anderson                       Matthew West
                 University of Illinois at                   University of Illinois at            University of Illinois at
                  Urbana-Champaign                            Urbana-Champaign                     Urbana-Champaign
               sethp3@illinois.edu                             cja@illinois.edu                  mwest@illinois.edu


ABSTRACT                                                                    cational techniques, it is completely impractical to expect
Using 10 years of grade data from a university computer                     that researchers will be able to experiment on more than a
science department we fit a multi-level proportional odds                   single section of a course.
model and find that students earn a higher grade in an af-
ternoon class at 1.15 times the odds for a morning class,                   Unfortunately, experimenting with one or only a few sections
even when controlling for GPA. This finding has implica-                    of a course requires the researcher to make the assumption
tions both for student learning and for experimental studies                that essentially all things are equal about the students tak-
that compare classes without considering the time of day at                 ing the courses and the courses themselves, apart from the
which they are taught. We find that there are no significant                intervention.
trends for student performance based on term when looking
at the department as a whole, though there are such trends                  Despite controlling for as many factors as possible, such as
for certain courses in particular.                                          instructor, course assignments, tests, and more, there are
                                                                            still often factors that lie outside the researcher’s control,
                                                                            such as the day of the week, time of day, term, and location
Keywords                                                                    that their course is scheduled for. Furthermore, students
course scheduling, GPA, research methods, multi-level mod-                  self-select into which section of the course that they want to
els                                                                         take! These variations between sections may be introducing
                                                                            a selection bias threatening the validity of these educational
1.    INTRODUCTION                                                          experiments.
When evaluating the effectiveness of a new instructional
technique or educational intervention, researchers would ide-               In this year’s SIGCSE Technical Symposium alone, there
ally test the intervention on many sections of the same course              were 7 studies which tested new educational practice through
to increase confidence that the intervention is working as                  experimenting with either one or just a few experimental
intended. Data would then be analyzed using multi-level                     and control sections of the same course, operating either ex-
regression to properly treat the variance that naturally hap-               plicitly or implicitly under the “all things equal” assumption
pens between sections of the same course [24].                              [7, 12, 13, 15, 19, 20, 25]. While six of these seven studies
                                                                            clearly stated the year and term of the sections that they
Multi-levels models have been used in computer science ed-                  collected data from, only one of them stated the time of day
ucation, for example, [21], but they are few and far between.               and days of the week on which the sections were held. This
Even thought there have been many multi-institution, multi-                 method of comparing one or only a few sections of a course
national studies in computer science, [2, 6, 8, 9, 10, 18, 23]              in assessing instructional practice is also used in other areas
even they often don’t include enough clusters of data to be                 of discipline-based education research, including chemistry
able to use a multi-level model.                                            [26], physics [14], and materials science [16], to name a few.

As researchers and educators we understand the reasons that                 The desire to check the validity of the “all things equal”
larger studies are not undertaken more often: even planning                 assumption for experimenting with multiple sections of the
an educational intervention experiment with one experimen-                  same course, and discussion with colleagues, led us to the
tal and one control section can be very resource intensive!                 following research questions:
In many situations, especially when first piloting new edu-
                                                                               1. Is student performance in a course related to the time
                                                                                  of day the course is scheduled for?
                                                                               2. Is student performance in a course related to the term?
                                                                               3. Is student performance in a course related to the days
                                                                                  of the week the course is held on?

Copyright c 2020 for this paper by its authors. Use permitted under Cre-       4. Is student performance in a course related to the build-
ative Commons License Attribution 4.0 International (CC BY 4.0).                  ing in which the course is held?
2.   LITERATURE REVIEW
It is has been shown that adolescents struggle to perform
to their fullest potential early in the morning, causing many                                                                                 A/A+
school districts to push back school start times [5, 11]. How-
ever, there has not been enough work done to verify that this


                                                                              20000
effect also holds true for college students [17].

Marbouti et. al. analyzed data from 15 different sections


                                                                              15000
of an introductory university engineering course and found
that due to lower attendance in morning sections, the early


                                                                  Frequency
morning sections of the course significantly under-performed
other sections [17]. To our knowledge, no one so far has                                                                        B


                                                                              10000
examined a data set including more than one course to see                                                                                A−
                                                                                                                                    B+
if this trend holds generally.
                                                                                                                  C        B−


                                                                              5000
Most literature agrees that courses offered in condensed terms                                                        C+
(such as most universities’ summer terms) lead to the stu-                                                  C−
                                                                                          F        D
dents learning the material equally well or even better than                                  D−       D+
courses that are taught over a full length term [1].


                                                                              0
                                                                                      0            1              2             3              4
When it comes to day-of-week scheduling, there is quite a di-
vision in the literature, with some studies finding that spac-                                                   Grades
ing lessons out over the week more helps students learn more,
while others find that students perform just as well when the
course material is presented only one day a week. [4]. Some                   Figure 1: All student grades in the data set.
studies even suggest that the outcome depends on whether
the material requires deep comprehension and analysis, or         taught on the same days of the week every time they were
simply recall [4].                                                taught, and another 30 were only taught on 2 different day
                                                                  configurations (e.g. a class was taught either Monday and
3.   DATA                                                         Wednesday or Tuesday and Thursday, but not in any other
Our grade and course scheduling data was acquired from the        day configurations). Because of this, we chose to leave day
registrar at the University of Illinois at Urbana-Champaign.      of the week considerations out of our analysis entirely.
Because our primary focus is the relationship between course
scheduling and time of day, we removed topics and reading         Figure 1 shows the distribution of student grades from the
courses that were only taught once, as well as courses that       entire data set. As hinted at by Figure 2 and Figure 3,
are not scheduled, such as independent study and senior           and revealed by deeper data exploration, there was sufficient
thesis courses. Summer courses were also removed from the         variance in the performance of students between courses as
data set, to avoid comparing versions of the same course          well as between sections of the same course to justify group-
which were taught on an entirely different time scale, and        ing the data by course and by section for the fitting of the
sometimes even with a different set of instructor expecta-        model, leading to a three-level model.
tions.
                                                                  4.              METHODS
Drops and withdrawals were also removed from the data             We fit the data using a three level model of the following
set. After cleaning the data, we were left with 72,739 stu-       form, where students are indexed by i, sections are indexed
dent grades from 24,705 students across 1,938 sections of         by j, and coursed are indexed by k, and y represents some
101 courses. The grade data consists of letter grades, which      grade (e.g. A, A-, B, etc.):
we converted to grade points for the purposes of fitting the
model (A → 4.0, A- → 3.67, B+ → 3.33, etc.). The overall
mean grade in the data set is 3.100, and the median is 3.33                   • Level 1 (student):
(B+).                                                                                   
                                                                                          P (gradeijk < y)
                                                                                                           
                                                                                     ln                      = β0jk + β1jk GPAijk
                                                                                          P (gradeijk ≥ y)
At the University of Illinois, the Fall term starts in late Au-
gust and ends in mid-December, and the Spring term starts                     • Level 2 (section):
in mid-January and ends in mid-May. We chose from the be-
ginning to treat time of day as a categorical variable, where                             β0jk = γ0k + γ1k Middayjk + γ2k Afternoonjk
courses beginning before 10:00 a.m. were considered “Morn-                                    + γ3k Eveningjk + Ujk
ing,” courses starting between 10:00 a.m. and 2:00 p.m. were
considered “Midday,” courses starting between 2:00 p.m. and                   • Level 3 (course):
5:00 p.m. were considered “Afternoon,” and courses which
                                                                                                            γ0k = δ0 + Wk
started after 5:00 p.m. were considered “Evening” courses.

A look at the data set shows that of the 101 courses offered      Middayjk , Afternoonjk , and Eveningjk are dummy codes
in the computer science department, 54 of them were always        denoting the time of day a course was held, and all of them
                                                                                   being 0 represents a Morning class. Our model assumes that
                                                                                   the error term on the section level, Ujk , and the error term at
                                                                                   the course level, Wk , are multivariate normal distributions
                                                                                   which are independent of one another. After substituting
                                                                                   and gathering the error terms, we obtain the mixed model:
                                                              B

                                                                  B+
            500


                                                                                             P (gradeijk < y)
                                                                                                               
                                                                                    ln                              = δ0 + β1jk GPAijk + γ1k Middayjk
                                                                                             P (gradeijk ≥ y)
            400


                                                                                                                    + γ2k Afternoonjk + γ3k Eveningjk
                                                         B−                                                         + Wk + Ujk
Frequency

            300


                                                                       A−
                                                                                   We used the R (version 3.6.3) package brms [3, 22] to fit
                                                                                   the model using Bayesian estimation. We used brms with
            200


                                                                                   default priors, 3 chains, 1500 warm-ups, and 3000 iterations.
                                                                            A/A+   After examining the R̂ values and trace plots, we concluded
                                                    C+                             that the model converged.
            100


                                                C                                  We also fit a version of the model including a dummy code
                        F        D   D+
                            D−            C−                                       for term (Fall vs. Spring) and found that there was no sig-
            0


                                                                                   nificant general trend for the relationship between term and
                    0            1              2             3              4     course. However, fitting similar models for some individual
                                               Grades                              courses revealed that some courses do have significant differ-
                                                                                   ences in performance between semesters, with some courses
                                                                                   having better performance in the Fall and some having bet-
                  Figure 2: Distribution of section averages                       ter performance in the Spring.

                                                                                   Finally, we fit a version of the model including a dummy code
                                                                                   for whether or not the section was held in the computer sci-
                                                                                   ence department building, with the hypothesis that sections
                                                                                   held in the computer science department building would be
                                                                                   more desirable and would thus fill up with more responsible
                                                                                   students who registered on time. We found no significant re-
                                                                                   lationship between student performance and which building
                                                                                   the course was held in.
            30


                                                                  B+
                                                              B
                                                                                   5.        RESULTS
                                                                                   The estimated parameters of the final model are listed in
            25


                                                                                   Table 1. This model allows us to estimate the relative prob-
                                                                       A−          ability that a student will receive each letter grade, given
                                                                                   their cumulative GPA, the course and section of the course,
            20


                                                                                   and the time of day that the course was scheduled. The
                                                                                   probability of a higher grade increases for higher values of
Frequency

            15


                                                                                   GPA. The later in day, the probability of a higher grade
                                                                                   increases. According to the model, the odds that a student
                                                         B−
                                                                                   receives a higher grade in an afternoon class (based on model
            10


                                                                            A/A+
                                                                                   fit information from Table 1) are e0.14 = 1.15 times the odds
                                                                                   for a student with the same GPA taking the same class in
                                                                                   the morning. Additionally, holding all other variable con-
            5


                                                                                   stant, the odds of a higher grade in an evening class are
                                 D   D+         C                                  e0.17 = 1.19 times the odds in a morning class.
                        F   D−            C−        C+
            0


                                                                                   To allow an interpretation of the effect size in grade units
                    0            1              2             3              4
                                                                                   rather than only as probabilities, we used the model to sim-
                                               Grades                              ulate what the average grade over all data points would be
                                                                                   if all courses in the department were offered at the same
                                                                                   time of day. The results, shown in Table 2, show that stu-
                  Figure 3: Distribution of course averages                        dents perform 0.04 and 0.05 grade points better in afternoon
                                                                                   and evening classes, respectively, than they do in morning
                                                                                   classes. Figure 4 helps us to visualize that student perfor-
                                                                                   mance is actually monotonically increasing throughout the
                                         Estimate    Est. Error     Lower-95%              Upper-95%             Significance
                                                                  Credible Interval      Credible Interval
          Population-Level Effects:
             Intercept (D-)          4.68                0.15                 4.40              4.97                 ***
             Intercept (D)           5.09                0.15                 4.81              5.38                 ***
             Intercept (D+)          5.94                0.15                 5.66              6.23                 ***
             Intercept (C-)          6.27                0.15                 5.98              6.56                 ***
             Intercept (C)           6.84                0.15                 6.56              7.14                 ***
             Intercept (C+)          7.83                0.15                 7.54              8.12                 ***
             Intercept (B-)          8.35                0.15                 8.07              8.65                 ***
             Intercept (B)           8.98                0.15                 8.69              9.28                 ***
             Intercept (B+)         10.07                0.15                 9.78             10.37                 ***
             Intercept (A-)         10.81                0.15                10.53             11.12                 ***
             Intercept (A/A+)       11.70                0.15                11.41             12.01                 ***
             Cumulative GPA          3.26                0.02                 3.23              3.30                 ***
             Afternoon               0.14                0.06                 0.02              0.26                  *
             Evening                 0.17                0.11                -0.05              0.39
             Midday                  0.06                0.06                -0.06              0.17
          Group-Level Effects:
             sd(Course Intercept)    1.15                0.11                0.95              1.37
             sd(Section Intercept)   0.61                0.01                0.58              0.64
                                    Significance codes: 0 < *** < 0.001 < ** < 0.01 < * < 0.05

                         Table 1: Estimated Paramaters of the Proportional Odds Model.


  Time         Average Grade      Difference from Morning
  Morning          3.075                       -
  Midday           3.092                    0.018
  Afternoon        3.118                    0.043
  Evening          3.129                    0.054                            3.150

Table 2: Simulated average grade using the model,
if all classes were offered at the same time of day.
                                                                             3.125
day, with the worst performance in morning classes, and the
best performance in evening classes.
                                                                     Grade


It is also important to note that there is a large variance                  3.100
in grades between sections and courses, so in addition to
the general trends, it appears there is often large variation
between any two sections of a given course. In Figure 5, we
visualize the relative amount of uncertainty at the section
                                                                             3.075
(Ujk ) and course (Wk ) levels. It appears that much more
variance in course performance comes from the course rather
than the section level, but there is still a significant amount
of unexplained variance between sections in our model.
                                                                             3.050
6. DISCUSSION                                                                        Morning     Midday      Afternoon     Evening
6.1 Implications                                                                                          Time
When planning experiments on multiple sections of the same
course, researchers should be aware of the differences be-          Figure 4: Simulated average grade if all courses were
tween sections that may have an influence on the student’s          offered at the same time of day. Note that the y-
grades independent of the instructional techniques used, and        axis does not start at 0. The 95% confidence inter-
plan accordingly. If they must, for some reason or another,         vals shown are basic bootstrap confidence intervals
conduct an educational experiment between sections that             calculated using the distribution of student grades
are taught at different times of day, or under other differing      predicted by the model.
circumstances, they should be aware of the typical variance
in grades that can be brought on by such circumstances, and
ensure that the effect size of the intervention they are trying
to study is significantly larger. Alternatively, if one section
                                                                   Another limitation is that our data come from a single de-
                                                                   partment at a single university. Replications of our study us-
                                                                   ing data from other universities will be useful to corroborate
                                                                   our findings and give education researchers more confidence
                                                                   in the way they plan their experiments.
                      1.0
                                                                   Additionally, our results should not be interpreted to mean
 Standard Deviation


                                                                   that a particular student will earn higher grades if they regis-
                                                                   ter for afternoon classes instead of morning, because we are
                                                                   using observational data where students have self-selected
                                                                   into courses, leading to selection bias. Our study is unable to
                                                                   make any statement about why student performance varies
                      0.5                                          by time of day, but a great area of future work would be
                                                                   to investigate why these performance differences exist, and
                                                                   what types of interventions may be able to help mitigate
                                                                   them.


                                                                   7.   CONCLUSION
                      0.0                                          We find that in the computer science department, the odds
                                                                   of a student receiving a higher grade in an afternoon class
                                    Section            Course
                                                                   is 1.15 times the odds of a student with the same GPA in a
                                              Source
                                                                   morning class earning a higher grade. According to simula-
                                                                   tions run using our model, this difference amounts to an av-
Figure 5: Standard deviation of the section level                  erage grade difference of 0.04 grade points between morning
(Ujk ) and course level (Wk ) error terms from the                 and afternoon classes. There is also a large unexplained vari-
model, with the 95% credible intervals provided by                 ance in grades between sections of the same course. Based
the Bayesian estimation.                                           on these findings and prior work in this area, we assert that
                                                                   the course scheduling information is an important piece of
                                                                   data which should be included in studies that make compar-
is expected to have a higher grade due to documented rea-          isons between treatments on different course sections. Based
sons (i.e. being in the afternoon vs. in the morning), they        on our data set, we were not able to investigate trends in stu-
could use the expected-to-be better performing section as          dent performance based on which days of the week courses
the control, and the expected-to-be worse section as the ex-       were scheduled for, and we found no overall trends for the
perimental group, counting on the intervention to have a           term a course was offered in, or for the classroom building
large enough effect to overcome the small negative impact          in which it was offered. Replication of our work, as well as
of scheduling.                                                     work to answer the research questions which were unable to
                                                                   answer given our data set, would be great future contribu-
Despite what we do know about trends in student perfor-            tions to the literature.
mance based on scheduling, it is critical to remember that
all the above statistics only show general trends, and can
not tell us about the relationship between any particular two      8.   ACKNOWLEDGMENTS
course or section instances. Researchers should do all they        We would like to thanks the computer science education re-
can to ensure “all things equal” between their experimental        search group at the University of Illinois at Urbana-Champaign
and control groups, and should document all the information        for useful feedback and suggested references for this work.
that they can about their course sections in the interest of
good science, i.e. interpretability and reproducibility of their   9.   REFERENCES
work. They should also be aware of and document the per-            [1] A. M. Austin and L. Gustafson. Impact of course
formance trends of the course they are experimenting with               length on student learning. Journal of Economics and
in particular, as some courses have much larger differences             Finance Education, 5(1):26–37, 2006.
term-to-term or based on time of day than others do.                [2] K. Blaha, A. Monge, D. Sanders, B. Simon, and
                                                                        T. VanDeGrift. Do students recognize ambiguity in
6.2                         Limitations                                 software design? a multi-national, multi-institutional
As we have discussed, our study was limited by the data we              report. In Proceedings. 27th International Conference
were able to receive from the registrar at the University of            on Software Engineering, 2005. ICSE 2005., pages
Illinois at Urbana-Champaign, and the way that the com-                 615–616. IEEE, 2005.
puter science department decided to schedule the courses,           [3] P.-C. BÃijrkner. brms: An R package for Bayesian
making it impossible for us to draw any conclusions about               multilevel models using Stan. Journal of Statistical
day-of-week effects on scheduling. We also did not have ac-             Software, 80(1):1–28, 2017.
cess to attendance data, making it impossible to verify if          [4] L. G. Carrington. The impact of course scheduling on
morning classes performed more poorly for the same reason               student success in intermediate accounting. American
as in [17], namely, that students miss morning classes more             Journal of Business Education (AJBE), 3(4):51–60,
often than they miss afternoon and evening classes.                     2010.
 [5] M. A. Carskadon, C. Vieira, and C. Acebo.                     science curriculum with computational modules.
     Association between puberty and delayed phase                 Journal of Materials Education, 38(3-4):161–174, 2016.
     preference. Sleep, 16(3):258–262, 1993.                  [17] F. Marbouti, A. Shafaat, J. Ulas, and H. A.
 [6] T. Clear, J. Whalley, P. Robbins, A. Philpott,                Diefes-Dux. Relationship between time of class and
     A. Eckerdal, and M.-J. Laakso. Report on the final            student grades in an active learning course. Journal of
     bracelet workshop: Auckland university of technology,         Engineering Education, 107(3):468–490, 2018.
     september 2010. 2011.                                    [18] M. McCracken, V. Almstrum, D. Diaz, M. Guzdial,
 [7] J. Corley, A. Stanescu, L. Baumstark, and M. C.               D. Hagan, Y. B.-D. Kolikant, C. Laxer, L. Thomas,
     Orsega. Paper or ide? the impact of exam format on            I. Utting, and T. Wilusz. A multi-national,
     student performance in a cs1 course. In Proceedings of        multi-institutional study of assessment of
     the 51st ACM Technical Symposium on Computer                  programming skills of first-year cs students. In
     Science Education, SIGCSE âĂŹ20, page 706âĂŞ712,        Working group reports from ITiCSE on Innovation
     New York, NY, USA, 2020. Association for                      and technology in computer science education, pages
     Computing Machinery.                                          125–180. 2001.
 [8] S. Fincher, B. Baker, I. Box, Q. Cutts, M. de Raadt,     [19] A. Naeem Syeda, R. Engineer, and B. Simion.
     P. Haden, J. Hamer, M. Hamilton, R. Lister, and               Analyzing the effects of active learning classrooms in
     M. Petre. Programmed to succeed?: A multi-national,           cs2. In Proceedings of the 51st ACM Technical
     multi-institutional study of introductory programming         Symposium on Computer Science Education, SIGCSE
     courses. 2005.                                                âĂŹ20, page 93âĂŞ99, New York, NY, USA, 2020.
 [9] S. Fincher, R. Lister, T. Clear, A. Robins,                   Association for Computing Machinery.
     J. Tenenberg, and M. Petre. Multi-institutional,         [20] J. Parham-Mocello and M. Erwig. Does story
     multi-national studies in csed research: some design          programming prepare for coding? In Proceedings of
     considerations and trade-offs. In Proceedings of the          the 51st ACM Technical Symposium on Computer
     first international workshop on Computing education           Science Education, SIGCSE âĂŹ20, page 100âĂŞ106,
     research, pages 111–121, 2005.                                New York, NY, USA, 2020. Association for
[10] S. Fincher, M. Petre, J. Tenenberg, K. Blaha,                 Computing Machinery.
     D. Bouvier, T.-Y. Chen, D. Chinn, S. Cooper,             [21] M. S. Peteranetz and L.-K. Soh. A multi-level analysis
     A. Eckerdal, and H. Johnson. A multi-national,                of the relationship between instructional practices and
     multi-institutional study of student-generated                retention in computer science. In Proceedings of the
     software. Kolin Kolistelut-Koli Calling Proceedings,          51st ACM Technical Symposium on Computer Science
     pages 20–28, 2004.                                            Education, SIGCSE âĂŹ20, page 37âĂŞ43, New York,
[11] M. H. Hagenauer, J. I. Perryman, T. M. Lee, and               NY, USA, 2020. Association for Computing
     M. A. Carskadon. Adolescent changes in the                    Machinery.
     homeostatic and circadian regulation of sleep.           [22] R Core Team. R: A Language and Environment for
     Developmental neuroscience, 31(4):276–284, 2009.              Statistical Computing. R Foundation for Statistical
[12] G. L. Herman and S. Azad. A comparison of peer                Computing, Vienna, Austria, 2020.
     instruction and collaborative problem solving in a       [23] K. Sanders, S. Fincher, D. Bouvier, G. Lewandowski,
     computer architecture course. In Proceedings of the           B. Morrison, L. Murphy, M. Petre, B. Richards,
     51st ACM Technical Symposium on Computer Science              J. Tenenberg, and L. Thomas. A multi-institutional,
     Education, SIGCSE âĂŹ20, page 461âĂŞ467, New            multinational study of programming concepts using
     York, NY, USA, 2020. Association for Computing                card sort data. Expert Systems, 22(3):121–128, 2005.
     Machinery.                                               [24] T. A. Snijders and R. J. Bosker. Multilevel analysis:
[13] S. Krause-Levy, L. Porter, B. Simon, and C. Alvarado.         An introduction to basic and advanced multilevel
     Investigating the impact of employing multiple                modeling. Sage, 2011.
     interventions in a cs1 course. In Proceedings of the     [25] G. Sprint and E. Fox. Improving student study choices
     51st ACM Technical Symposium on Computer Science              in cs1 with gamification and flipped classrooms. In
     Education, SIGCSE âĂŹ20, page 1082âĂŞ1088, New          Proceedings of the 51st ACM Technical Symposium on
     York, NY, USA, 2020. Association for Computing                Computer Science Education, SIGCSE âĂŹ20, page
     Machinery.                                                    773âĂŞ779, New York, NY, USA, 2020. Association
[14] E. Kuo, M. M. Hull, A. Elby, and A. Gupta.                    for Computing Machinery.
     Mathematical sensemaking as seeking coherence            [26] M. Stieff, B. L. Dixon, M. Ryu, B. C. Kumi, and
     between calculations and concepts: Instruction and            M. Hegarty. Strategy training eliminates sex
     assessments for introductory physics. arXiv preprint          differences in spatial problem solving in a stem
     arXiv:1903.05596, 2019.                                       domain. Journal of Educational Psychology,
[15] A. Lionelle, J. Grinslad, and J. R. Beveridge. Cs 0:          106(2):390, 2014.
     Culture and coding. In Proceedings of the 51st ACM
     Technical Symposium on Computer Science Education,
     SIGCSE âĂŹ20, page 227âĂŞ233, New York, NY,
     USA, 2020. Association for Computing Machinery.
[16] R. Mansbach, A. Ferguson, K. Kilian, J. Krogstad,
     C. Leal, A. Schleife, D. Trinkle, M. West, and
     G. Herman. Reforming an undergraduate materials