=Paper= {{Paper |id=Vol-2328/session3_paper2 |storemode=property |title=Emotional Experience of Students Interacting with a System for Learning Programming |pdfUrl=https://ceur-ws.org/Vol-2328/3_1_paper_14.pdf |volume=Vol-2328 |authors=Thomas James Tiam-Lee,Kaoru Sumi |dblpUrl=https://dblp.org/rec/conf/aaai/Tiam-LeeS19 }} ==Emotional Experience of Students Interacting with a System for Learning Programming== https://ceur-ws.org/Vol-2328/3_1_paper_14.pdf
                          Emotional Experience of Students Interacting with
                                a System for Learning Programming

                                        Thomas James Tiam-Lee and Kaoru Sumi
                                                       Future University Hakodate
                                                        116-2 Kamedanakanocho
                                                      Hakodate, Hokkaido 041-8655


                            Abstract                                     However, understanding affect in complex learning tasks
                                                                      still proves to be challenging. One such task is computer pro-
  This paper discusses the emotional experience of students           gramming. In learning programming, students spend a lot of
  while interacting with a system for solving programming ex-         time writing, testing, and debugging code. Interactions with
  ercises. We collected data from 73 university students who
                                                                      the tutor agent typically occur in less frequency, and displays
  each used the system for 45 minutes. They were also asked to
  provide self-report affect judgments which describe the emo-        of affect through facial expressions tend to be more subtle as
  tions that they experienced at different moments in the ses-        well. Despite this, previous studies have shown that students
  sion. Using this data, we performed an analysis of the emo-         have a rich affective experience of learning-specific emo-
  tional experience of students while interacting with the sys-       tions while learning programming (Bosch, D’Mello, and
  tem content, focusing particularly on the transitions across        Mills 2013; Bosch and D’Mello 2013) - information that
  different emotions. We also analyzed the facial expressions,        could hold much potential for improving programming in-
  pose, and logs in relation to the various emotional states. We      struction.
  believe this study can contribute to the recognition of student        A study by Bosch, Chen, and D’Mello showed that it is
  affect in programming activity, and can potentially be used         difficult to automatically recognize fixed-point affect judg-
  in a variety of applications such as intelligent programming
                                                                      ments in programming sessions by using face features alone.
  tutors.
                                                                      One of the ways to address this is to combine face data
                                                                      with an understanding of the affective experience of students
         Introduction and Related Studies                             while doing the task to improve the recognition of affective
                                                                      states. In programming activity, there is a rich set of data
Recently, there has been great interest in modelling the af-          from the interaction between the student and the system that
fective experience of students while engaging in learning             could be used to help model affect.
tasks. Previous studies have found that positive emotions                In this paper, we discuss a statistical analysis of the af-
such as enjoyment are positively correlated with student              fective experience of students interacting with a system for
achievement, whereas negative emotions such as boredom                programming practice. We focus our discussions on the dis-
as negatively correlated with student achievement (Daniels            tribution of the emotions experienced by the students, how
et al. 2009). Emotions have also been shown to be associated          these emotions transiton from one type to another, and which
with motivation and self-regulated learning (Mega, Ronconi,           features are useful for recognizing these affective states.
and De Beni 2014), as well as the quality of the learning
experience (Cho and Heron 2015). These studies provide                    System Design and Experimental Setup
support for efforts in automatically recognizing the affective
state of students while learning.                                     In this section we discuss the system design and the exper-
   Good affective models of learning-specific emotions can            imental setup for this study. We recruited 38 students from
open up opportunities for intelligent tutoring systems (ITS)          Future University Hakodate in Japan and 35 students from
by allowing them to respond not only to the cognitive state           De La Salle University in the Philippines to participate in
but also the affective state of the student. For example, if          this study. We chose to recruit participants from two differ-
confusion is detected, the ITS may provide an interven-               ent countries not only to increase the amount of data that
tion in the form of a hint. This is referred to as affec-             could be collected but also to investigate if there are simi-
tive tutoring. Previous ITS such as AutoTutor (D’Mello and            larities or differences between the two groups. All students
Graesser 2012a), MetaTutor (Jaques et al. 2014), and FER-             who participated in the study were enrolled in a freshman
MAT (Zatarain-Cabada et al. 2014) have shown the potential            introductory programming course in their respective univer-
of affective tutoring in improving students’ learning across          sities at the time of the experiment.
various domains.                                                         Each student interacted with a system in which they have
                                                                      to solve a series of coding exercises. A screenshot of the
Copyright c 2019, Association for the Advancement of Artificial       system is shown in Figure 1
Intelligence (www.aaai.org). All rights reserved.                        In each exercise, they must write the body of a function
                                 Figure 1: Screenshot of the system used in the data collection


                                                                   ments (IDE).
                 Table 1: Exercise List                               Second, the student could test the code. This was done
 No.    Problem Description                                        by providing sample values for each input parameter of the
 1      Return the area of a square given the length of the        function, and then clicking the ”Run Code” button to exe-
        side.                                                      cute the code. The system responded to this command by
 2      Return the change given the price of the item, the         displaying the result of the execution on the screen. The re-
        number of items bought, and the amount paid by             sult may either be a successful compilation, a compilation
        the customer.                                              error, or a runtime error. In the case of a successful com-
 3      Return the larger value between two integers.              pilation, the return value of the function was displayed. In
 4      Return the name of the winner of a rock paper              the other two cases, the Java error message was displayed
        scissors game given what each player played.               instead.
 5      Return the age of the middle child given the ages             Third, the student could submit the code. The system then
        of three brothers.                                         automatically checked the code by running it on a set of pre-
 6      Return the total of all integers in a given array.         defined test cases and then comparing the result against the
 7      Given an array of integers, return the number of           expected values. If the code passed all test cases, the system
        elements that are divisible by 3.                          responded with a ”correct” message and displayed the next
 8      Return the sum of all the factors of a given integer.      problem. If the code failed at least one test case, the system
 9      Given an array of integers, return the number of           displayed a ”wrong” message. The student was not informed
        times that the most frequently occurring element           of the failing test cases nor the type of error that occurred, if
        appears.                                                   any.
                                                                      Each student used the system for 45 minutes, or until all
                                                                   the problems were solved correctly. Throughout the session,
                                                                   the system automatically logged information which com-
according to a given specification. For example, in one of the     prised of (1) a video recording of the student’s face, (2) all
exercises, the function took in an integer value representing      code changes, and (3) all compilations and submissions.
the length of the side of a square, and should return the area        At the end of the coding phase, the session was auto-
of the square. The students were not allowed to skip exer-         matically split into intervals. The boundaries of these inter-
cises until a correct solution had been provided. The order        vals corresponded to key moments in the session, which in-
of the exercises were fixed for all of the subjects, and were      cluded: program compilation (testing the program), program
arranged in increasing difficulty. Table 1 shows the exercises     submission, and the beginning and ending of each typing
that were given to the students.                                   sequence. A typing sequence refers to a sequence of code
   The students performed three types of interaction with the      changes (insertions and deletions) with a maximum interval
system throughout the session.                                     of 5 seconds in between. We chose this limit because some-
   First, the student could edit the code. A code edit may be      times students pause to do brief moments of thinking while
classified as an insertion (adding characters) or a deletion       they are typing. Intervals that were less than 5 seconds in
(removing characters). The system provides an interface for        length were merged with the succeeding interval until it was
editing code similar to an integrated development environ-         at least 5 seconds in length.
            Table 2: List of Affective State Labels                    Table 4: Distribution of Different Affective States
 Label         Definition                                                      Emotion       Japanese Filipino
 Engaged        You are immersed in the activity and en-                       Engaged          34.89%     36.09%
                joying it.                                                     Confused         18.05%     19.51%
 Confused       You have feelings of uncertainty on how to                     Frustration      16.20%     22.91%
                proceed.                                                       Bored             7.94%      6.07%
 Frustrated     You have strong feelings of anger or disap-                    Neutral          22.92%     15.42%
                pointment.
 Bored          You feel a lack of interest in continuing
                with the activity.                                                Results of the Analysis
 Neutral        There’s no apparent feeling.                      In this section, we present the results of the analysis done on
                                                                  the data. Our analysis aimed to explore the following ques-
                                                                  tions:
              Table 3: List of Action State Labels
 Label           Definition                                       • What is the distribution of the affective states experienced
                                                                    by the students?
 Reading         You are reading the problem.                     • What are the common transitions between affective
 Thinking        You are thinking about the next step you           states?
                 will do.
 Writing         You are translating your ideas by writing        • Which events trigger transitions between affective states?
                 them into code.                                  • Which features could be used to recognize the presence
 Finding         You are trying to determine what the error         of affective states?
                 is or thinking about how to fix it.
 Fixing          You are trying to change something in the        Affective State Occurrences
                 code to fix the error.                           In this section we present results regarding the occurrence
 Unfocused       You are not focused in the task and your         of the different affective states as reported by the students.
                 mind is thinking about other things.             Table 4 shows the distribution of the different affective states
 Other           The above labels do not apply.                   reported by the Japanese and Filipino students. This is based
                                                                  not on the number of intervals but on the total duration of the
                                                                  intervals.
   To collect affective data on the programming session,             A high level look at the distribution reveals similarities
each student was asked to provide self-report affect and ac-      between the two groups. Engagement was the emotion that
tion judgments on each interval. A maximum limit of 150           was reported the most, and comprised of approximately a
intervals was set for each student to keep the annotation task    third of the total duration of the session. Meanwhile, con-
manageable. If the session contained more than 150 inter-         fusion and frustration each comprised of around a fifth of
vals, we randomly chose 150 intervals for annotation. For         the total duration. In this experiment, Japanese students tend
each interval, the student was asked to select an emotion la-     to report the neutral emotion more often than the Filipino
bel, describing the affective state that best described his or    students.
her experience during that interval, and an action label, de-        We also investigated if the student’s performance had a
scribing the type of action he or she was doing during that in-   relationship with the distribution of the reported affective
terval. To minimize subjectivity in self-reports, we provided     states. To do this, we divided the students into 5 groups
a clear definition for each label. We chose the emotions of       based on the number of problems they were able to solve in
engagement, confusion, frustration, boredom, and neutral          the session. Students who were able to solve more problems
for the affective state labels based on a previous study which    were considered to have a better performance than those
showed that these were the common emotions experienced            solved less problems. We then computed for the distribu-
by novice programmers (Bosch, D’Mello, and Mills 2013).           tion of the affective states in each group. Figure 2 shows the
   Tables 2 and 3 show the emotion labels and action labels       results for both groups.
respectively, along with their definitions.                          Noticeable trends on the distribution of affect reports
   Data collection was performed from July to August 2018         could be observed based on the number of problems solved.
in Future University Hakodate in Japan and De La Salle Uni-       Boredom and frustration decreased as the number of prob-
versity in the Philippines. A total of 38 Japanese students       lems solved increased, while engagement increased along
and 35 students who were at the time taking up freshmen           with the number of problems solved. Interestingly, a drop
programming courses participated. We were able to collect         on the amount of engagement reports was observed in both
a total of 49 hours, 25 minutes, and 17 seconds of session        groups on the 8-9 problems solved category. A probable
data. This comprised of 9,702 annotated intervals. The av-        cause of this is the increase in confusion and frustration
erage number of annotated intervals collected per student is      due to the difficulty of the last two exercises offered by the
132.9. The average length of an interval is 17.24 seconds,        system. These observations support previous literature that
resulting in fairly fine-grained affect information.              showed correlations between different types of emotions and
                                                                   Figure 3: Significant transition likelihood scores between af-
                                                                   fective states. Edge values are the mean likelihood values
Figure 2: Distribution of Affective State Reports Grouped          and the values in parenthesis are the p values
by Performance
                                                                   A likelihood value > 0 means that the transition occurred
Table 5: Frequency of Transitions Between Affective States         above chance. We did not consider transitions to the same
(the row is the previous state and the column is the next state)   affective state. Likelihood score values were computed for
            Japanese                       Filipino                every student and a two-tailed one sample t-test was per-
        En Co Fr Bo                     En Co Fr Bo                formed. Significant transitions (p ≤ 0.05) and their corre-
                                                                   sponding mean likelihood scores are shown in Figure 3.
  En           70    17     9    En           54    11     8          Similar observations were found for both Japanese and
  Co     68          47    13    Co     68          30    17       Filipino groups. These results are consistent with the theo-
  Fr     21    35          10    Fr     15    26          17       retical model of affect dynamics for complex learning pro-
  Bo      8    10    11          Bo      9     8    13             posed by D’Mello and Graesser (2012b). In this model, a
                                                                   student in the state of engagement may transition to state
                                                                   of confusion when a hurdle is encountered. Depending on
student performance, and highlight the importance of man-          whether the hurdle is resolved or not, the student may tran-
aging student affect in learning systems.                          sition back to engagement in the case of the former, or tran-
                                                                   sition to a state of frustration in the case of the latter. In the
Affective State Transitions                                        model, frustration may transition to boredom, but this was
In this section we present results on the transitions between      not observed at a significant level in our data. This may be
different affective states. Table 5 shows the frequency of         because several of the students did not report boredom at all,
each transition between pairs of affect reports. We only con-      making it difficult to establish a statistical significance.
sidered intervals that that are immediately consecutive. The
data shows that certain transitions occurred more often than       Triggers of Affective Transitions
others. For example, transitions from engagement to confu-         In this section, we present results on the events that trig-
sion and vice versa occurred in large frequency for both of        ger affective state transitions. We identified boundaries of
the groups.                                                        the intervals that were associated with compilation and sub-
   To verify which transitions were significant, we applied        mission events. A compilation event refers to a point in the
a scoring metric for transition likelihoods between affective      session where the student tested the code by providing sam-
states proposed by D’Mello (2012). We computed the like-           ple inputs. This event could result to a compilation with no
lihood score of a state following another as the conditional       errors or a compilation with errors (syntax or runtime). On
probability of the next state occurring after the current state    the other hand, a submission event could either result into a
normalized over the overall probability of the next state oc-      submission that passed or failed.
curring. This likelihood score value has a range of (−∞, 1].          We computed the likelihood of each affective state to fol-
                                                                 Table 6: Common Action Sequences That Lead to an Affec-
                                                                 tive State (Frequency is occurrence over all n-grams of the
                                                                 same emotion and length)
                                                                     Freq. Sequence
                                                                  12.92%     Writing → Thinking → Engaged
                                                                  11.84%     Writing → Compile No Error → Engaged
                                                                   9.83%     Thinking → Writing → Engaged
                                                                   9.15%     Thinking → Compile No Error → Engaged
                                                                   8.36%     Writing → Thinking → Writing → Engaged
                                                                   8.75%     Writing → Thinking → Writing → Thinking
                                                                             → Engaged
                                                                   14.9%     Thinking → Compile No Error → Confused
                                                                   8.16%     Finding Bug → Compile No Error → Con-
                                                                             fused
                                                                  13.86%     Compile No Error → Thinking → Compile No
                                                                             Error → Confused
                                                                  10.18%     Thinking → Compile No Error → Thinking →
                                                                             Compile No Error → Confused
                                                                   9.17%     Compile No Error → Thinking → Compile No
                                                                             Error → Thinking → Compile No Error →
Figure 4: Significant transition likelihoods from compilation                Confused
and submission events to affective states. Edge values are the     8.74%     Finding Bug → Compile Error → Frustrated
mean likelihood values and the values in parenthesis are the       8.33%     Fixing Bug → Compile Error → Frustrated
p values

                                                                 Predictors of Affect
low each compilation or submission event. We did this for        In this section, we present results on features that are use-
each student and performed a two-tailed one sample t-test to     ful for predicting affect. We investigated log-based features
determine which transition likelihoods were significant. We      (code compilations, typing, etc.) and face-based features.
did not consider submission passed events because there was      For face-based features, we used OpenFace, a computer vi-
a low number of occurrences of this event that was followed      sion toolkit capable of head pose estimation, eye gaze esti-
by an interval. We applied a Bonferonni correction resulting     mation and action unit recognition from videos (Baltrusaitis
in α = 0.004.                                                    et al. 2018).
                                                                    Action units are a taxonomy of fundamental actions of
   The results are shown in Figure 4. For the Japanese group,    facial muscles used in previous studies for emotion recog-
we found that there was a likelihood that a compilation error    nition (Ekman and Friesen 1975). An example of an action
was followed by frustration in levels above chance. On the       unit is raising the inner brow or raising the cheek. OpenFace
other hand, for the Filipino group, we found that compila-       has shown good inter-rater agreement with baselines set by
tion errors were likely to be followed by both confusion and     human annotators across multiple datasets in AU detection
frustration. In both groups, a compilation without any errors    (Baltrušaitis, Mahmoud, and Robinson 2015). Table 7 shows
was likely to be followed by engagement.                         a list of the features that we considered.
   We also performed frequency analysis of n-grams on the           In this analysis, we treated each interval as a separate in-
session data to determine common sequences of actions that       stance, with the affect report as the class label. To determine
lead to an affective state. We considered n-grams of length 4    which features are useful for recognizing affect, we used
to 6. Table 6 shows the common sequences. We considered          RELIEF-F feature ranking to rank features based on how
a sequence to be common if it accounts for at least 2% of all    discriminative they were against the closest neighboring in-
sequences leading to the target affective state with the same    stance with a different class. We identified the features that
n-gram length.                                                   scored high in this ranking, and then further made statistical
                                                                 analyses on these features. The following subsections dis-
   It can be seen that transitions in the affective state are    cuss these.
often observed after a compilation or submission. This is
expected because these actions are the types of interactions     Log-Based Features We performed paired Wilcoxon
that the system responds or gives feedback to. The feedback      signed-rank tests to determine if action state significantly
likely triggers the change in affective state. Furthermore, it   co-occur with various affective states. We found that engage-
can be seen that engagement and confusion are associated         ment co-occurs significantly more with writing (µ = 0.49)
with writing and thinking, while frustration is associated       than thinking (µ = 0.25, p = 0.0000009), and co-occurs
with finding and fixing bugs.                                    significantly more with thinking more than the other actions.
                  Table 7: List of Features
 Feature          Description
 Log-based Features
 insert        No. of insertions in the code
 remove        No. of deletions in the code
 type          No. of insertions and deletions in the code
 compile err   No. of compilations with syntax error
 compile       No. of compilations without syntax error
 Pose-based Features (from OpenFace)
 pose Tx       Location of the head in x axis in mm
 pose Ty       Location of the head in y axis in mm               Figure 5: Displays of AU04 (Brow Lowerer). The left im-
 pose Tz       Location of the head in z axis in mm               age is a Japanese student with a slight AU04 display, while
 pose Rx       Rotation of the head in x axis in radians          the right image is a Filipino student with a stronger AU04
 pose Ry       Rotation of the head in y axis in radians          display.
 pose Rz       Rotation of the head in z axis in radians
 gaze angle x Gaze angle x in world coord. in radians
 gaze angle y Gaze angle y in world coord. in radians             Table 8: Mean Intensity Display of AU04 in Typing and
 Face-based Features (from OpenFace)                              Non-typing Intervals
 AUs           Intensity of different action units                                              Japanese Filipino
                                                                          Engaged, not typing           0.64        0.21
                                                                          Confused, not typing          0.66        0.22
Confusion co-occurs significantly more with thinking (µ =                 Frustrated, not typing        0.98        0.31
0.4) more than writing (µ = 0.16, p = 0.0000035), finding                 Engaged, typing               0.89        0.21
(µ = 0.21, p = 0.0023) and fixing (µ = 0.18, p = 0.0002).                 Confused, typing              0.65        0.29
Meanwhile, frustration co-occurred with finding (µ = 0.2)                 Frustrated, typing            1.35        0.35
and fixing bugs (µ = 0.2) more than it co-occurred with
writing (µ = 0.17), but the difference is not significant.
   Document insertions occurred significantly more when
                                                                  8). This finding supports previous studies that have associ-
students were engaged (µ = 0.65) then when they were
                                                                  ated AU04 with confusion and frustration (Bosch, Chen, and
confused (µ = 0.34, p = 0.000000011), frustrated (µ =
                                                                  D’Mello 2014; Grafsgaard, Boyer, and Lester 2011).
0.35, p = 0.000046) or bored (µ = 0.25, p = 0.00078).
Document deletions occurred significantly more when stu-             We also performed a Wilcoxon signed-ranked test to de-
dents were engaged (µ = 0.13) than when they were con-            termine if there was a difference in the mean intensity of
fused (µ = 0.09, p = 0.0063) or bored (µ = 0.07, p =              AU04 on intervals of frustration compared to other affective
0.0033). Overall, document changes occurred significantly         states, and found that there was indeed a significant differ-
more when students are engaged (µ = 0.77) then when they          ence. The mean intensity display of AU04 in intervals of
are confused (µ = 0.44, p = 0.000000047), frustrated (µ =         frustation across all groups is 1.24, while the mean intensity
0.55, p = 0.0055) or bored (µ = 0.38, p = 0.0015). These          of the display of AU04 in all other intervals is 0.99, with a p
findings suggest that document changes were indicative of         value of 0.0065, indicating a significant difference.
engagement, and supports our previous study in which con-         Head Rotation Standard Deviation Another feature that
fusion was classified using hidden Markov models with log-        was ranked highly is the standard deviation of the head lo-
based features (Tiam-Lee and Sumi 2018).                          cation with respect to the camera. We found that the mean
AU04 - Brow Lowerer AU04 is an action unit referred to            standard deviation of the head location tends to be higher
as the ”Brow Lowerer”. As the name implies, it refers to the      across both groups in intervals of boredom. This suggests
lowering of the eyebrow. Figure 5 shows some examples of          that there is a bigger range of head movement when students
this action unit.                                                 are bored. The mean standard deviation values of the head
                                                                  location features are shown in Table 9.
   AU04 was ranked highly in RELIEF-F feature ranking for
classification tasks for engagement, confusion, and frustra-
tion. Students exhibited this action unit in the data when fur-                           Discussion
rowing the brow, and also when looking down to the key-           In this study, we looked into the affective experience of stu-
board repeatedly when typing. In our data, there was an in-       dents while using a system for programming practice. This
creased observation of AU04 on Japanese students because          system did not provide any learning interventions such as
they tend to look down on the keyboard more than the Fil-         learning prompts or hints to help the students. It only pro-
ipino students.                                                   vided a basic interface to facilitate solving the programming
   Upon further analysis, it can be seen that the mean inten-     exercises. Thus, it could be said that the environment used
sity of AU04 increases in moments of confusion and frus-          in our experiment was similar to that of students doing pro-
tration for both typing and non-typing intervals (see Table       gramming practice on their own without any guidance. This
                                                                  adding support to the idea that these observations on the af-
Table 9: Head Location Average Standard Deviation Across          fective experience of students are consistent across different
Different Emotions                                                environments.
              Engaged Confused Frustrated Bored                      That being said, there are some differences between the
 Japanese Group                                                   two groups observed in our data. For example, in our ex-
 location X     12.10           15.75         15.00    20.87      periment the Japanese students reported the ”neutral” (no
 location Y     10.16           10.55         11.58    15.97      apparent feeling) emotion more than the Filipino students,
 location Z     13.91           15.67         16.67    23.27      despite the same definition of the affective state labels be-
 Filipino Group                                                   ing provided to the two groups. It is difficult to say whether
 location X     14.14           14.06         14.34    17.25      this was because the Japanese students really felt less emo-
 location Y      9.80            9.02          9.01    10.68      tions or because they tend to be more reluctant to report their
 location Z     10.75           11.73         11.96    15.74      emotions.
                                                                     Another noticeable difference that can have implications
                                                                  in the implementation of ITS is that the Japanese students
                                                                  tend to have higher intensities of AU04 (brow lowerer) com-
was different from previous similar studies in which learning     pared to the Filipino students. Upon closer inspection, this
interventions and prompts were given to elicit emotions.          was because the Japanese students tend to look down at key-
   Despite this, we found that students still experienced         board more while typing, causing their eyebrows to move
learning-specific emotions all throughout the sessions, and       downwards, which was being detected as AU04. Considera-
that they transition between these emotions. We were able to      tions like this have to be made when designing systems for
confirm transitions in the theoretical model of affect dynam-     practical use.
ics for complex learning tasks, which show that engagement
generally transitions to confusion when hurdles are encoun-
tered, and confusion can either transition back to engage-
                                                                                         Conclusion
ment if the confusion is resolved, or transition to frustration   In this paper, we presented an analysis of the affective ex-
if not resolved.                                                  perience of students while interacting with a system for pro-
   On average, we found that frustration accounted for            gramming practice. We believe that our findings can provide
around a fifth of all the emotions experienced. This could        insights in the development and implementation of affect-
have been potentially addressed if confusion could be re-         aware intelligent tutoring systems for programming.
solved through tutor intervention. This is supported by our
findings that students who performed better (i.e. solved more                        Acknowledgments
problems) experienced less negative affective states like
                                                                  The authors would like to thank Mr. Fritz Kevin Flores, Mr.
frustration and boredom, and at the same time experienced
                                                                  Manuel Carl Toleran, and Mr. Kayle Anjelo Tiu for facilitat-
more engagement. This shows that there is potential for in-
                                                                  ing the data collection sessions in De La Salle University in
telligent programming tutors to improve the learning expe-
                                                                  the Philippines.
rience of students.
   Transitions between affective states were often observed
during code compilations and submissions. These are also                                  References
the points in the session where the system gives feedback         Baltrusaitis, T.; Zadeh, A.; Lim, Y. C.; and Morency, L.-P.
(i.e., displays if the output of the code or displays if the      2018. Openface 2.0: Facial behavior analysis toolkit. In Au-
submission passed or failed). This implies that changes in        tomatic Face & Gesture Recognition (FG 2018), 2018 13th
the affective state could be more easily triggered by sys-        IEEE International Conference on, 59–66. IEEE.
tem feedback. And if appropriate interventions could be           Baltrušaitis, T.; Mahmoud, M.; and Robinson, P. 2015.
displayed, intelligent programming tutors could potentially       Cross-dataset learning and person-specific normalisation for
control transitions of negative affective states to more posi-    automatic action unit detection. In Automatic Face and Ges-
tive ones.                                                        ture Recognition (FG), 2015 11th IEEE International Con-
   We also looked into the features that could be useful for      ference and Workshops on, volume 6, 1–6. IEEE.
predicting affect, and found statistical evidence that asso-
                                                                  Bosch, N., and D’Mello, S. 2013. Sequential patterns of af-
ciated certain log-based and face-based features in recog-
                                                                  fective states of novice programmers. In The First Workshop
nizing certain emotions. We found that document changes
                                                                  on AI-supported Education for Computer Science (AIEDCS
(typing), compilations, AU04 (lowering of the brow), and
                                                                  2013), 1–10.
head location standard deviation could be useful features for
predicting affect. Face-based features alone are difficult to     Bosch, N.; Chen, Y.; and D’Mello, S. 2014. Its written
use in affect recognition, as was shown in previous studies,      on your face: detecting affective states from facial expres-
but combining them with log-based features and a mdoel of         sions while learning computer programming. In Interna-
affect occurrence and transition could potentially improve        tional Conference on Intelligent Tutoring Systems, 39–44.
performance.                                                      Springer.
   We conducted the same experiment on two different uni-         Bosch, N.; D’Mello, S.; and Mills, C. 2013. What emo-
versities, and we were able to achieve very similar reuslts,      tions do novices experience during their first computer pro-
gramming learning session? In International Conference on
Artificial Intelligence in Education, 11–20. Springer.
Cho, M.-H., and Heron, M. L. 2015. Self-regulated learning:
the role of motivation, emotion, and use of learning strate-
gies in students learning experiences in a self-paced online
mathematics course. Distance Education 36(1):80–99.
Daniels, L. M.; Stupnisky, R. H.; Pekrun, R.; Haynes, T. L.;
Perry, R. P.; and Newall, N. E. 2009. A longitudinal analysis
of achievement goals: From affective antecedents to emo-
tional effects and achievement outcomes. Journal of Educa-
tional Psychology 101(4):948.
D’Mello, S., and Graesser, A. 2012a. Autotutor and affective
autotutor: Learning by talking with cognitively and emotion-
ally intelligent computers that talk back. ACM Transactions
on Interactive Intelligent Systems (TiiS) 2(4):23.
D’Mello, S., and Graesser, A. 2012b. Dynamics of affective
states during complex learning. Learning and Instruction
22(2):145–157.
D’Mello, S. 2012. Monitoring affective trajectories during
complex learning. In Encyclopedia of the Sciences of Learn-
ing. Springer. 2325–2328.
Ekman, P., and Friesen, W. V. 1975. Unmasking the face: A
guide to recognizing emotions from facial cues.
Grafsgaard, J. F.; Boyer, K. E.; and Lester, J. C. 2011. Pre-
dicting facial indicators of confusion with hidden markov
models. In International Conference on Affective Comput-
ing and Intelligent Interaction, 97–106. Springer.
Jaques, N.; Conati, C.; Harley, J. M.; and Azevedo, R. 2014.
Predicting affect from gaze data during interaction with an
intelligent tutoring system. In International Conference on
Intelligent Tutoring Systems, 29–38. Springer.
Mega, C.; Ronconi, L.; and De Beni, R. 2014. What makes
a good student? how emotions, self-regulated learning, and
motivation contribute to academic achievement. Journal of
Educational Psychology 106(1):121.
Tiam-Lee, T. J., and Sumi, K. 2018. Adaptive feedback
based on student emotion in a system for programming prac-
tice. In International Conference on Intelligent Tutoring
Systems, 243–255. Springer.
Zatarain-Cabada, R.; Barrón-Estrada, M. L.; Camacho, J.
L. O.; and Reyes-Garcı́a, C. A. 2014. Affective tutoring
system for android mobiles. In International Conference on
Intelligent Computing, 1–10. Springer.