=Paper=
{{Paper
|id=Vol-2328/session3_paper2
|storemode=property
|title=Emotional Experience of Students Interacting with a System for Learning Programming
|pdfUrl=https://ceur-ws.org/Vol-2328/3_1_paper_14.pdf
|volume=Vol-2328
|authors=Thomas James Tiam-Lee,Kaoru Sumi
|dblpUrl=https://dblp.org/rec/conf/aaai/Tiam-LeeS19
}}
==Emotional Experience of Students Interacting with a System for Learning Programming==
Emotional Experience of Students Interacting with a System for Learning Programming Thomas James Tiam-Lee and Kaoru Sumi Future University Hakodate 116-2 Kamedanakanocho Hakodate, Hokkaido 041-8655 Abstract However, understanding affect in complex learning tasks still proves to be challenging. One such task is computer pro- This paper discusses the emotional experience of students gramming. In learning programming, students spend a lot of while interacting with a system for solving programming ex- time writing, testing, and debugging code. Interactions with ercises. We collected data from 73 university students who the tutor agent typically occur in less frequency, and displays each used the system for 45 minutes. They were also asked to provide self-report affect judgments which describe the emo- of affect through facial expressions tend to be more subtle as tions that they experienced at different moments in the ses- well. Despite this, previous studies have shown that students sion. Using this data, we performed an analysis of the emo- have a rich affective experience of learning-specific emo- tional experience of students while interacting with the sys- tions while learning programming (Bosch, D’Mello, and tem content, focusing particularly on the transitions across Mills 2013; Bosch and D’Mello 2013) - information that different emotions. We also analyzed the facial expressions, could hold much potential for improving programming in- pose, and logs in relation to the various emotional states. We struction. believe this study can contribute to the recognition of student A study by Bosch, Chen, and D’Mello showed that it is affect in programming activity, and can potentially be used difficult to automatically recognize fixed-point affect judg- in a variety of applications such as intelligent programming ments in programming sessions by using face features alone. tutors. One of the ways to address this is to combine face data with an understanding of the affective experience of students Introduction and Related Studies while doing the task to improve the recognition of affective states. In programming activity, there is a rich set of data Recently, there has been great interest in modelling the af- from the interaction between the student and the system that fective experience of students while engaging in learning could be used to help model affect. tasks. Previous studies have found that positive emotions In this paper, we discuss a statistical analysis of the af- such as enjoyment are positively correlated with student fective experience of students interacting with a system for achievement, whereas negative emotions such as boredom programming practice. We focus our discussions on the dis- as negatively correlated with student achievement (Daniels tribution of the emotions experienced by the students, how et al. 2009). Emotions have also been shown to be associated these emotions transiton from one type to another, and which with motivation and self-regulated learning (Mega, Ronconi, features are useful for recognizing these affective states. and De Beni 2014), as well as the quality of the learning experience (Cho and Heron 2015). These studies provide System Design and Experimental Setup support for efforts in automatically recognizing the affective state of students while learning. In this section we discuss the system design and the exper- Good affective models of learning-specific emotions can imental setup for this study. We recruited 38 students from open up opportunities for intelligent tutoring systems (ITS) Future University Hakodate in Japan and 35 students from by allowing them to respond not only to the cognitive state De La Salle University in the Philippines to participate in but also the affective state of the student. For example, if this study. We chose to recruit participants from two differ- confusion is detected, the ITS may provide an interven- ent countries not only to increase the amount of data that tion in the form of a hint. This is referred to as affec- could be collected but also to investigate if there are simi- tive tutoring. Previous ITS such as AutoTutor (D’Mello and larities or differences between the two groups. All students Graesser 2012a), MetaTutor (Jaques et al. 2014), and FER- who participated in the study were enrolled in a freshman MAT (Zatarain-Cabada et al. 2014) have shown the potential introductory programming course in their respective univer- of affective tutoring in improving students’ learning across sities at the time of the experiment. various domains. Each student interacted with a system in which they have to solve a series of coding exercises. A screenshot of the Copyright c 2019, Association for the Advancement of Artificial system is shown in Figure 1 Intelligence (www.aaai.org). All rights reserved. In each exercise, they must write the body of a function Figure 1: Screenshot of the system used in the data collection ments (IDE). Table 1: Exercise List Second, the student could test the code. This was done No. Problem Description by providing sample values for each input parameter of the 1 Return the area of a square given the length of the function, and then clicking the ”Run Code” button to exe- side. cute the code. The system responded to this command by 2 Return the change given the price of the item, the displaying the result of the execution on the screen. The re- number of items bought, and the amount paid by sult may either be a successful compilation, a compilation the customer. error, or a runtime error. In the case of a successful com- 3 Return the larger value between two integers. pilation, the return value of the function was displayed. In 4 Return the name of the winner of a rock paper the other two cases, the Java error message was displayed scissors game given what each player played. instead. 5 Return the age of the middle child given the ages Third, the student could submit the code. The system then of three brothers. automatically checked the code by running it on a set of pre- 6 Return the total of all integers in a given array. defined test cases and then comparing the result against the 7 Given an array of integers, return the number of expected values. If the code passed all test cases, the system elements that are divisible by 3. responded with a ”correct” message and displayed the next 8 Return the sum of all the factors of a given integer. problem. If the code failed at least one test case, the system 9 Given an array of integers, return the number of displayed a ”wrong” message. The student was not informed times that the most frequently occurring element of the failing test cases nor the type of error that occurred, if appears. any. Each student used the system for 45 minutes, or until all the problems were solved correctly. Throughout the session, the system automatically logged information which com- according to a given specification. For example, in one of the prised of (1) a video recording of the student’s face, (2) all exercises, the function took in an integer value representing code changes, and (3) all compilations and submissions. the length of the side of a square, and should return the area At the end of the coding phase, the session was auto- of the square. The students were not allowed to skip exer- matically split into intervals. The boundaries of these inter- cises until a correct solution had been provided. The order vals corresponded to key moments in the session, which in- of the exercises were fixed for all of the subjects, and were cluded: program compilation (testing the program), program arranged in increasing difficulty. Table 1 shows the exercises submission, and the beginning and ending of each typing that were given to the students. sequence. A typing sequence refers to a sequence of code The students performed three types of interaction with the changes (insertions and deletions) with a maximum interval system throughout the session. of 5 seconds in between. We chose this limit because some- First, the student could edit the code. A code edit may be times students pause to do brief moments of thinking while classified as an insertion (adding characters) or a deletion they are typing. Intervals that were less than 5 seconds in (removing characters). The system provides an interface for length were merged with the succeeding interval until it was editing code similar to an integrated development environ- at least 5 seconds in length. Table 2: List of Affective State Labels Table 4: Distribution of Different Affective States Label Definition Emotion Japanese Filipino Engaged You are immersed in the activity and en- Engaged 34.89% 36.09% joying it. Confused 18.05% 19.51% Confused You have feelings of uncertainty on how to Frustration 16.20% 22.91% proceed. Bored 7.94% 6.07% Frustrated You have strong feelings of anger or disap- Neutral 22.92% 15.42% pointment. Bored You feel a lack of interest in continuing with the activity. Results of the Analysis Neutral There’s no apparent feeling. In this section, we present the results of the analysis done on the data. Our analysis aimed to explore the following ques- tions: Table 3: List of Action State Labels Label Definition • What is the distribution of the affective states experienced by the students? Reading You are reading the problem. • What are the common transitions between affective Thinking You are thinking about the next step you states? will do. Writing You are translating your ideas by writing • Which events trigger transitions between affective states? them into code. • Which features could be used to recognize the presence Finding You are trying to determine what the error of affective states? is or thinking about how to fix it. Fixing You are trying to change something in the Affective State Occurrences code to fix the error. In this section we present results regarding the occurrence Unfocused You are not focused in the task and your of the different affective states as reported by the students. mind is thinking about other things. Table 4 shows the distribution of the different affective states Other The above labels do not apply. reported by the Japanese and Filipino students. This is based not on the number of intervals but on the total duration of the intervals. To collect affective data on the programming session, A high level look at the distribution reveals similarities each student was asked to provide self-report affect and ac- between the two groups. Engagement was the emotion that tion judgments on each interval. A maximum limit of 150 was reported the most, and comprised of approximately a intervals was set for each student to keep the annotation task third of the total duration of the session. Meanwhile, con- manageable. If the session contained more than 150 inter- fusion and frustration each comprised of around a fifth of vals, we randomly chose 150 intervals for annotation. For the total duration. In this experiment, Japanese students tend each interval, the student was asked to select an emotion la- to report the neutral emotion more often than the Filipino bel, describing the affective state that best described his or students. her experience during that interval, and an action label, de- We also investigated if the student’s performance had a scribing the type of action he or she was doing during that in- relationship with the distribution of the reported affective terval. To minimize subjectivity in self-reports, we provided states. To do this, we divided the students into 5 groups a clear definition for each label. We chose the emotions of based on the number of problems they were able to solve in engagement, confusion, frustration, boredom, and neutral the session. Students who were able to solve more problems for the affective state labels based on a previous study which were considered to have a better performance than those showed that these were the common emotions experienced solved less problems. We then computed for the distribu- by novice programmers (Bosch, D’Mello, and Mills 2013). tion of the affective states in each group. Figure 2 shows the Tables 2 and 3 show the emotion labels and action labels results for both groups. respectively, along with their definitions. Noticeable trends on the distribution of affect reports Data collection was performed from July to August 2018 could be observed based on the number of problems solved. in Future University Hakodate in Japan and De La Salle Uni- Boredom and frustration decreased as the number of prob- versity in the Philippines. A total of 38 Japanese students lems solved increased, while engagement increased along and 35 students who were at the time taking up freshmen with the number of problems solved. Interestingly, a drop programming courses participated. We were able to collect on the amount of engagement reports was observed in both a total of 49 hours, 25 minutes, and 17 seconds of session groups on the 8-9 problems solved category. A probable data. This comprised of 9,702 annotated intervals. The av- cause of this is the increase in confusion and frustration erage number of annotated intervals collected per student is due to the difficulty of the last two exercises offered by the 132.9. The average length of an interval is 17.24 seconds, system. These observations support previous literature that resulting in fairly fine-grained affect information. showed correlations between different types of emotions and Figure 3: Significant transition likelihood scores between af- fective states. Edge values are the mean likelihood values Figure 2: Distribution of Affective State Reports Grouped and the values in parenthesis are the p values by Performance A likelihood value > 0 means that the transition occurred Table 5: Frequency of Transitions Between Affective States above chance. We did not consider transitions to the same (the row is the previous state and the column is the next state) affective state. Likelihood score values were computed for Japanese Filipino every student and a two-tailed one sample t-test was per- En Co Fr Bo En Co Fr Bo formed. Significant transitions (p ≤ 0.05) and their corre- sponding mean likelihood scores are shown in Figure 3. En 70 17 9 En 54 11 8 Similar observations were found for both Japanese and Co 68 47 13 Co 68 30 17 Filipino groups. These results are consistent with the theo- Fr 21 35 10 Fr 15 26 17 retical model of affect dynamics for complex learning pro- Bo 8 10 11 Bo 9 8 13 posed by D’Mello and Graesser (2012b). In this model, a student in the state of engagement may transition to state of confusion when a hurdle is encountered. Depending on student performance, and highlight the importance of man- whether the hurdle is resolved or not, the student may tran- aging student affect in learning systems. sition back to engagement in the case of the former, or tran- sition to a state of frustration in the case of the latter. In the Affective State Transitions model, frustration may transition to boredom, but this was In this section we present results on the transitions between not observed at a significant level in our data. This may be different affective states. Table 5 shows the frequency of because several of the students did not report boredom at all, each transition between pairs of affect reports. We only con- making it difficult to establish a statistical significance. sidered intervals that that are immediately consecutive. The data shows that certain transitions occurred more often than Triggers of Affective Transitions others. For example, transitions from engagement to confu- In this section, we present results on the events that trig- sion and vice versa occurred in large frequency for both of ger affective state transitions. We identified boundaries of the groups. the intervals that were associated with compilation and sub- To verify which transitions were significant, we applied mission events. A compilation event refers to a point in the a scoring metric for transition likelihoods between affective session where the student tested the code by providing sam- states proposed by D’Mello (2012). We computed the like- ple inputs. This event could result to a compilation with no lihood score of a state following another as the conditional errors or a compilation with errors (syntax or runtime). On probability of the next state occurring after the current state the other hand, a submission event could either result into a normalized over the overall probability of the next state oc- submission that passed or failed. curring. This likelihood score value has a range of (−∞, 1]. We computed the likelihood of each affective state to fol- Table 6: Common Action Sequences That Lead to an Affec- tive State (Frequency is occurrence over all n-grams of the same emotion and length) Freq. Sequence 12.92% Writing → Thinking → Engaged 11.84% Writing → Compile No Error → Engaged 9.83% Thinking → Writing → Engaged 9.15% Thinking → Compile No Error → Engaged 8.36% Writing → Thinking → Writing → Engaged 8.75% Writing → Thinking → Writing → Thinking → Engaged 14.9% Thinking → Compile No Error → Confused 8.16% Finding Bug → Compile No Error → Con- fused 13.86% Compile No Error → Thinking → Compile No Error → Confused 10.18% Thinking → Compile No Error → Thinking → Compile No Error → Confused 9.17% Compile No Error → Thinking → Compile No Error → Thinking → Compile No Error → Figure 4: Significant transition likelihoods from compilation Confused and submission events to affective states. Edge values are the 8.74% Finding Bug → Compile Error → Frustrated mean likelihood values and the values in parenthesis are the 8.33% Fixing Bug → Compile Error → Frustrated p values Predictors of Affect low each compilation or submission event. We did this for In this section, we present results on features that are use- each student and performed a two-tailed one sample t-test to ful for predicting affect. We investigated log-based features determine which transition likelihoods were significant. We (code compilations, typing, etc.) and face-based features. did not consider submission passed events because there was For face-based features, we used OpenFace, a computer vi- a low number of occurrences of this event that was followed sion toolkit capable of head pose estimation, eye gaze esti- by an interval. We applied a Bonferonni correction resulting mation and action unit recognition from videos (Baltrusaitis in α = 0.004. et al. 2018). Action units are a taxonomy of fundamental actions of The results are shown in Figure 4. For the Japanese group, facial muscles used in previous studies for emotion recog- we found that there was a likelihood that a compilation error nition (Ekman and Friesen 1975). An example of an action was followed by frustration in levels above chance. On the unit is raising the inner brow or raising the cheek. OpenFace other hand, for the Filipino group, we found that compila- has shown good inter-rater agreement with baselines set by tion errors were likely to be followed by both confusion and human annotators across multiple datasets in AU detection frustration. In both groups, a compilation without any errors (Baltrušaitis, Mahmoud, and Robinson 2015). Table 7 shows was likely to be followed by engagement. a list of the features that we considered. We also performed frequency analysis of n-grams on the In this analysis, we treated each interval as a separate in- session data to determine common sequences of actions that stance, with the affect report as the class label. To determine lead to an affective state. We considered n-grams of length 4 which features are useful for recognizing affect, we used to 6. Table 6 shows the common sequences. We considered RELIEF-F feature ranking to rank features based on how a sequence to be common if it accounts for at least 2% of all discriminative they were against the closest neighboring in- sequences leading to the target affective state with the same stance with a different class. We identified the features that n-gram length. scored high in this ranking, and then further made statistical analyses on these features. The following subsections dis- It can be seen that transitions in the affective state are cuss these. often observed after a compilation or submission. This is expected because these actions are the types of interactions Log-Based Features We performed paired Wilcoxon that the system responds or gives feedback to. The feedback signed-rank tests to determine if action state significantly likely triggers the change in affective state. Furthermore, it co-occur with various affective states. We found that engage- can be seen that engagement and confusion are associated ment co-occurs significantly more with writing (µ = 0.49) with writing and thinking, while frustration is associated than thinking (µ = 0.25, p = 0.0000009), and co-occurs with finding and fixing bugs. significantly more with thinking more than the other actions. Table 7: List of Features Feature Description Log-based Features insert No. of insertions in the code remove No. of deletions in the code type No. of insertions and deletions in the code compile err No. of compilations with syntax error compile No. of compilations without syntax error Pose-based Features (from OpenFace) pose Tx Location of the head in x axis in mm pose Ty Location of the head in y axis in mm Figure 5: Displays of AU04 (Brow Lowerer). The left im- pose Tz Location of the head in z axis in mm age is a Japanese student with a slight AU04 display, while pose Rx Rotation of the head in x axis in radians the right image is a Filipino student with a stronger AU04 pose Ry Rotation of the head in y axis in radians display. pose Rz Rotation of the head in z axis in radians gaze angle x Gaze angle x in world coord. in radians gaze angle y Gaze angle y in world coord. in radians Table 8: Mean Intensity Display of AU04 in Typing and Face-based Features (from OpenFace) Non-typing Intervals AUs Intensity of different action units Japanese Filipino Engaged, not typing 0.64 0.21 Confused, not typing 0.66 0.22 Confusion co-occurs significantly more with thinking (µ = Frustrated, not typing 0.98 0.31 0.4) more than writing (µ = 0.16, p = 0.0000035), finding Engaged, typing 0.89 0.21 (µ = 0.21, p = 0.0023) and fixing (µ = 0.18, p = 0.0002). Confused, typing 0.65 0.29 Meanwhile, frustration co-occurred with finding (µ = 0.2) Frustrated, typing 1.35 0.35 and fixing bugs (µ = 0.2) more than it co-occurred with writing (µ = 0.17), but the difference is not significant. Document insertions occurred significantly more when 8). This finding supports previous studies that have associ- students were engaged (µ = 0.65) then when they were ated AU04 with confusion and frustration (Bosch, Chen, and confused (µ = 0.34, p = 0.000000011), frustrated (µ = D’Mello 2014; Grafsgaard, Boyer, and Lester 2011). 0.35, p = 0.000046) or bored (µ = 0.25, p = 0.00078). Document deletions occurred significantly more when stu- We also performed a Wilcoxon signed-ranked test to de- dents were engaged (µ = 0.13) than when they were con- termine if there was a difference in the mean intensity of fused (µ = 0.09, p = 0.0063) or bored (µ = 0.07, p = AU04 on intervals of frustration compared to other affective 0.0033). Overall, document changes occurred significantly states, and found that there was indeed a significant differ- more when students are engaged (µ = 0.77) then when they ence. The mean intensity display of AU04 in intervals of are confused (µ = 0.44, p = 0.000000047), frustrated (µ = frustation across all groups is 1.24, while the mean intensity 0.55, p = 0.0055) or bored (µ = 0.38, p = 0.0015). These of the display of AU04 in all other intervals is 0.99, with a p findings suggest that document changes were indicative of value of 0.0065, indicating a significant difference. engagement, and supports our previous study in which con- Head Rotation Standard Deviation Another feature that fusion was classified using hidden Markov models with log- was ranked highly is the standard deviation of the head lo- based features (Tiam-Lee and Sumi 2018). cation with respect to the camera. We found that the mean AU04 - Brow Lowerer AU04 is an action unit referred to standard deviation of the head location tends to be higher as the ”Brow Lowerer”. As the name implies, it refers to the across both groups in intervals of boredom. This suggests lowering of the eyebrow. Figure 5 shows some examples of that there is a bigger range of head movement when students this action unit. are bored. The mean standard deviation values of the head location features are shown in Table 9. AU04 was ranked highly in RELIEF-F feature ranking for classification tasks for engagement, confusion, and frustra- tion. Students exhibited this action unit in the data when fur- Discussion rowing the brow, and also when looking down to the key- In this study, we looked into the affective experience of stu- board repeatedly when typing. In our data, there was an in- dents while using a system for programming practice. This creased observation of AU04 on Japanese students because system did not provide any learning interventions such as they tend to look down on the keyboard more than the Fil- learning prompts or hints to help the students. It only pro- ipino students. vided a basic interface to facilitate solving the programming Upon further analysis, it can be seen that the mean inten- exercises. Thus, it could be said that the environment used sity of AU04 increases in moments of confusion and frus- in our experiment was similar to that of students doing pro- tration for both typing and non-typing intervals (see Table gramming practice on their own without any guidance. This adding support to the idea that these observations on the af- Table 9: Head Location Average Standard Deviation Across fective experience of students are consistent across different Different Emotions environments. Engaged Confused Frustrated Bored That being said, there are some differences between the Japanese Group two groups observed in our data. For example, in our ex- location X 12.10 15.75 15.00 20.87 periment the Japanese students reported the ”neutral” (no location Y 10.16 10.55 11.58 15.97 apparent feeling) emotion more than the Filipino students, location Z 13.91 15.67 16.67 23.27 despite the same definition of the affective state labels be- Filipino Group ing provided to the two groups. It is difficult to say whether location X 14.14 14.06 14.34 17.25 this was because the Japanese students really felt less emo- location Y 9.80 9.02 9.01 10.68 tions or because they tend to be more reluctant to report their location Z 10.75 11.73 11.96 15.74 emotions. Another noticeable difference that can have implications in the implementation of ITS is that the Japanese students tend to have higher intensities of AU04 (brow lowerer) com- was different from previous similar studies in which learning pared to the Filipino students. Upon closer inspection, this interventions and prompts were given to elicit emotions. was because the Japanese students tend to look down at key- Despite this, we found that students still experienced board more while typing, causing their eyebrows to move learning-specific emotions all throughout the sessions, and downwards, which was being detected as AU04. Considera- that they transition between these emotions. We were able to tions like this have to be made when designing systems for confirm transitions in the theoretical model of affect dynam- practical use. ics for complex learning tasks, which show that engagement generally transitions to confusion when hurdles are encoun- tered, and confusion can either transition back to engage- Conclusion ment if the confusion is resolved, or transition to frustration In this paper, we presented an analysis of the affective ex- if not resolved. perience of students while interacting with a system for pro- On average, we found that frustration accounted for gramming practice. We believe that our findings can provide around a fifth of all the emotions experienced. This could insights in the development and implementation of affect- have been potentially addressed if confusion could be re- aware intelligent tutoring systems for programming. solved through tutor intervention. This is supported by our findings that students who performed better (i.e. solved more Acknowledgments problems) experienced less negative affective states like The authors would like to thank Mr. Fritz Kevin Flores, Mr. frustration and boredom, and at the same time experienced Manuel Carl Toleran, and Mr. Kayle Anjelo Tiu for facilitat- more engagement. This shows that there is potential for in- ing the data collection sessions in De La Salle University in telligent programming tutors to improve the learning expe- the Philippines. rience of students. Transitions between affective states were often observed during code compilations and submissions. These are also References the points in the session where the system gives feedback Baltrusaitis, T.; Zadeh, A.; Lim, Y. C.; and Morency, L.-P. (i.e., displays if the output of the code or displays if the 2018. Openface 2.0: Facial behavior analysis toolkit. In Au- submission passed or failed). This implies that changes in tomatic Face & Gesture Recognition (FG 2018), 2018 13th the affective state could be more easily triggered by sys- IEEE International Conference on, 59–66. IEEE. tem feedback. And if appropriate interventions could be Baltrušaitis, T.; Mahmoud, M.; and Robinson, P. 2015. displayed, intelligent programming tutors could potentially Cross-dataset learning and person-specific normalisation for control transitions of negative affective states to more posi- automatic action unit detection. In Automatic Face and Ges- tive ones. ture Recognition (FG), 2015 11th IEEE International Con- We also looked into the features that could be useful for ference and Workshops on, volume 6, 1–6. IEEE. predicting affect, and found statistical evidence that asso- Bosch, N., and D’Mello, S. 2013. Sequential patterns of af- ciated certain log-based and face-based features in recog- fective states of novice programmers. In The First Workshop nizing certain emotions. We found that document changes on AI-supported Education for Computer Science (AIEDCS (typing), compilations, AU04 (lowering of the brow), and 2013), 1–10. head location standard deviation could be useful features for predicting affect. Face-based features alone are difficult to Bosch, N.; Chen, Y.; and D’Mello, S. 2014. Its written use in affect recognition, as was shown in previous studies, on your face: detecting affective states from facial expres- but combining them with log-based features and a mdoel of sions while learning computer programming. In Interna- affect occurrence and transition could potentially improve tional Conference on Intelligent Tutoring Systems, 39–44. performance. Springer. We conducted the same experiment on two different uni- Bosch, N.; D’Mello, S.; and Mills, C. 2013. What emo- versities, and we were able to achieve very similar reuslts, tions do novices experience during their first computer pro- gramming learning session? In International Conference on Artificial Intelligence in Education, 11–20. Springer. Cho, M.-H., and Heron, M. L. 2015. Self-regulated learning: the role of motivation, emotion, and use of learning strate- gies in students learning experiences in a self-paced online mathematics course. Distance Education 36(1):80–99. Daniels, L. M.; Stupnisky, R. H.; Pekrun, R.; Haynes, T. L.; Perry, R. P.; and Newall, N. E. 2009. A longitudinal analysis of achievement goals: From affective antecedents to emo- tional effects and achievement outcomes. Journal of Educa- tional Psychology 101(4):948. D’Mello, S., and Graesser, A. 2012a. Autotutor and affective autotutor: Learning by talking with cognitively and emotion- ally intelligent computers that talk back. ACM Transactions on Interactive Intelligent Systems (TiiS) 2(4):23. D’Mello, S., and Graesser, A. 2012b. Dynamics of affective states during complex learning. Learning and Instruction 22(2):145–157. D’Mello, S. 2012. Monitoring affective trajectories during complex learning. In Encyclopedia of the Sciences of Learn- ing. Springer. 2325–2328. Ekman, P., and Friesen, W. V. 1975. Unmasking the face: A guide to recognizing emotions from facial cues. Grafsgaard, J. F.; Boyer, K. E.; and Lester, J. C. 2011. Pre- dicting facial indicators of confusion with hidden markov models. In International Conference on Affective Comput- ing and Intelligent Interaction, 97–106. Springer. Jaques, N.; Conati, C.; Harley, J. M.; and Azevedo, R. 2014. Predicting affect from gaze data during interaction with an intelligent tutoring system. In International Conference on Intelligent Tutoring Systems, 29–38. Springer. Mega, C.; Ronconi, L.; and De Beni, R. 2014. What makes a good student? how emotions, self-regulated learning, and motivation contribute to academic achievement. Journal of Educational Psychology 106(1):121. Tiam-Lee, T. J., and Sumi, K. 2018. Adaptive feedback based on student emotion in a system for programming prac- tice. In International Conference on Intelligent Tutoring Systems, 243–255. Springer. Zatarain-Cabada, R.; Barrón-Estrada, M. L.; Camacho, J. L. O.; and Reyes-Garcı́a, C. A. 2014. Affective tutoring system for android mobiles. In International Conference on Intelligent Computing, 1–10. Springer.