Detecting emotions in a learning environment: a multimodal exploration Sharanya Lal, Tessa H.S. Eysink, Hannie A.H. Gijlers, Willem B. Verwey and Bernard P.Veldkamp University of Twente, Drienerlolaan 5, Enschede, 7522 NB, The Netherlands Abstract Learner-emotions are intrinsically linked with learning experiences and academic outcomes. Therefore, intelligent learning environments need to be emotion-aware to bring learners to their zone of proximal development. In this paper, we describe the first steps towards such a system. In this study, we manipulated task difficulty with the aim of detecting the physiological indicators of accompanying emotions, namely boredom/anger (during an easy task), enjoyment (during a moderately challenged task) and frustration/boredom (during a difficult task). Twenty-one adults (13 females and 8 males, Mage = 24.1 years) participated in a repeated- measures quasi-experimental set-up. Data were collected via Empatica E4 wristbands and self- reports. Results indicate that varying task difficulty may be associated with changes in skin temperature, phasic and tonic skin conductance, and heart rate. Findings encourage further exploration and thoughts on study design are discussed. Keywords 1 psychophysiology, wearables in education, affective computing, emotion detection 1. Introduction and background reported task difficulty. In another study, [6] found that positive emotions (namely enjoyment and pride) predicted high learning 1.1. Emotions in learning achievements while the opposite was true for negative emotions (namely anger, anxiety, Emotions play a significant role in learning shame, boredom and hopelessness). Therefore, and this is evidenced by the growing body of to optimise learning experiences and work on the interaction of learner emotions, outcomes, it is essential that one takes learner well-being, and learning outcomes [1], [2], emotions into account. In today’s era of digital [3], [4], [5]. For example, [5] found that the learning, this calls for intelligent learning induction of positive emotions in learners systems that can detect learners’ emotions to resulted in higher learning transfer, greater provide optimally adjusted support. mental engagement and lower levels of _______________________ Proceedings of the Doctoral Consortium of Sixteenth European Conference on Technology Enhanced Learning, September 20–21, 2021, Bolzano, Italy (online). EMAIL: s.lal@utwente.nl (A. 1); t.h.s.eysink@utwente.nl (A. 2); a.h.gijlers@utwente.nl (A. 3); w.b.verwey@utwente.nl (A. 4); b.p.veldkamp@utwente.nl (A. 5) ORCID: 0000-0002-6228-8850 (A. 1); 0000-0001-5820- 4469 (A. 2); 0000-0003-2406-6541 (A. 3); 0000-0003- 3994-8854 (A. 4); 0000-0003-3543-2164 (A. 5) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 1.2. Theoretical perspectives emotional state has occurred and data are collected [11]. The latter could result in the In their meta-study that showed strong collection of data for another moment in time correlations between emotional, cognitive and or even inaccuracies when recalling past learning processes in e-learning environments, experiences. Consequently, there is much [7] suggest fostering optimal levels of interest in alternate approaches to emotion subjective control (i.e., a learner’s appraisal of detection that can provide objective, time- how much control over a task they have) and specific and reliable data. One approach that is value (i.e., the value a learner places on a notably gaining traction is the use of task). Their results align with and suggestions physiological measures to understand rely heavily on Pekrun’s [4] Control-Value underlying psychological processes. For theory that states that the subjective appraisals example, [12] found that emotional valence of control and value are central to emotions (i.e., the extent to which an emotion is related to learning. For example, if the learner negative or positive) was positively related to sees positive value in a task and has high blood volume pulse (i.e., a measure of the control of actions, they experience enjoyment. changes in blood volume flowing through one's On the other hand, if they see no value in the arteries and capillaries). Skin conductance (i.e., task, they feel bored irrespective of whether skin’s property of conducting electricity) has they have high or low control. Similarly, if been found to reflect stress during a task [13], learners find themselves unable to control an and emotional arousal [14]. In recent activity, they experience frustration educational research specifically, [15] studied irrespective of the value they placed on the adolescent girls learning in maker-spaces and same. Pekrun’s [4] activity related emotions found that skin conductance was positively draw on Csikszentmihalyi’s [1] seminal work related to engagement. In another study, [16] on ‘flow’ – a state of extreme concentration, measured average student heart rates (i.e., the when someone is so engaged in the task at number of heart beats per minute) during hand that they forget the passage of time. Flow medical school lectures and found a steady theory suggests that learners in ‘flow’ decline from the start to the end of a lecture. experience enjoyment and happiness and that They also found that heart rate significantly this is achieved when one not only has a clear increased during periods of student interaction goal, a sense of purpose and immediate such as group-based problem solving. More feedback, but also a balance of challenge and recently, [17] in a study involving 67 students skill (with challenge and skill level being just solving statistical exercises of varying above the average for the person) [1], [8]. This difficulty found that heart rate and skin in turn shares similarities with one of the most temperature were significantly related to self- significant concepts in learner centric reported cognitive load and skin temperature education – the Zone of Proximal specifically to task performance. Studies like Development (ZPD) [9], which posits that these suggest that these measures are useful learning is optimal whena task is just out of the indicators of challenge to skill balance, learner’s reach and they have available the perceived task difficulty and task absorption assistance of a more skilled/knowledgeable and can therefore offer a glimpse into learner person. Taking cue from this, in this study, we emotions. Physiological signals that can now look at emotions in light of learner’s be assessed with portable devices give us perceptions of task difficulty, challenge to access to vast amounts of uninterrupted, time- skill balance, absorption in a task and control- specific and objective data points, thus value appraisals. bringing us closer to understanding a learner’s emotional state in real-time. However, research is still at a nascent stage and there is 1.3. Detecting emotions value in advancing the body of literature on the same (e.g., [18], [19], [20]). Emotion detection has traditionally been done through learner reported data [10]. Such an approach has several limitations including 2. Research aims of present study the subjective nature of self-reports and the likely temporal mismatch between when an The present study is the first step in our research project that is geared towards 3. For the task that was neither too developing an intelligent learning system that difficult nor too easy, it was adapts to a learner’s emotions so as to bring expected that learners would them to their ZPD. Therefore, this paper perceive a balance between the focuses on emotion detection. To this end, a challenge and their skills and have repeated-measures quasi- experimental design high task absorption. An appraisal of was adopted wherein physiological data in high control and high value of the combination with self- reported measures were task would be associated with a used to detect emotional states. The positive emotional state (i.e., physiological signals investigated in the study enjoyment). Enjoyment being an were skin conductance, skin temperature, activating emotion would be blood volume pulse and heart rate. Emotional associated with high skin states were elicited primarily through the conductance and heart rate.We also manipulation of task difficulty in a digital expected blood volume pulse to be learning environment designed to teach an indicator of emotional valence programming skills. This manipulation (see [12] and skin temperature to be high Methods) was done with the expectation that it during the difficult task [17]. would lead to differences in learners’ Emotional states were also elicited through perceptions of challenge to skill balance, task a sample taken from the Open Affective absorption and therefore emotions. Drawing on Standardised Image Set (OASIS) [21] the ideas of Csikszentmihalyi [1] and Pekrun (described in Methods). The hypothesis was [4] and past studies on psychophysiological thatthe valence and arousal associated with the measures,several conjectures were made: different images would be reflected in the 1. For the task that was too easy, physiological signals. Therefore, these could learners would perceive a mismatch act as reference points when interpreting between challenge and skills and emotions during the programming tasks. have lowabsorption in task. Based on Thus, this study aimed to detect their appraisal of control over and psychophysiological indicators (if any) of value of the task, they would learner emotions associated with tasks of experience either boredom (no varying difficulty. value, high control) or anger (negative value, high control). 3. Methods Boredom being a deactivating emotion (i.e., one that is associated with low arousal) would be 3.1. Participants associated with low skin conductance and heart rate. Anger on Participants consisted of 21 (13 females the other hand being an activating and 8 males, 19-32 years old, Mage = 24.14 emotion (i.e., one that is associated years) university students and working with high arousal) would be professionals based in the Netherlands. The associated with high skin sample consisted of persons of 6 nationalities conductance and heart rate. and different educational levels (11 bachelor 2. For the task that was too difficult, the students, 1 bachelor’s degree holder, 8 master’s expectation was that learners would degree holders and 1 PhD student). All perceive a mismatch between participants had at least working knowledge of challenge and skills and have low English and basic computer skills. Participation absorption in task. Based on control was voluntary and active consent had been and value appraisal of the task, they received from all participants before the start would either experience frustration of theexperiment. (positive/negative value, low control) or boredom (no value, low 3.2. Materials control). Unlike boredom, frustration being an activating emotion would be associated with high skin 3.2.1. Primary stimuli set – conductance and heart rate. programming tasks In the learning environment [22], participants programmed instructions by joining blocks of code to control a red ‘robot’ (see Figure 1). The goal was to make the robot reach the end of its path by codingits trajectory. Paths could be 5-, 10- or 15-step, eachrequiring a longer or more sophisticated piece of code than the previous. The environment also had a free-play ‘Sandbox’ mode, in which participants were free to explore the environment in any way they wanted – there was no specific aim to this activity. Three tasks of varying difficulty were designed within the learning environment. The Figure 2: In the learning environment, one moderately challenging task was to complete a could either code to make the red robot 5-, 10- and 15-step path (see Figure 2) within reach the end of its 5- (Top-Left), 10- (Top- 10 minutes. The easy task was to do a 5-step Right) or 15-step (Bottom-Left) path, or path over and over again for 10 minutes. The explore freely in the Sandbox mode difficult task was to ‘decipher the aim and rules (Bottom-Right) of the Sandbox’ and ‘complete it successfully’ in 10 minutes. This was considered ‘difficult’ because the Sandbox mode does not actually 3.2.2. Baseline- have a tangible goal or rules, thus making the measurement stimulus task a wild goose chase (however, participants were not aware of this fact). User responses A video with the instructions, “Sit still during pilot testing of the environment and and relax” was displayed for 5 minutes. At tasks concurred with these expectations. the 4 m 50s mark, an audio signal indicated the end of the rest period. At this point, the phrase “I feel: ” followed by a smiley meter (described in a subsequent sub-section) appeared on the screenfor 10 seconds. 3.2.3. Secondary stimuli set – images A set of 35 500x400 pixel images – 13 Figure 1: In the digital learning positive (for example, a puppy in a teacup), 10 negative (for example, garbage) and 12 environment, participants selected blocks neutral (for example, a tiled roof) were of code (left pane), edited and joined them sampled from OASIS [21]. The value to form a piece of code (center pane) that given to these images was based on would move the red robot to the end of its participant-reported valence in the original path (right pane) study. While sampling, graphic and sexually explicit images were excluded. The images were presented one after the other with intermittent 5 s pauses wherein a blank screen was inserted. Each image was displayed for 10 seconds. On the 6th second, a smiley meter (described in a subsequent paragraph) along with the phrase “This photo makes me feel…” appeared below the image and stayed visible till the end of the 10th second. 3.2.4. Hardware and software set items) and ‘Task Absorption’ (Task_Absorption) (9 items) [24]. Since the up two statements in the scale , “It was boring for me” and “My attention was not engrossed at Physiological data were collected using the all by the activity” were negatively framed, biosensing wristband E4. The E4 makes use of they were recoded. Testing for reliability, we an electrodermal activity sensor that measures found Cronbach’s α = .92, α = .79 and α = .91 sympathetic nervous system arousal via of the SFS for the moderately challenging, stainless steel electrodes that are placed on the easy and difficult task respectively. Reliability ventral wrist. This arousal is quantified in tests were also performed for each subscale terms of skin conductance which is measured ‘challenge to skill balance’ (‘Chal2Skill’) and in microSiemens (µS) and sampled at 4 Hz ‘task absorption’ (‘Task_Absorption’). We (i.e., 4 readings per second). Skin temperature found that the sub-scales Chal2Skill and was collected in degree Celsius (°C) via the Task_Absorption had a) Cronbach’s α = .95 E4’s infrared thermopile sensor at a sampling and α = .74 respectively, for the moderately frequency of 4 Hz. Blood volume pulse was challenging task, b) α = .91 and α = .93 collected from the E4’s photoplethysmography respectively, for the easy task, and c) α = .88 (PPG) sensor placed on the dorsal wrist and and α = .90 respectively, for the difficult task. was sampled at 64 Hz. Heart rate (calculated Consequently, new variables valued as the per 10 s) was derived from blood volume mean of each subscale were computed to be pulse. In addition to this, acceleration data used for further analyses. It is important to (indicating movement) from the E4’s note that lowand high Chal2Skill ratings denote accelerometer were collected at 32 Hz. All data an imbalance of challenge and skill (i.e. a task were streamed to Empatica’s cloud-based is too difficult or a task is too easy, repository via an android application set up on respectively) and a moderate Chal2Skill rating a mobile phone which in turn was connected denotes a balance of challenge and skill. via Bluetooth to the E4. The internal clock of Another self-report measure used after the the E4 was synchronised with that of the programming tasks was a one- item scale on computer on which the stimuli were loaded. A perceived task difficulty(henceforth referred to screen recorder was set up on the computer so as the Task_Difficulty scale). The scale as to capture timestamps of the different consisted of the following item – ‘Was this task stimuli and digital behaviour during the 1) Too easy 2) Easy 3) Just right 4) Difficult 5) programming tasks. A handheld timer was Very difficult?’ used to facilitate and keep track of the different Interview: An audio-recorded face-to-face activities in the study. semi- structured interview was conducted at the end of the study to glean participants’ 3.2.5. Self-reports experiences during the experiment. Participants were asked how they were feeling Self-reported data were collected using at the start and end of the study, if they could several tools: describe their experiences during the different Smiley meter: A five point smiley meter programming tasks and baselines, and their [23] was used to collect participants’ rationale for selecting a particular smiley on perception of different stimuli during the corresponding smiley meters. study. Participants were expected to reflect on how the stimulus (a programming task, a 3.3. Procedure baseline activity or an image) made them feel and point to the smiley that best represented This study took place during the Covid-19 their emotional state. The scale was used pandemic. Consequently, participants received unmarked to avoid putting specific affect- hygiene and safety guidelines by e-mail and related words into the participant’s head. the experimental space and all equipment were Short flow scale (SFS) and task difficulty sanitized before each use. On the day of the scale: A 20-item short flow scale [24] was used study, participants were individually seated in as a self-report of experiences during the three a closed lab space set up to minimize external programming tasks. The SFS has 2 sub-scales, distractions. Demographic data of participants ‘Challenge to skill balance’ (Chal2Skill) (11 namely age, sex, nationality, handedness, prior knowledge in programming and educational level were collected. Participants then received a general outline of the experimental set-up, procedure, tools and expected code of conduct. Once ready, they were fitted with the Empatica E4 on their non-dominant hand to mitigate the effects of hand movements, making sure that the wristband’s sensors made complete skin contact and the electrodes for skin conductance detection were in line with the gap between the middle and ring finger. The E4 was then switched on, and readings were checked to see that a stable connection had been established. Participants then faced a Figure 3: Study procedure computer screen with their non-dominant hand either on their lap or on the table. Participants first watched an instructional video outlining 3.4. Data pre-processing the components of the learning environment and how to navigate it. They were then guided Blood volume pulse, heart rate, skin by the baseline video during which they sat conductance and skin temperature readings still and could either look at the computer were obtained as separate files. These were screen or the white wall behind it, or keep their combined using a Python program that took eyes closed. Then participants proceeded to do the earliest and latest time stamps and the three programming tasks one after the interpolated all readings between the two. This other. The completion of the tasks was involved bringing all data capturing times to a followed by another baseline reading, then a 0.25 second temporal resolution (in keeping viewing of the images and a third and final with the 4 Hz sampling rate of the baseline reading. After each baseline, electrodermal activity sensor). Timestamps for programming task and image, participants various user actions and events (i.e., start and indicated their emotional state on the smiley end of a stimulus) were obtained from screen meter. Thus for each participant, a total of 41 recordings and added to these data. These smiley meter ratings were collected. were used to determine the duration of time Meanwhile, the researcher kept time, took windows to be analysed. Baselines were notes and checked that the wristband was computed as the start of the baseline video to collecting a continuous stream of data. the reading just before the appearance of the Participants then filled three copies of the SFS smiley meter. The duration of an image and Task_Difficulty scale, once for each stimulus was coded as the moment the image programming task, were interviewed and was displayed to the moment just before the finally debriefed about the purpose of the appearance of the smiley meter. Task duration study. Figure 3 shows the experimental was 10 minutes unless a participant took less procedure. time to complete a task. All continuous physiological readings falling within a time window were averaged. These were then standardised by subtracting from them the average of all the baseline readings. Further analyses were performed using these standardised values. Skin conductance was pre-processed using the MATLAB (The MathWorks, Inc., Natick, MA, U.S.A. ) software package ‘Ledalab’ (version 3.4.9 http://www.ledalab.de). Signal pre-processing included decomposition to its two components, phasic skin conductance (rapidly changing signal) and tonic skin conductance level (slow-moving signal), using challenging task (M) was significantly higher the continuous decomposition analysis method than that during the difficult (D) task [mean [25] and feature extraction. Feature extraction difference = 0.38, p = 0.02]. was done using a threshold of 0. 01 µS. Phasic signal features that were extracted were namely onset and amplitude of non-specific Heart rate significant skin conductance responses (nSCRs). These were used to compute nSCR 15 frequency (nSCR/min) for each programming 10 Mean standardised value task. Baseline nSCR frequency was computed as the average of all three baselines. Taking 5 1.413 cue from Pijeira-Díaz et al. (2018), phasic skin 0 conductance was computed as a categorical -2.807 -2.605 -5 -3.917 variable with 3 values: 0 (low nSCR frequency – 0 to 3 SCR/min), 1 or (medium nSCR -10 frequency – 4 to 20 nSCR/min) and 2 (high -15 nSCR frequency – 21 and above nSCR/min). Tonic skin conductance data was extracted as -20 B E a continuous variable. Condition 4. Results Figure 5: Heart rate during the moderately To answer the exploratory question of challenging task (M) was significantly whether we could detect psychophysiological greater than that during the baseline (B) indicators (if any) of learner emotions [mean difference = 4.22, p = 0.00], easy (E) associated with tasks of varying difficulty, we [mean difference = 4.02, p = 0.00] and made comparisons across the three tasks and difficult (D) tasks [mean difference = 5.33, p deviations from the baseline. We used linear = 0.00]. mixed models while controlling for acceleration and demographic data. Pairwise comparisons were computed having applied Temperature Bonferroni correction. Across tasks, we found 1 Mean standardised value a significant variation in skin conductance 0.8 [F(3, 60) = 15.09, p = 0.00] , heart rate [F(3, 0.6 60) = 9.61, p = 0.00] and temperature [F(3, 0.4 0.31 0.247 0.2 0.177 60) = 3.13, p = 0.03]. Please refer to Figures 4, 0 5 and 6 for more details. -2 -0.021 -0.4 0. -0.6 Skin conductance -0.8 Mean standardisedvalue B E 2.5 2 Condition 1.5 0.865 1 1.103 0.5 1.484 0.722 Figure 6: Temperature during the easy (E) 0 task was significantly higher than that B E during the baseline (B), [mean difference = Condition 0.33, p = 0.03]. No significant changes during the moderately challenging (M) and difficult Figure 4: SC at baseline (B) was significantly (D) tasks were observed. higher than that during the easy (E) [mean difference = 0.62, p = 0.00], moderately Results indicated no significant changes in challenging (M) [mean difference = 0.38, p = blood volume, F(3, 60) = 1.20, p = 0.32 and 0.02] and difficult (D) tasks [mean difference = tonic skin conductance, F(3, 66) = 1.46, p = 0.76, p = 0.00]. SC during the moderately 0.23. Relationships between physiological data images of few participants). and appraisals of challenge to skill balance, Finally, we also evaluated the stimuli, i.e., task difficulty and task absorption were examined whether participants perceived the explored. To do this, demographic data were programming tasks as they were intended to included as fixed factors and participant was a be (namely, task 1 – moderately challenging random factor in the linear mixed model. We and positive-emotion inducing, task 2 – too found no effect of Chal2Skill (F(1, 38.04) = easy, negative-emotion inducing, and task 3 0.79, p = 0.38), Task_Absorption (F(1, – too difficult, negative-emotion inducing). 37.61) = 0.06, p = 0.81) and Task_Difficulty We used linear mixed models while (F(4, 36.06) = 0.10, p = 0.98) on phasic skin controlling for demographic data. Results conductance. We also found no effect of indicated significant differences in Chal2Skill Chal2Skill (F(1, 33.69) = 1.97, p = 0.17), ratings [F(2, 40) = 43.59, p = 0.00]. The Task_Absorption F(1, 33.58) = 0.46, p = 0.50) average Chal2Skill rating for the moderately and Task_Difficulty (F(4, 33.35) = 0.96, p = challenging task exceeded that of the difficult 0.44) on heart rate. No significant effect of task (mean difference = 1.43, p = 0.00), while Chal2Skill (F(1, 34.88) = 1.45, p = 0.24), that of the easy task was greater than that of the Task_Absorption (F(1, 34.61) = 1.45, moderately challenging (mean difference = p = 0.24) and Task_Difficulty 0.76, p = 0.01) and difficult task (mean (F(4, 33.96) = 0.93, p = 0.46) was found on difference = 2.20, p = 0.00). We found blood volume pulse. Chal2Skill (F(1, 38.60) = significant differences in Task_Difficulty 1.29, p = 0.26), Task_Absorption (F(1, 38.22) ratings [F(2, 39) = 40.97, p = 0.00]. As = 0.80, p = 0.38) and Task_Difficulty (F(4, expected, Task_Difficulty ratings for the 36.37) = 2.09, p = 0.10) had no significant difficult task were greater than those of the effects on temperature. Chal2Skill was found moderately challenging task (mean difference to have a positive effect on tonic skin = 1.86 , p = 0.00) and easy task (mean conductance (β = 0.43, t(36.96) = 2.93, difference = 2.60, p = 0.00), while ratings for p = 0.00, 95% CI [0.13, 0.73] and the moderately challenging task were higher Task_Absorption was found to have a than those for the easy task (mean difference = negative effect (β = -0.37, t(37.38) = -3.56, 0.75, p = 0.05). No significant differences in p = 0.00, 95% CI [-0.59, -0.16]). There are Task_Absorption ratings were found [F(2, 40) some indications that Task_Difficulty ratings = 2.15, p = 0.13]. We also found no significant negatively affect tonic skin conductance: For differences in smiley meter ratings for the Task_Difficulty = 1, β = -2.27, t(34.67) = different tasks, F(2, 46) = 1.14, p = 0.33. This -4.33, p = 0.00, 95% CI [-3.34, -1.21], for is corroborated by the interviews in which Task_Difficulty = 2, β = -1.20, t(33.77) = several participants exhibit recall bias at the -2.48, p = 0.02, 95% CI [- 2.19, -0.22], for time of responding to the smiley meters. For Task_Difficulty = 3, β = -0.88, t(34.70) = example, one participant provided a low -2.09, p = 0.04, 95% CI [-1.74, -0.03] and for smiley meter rating despite having enjoyed the Task_Difficulty = 4, β = 0.036, t(34.91) = task simply because they felt disappointed at 0.12,p = 0.90, 95% CI [-0.56, 0.63]. not being able to complete it on time. In Next, to examine whether the valence of another case, a participant displayed agitation (OASIS image-induced) emotions would be through most of the task period but gave a reflected in physiological data, relationships high rating because they managed to between the latter and smiley meter ratings for understand the task towards the end. images were analysed. We found no significant Consequently, smiley meter ratings for the relation between smiley meter ratings and tasks were not included in any other analyses. tonic skin conductance levels [F(4, 677.72) = During the interviews, some words used to 1.63 , p = 0.17 ] , blood volume pulse [F(4, describe experiences during the moderately 654.02) = 0.97 , p = 0.42 ], heart rate [F(4, challenging task were “confused”, 683.29) = 1.66 , p = 0.16 ] and skin “challenging”, “enjoyable” and “fun”. Some temperature [F(4, 676.20) = 1.37 , p = 0.24 ]. participants (n = 5) described feeling slightly Feature extraction from phasic skin stressed or frustrated when they could not find conductance data corresponding to the image a solution at the beginning, but feeling better stimuli resulted in no significant SCRs for afterwards. Some (n = 4) displayed practically the whole dataset (except 1 to 2 disappointment at not being able to complete the task. Talking about the easy task, most limitations of the study (discussed below) participants (n = 13) mentioned its repetitive suggest that more evidence is required to nature or described being bored at some point ascertain whether all these physiological during the task. While describing their changes are indeed due to the emotional experience during the difficult task, most stimuli. participants (n = 11) mentioned frustration, The biggest limitations of this study are the annoyance, a sense of hopelessness or fixed order of the programming tasks and a incompetence. lack of sufficient evidence to ascertain clear relationships between all the physiological 5. Discussion signals and self- reports. Therefore, we cannot write off order-effects and there is a great In this study, we attempted to detect likelihood that the changes in physiological physiological indicators of learning related signals are simply due to the passage of time. emotions by using multimodal data from a Also, there is the issue of obtaining clear biosensing wristband and self-reports. To this self-reports on emotions. In this study, data end, we presented participants with an easy, from the smiley meters did not add value to the moderately challenging and difficult task with analysis. The decision to use a smiley meter the expectation that these would be associated was to ensure that we did not put words into with different emotions. It was expected that participants’ heads. However, this resulted in during the easy and difficult tasks, participants not having direct measures of learner emotions would experience negative emotions and having to make inferences based only on (boredom/frustration/anger). This negative learner appraisals of task difficulty, challenge emotional state would be associated with a to skill balance and task absorption. We also combination of low blood volume pulse and gathered that the 10 minute intervals between either low skin conductance and heart rate, or smiley meter ratings on the programming tasks high skin conductance and low heart rate. We were likely too long as several participants also expected that during the moderately displayed recall bias. Since these limitations challengingtask, participants would experience warrant further research, in our next study, we a positive emotional state (i.e., enjoyment), will tweak our design to ensure increased which in turn would be associated with high reliability of our findings. Firstly, we plan to blood volume pulse, skin conductance and randomise the order of tasks for each heart rate. Results show that participants in participant. And secondly, we will collect general had lower phasic skin conductance and regular and intermittent reports during the task heart rate during the difficult task as compared (for example, every 3 to4 minutes) on a more to the moderately challenging task. In fact, sophisticated scale such as the Affect Grid heart rate during the moderately challenging [28] or Self-Assessment Manikin [29]. task was also higher than that during baseline The use of physiological measures of and the easy task. On the other hand, no emotion detection has important theoretical significant differences in blood volume pulse and practical implications. As mentioned were found. The findings of high heart rate and earlier, the vast majority of studies in learner phasic skin conductance during the moderately emotion have utilized self-reported data [10]. challenging task align with our expectation of These include the building of significant indicators of enjoyment. Similarly, low phasic educational theories such as [6]. An approach skin conductance, tonic skin conductance and utilizing multimodal data including heart rate during the difficult task could physiological data (such as what we do in this indicate boredom. While we did not see high study) opens up the possibility to test such skin temperatures during the difficult task as theories in a more robust manner and advance expected, indications of high skin temperature our knowledge base on learner psychology. and tonic skin conductance levels during the Additionally, such studies take us closer easy task could indicate anger [26], [27]. These towards realizing intelligent systems that can indications of enjoyment, boredom and anger detect and therefore cater to the emotions of also align with our expectations based on the learners. The results of the present study thus control-value theory [6]. However, a contribute towards the field of emotions in comparison with self-reports and certain learning. 6. Conclusion 2001, pp 43-46. doi: 10.1109/ICALT.2001.943837 In the present study, we found indications [3] R. E. Mayer, Searching for the role that certain learner emotions related to of emotions in e-learning, Learn. different task difficulties may possibly be Instr. 70 (2019). doi: characterised by a combination of phasic and 10.1016/j.learninstruc.2019.05.010. tonic skin conductance, heart rate, and skin [4] R. Pekrun, The control-value theory temperature. Such a psychophysiological of achievement emotions: approach to emotion detection can open the assumptions, corollaries, and doors to real- time adaptive support that can implications for educational research bring learners to their zone of proximal and practice, Educ. Psychol. Rev. 18 development and consequently greatly (2006), pp. 315–341. doi: improve learning outcomes. Therefore, though 10.1007/s10648-006-9029-9. the results of the present study are far from [5] E. R. Um, J. L. Plass, E. O. Hayward, definitive, we see value in advancing research B. D. Homer, Emotional design in in this area. Our next steps include a) multimedia learning, J. Educ. Psychol. furthering our exploration of signals collected 104 (2012), pp. 485–498. doi: from the E4 after including design changes 10.1037/a0026609. derived from this study, b) exploring other [6] R. Pekrun, S. Lichtenfeld, H. W. nonintrusive measures of learner engagement Marsh, K. Murayama, T. Goetz, such as camera based eye tracking and screen Achievement Emotions and activity, c) developing a multimodal system of Academic Performance: Longitudinal emotion detection, d) prototyping an adaptive Models of Reciprocal Effects, Child system based on affective feedback. Dev. 88 (2017), pp. 1653–1670. doi: 10.1111/cdev.12704. 7. Acknowledgements [7] K. Loderer, R. Pekrun, J. C. Lester, Beyond cold technology: A systematic review and meta-analysis This study was funded by the BMS on emotions intechnology-based Signature PhD grant at the University of learning environments, Learn. Instr. Twente and made possible with the technical 70 (2020). doi: assistance of the university’s BMSLab. We 10.1016/j.learninstruc.2018.08.002. would like to thank André Bester and Lucia [8] K. S. Beard, Theoretically Speaking: M. Rabago Mayer for the Python code, setting An Interview with Mihaly up the Unity platform and other technical Csikszentmihalyi on Flow Theory guidance. We would also like to thank Development and Its Usefulness in Johannes Steinrücke for his guidance on Addressing Contemporary Challenges statistical methods, especially linear mixed in Education, Educ. Psychol. Rev. 27 models. (2015). doi: 10.1007/s10648- 014- 9291-1. 8. References [9] L.S. Vygotsky, Mind in society: The development of higher psychological [1] B. Mihaly Csikszentmihalyi, Flow: processes, Harvard University Press, The Psychology of Optimal Massachusetts, 1978. Experience, Harper & Row, New [10] C. H. Wu, Y. M. Huang, J. P. York, 1990. Hwang, Review of affective [2] Kort, Barry, Rob Reilly and computing in education/learning: Rosalind W. Picard, An affective Trends and challenges, Br. J. Educ. model of interplay between Technol. 47 (2016). doi: emotions and learning: engineering 10.1111/bjet.12324. educational pedagogy-building a [11] E. Yadegaridehkordi, N. F. B. M. learning companion, in: Proceedings Noor, M. N. Bin Ayub, H. B. Affal, IEEE International Conference on N. B. Hussin, Affective computing in Advanced Learning Technologies, education: A systematic review and future research, Comput. Educ. 142 for mathematics that addresses (2019). doi: cognition, metacognition and affect, 10.1016/j.compedu.2019.103649. Int. J. Artif. Intell. Educ. 24 (2014) [12] E. H. Jang, M. S. Park, B. J. Park, S. 387–426. doi: 10.1007/s40593- 014- H. Kim, M. A. Chung, J. H. Sohn, 0023-y. Relationship between affective [19] M. Dindar, J. Malmberg, S. Järvelä, dimensions and physiological E. Haataja, P. A. Kirschner, responses induced by emotional Matching self-reports with stimuli: Base on affective electrodermal activity data: dimensions: Arousal, valence, Investigating temporal changes in intensity and approach, in: self- regulated learning, Educ. Inf. Proceedings of PhyCS 2014 - Proc. Technol. 25 (2020) 1785–1802. doi: Int. Conf. Physiol. Comput. Syst., 10.1007/s10639-019-10059-5. 2014, pp. 254–259. doi: [20] H. J. Pijeira-Díaz, H. Drachsler, P. A. 10.5220/0004728302540259. Kirschner, S. Järvelä, Profiling [13] Brouwer, A., Van Beurden, M., sympathetic arousal in a physics Nijboer, L., Derikx, L., Binsch, O., course: How active are students?, Gjaltema, C., Noordzij, M., A Journal of Computer Assisted comparison of different Learning 34 (2018) 397–408. doi: Electrodermal variables in response 10.1111/jcal.12271. to an acute social stressor, Symbiotic [21] B. Kurdi, S. Lozano, M. R. Banaji, Interaction 7 (2018). doi: Introducing the Open Affective https://doi.org/10.1007/978-3-319- Standardized Image Set (OASIS), 91593-7_2. Behav. Res. Methods 49 (2017) 457– [14] M. Jindrová, M. Kocourek, P. 470. doi: 10.3758/s13428-016-0715- Telenský, Skin conductance rise time 3. and amplitude discern between [22] D.C. Menezes, 2019. Play Mode different degrees of emotional arousal Blocks Engine. URL: induced by affective pictures https://assetstore.unity.com/packages/ presented on a computer screen, te mplates/systems/play-mode-blocks- bioRxiv (2020). doi: engine-158224. 10.1101/2020.05.12.090829. [23] Read JC, MacFarlane SJ, Casey C, [15] V. R. Lee, L. Fischback, R. Cain, A Endurability, engagement and wearables-based approach to detect expectations: measuring children’s and identify momentary engagement fun, in : Proceedings of the in afterschool Makerspace programs, International Workshop Interaction Contemp. Educ. Psychol. 59 (2019). Design and Children, Shaker doi: Publishing, Eindhoven, 2002. 10.1016/j.cedpsych.2019.101789. [24] T. Magyaródi, H. Nagy, P. Soltész, T. [16] D. K. Darnell, P. A. Krieg, Student Mózes, A. Oláh, Psychometric engagement, assessed using heart properties of a newly established rate, shows no reset following active flow state questionnaire, 2013. learning sessions in lectures, PLoS [25] Benedek, M., & Kaernbach, C., One 14 (2019). doi: Decomposition of skin conductance 10.1371/journal.pone.0225709. data by means of nonnegative [17] C. Larmuseau, J. Cornelis, L. deconvolution. Psychophysiology 47 Lancieri, P. Desmet, F. Depaepe, (2010) 647 – 658. doi: Multimodal learning analytics to 10.1111/j.1469-8986.2009.00972.x. investigate cognitive load during [26] V. Jha, N. Prakash, S. Sagar, online problem solving, Br. J. Educ. Wearable anger-monitoring system, Technol. (2020). doi: ICT Express 4 (2018) 194–198. doi: 10.1111/bjet.12958. 10.1016/J.ICTE.2017.07.002. [18] I. Arroyo, B. P. Woolf, W. Burelson, [27] G. Stemmler, Physiological processes K. Muldner, D. Rai, M. Tai, A during emotion, in: P. Philippot, R. S. multimedia adaptive tutoring system Feldman (Eds.), The regulation of emotion, 1st ed., Lawrence Erlbaum Associates Publishers, 2004, pp. 33– 70. doi: 10.4324/9781410610898 [28] J. A. Russell, A. Weiss, G. A. Mendelsohn, Affect Grid: A single- item scale of pleasure and arousal., J. Pers. Soc. Psychol. 57 (1989) 493– 502. doi: 10.1037/0022- 3514.57.3.493. [29] M. M. Bradley, P. J. Lang, Measuring emotion: The self- assessment manikin and the semantic differential, J. Behav. Ther. Exp. Psychiatry 25 (1994) 49–59. doi: 10.1016/0005-7916(94)90063-9.