Sense the classroom: AI-supported synchronous online education for a resilient new normal Krist Shingjergjia, Deniz Irena, Corrie Urlingsa and Roland Klemkea,b a Open University of the Netherlands 6419 AT Heerlen, The Netherlands b Faculty of Cultural Sciences, TH Köln, Cologne, Germany Abstract Following the COVID-19 pandemic, as the user-base of online synchronous communication systems skyrocketed, the shortcomings of synchronous online learning systems became more visible. Any attempt to overcome these shortcomings should be considered worthwhile due to the magnitude of potential impact. Improving the quality and addressing the shortcomings of online education is more important than ever. The goal of this multidisciplinary study that lies in the intersection of the fields of Education Science and Computer Science is to address a number of challenges of online education by incorporating AI. This study focuses on developing methods and means to ethically collect and use non-verbal cues of participants of online classrooms to assist teachers, students, and course coordinators by providing real-time and after-the-fact feedback of the students’ learning-centered affective states. Keywords 1 Technology enhanced learning, learning-centered affective states, affective computing, synchronized learning, artificial intelligence 1. Introduction and motivation Students’ learning experience and performance are highly related to their psychological, physiological, and emotional Online learning provides a means of states [5]. Teachers can notice when students education to students with physical limitations are distracted, confused, tired, etc., and have the or inconvenience to participate in physical, opportunity to adjust their teaching approach face-to-face classroom education. During the accordingly [6], and choose appropriate COVID-19 pandemic, this limitation became interventions to keep the learning experience of relevant for all students. Approximately, 1.2 the class optimal. However, many teachers who billion learners were affected by the closure of gave an online lecture have experienced the schools at the time of the pandemic [1] and severe lack of an understanding of the learning- educational institutions worldwide made a centered affective states of students in the mandatory transition to online/hybrid learning classroom, thus, missing opportunities to [2]. As the utilization of online learning reached improve the overall learning experience. This unprecedented levels, the already-known also directly impacts individual students. In challenges of online education became online lectures, students are more prone to painfully visible for both students (e.g., feeling distractions and use the Internet for purposes of isolation [3] ) and teachers (e.g., lack of face unrelated to the educational activity [7]. The to face interaction with the students [4]). lecturers cannot give timely feedback to guide Proceedings of the Doctoral Consortium of Sixteenth European Conference on Technology Enhanced Learning, September 20–21, 2021, Bolzano, Italy (online). EMAIL: krist.shingjergji@ou.nl (Krist Shingjergji) ORCID: 0000-0002-8239-9478 (Krist Shingjergji) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) the attention since they do not observe the communication medium that partly relates to students physically. As a result, the students are the capability of transmitting body language, left alone to manage their learning experience, facial expressions, and natural speech [10]. stay motivated, and struggle not to fall behind According to this theory, a reduction in a during the educational activity. The underlying medium’s naturalness may lead to a decrease in reason that leads to these challenges is the learning effectiveness, and a potential increase communication modality limitations of video in ambiguity of the conveyed message [11]. conferencing technologies. In this study, we build on Media Naturalness 2.2. Learning centered affective Theory to examine the limitations of video conferencing as a medium of communication states for online, synchronized education. Our objective is to develop artificial intelligence Many studies that aim to detect the (AI) models to detect a multitude of relationship between online learning and components of learning-centered affective emotions by applying emotion recognition states (e.g., gestures, micro-expressions, and techniques, use the basic emotions, namely, macro-expression) of the learners, and present happiness, sadness, fear, disgust, anger, and the aggregated information to the teacher and surprise [12]. A plethora of studies that report the course coordinator in a privacy-protecting an accurate mapping among facial expressions manner, and provide the individual information and emotions exist in the literature [13], [14], to the students themselves. [15]. However, D’Mello in [5] states that the The remainder of this paper is structured as basic emotions are quite infrequent in the the following. Section 2 sheds light on the context of learning with educational software background and related work. Section 3 lays out which raised the need of focusing on the the details of the overall research methodology. learning-centered affective states, such as Section 4 highlights important discussion engagement, concentration, boredom, anxiety, points such as the theoretical and practical confusion, frustration, and happiness. In implications, and ethical considerations. contrast to emotion recognition, the mapping between facial expressions and learning- centered affective states has been severely 2. Background and related work understudied [16]. The observable non-verbal cues consist of In this section we explain the overall gestures and body postures (e.g., head-tilt, nod, methodology that is proposed for this study. shake), micro-expressions (e.g., movement of inner eyebrows and lips), other expressions 2.1. Communication modalities (e.g., smile, frown, confusion), and other and Media Naturalness Theory activities (e.g., note-taking, active-listening, looking-away) [17]. The state-of-the-art facial- expression recognition (FER) and gesture Human communication occurs in multiple recognition (GR) models use Convolutional modalities such as voice, speech, facial Neural Network (CNN) and Recurrent Neural expressions, and body language [8]. One Network (RNN) hybrid networks [18]. important type of human communication is the These models perform discrete/momentary non-verbal communication which is the way of measurement (i.e., in short intervals), generally conveying information without the use of on single modality, and they are trained on words via non-verbal cues i.e., facial datasets in which the non-verbal cues are expressions and body language [9]. Video mimicked by actors (i.e., not naturally conferencing platforms fall short in conveying occurring). As previously mentioned, the non-verbal cues among participants. The detection of constructs other than the six shortcomings of video conferencing as a universal emotions 2, such as learning-centered communication medium can be analyzed and affective states does not have a rich literature. improved based on the Media Naturalness However, the collection of high-quality data for Theory (MNT). MNT describes the criteria to the recognition of learning-centered affective assess the degree of naturalness of a states has been the subject of several studies 2 sadness, happiness, fear, anger, surprise, and disgust that have certain important limitations, for actions to positively influence the learning- instance; focusing only on game-based centered affective states of the students? interfaces [19], being explicit to certain ethnic 4. How can we provide students with this groups [20], and having a limited target information so that learning-centered affective set such as the level of engagement on affective states are positively influenced? a scale [21], and the lack of interest and 5. How can we design a system that is boredom [22]. ethically sound, that respects privacy In this study, we will bridge this gap by concerns and keeps all collected data improving the CNN-RNN hybrids archite- secure? cturally by introducing attention layers, formulating fitting objective functions, fusing data from multiple modalities, and applying transfer learning to train models with the multi- modal data collected from synchronous online education settings. 3. Methodology In this section we explain the overall methodology that is proposed for this research. 3.1. Research model Figure 1: The research model Active and engaged learning is an important 3.2. Approach model in online higher education. Therefore, this study aims at addressing the motivational In this study, we will employ Design-Based and emotional side of online education, by Research (DBR) and experiments throughout providing information that can assist educators multiple iterations (Figure 2). Teachers and to refine the educational activities that they students will be involved in focus groups and have devised. Our aspiration is to utilize co-designing of prototypes [23]. The DBR theories of learning, motivation, and emotion in iterations consist of literature study, combination to (1) define the relationship requirements elicitation, participatory design, between learning-centered affective states of and evaluation of the interventions (i.e., students and observable non-verbal cues, (2) integrated AI models) in pilot studies of online develop specialized multi-modal AI algorithms learning. The AI models will be developed for the recognition of learning-centered through experimentation cycles which affective states, (3) and design tools to present comprise data collection, annotation, algorithm this information in an actionable way for development, and model training and teachers and students to improve the learning evaluation. We will collect data from public- process respecting the privacy of all domain video repositories and online lecture participants (Figure 1). sessions recorded by us with the informed consent of participants. The data will be Thus, the research questions of our studies annotated by multiple experts in terms of are as follows: observed non-verbal cues. Consecutively, we 1. Which are the specific non-verbal feedback will develop algorithms, train and test FER-GR, needs (e.g., facial expressions) of teachers and the learning-centered affective states and students in online lectures? recognition AI models on multiple datasets to 2. How can we automatically detect non- ensure generalizability. We will rely on metrics verbal cues and translate them to learning- that are commonly used in machine learning, centered affective states of multiple i.e., precision, recall, and F-1 measure to participants in online, synchronous, evaluate the accuracy of our models. Data educational activities? management will be conducted in line with the 3. How can we present this information to FAIR data principles [24]. teachers in real-time so that they can take A report that shows the detected learning- centered affective states of the overall Improve Teacher classroom design of matched to educational different parts activity and Figure 2: The research approach of the delivery educational style. After-the-fact We envision the system to provide certain activity (e.g., information in real-time and after-the-fact to slides, various stakeholders for different purposes activities, (Table 1). The information flow is targeted at interventions) specific educational purposes for each party as a timeline. involved, with short-term and long-term educational benefits. Learning- centered Evidence- Course Coordinator affective states based course Table 1 trend of a design and Information flow to the actors of the system course ensuring throughout educational When Who What Why multiple online quality. Aggregated Teacher may educational information (a) alter the activities. regarding the teaching Teacher overall style and/or learning- (b) initiate centered interventions 4. Discussion affective states (e.g., of students in breakout In this section, we discuss several important Real-time the classroom. rooms). aspects of this study including the theoretical and practical implications, the privacy concerns A semiotic Keep the as well as the limitations. indicator that student shows their engaged and own learning- active, Student centered positive 4.1. Theoretical and practical affective influence on implications states, as well the learning as suitable process and The outcomes of our experiments will nudges (e.g., self- potentially allow us to gain a deeper pop questions). regulation. understanding on how learning-centered affective states are indicated by observable non-verbal cues, and how these states can be related to an effective learning experience. Our results will also contribute to the Media Naturalness Theory by extending it to cover widely used video conferencing platforms and tailor it for education scenarios. The practical outcome of this study will be classroom-level. Finally, this study solely aims an analytical platform that is integrated to video at developing a method for improving the conferencing clients of the students and the quality of online education, and not as a way of teachers. The platform will be able to provide individual assessment of the students or the feedback both on real-time and after-the-fact teachers. We are confident that this privacy- for all the involved actors in the course: preserving design will not allow any misuse of students, teachers, and course coordinators the system. (Table 1). In real-time, the platform will provide the teachers with aggregated 4.3. Limitations information regarding the learning-centered affective states of the students. This information will give the teachers the The source of the data in this study will be opportunity to respond in different ways such the participants’ cameras, which results in two as changing the teaching style and/or intervene important limitations. First, we cannot observe in the course content flow. Students are going the entire environment of a student, thus, it is to receive information regarding their own not possible to differentiate whether the learning-centered states, which they can use to observed non-verbal cues of an individual self-regulate and be active and engaged. student are the result of an event in the Regarding the real-time feedback, the system classroom or an off-task activity. Second, in the will be designed in a way that optimizes its use online classrooms, students are in control of while taking part in the educational activity. On their cameras and may refuse to turn them on a longer-term aspect, this aggregated even when the proposed privacy-preserving information will be useful for the teachers and methods are in place. In that case, the proposed the course coordinators as it would play the role method is not applicable. of an evidence-based course evaluation which can be used for future improvement of delivery style and course design from the side of the 5. References teacher and the course coordinator respectively. [1] Education: From disruption to recovery, 4.2. Privacy 2021 URL: https://en.unesco.org/covid19/educatio nresponse. We acknowledge the privacy-sensitive nature of this study. To protect the privacy of [2] S. Dhawan, Online Learning: A Panacea students, and to prevent a possible misuse of the in the Time of COVID-19 Crisis, technology, e.g., using the obtained information Journal of Educational Technology to evaluate students, we design core privacy- Systems 49 (2020) 5–22. doi: preserving measures to shape our research 10.1177/0047239520934018. around them. Firstly, all data collection and [3] M. Alawamleh, L. M. Al-Twait, G. R. experimentation will be voluntary and with the Al-Saht, The effect of online learning on informed consent of the participants. The communication between instructors and ethical board will be consulted prior to all data students during Covid-19 pandemic, collection phases. The training data will be Asian Education and Development collected anonymously with no possibility to Studies (2020). doi: 10.1108/AEDS-06- link to individuals. Secondly, the designed 2020-0131. system will keep sensitive individual data (e.g., [4] S. Gurung, Challenges faced by teachers video) on individuals’ computers. We will use in online teaching during Covid-19 a virtual webcam that implements AI models pandemic, The online journal of and analyzes video data on client computers. distance education and e-Learning 9 This feature will also allow students to keep (2021). their camera off (use avatars or nothing at all) [5] S. D’Mello, A selective meta-analysis while still benefiting from the system. Only the on the relative incidence of discrete processed and anonymized data (i.e., numerical affective states during learning with representations of non-verbal cues) will be technology, Journal of Educational transferred, and the teacher will only be Psychology 105 (2013) 1082-1099. doi: provided with information that is aggregated at 10.1037/a0032674. [6] K. Bahreini, R. Nadolski, W. Westera, Procedia Computer Science 108 (2017) Towards multimodal emotion 1175–1184. doi: recognition in e-learning environments, 10.1016/j.procs.2017.05.025. Interactive Learning Environments, [16] M. A. A. Dewan, M. Murshed, F. Lin, (2016) 590–605. doi: Engagement detection in online 10.1080/10494820.2014.908927. learning: a review, Smart Learning [7] A. Lepp, J. E. Barkley, A. C. Karpinski, Environments 6 (2019). doi: S. Singh, College Students’ 10.1186/s40561-018-0080-z. Multitasking Behavior in Online Versus [17] D. Umnia Soraya, K. Candra Kirana, S. Face-to-Face Courses, SAGE Open 9 Wibawanto, H. Wahyu Herwanto, C. (2019), doi: Wijaya Kristanto, Non-Verbal 10.1177/2158244018824505. Communication Behavior of Learners [8] C. Jewitt, J. Bezemer, K. O’Halloran, on Online-based Learning, in: Introducing multimodality, 1st. ed., Proceedings of the 2nd International Routledge, London, 2016. doi: Conference on Vocational Education 10.4324/9781315638027. and Training (ICOVET 2018), 2019, [9] APA Dictionary of Phycology, 2020. pp. 4-6. doi: 10.2991/icovet-18.2019.2. URL: [18] M. Sharma, D. Ahmetovic, L. A. Jeni, https://dictionary.apa.org/nonverbal- K. M. Kitani, Recognizing Visual communication. Signatures of Spontaneous Head [10] N. Kock, Media naturalness theory: Gestures, in: 2018 IEEE Winter human evolution and behaviour towards Conference on Applications of electronic communication technologies, Computer Vision (WACV), 2018, pp. in: S Craig Roberts, Applied 400-408. doi: evolutionary psychology, Oxford 10.1109/WACV.2018.00050. University Press Inc., New York, 2012, [19] Nigel Bosch, Sidney D'Mello, Ryan pp. 381-398. doi: Baker, Jaclyn Ocumpaugh, Valerie 10.1093/acprof:oso/9780199586073.00 Shute, Matthew Ventura, Lubin Wang, 1.0001. Weinan Zhao, Automatic detection of [11] O. Weiser, I. Blau, Y. Eshet-Alkalai, learning-centered affective states in the How do medium naturalness, teaching- wild, in: Proceedings of the 20th learning interactions and Students’ International Conference on Intelligent personality traits affect participation in User Interfaces (IUI '15), Association synchronous E-learning?, Internet and for Computing Machinery, New York, Higher Education 37 (2018) 40–51. doi: 2015, pp. 379–388. doi: 10.1016/j.iheduc.2018.01.001. 10.1145/2678025.2701397. [12] P. Ekman, An Argument for Basic [20] T. S. Ashwin, R. M. R. Guddeti, Emotions, Cognition and Emotion 6 Affective database for e-learning and (1992) 169–200. doi: classroom environments using Indian 10.1080/02699939208411068. students’ faces, hand gestures and body [13] R. Reisenzein, M. Studtmann, G. postures, Future Generation Computer Horstmann, Coherence between Systems 108 (2020), 334–348. doi: emotion and facial expression: Evidence 10.1016/j.future.2020.02.075. from laboratory experiments, Emotion [21] J. Whitehill, Z. Serpell, Y. C. Lin, A. Review 5 (2013) 16–23. doi: Foster, J. R. Movellan, The faces of 10.1177/1754073912457228. engagement: Automatic recognition of [14] M. Wegrzyn, M. Vogt, B. Kireclioglu, J. student engagement from facial Schneider, J. Kissler, Mapping the expressions, IEEE Transactions on emotional face. How individual face Affective Computing 5, (2014) 86–98. parts contribute to successful emotion doi: 10.1109/TAFFC.2014.2316163. recognition, PLoS ONE 12, (2017) doi: [22] L. B. Krithika, G. G. Lakshmi Priya, 10.1371/journal.pone.0177239. Student Emotion Recognition System [15] P. Tarnowski, M. Kołodziej, A. (SERS) for e-learning Improvement Majkowski, R. J. Rak, Emotion Based on Learner Concentration Metric, recognition using facial expressions, in Procedia Computer Science 85 (2016) 767–776. doi: 10.1016/j.procs.2016.05.264. [23] K. Holstein, B. M. McLaren, V. Aleven, Co-designing a real-time classroom orchestration tool to support teacher–ai complementarity, Journal of Learning Analytics 6 (2019) 27–52. doi: 10.18608/jla.2019.62.3. [24] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J. W. Boiten, L. O. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes, T. Clark, M. Crosas, I. Dillo, O. Dumon, S. C. Edmunds, C. T. A. Evelo, R. Finkers, A. N. González-Beltrán, A. J. G. Gray, P. Groth, C. A. Goble, J. S. Grethe, J. Heringa, P. A. C. ‘t Hoen, R. W. W. Hooft, T. Kuhn, R. G. Kok, J. N. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. Laerte Packer, B. Persson, P. Rocca- Serra, M. Roos, R. C. van Schaik, S. Sansone, E. A. Schultes, T. Sengstag, T. Slater, G. O. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. M. van Mulligen, J. Velterop, A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, B. Mons, The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3 (2016). doi: 10.1038/sdata.2016.18.