Sense the classroom: AI-supported                                                                         synchronous               online
education for a resilient new normal
Krist Shingjergjia, Deniz Irena, Corrie Urlingsa and Roland Klemkea,b
a
     Open University of the Netherlands 6419 AT
     Heerlen, The Netherlands
b
     Faculty of Cultural Sciences, TH Köln, Cologne,
     Germany


                 Abstract
                 Following the COVID-19 pandemic, as the user-base of online synchronous communication
                 systems skyrocketed, the shortcomings of synchronous online learning systems became more
                 visible. Any attempt to overcome these shortcomings should be considered worthwhile due to
                 the magnitude of potential impact. Improving the quality and addressing the shortcomings of
                 online education is more important than ever. The goal of this multidisciplinary study that lies
                 in the intersection of the fields of Education Science and Computer Science is to address a
                 number of challenges of online education by incorporating AI. This study focuses on
                 developing methods and means to ethically collect and use non-verbal cues of participants of
                 online classrooms to assist teachers, students, and course coordinators by providing real-time
                 and after-the-fact feedback of the students’ learning-centered affective states.

                 Keywords 1
                 Technology enhanced learning, learning-centered affective states, affective computing,
                 synchronized learning, artificial intelligence


1. Introduction and motivation                                                                  Students’     learning     experience      and
                                                                                            performance are highly related to their
                                                                                            psychological, physiological, and emotional
    Online learning provides a means of
                                                                                            states [5]. Teachers can notice when students
education to students with physical limitations
                                                                                            are distracted, confused, tired, etc., and have the
or inconvenience to participate in physical,
                                                                                            opportunity to adjust their teaching approach
face-to-face classroom education. During the
                                                                                            accordingly [6], and choose appropriate
COVID-19 pandemic, this limitation became
                                                                                            interventions to keep the learning experience of
relevant for all students. Approximately, 1.2
                                                                                            the class optimal. However, many teachers who
billion learners were affected by the closure of
                                                                                            gave an online lecture have experienced the
schools at the time of the pandemic [1] and
                                                                                            severe lack of an understanding of the learning-
educational institutions worldwide made a
                                                                                            centered affective states of students in the
mandatory transition to online/hybrid learning
                                                                                            classroom, thus, missing opportunities to
[2]. As the utilization of online learning reached
                                                                                            improve the overall learning experience. This
unprecedented levels, the already-known
                                                                                            also directly impacts individual students. In
challenges of online education became
                                                                                            online lectures, students are more prone to
painfully visible for both students (e.g., feeling
                                                                                            distractions and use the Internet for purposes
of isolation [3] ) and teachers (e.g., lack of face
                                                                                            unrelated to the educational activity [7]. The
to face interaction with the students [4]).
                                                                                            lecturers cannot give timely feedback to guide

Proceedings of the Doctoral Consortium of Sixteenth European
Conference on Technology Enhanced Learning, September 20–21,
2021, Bolzano, Italy (online).
EMAIL: krist.shingjergji@ou.nl (Krist Shingjergji)
ORCID: 0000-0002-8239-9478 (Krist Shingjergji)
             © 2021 Copyright for this paper by its authors. Use permitted under Creative
             Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
the attention since they do not observe the                  communication medium that partly relates to
students physically. As a result, the students are           the capability of transmitting body language,
left alone to manage their learning experience,              facial expressions, and natural speech [10].
stay motivated, and struggle not to fall behind              According to this theory, a reduction in a
during the educational activity. The underlying              medium’s naturalness may lead to a decrease in
reason that leads to these challenges is the                 learning effectiveness, and a potential increase
communication modality limitations of video                  in ambiguity of the conveyed message [11].
conferencing technologies.
    In this study, we build on Media Naturalness             2.2. Learning centered affective
Theory to examine the limitations of video
conferencing as a medium of communication                    states
for online, synchronized education. Our
objective is to develop artificial intelligence                  Many studies that aim to detect the
(AI) models to detect a multitude of                         relationship between online learning and
components of learning-centered affective                    emotions by applying emotion recognition
states (e.g., gestures, micro-expressions, and               techniques, use the basic emotions, namely,
macro-expression) of the learners, and present               happiness, sadness, fear, disgust, anger, and
the aggregated information to the teacher and                surprise [12]. A plethora of studies that report
the course coordinator in a privacy-protecting               an accurate mapping among facial expressions
manner, and provide the individual information               and emotions exist in the literature [13], [14],
to the students themselves.                                  [15]. However, D’Mello in [5] states that the
    The remainder of this paper is structured as             basic emotions are quite infrequent in the
the following. Section 2 sheds light on the                  context of learning with educational software
background and related work. Section 3 lays out              which raised the need of focusing on the
the details of the overall research methodology.             learning-centered affective states, such as
Section 4 highlights important discussion                    engagement, concentration, boredom, anxiety,
points such as the theoretical and practical                 confusion, frustration, and happiness. In
implications, and ethical considerations.                    contrast to emotion recognition, the mapping
                                                             between facial expressions and learning-
                                                             centered affective states has been severely
2. Background and related work
                                                             understudied [16].
                                                                 The observable non-verbal cues consist of
  In this section we explain the overall                     gestures and body postures (e.g., head-tilt, nod,
methodology that is proposed for this study.                 shake), micro-expressions (e.g., movement of
                                                             inner eyebrows and lips), other expressions
2.1. Communication modalities                                (e.g., smile, frown, confusion), and other
and Media Naturalness Theory                                 activities (e.g., note-taking, active-listening,
                                                             looking-away) [17]. The state-of-the-art facial-
                                                             expression recognition (FER) and gesture
   Human communication occurs in multiple                    recognition (GR) models use Convolutional
modalities such as voice, speech, facial                     Neural Network (CNN) and Recurrent Neural
expressions, and body language [8]. One                      Network (RNN) hybrid networks [18].
important type of human communication is the                 These models perform discrete/momentary
non-verbal communication which is the way of                 measurement (i.e., in short intervals), generally
conveying information without the use of                     on single modality, and they are trained on
words via non-verbal cues i.e., facial                       datasets in which the non-verbal cues are
expressions and body language [9]. Video                     mimicked by actors (i.e., not naturally
conferencing platforms fall short in conveying               occurring). As previously mentioned, the
non-verbal cues among participants. The                      detection of constructs other than the six
shortcomings of video conferencing as a                      universal emotions 2, such as learning-centered
communication medium can be analyzed and
                                                             affective states does not have a rich literature.
improved based on the Media Naturalness
                                                             However, the collection of high-quality data for
Theory (MNT). MNT describes the criteria to
                                                             the recognition of learning-centered affective
assess the degree of naturalness of a
                                                             states has been the subject of several studies

2
    sadness, happiness, fear, anger, surprise, and disgust
that have certain important limitations, for           actions to positively influence the learning-
instance; focusing only on game-based                  centered affective states of the students?
interfaces [19], being explicit to certain ethnic   4. How can we provide students with this
groups [20], and having a limited target               information so that learning-centered
affective set such as the level of engagement on       affective states are positively influenced?
a scale [21], and the lack of interest and          5. How can we design a system that is
boredom [22].                                          ethically sound, that respects privacy
    In this study, we will bridge this gap by          concerns and keeps all collected data
improving the CNN-RNN hybrids archite-                 secure?
cturally by introducing attention layers,
formulating fitting objective functions, fusing
data from multiple modalities, and applying
transfer learning to train models with the multi-
modal data collected from synchronous online
education settings.

3. Methodology
  In this section we explain the overall
methodology that is proposed for this research.

3.1.    Research model                              Figure 1: The research model

    Active and engaged learning is an important     3.2.    Approach
model in online higher education. Therefore,
this study aims at addressing the motivational          In this study, we will employ Design-Based
and emotional side of online education, by          Research (DBR) and experiments throughout
providing information that can assist educators     multiple iterations (Figure 2). Teachers and
to refine the educational activities that they      students will be involved in focus groups and
have devised. Our aspiration is to utilize          co-designing of prototypes [23]. The DBR
theories of learning, motivation, and emotion in    iterations consist of literature study,
combination to (1) define the relationship          requirements elicitation, participatory design,
between learning-centered affective states of       and evaluation of the interventions (i.e.,
students and observable non-verbal cues, (2)        integrated AI models) in pilot studies of online
develop specialized multi-modal AI algorithms       learning. The AI models will be developed
for the recognition of learning-centered            through experimentation cycles which
affective states, (3) and design tools to present   comprise data collection, annotation, algorithm
this information in an actionable way for           development, and model training and
teachers and students to improve the learning       evaluation. We will collect data from public-
process respecting the privacy of all               domain video repositories and online lecture
participants (Figure 1).                            sessions recorded by us with the informed
                                                    consent of participants. The data will be
   Thus, the research questions of our studies      annotated by multiple experts in terms of
are as follows:                                     observed non-verbal cues. Consecutively, we
1. Which are the specific non-verbal feedback       will develop algorithms, train and test FER-GR,
    needs (e.g., facial expressions) of teachers    and the learning-centered affective states
    and students in online lectures?                recognition AI models on multiple datasets to
2. How can we automatically detect non-             ensure generalizability. We will rely on metrics
    verbal cues and translate them to learning-     that are commonly used in machine learning,
    centered affective states of multiple           i.e., precision, recall, and F-1 measure to
    participants in online, synchronous,            evaluate the accuracy of our models. Data
    educational activities?                         management will be conducted in line with the
3. How can we present this information to           FAIR data principles [24].
    teachers in real-time so that they can take
                                                                                                   A report that
                                                                                                   shows the
                                                                                                   detected
                                                                                                   learning-
                                                                                                   centered
                                                                                                   affective states
                                                                                                   of the overall     Improve


                                                                              Teacher
                                                                                                   classroom          design of
                                                                                                   matched to         educational
                                                                                                   different parts    activity and
Figure 2: The research approach                                                                    of the             delivery
                                                                                                   educational        style.


                                                             After-the-fact
    We envision the system to provide certain                                                      activity (e.g.,
information in real-time and after-the-fact to                                                     slides,
various stakeholders for different purposes                                                        activities,
(Table 1). The information flow is targeted at                                                     interventions)
specific educational purposes for each party                                                       as a timeline.
involved, with short-term and long-term
educational benefits.                                                                              Learning-
                                                                                                   centered           Evidence-


                                                                              Course Coordinator
                                                                                                   affective states   based course
Table 1                                                                                            trend of a         design and
Information flow to the actors of the system                                                       course             ensuring
                                                                                                   throughout         educational
When Who                   What                Why
                                                                                                   multiple online    quality.
                        Aggregated         Teacher may                                             educational
                        information        (a) alter the                                           activities.
                        regarding the      teaching
              Teacher


                        overall            style and/or
                        learning-          (b) initiate
                        centered           interventions   4. Discussion
                        affective states   (e.g.,
                        of students in     breakout           In this section, we discuss several important
  Real-time


                        the classroom.     rooms).         aspects of this study including the theoretical
                                                           and practical implications, the privacy concerns
                        A semiotic         Keep the        as well as the limitations.
                        indicator that     student
                        shows their        engaged and
                        own learning-      active,
              Student


                        centered           positive
                                                           4.1. Theoretical                                  and       practical
                        affective          influence on    implications
                        states, as well    the learning
                        as suitable        process and         The outcomes of our experiments will
                        nudges (e.g.,      self-           potentially allow us to gain a deeper
                        pop questions).    regulation.     understanding on how learning-centered
                                                           affective states are indicated by observable
                                                           non-verbal cues, and how these states can be
                                                           related to an effective learning experience. Our
                                                           results will also contribute to the Media
                                                           Naturalness Theory by extending it to cover
                                                           widely used video conferencing platforms and
                                                           tailor it for education scenarios.
    The practical outcome of this study will be      classroom-level. Finally, this study solely aims
an analytical platform that is integrated to video   at developing a method for improving the
conferencing clients of the students and the         quality of online education, and not as a way of
teachers. The platform will be able to provide       individual assessment of the students or the
feedback both on real-time and after-the-fact        teachers. We are confident that this privacy-
for all the involved actors in the course:           preserving design will not allow any misuse of
students, teachers, and course coordinators          the system.
(Table 1). In real-time, the platform will
provide the teachers with aggregated                 4.3.    Limitations
information regarding the learning-centered
affective states of the students. This
information will give the teachers the                   The source of the data in this study will be
opportunity to respond in different ways such        the participants’ cameras, which results in two
as changing the teaching style and/or intervene      important limitations. First, we cannot observe
in the course content flow. Students are going       the entire environment of a student, thus, it is
to receive information regarding their own           not possible to differentiate whether the
learning-centered states, which they can use to      observed non-verbal cues of an individual
self-regulate and be active and engaged.             student are the result of an event in the
Regarding the real-time feedback, the system         classroom or an off-task activity. Second, in the
will be designed in a way that optimizes its use     online classrooms, students are in control of
while taking part in the educational activity. On    their cameras and may refuse to turn them on
a longer-term aspect, this aggregated                even when the proposed privacy-preserving
information will be useful for the teachers and      methods are in place. In that case, the proposed
the course coordinators as it would play the role    method is not applicable.
of an evidence-based course evaluation which
can be used for future improvement of delivery
style and course design from the side of the         5. References
teacher and the course coordinator respectively.
                                                     [1]    Education: From disruption to recovery,
4.2.    Privacy                                             2021                                URL:
                                                            https://en.unesco.org/covid19/educatio
                                                            nresponse.
   We acknowledge the privacy-sensitive
nature of this study. To protect the privacy of      [2]    S. Dhawan, Online Learning: A Panacea
students, and to prevent a possible misuse of the           in the Time of COVID-19 Crisis,
technology, e.g., using the obtained information            Journal of Educational Technology
to evaluate students, we design core privacy-               Systems 49 (2020) 5–22. doi:
preserving measures to shape our research                   10.1177/0047239520934018.
around them. Firstly, all data collection and        [3]    M. Alawamleh, L. M. Al-Twait, G. R.
experimentation will be voluntary and with the              Al-Saht, The effect of online learning on
informed consent of the participants. The                   communication between instructors and
ethical board will be consulted prior to all data           students during Covid-19 pandemic,
collection phases. The training data will be                Asian Education and Development
collected anonymously with no possibility to                Studies (2020). doi: 10.1108/AEDS-06-
link to individuals. Secondly, the designed                 2020-0131.
system will keep sensitive individual data (e.g.,    [4]    S. Gurung, Challenges faced by teachers
video) on individuals’ computers. We will use               in online teaching during Covid-19
a virtual webcam that implements AI models                  pandemic, The online journal of
and analyzes video data on client computers.                distance education and e-Learning 9
This feature will also allow students to keep               (2021).
their camera off (use avatars or nothing at all)     [5]    S. D’Mello, A selective meta-analysis
while still benefiting from the system. Only the            on the relative incidence of discrete
processed and anonymized data (i.e., numerical              affective states during learning with
representations of non-verbal cues) will be                 technology, Journal of Educational
transferred, and the teacher will only be                   Psychology 105 (2013) 1082-1099. doi:
provided with information that is aggregated at             10.1037/a0032674.
[6]    K. Bahreini, R. Nadolski, W. Westera,               Procedia Computer Science 108 (2017)
       Towards         multimodal        emotion           1175–1184.                           doi:
       recognition in e-learning environments,             10.1016/j.procs.2017.05.025.
       Interactive Learning Environments,           [16]   M. A. A. Dewan, M. Murshed, F. Lin,
       (2016)            590–605.            doi:          Engagement detection in online
       10.1080/10494820.2014.908927.                       learning: a review, Smart Learning
[7]    A. Lepp, J. E. Barkley, A. C. Karpinski,            Environments        6    (2019).     doi:
       S.      Singh,     College      Students’           10.1186/s40561-018-0080-z.
       Multitasking Behavior in Online Versus       [17]   D. Umnia Soraya, K. Candra Kirana, S.
       Face-to-Face Courses, SAGE Open 9                   Wibawanto, H. Wahyu Herwanto, C.
       (2019),                               doi:          Wijaya        Kristanto,     Non-Verbal
       10.1177/2158244018824505.                           Communication Behavior of Learners
[8]    C. Jewitt, J. Bezemer, K. O’Halloran,               on     Online-based      Learning,     in:
       Introducing multimodality, 1st. ed.,                Proceedings of the 2nd International
       Routledge, London, 2016. doi:                       Conference on Vocational Education
       10.4324/9781315638027.                              and Training (ICOVET 2018), 2019,
[9]    APA Dictionary of Phycology, 2020.                  pp. 4-6. doi: 10.2991/icovet-18.2019.2.
       URL:                                         [18]   M. Sharma, D. Ahmetovic, L. A. Jeni,
       https://dictionary.apa.org/nonverbal-               K. M. Kitani, Recognizing Visual
       communication.                                      Signatures of Spontaneous Head
[10]   N. Kock, Media naturalness theory:                  Gestures, in: 2018 IEEE Winter
       human evolution and behaviour towards               Conference on Applications of
       electronic communication technologies,              Computer Vision (WACV), 2018, pp.
       in:       S Craig Roberts, Applied                  400-408.                             doi:
       evolutionary      psychology,      Oxford           10.1109/WACV.2018.00050.
       University Press Inc., New York, 2012,       [19]    Nigel Bosch, Sidney D'Mello, Ryan
       pp.             381-398.              doi:          Baker, Jaclyn Ocumpaugh, Valerie
       10.1093/acprof:oso/9780199586073.00                 Shute, Matthew Ventura, Lubin Wang,
       1.0001.                                             Weinan Zhao, Automatic detection of
[11]   O. Weiser, I. Blau, Y. Eshet-Alkalai,               learning-centered affective states in the
       How do medium naturalness, teaching-                wild, in: Proceedings of the 20th
       learning interactions and Students’                 International Conference on Intelligent
       personality traits affect participation in          User Interfaces (IUI '15), Association
       synchronous E-learning?, Internet and               for Computing Machinery, New York,
       Higher Education 37 (2018) 40–51. doi:              2015,       pp.       379–388.       doi:
       10.1016/j.iheduc.2018.01.001.                       10.1145/2678025.2701397.
[12]   P. Ekman, An Argument for Basic              [20]   T. S. Ashwin, R. M. R. Guddeti,
       Emotions, Cognition and Emotion 6                   Affective database for e-learning and
       (1992)            169–200.            doi:          classroom environments using Indian
       10.1080/02699939208411068.                          students’ faces, hand gestures and body
[13]   R. Reisenzein, M. Studtmann, G.                     postures, Future Generation Computer
       Horstmann,       Coherence        between           Systems 108 (2020), 334–348. doi:
       emotion and facial expression: Evidence             10.1016/j.future.2020.02.075.
       from laboratory experiments, Emotion         [21]   J. Whitehill, Z. Serpell, Y. C. Lin, A.
       Review 5 (2013) 16–23. doi:                         Foster, J. R. Movellan, The faces of
       10.1177/1754073912457228.                           engagement: Automatic recognition of
[14]   M. Wegrzyn, M. Vogt, B. Kireclioglu, J.             student engagement from facial
       Schneider, J. Kissler, Mapping the                  expressions, IEEE Transactions on
       emotional face. How individual face                 Affective Computing 5, (2014) 86–98.
       parts contribute to successful emotion              doi: 10.1109/TAFFC.2014.2316163.
       recognition, PLoS ONE 12, (2017) doi:        [22]   L. B. Krithika, G. G. Lakshmi Priya,
       10.1371/journal.pone.0177239.                       Student Emotion Recognition System
[15]   P. Tarnowski, M. Kołodziej, A.                      (SERS) for e-learning Improvement
       Majkowski, R. J. Rak, Emotion                       Based on Learner Concentration Metric,
       recognition using facial expressions,               in Procedia Computer Science 85
       (2016)           767–776.           doi:
       10.1016/j.procs.2016.05.264.
[23]   K. Holstein, B. M. McLaren, V. Aleven,
       Co-designing a real-time classroom
       orchestration tool to support teacher–ai
       complementarity, Journal of Learning
       Analytics 6 (2019) 27–52. doi:
       10.18608/jla.2019.62.3.
[24]   M. D. Wilkinson, M. Dumontier, I. J.
       Aalbersberg, G. Appleton, M. Axton, A.
       Baak, N. Blomberg, J. W. Boiten, L. O.
       B. da Silva Santos, P. E. Bourne, J.
       Bouwman, A. J. Brookes, T. Clark, M.
       Crosas, I. Dillo, O. Dumon, S. C.
       Edmunds, C. T. A. Evelo, R. Finkers, A.
       N. González-Beltrán, A. J. G. Gray, P.
       Groth, C. A. Goble, J. S. Grethe, J.
       Heringa, P. A. C. ‘t Hoen, R. W. W.
       Hooft, T. Kuhn, R. G. Kok, J. N. Kok,
       S. J. Lusher, M. E. Martone, A. Mons,
       A. Laerte Packer, B. Persson, P. Rocca-
       Serra, M. Roos, R. C. van Schaik, S.
       Sansone, E. A. Schultes, T. Sengstag, T.
       Slater, G. O. Strawn, M. A. Swertz, M.
       Thompson, J. van der Lei, E. M. van
       Mulligen, J. Velterop, A. Waagmeester,
       P. Wittenburg, K. Wolstencroft, J. Zhao,
       B. Mons, The FAIR Guiding Principles
       for scientific data management and
       stewardship, Scientific Data 3 (2016).
       doi: 10.1038/sdata.2016.18.