1. Introduction

E. Navarrete); anett.hoppe@tib.eu (A. Hoppe); ralph.ewerth@tib.eu (R. Ewerth)

A Review on Recent Advances in Video-based Learning Research: Video Features, Interaction, Tools, and Technologies

Evelyn Navarrete

Anett Hoppe

0 1

Ralph Ewerth

0 1 0 L3S Research Center, Leibniz University Hannover , Germany 1 TIB - Leibniz Information Centre for Science and Technology , Hannover , Germany

2021

000 0 0002

Human learning shifts stronger than ever towards online settings, and especially towards video platforms. There is an abundance of tutorials and lectures covering diverse topics, from fixing a bike to particle physics. While it is advantageous that learning resources are freely available on the Web, the quality of the resources varies a lot. Given the number of available videos, users need algorithmic support in finding helpful and entertaining learning resources. In this paper, we present a review of the recent research literature (2020-2021) on video-based learning. We focus on publications that examine the characteristics of video content, analyze frequently used features and technologies, and, finally, derive conclusions on trends and possible future research directions.

eol>Video-based Learning web-based learning literature review video features

1. Introduction

(c) teachers who seek to enhance their teaching by integrating adapted multimedia resources, and finally, (d) stuThe Web has fundamentally changed the way we learn. dents who pursue a more efective learning process and Especially video platforms, such as YouTube, play an experience. increasing role here – from the about 30 million video In this paper, we provide a review of recent research views a day, about 50% relate to some kind of learning on the analysis of video-based learning, with a focus on content [1]. Another YouTube-related statistic states that studies that examine video characteristics. For this puras of May 2019, about 500 hours of new content were pose, we performed a systematic literature search. Here, uploaded to the platform every minute [2], implying that we present a preliminary summary of our findings from users are reliant on efective search and recommenda- the most recent publications, covering the years 2020 and tion algorithms to find content that is relevant to their 2021. We systematize the contents of 41 reviewed papers learning needs. and provide an overview on (a) the chosen research ap

These algorithms need to consider multiple factors to proach (e.g., empirical study, tool prototype, production provide suitable rankings and recommendations. Espe- guidelines), (b) considered video characteristics, and (c) cially in learning contexts, an open question is: When tasks and technologies used to develop VBL tools and is a video the best way to learn, depending on the indi- frameworks. From these dimensions, we derive recent vidual user, the learning objective, and context factors? research trends and identify gaps that provide directions This question has been investigated in several studies for future research. in recent years, examining characteristics of the videos The rest of this paper is structured as follows: Secand the surrounding platforms to determine when video- tion 2 presents the related work; Section 3 describes based learning (VBL) processes are especially success- our methodology; Section 4 provides an overview of the ful. The resulting insights are of interest for all the in- video-based learning field, its benefits, potentials, and the volved stakeholders: (a) Platform providers who wish to current challenges; and explains why identifying relevant provide their users with the best possible content from features in videos is important. Section 5 presents the their database, but also (b) content producers who aim results of the review. Finally, we summarize our findings, to develop eficient and entertaining learning resources, point out the limitations, and conclude with indications for future research in Section 6. Ref. [3] (2013) [4] (2014) [5] (2018) [6] (2020) 2008-2019

This literature review covers studies extracted from three

academic databases: (a) the Digital Bibliography and Library Project (DBLP), (b) the Association for Computing Machinery (ACM), (c) and SpringerLink databases.

These are specialized in the computer science field and, thus, provide suitable domain specificity not only for our review of technological systems surrounding video-based learning but also to identify the state of the art in this specific research area.

We iteratively refined our set of query terms, to include new vocabulary from the research papers. Whenever we encountered a new term to describe research around VBL in an article, we went back to the databases and re-ran

4. Video-based Learning

learner can be supported in her video-based learning tra- current insights and research trends. The discussion of jectory. It is widely recognized that learners have diverse learner-related features, such as previous knowledge and needs [8], depending on their current learning objective, skills, is out of the scope of this review (see, for instance, context, and general preferences. How to analyze and [14] for a recent review). index video resources to satisfy these individual requirements is still an open research question. As we will show below, exploring possible features for the automatic deter- 5. Literature Review mination of video content and quality is one active area of research. These explorations are indispensable in the This section summarizes our findings from the analysis quest to allow diverse learners a satisfying video-based of 41 publications on video-based learning in the pelearning experience. riod 2020-2021. We analyze (a) the principal research

Lastly, MOOCs (Massive Open Online Courses) and approach (Section 5.1); (b) the target task and the used similar educational courses are a central research area, technologies (Section 5.2); (c) the video features explored mainly because they have enabled data analysis at scale. in the studies (Section 5.3); (d) and, finally, verify former But it is precisely the large-scale nature of MOOCs that surveys’ finding that there is a specific focus on STEM can carry issues in handling and processing such a mas- subjects in VBL studies (Section 5.4). sive amount of information. As stated by Guo et al. “The scale of data from MOOC interaction logs hundreds of 5.1. Research Approaches thousands of students from around the world and mil- Four main research approaches have been identified in lions of video watching sessions is four orders of mag- the reviewed literature: (1) Controlled experiments, nitude larger than those available in prior studies” [8, which in this study are defined as experiments where p. 42]. Moreover, critics directed to the research commu- one or a few variables are manipulated (according to the nity have been raised as well. For example, a challenge learning context, video characteristics, or content) and that needs to be confronted in these big scenarios is find- where the impact of this manipulation is measured by ing how to incorporate controlled experiments [4] and some target metric. This group includes experiments run methods or instruments from other fields, such as the in laboratory settings, but also those performed in realifeld of psychology, instead of relying only on data anal- istic academic environments such as university courses. ysis [5]. This category accounts for 22 instances (54%) in the re

Importance of features: Research on VBL poses a viewed papers. (2) Novel tools, architectures, or analnumber of relevant questions: How can eficient learning ysis pipelines surrounding video-based learning scebe facilitated and improved? How to enable long-term narios are covered in 16 (39%) articles. (3) Design prinlearning success? How to ensure the learner’s engage- ciples and guidelines for the creation of educational ment and satisfaction? The answer to these questions videos are presented in two publications (5%). (4) Results strongly relies on one base research question: What char- of the analysis of data generated by users in a videoacterizes an efective learning video? [ 8, 4, 5] The search based online learning platform are presented in one paper for relevant video features has been widely studied from (2%). the perspective of diferent research fields (e.g., [ 9]), and Table 4 lists all the publications in each category. The in an exploration of what is possible with current tech- high number of empirical studies matches Poquet et al.’s nologies (e.g., [10]). In consequence, there is a multitude [5] statement that, especially after 2016, empirical works of variables that have been extracted, studied, and ma- are on the rise. However, an extension of the reviewed penipulated. Video is a complex medium combining audio, riod time is necessary to see if our findings concord with text, and visual information from all of which features the authors’ temporal placement of the change. Within can be extracted and combined. These can range from this category (controlled experiments), 21 out of 22 studlow-level features like the use of audio quality features ies reported the number of participants in the experiment (e.g.,[11]), to high-level features where semantics try to (sample size). The smallest study reported 12 participants be captured (e.g., [12]). Features can also not necessar- [15] and the biggest acquired a group of 229 learners ily come directly from the video but can result from the [16]. The average number of participants was 85 learninteraction with a video, e.g., views and likes (e.g., [13]). ers (SD=56). Studies that focused on the development of Additional to the characteristics of the video, it is vital tools (group (2)) or data analysis of learning platforms to study its environment. The success of a VBL process such as MOOCs (group (4)) often do not report a sample is further influenced by functionalities of the surround- size. ing platform, the learner’s context, and the instructor Some of the subsequent sections focus on a subset of captured in the video. the groups since not all dimensions can be meaningfully

In our review, we focus on those features directly extracted from all research objectives. The following related to the learning video, with the goal to outline review of used technologies, for instance, only makes sense in the context of works that have a technology component, and will, consequently, mostly cover works of the group (2). 5.2. Target Task and Used Technologies a Coursera course (a MOOC provider) [37]. All these recent examples make use of deep learning technologies, specifically, neural networks with gated recurrent units [37], recurrent neural networks (RNN) [38], or long short-term memory neural networks (LSTM) [36]. These classifiers have been mainly fed with real-time features [36, 37], especially play, pause, and speed rate, however, aggregated data [37] and results of quizzes have also been explored [38].

Two of the 16 reviewed papers in this section (13%) focus on video segmentation. Video as a learning medium has the downside that all information is presented sequentially. Unlike text, videos cannot be easily skimmed to find relevant information. This can be alleviated by automatic video segmentation. In our sample, one paper aims to automatically determine important segments in the educational content using a bidirectional LSTM network classifier built on pre-trained models to extract text and visual features [11]. Das & Das [12] improve on topical segmentation of lectures videos by identifying concepts in the speech transcripts. To achieve this, speech-to-text-technologies are used, pre-trained neural network models are to analyze the resulting text, and the cluster centroid algorithm is used to group the resulting concepts.

The remaining six studies (38%) focus on diverse tasks such as video indexing [46], video ranking [43], video customization according to the type of learner [47], prediction of instructors’ enthusiasm [48], matching videos with practical exercises [44] and helping educators with the creation of teaching materials [45].

This section focuses on the 16 studies which present a technology component. From these, roughly a third of the studies considered in this section (N=5, 31%) present recommender systems that aim to support learners in ifnding suitable learning material. One of the systems does not only recommend video resources but also learning materials in other modalities (e.g., text articles such as Wikipedia pages [39]). Another two systems provide the user with a video sequence (playlist) instead of single learning contents [41, 42].

A popular approach to recommend the material is based on the extraction of topical keywords from the video’s speech transcript [39, 40, 42], especially, by using 5.3. Features the tf-idf algorithm. Those are then used to compute the similarity of the currently viewed video to the other The core focus of this review is the collection of video ones in the database [39, 40]. Tavakoli et al. [13] suggest characteristics that have been studied in the literature not only use the current video for this but all resources (2020 and 2021). The objective is to provide an account previously viewed by the user. They combine this infor- of how video features have been investigated, extracted, mation with a profile of the learner’s skill set and predict and varied in the context of studies on video-based learnrelevant videos using a Random Forest classifier. Another ing in the computer science field. We developed a feature technique is to detect prerequisite relations between the taxonomy that groups the video features into categories, videos to generate sequences organized by dificulty and which allows for a structured presentation of our findcomplexity [41, 42]. Lastly, Tang et al. [42] addition- ings. ally analyzes viewer comments with sentiment analysis We identified eight high-level categories: (a) audio feato generate a playlist, assuming that positive-sounding tures, (b) visual features, (c) textual features, (d) features comments point to engaging learning material. related to the instructor’s behavior, (e) features result

Other common use cases are systems for the forecast- ing from the interaction of the learner with the video ing of learning success (N=3, 19%). “Success” can be (e.g., likes), (f) interactive features, (g) production style operationalized in diferent ways: In two studies, the (e.g., Khan style), and (h) features related to instructional objective is to predict the final score of the learner as design principles. All of these will be defined and their “pass” or “fail”. The first case is in the context of passing appearances reviewed in separate paragraphs. a university MOOC course [36], and the second case is Audio features are all features that relate to the acabout passing video quizzes presented by videos in an tual video’s audio stream and also the features that can e-learning platform [38]. In a third study, the target is be extracted from it. This includes audio records (e.g., to forecast the probability of a student dropping out of [39]), and quality-related features, such as energy, en

Titles, keywords, tags, video length Visual text (text extracted from frames) Textual cues (guiding attention to a key location in a slide)

Beat gestures (e.g., hand strokes, natural handwaving, and pointing gestures that aim at highlight the speech), facial emotions Volume (expansion of the teacher’s body), pose

Play, pause, rate-change (speed), seek forward,

seek backward Percentage of the video that was watched Average proportion of videos watched per week Standard deviation of the proportion of videos watched per week Number of views, user ratings, relevancy score (rank in search results) Quizzes, annotation, feedback, exchange (actions to interact with other learners) Tutorial, lecture, Khan style, talking head, Dialogue and monologue, voice over slides, only-text, animations included Coherence, signalling, spatial contiguity, segmentation (information in segments, rather than a long stream), pre-training (present first the basic information), modality (graphics and spoken words), multimedia (words together with pictures), personalization (informal conversational style), voice (human rather than a computer voice), image (animations) tropy, and spectral features which allow making pre- to text and still image, is that movement can guide the dictions on how well the auditory information can be learner’s attention. These types of features are captured perceived, e.g., [9, 11, 45]. A second big cluster is person- in the category visual representation features where we related audio features which groups characteristics di- can find, for example, highlighting and visual cues (e.g., rectly related to the instructor’s expression [24, 45, 48]. animated and appearing elements) [24, 25, 26, 27, 19, This includes, for instance, metrics for the teacher’s voice 45]. There is also work investigating visual quality fea[45, 48], speech rate [50], used vocabulary [50], emphasis tures [9]. Finally, some works explore enhanced visual on specific words that guide the listener’s attention [ 24]. representations by including 360-degree scene represen

The category visual features contains analyses or tations [20, 10, 21], augmented [23], and virtual reality adaptations related to the video’s image information. features [22].

This includes the analysis of information in video frames In the category text features, we group every textas images [11, 12, 44]. Particular to video, in contrast based information which is available about the video or can be extracted from it. It is common practice to several (dialogue, interview), and in which perspective generate a speech transcript from the spoken content in the content is presented (upfront, or over-the-shoulder a learning video. This pre-processing step transforms view in tutorials). Production style was mentioned sevhard-to-analyze speech into written text, for which so- eral times as a feature in our sample [18, 32, 15, 17, 19]. phisticated and eficient analysis methods are available. There is a number of psychological and educational Indeed, in our sample, speech transcripts are widely theories on how to eficiently support learners with mulused [11, 12, 13, 39, 40, 41, 42, 43, 44, 46], and are ref- timedia resources, references to those are collected in the erenced in all the works on video segmentation [11, 12], category instructional design principles. The only and on recommending subsequent videos [36, 37, 38]. theoretical element referenced in our sample is the set

Many video platforms provide metadata about the of Multimedia Learning Principles introduced by Mayer hosted videos, which are widely used as an easily avail- [55], which was used by Eradze et al. [9]. They collected able data source. These include structured information manual annotations that state whether a video follows about the video’s title, attributed keywords, tags, and sim- those design principles (see Table 5 for examples). The ilar [13, 40, 42]. Video length has been used, for instance, aim was to find a correlation between the principles and by Mubarak et al. [36] as a factor to predict the learning students’ perceptions regarding the quality of the video. outcome, and by Tavakoli et al. [13] to suggest similar As shown by the taxonomy in Table 5, from all the videos. Finally, text, especially in learning videos, also studies reviewed in this research work (2020-2021), the appears to support and visualize the speakers’ message most explored video features are text-related features, in the form of superimposed scene text, often in the form especially transcripts. The next most investigated are inof presentation slides (visual text) [24, 44]. teractive features, which study additional functionalities

In every learning process, the teacher is a central fa- (quizzes, note-taking) that impact the learning success. cilitator, there is thus a number of features related to In third place, we find real-time features. Other usually instructor behavior [16, 25, 45, 48]. This group cap- explored features are visual features, especially features tures gesturing [16, 25] and facial emotions [48], features that direct attention (e.g., animation), enhanced visuals related to the body such as volume and pose [48], and also (e.g., 360-degree features), and production style (e.g., talkinformation about the speaker’s personality [45], which ing head). These findings are similar to the results of have been used as base techniques to determine higher- Poquet et al. [5] who show that text, animation, audio, level information. and production style are the most explored features.

Several studies use learner interactions with the Lastly, we found that the impact of the above menvideo as a feature. This includes real-time features of tioned features has been measured mainly through the a user interacting with a certain instructional video learning outcome using metrics such as recall and trans[36, 37, 33, 26, 34, 35, 47], e.g., playing and pausing a fer [16, 33, 30, 23, 24, 25, 28, 26, 27, 10, 32, 15, 29, 22, 34, 17, video, skipping video content or interrupting; and aggre- 35, 19], which, again, conforms to the results of Poquet gated measures, such as the number and ratio of videos et al. [5]. Other frequently used metrics are: cognitive watched in a time period, or the aggregated behavior load [16, 25, 32, 22, 34, 17, 31, 19, 33, 23, 26, 18, 32, 22], of several users on a certain video (e.g., [47]). Besides, motivation [30, 23, 20, 10, 22, 17], self-eficacy [ 28, 22, 17], user interactions are used as indicators of video popular- mental efort [ 16, 32, 19], eye-gaze direction [25, 15, 17], ity [13, 9], in form of the number of views and ratings. engagement [20, 26], comprehension (understanding)

Interactive functionalities surrounding the [21, 29], and social presence [16, 17]. Of these, motivideo: A critical challenge in videos is that information vation, cognitive load, mental efort, and the results of is transient [51] and consumed in a rather passive eye-tracking were also included in the findings of Poway [52, 53]. This invites superficial processing quet et al. [5] as often used metrics. On the other hand, and leads to challenges in knowledge acquisition the less frequently used metrics encompass: enjoyment and integration. In consequence, several studies [10], agent-persona (credibility, engagement, and learninvestigate functionalities to actively involve the ing facilitating) [16], afective rating [ 16], para-social user [38, 33, 30, 28, 26, 32, 29, 31] by adding functionali- interaction [16], self-regulated learning [33], sense of ties such as interactive quizzes, collaborative elements, presence (feeling of being present in the environment) or note-taking areas. [10], perceived benefits [ 21], confidence [ 19], creative

The production style category classifies character- thinking [22], application [21], analysis [21], synthesis istic types of video design [54]. It includes rough sub- [21], evaluation [21], attention span [32], views about categories of how the learning setting is composed – e.g., the teaching technique [27], facial temperature [19], and if people are visible (talking-head video, lecture setting challenge [19]. with or without an audience) or not (voice-over-slides, Khan-style lecture, animations, and films of real-world phenomena). If there is a single speaker (monologue) or 5.4. Covered Subject Domain to improve video segmentation for educational videos; the approaches widely rely on pre-trained models for the Previous literature reviews stated that STEM (Science, extraction of textual and visual features. Technology, Engineering & Mathematics) areas are over- Video features: In general, the most explored video represented in the investigation of VBL [3, 5]. This is features are text-related, especially metadata and speech confirmed in our sample of research papers from 2020- transcripts. The second most studied features are inter2021. In our first category of controlled experiments, 14 active (such as interactive quizzes and other additional of 22 studies (64%) use videos on some STEM domain. functionalities), followed by click-stream features, visual Specifically, computer science [ 28, 20, 17, 35] and physics features, and production style. [30, 18] are often targeted. In the remaining eight pa- Specifically, in the controlled experiments category, pers, teaching English as a foreign language is the most the main features manipulated are related to interactive common discipline [23, 29], followed by other social and features (32% of the studies), principally quizzes and anhumanities domains. notations. Other often explored features in this category

In the research categories “Data analysis” and “Design are enhanced visual features, mainly 360-degree features, principles & guidelines” the main focus is also on STEM production style features (18% of the studies) with an spevideo content. However, the small sample size in these cial emphasis in dialogue and monologue, and features categories does not lead to further interpretation. Papers that direct the attention of the learner (18% of the studies) in the “Tools” research category do not usually focus on (e.g., animations.) specific disciplines but aim at ofering general solutions Subject domain: Based on our sample, we found that for VBL. This is why no specific analysis regarding the STEM domains are prevalent in the targeted subject areas. subject domain is provided. Tendencies and directions: In comparison to previous reviews, the tendencies that could be reafirmed in 6. Conclusions this study are mainly related to the type of research that was performed, the type of features that have been manipIn this paper, we have presented a survey on 41 papers ulated, and the subject domain in which the educational in the field of video-based learning (2020-2021) to pro- videos have focused. Specifically, the trends discovered vide a structured account of current tendencies in the by Poquet et al. [5] could be confirmed by our review: ifeld. Specifically, we identified (a) common research (1) It was observed that the research community-directed approaches, (b) target applications and the used tech- efort principally towards controlled experiments (54% nologies, (c) investigated video features and developed of the reviewed papers). (2) Text, animation, audio, and a taxonomy which allows structuring the related work, production style are situated among the most explored (d) and, finally, collected information on the covered sub- features. (3) Recall and transfer are the most-used metrics ject domains of the learning environments. to measure learning outcome and efectiveness. (4) Cog

Research approaches: The results show that more nitive load, mental efort, motivation, and the results than half of the studies are dedicated to controlled ex- of eye-tracking are considered among the most popular periments (54% of the studies). A significant proportion metrics for measuring learning experience. (5) Target disfocused on tool development (39% of the studies), while ciplines of the videos were primarily addressing STEM a very small proportion of studies dedicated efort to topics analyze data from learning platforms such as MOOCs Research on video, text, and image analysis has been (2% of the studies) and proposed design principles and dominated by deep learning approaches in recent years. guidelines (5% of the studies). Increasingly, these also find application in the analysis of

Tools and technologies: The main applications in educational data, exemplified by their dominance in learnour sample are three: (1) recommender systems (35% ing outcome prediction, educational video segmentation, of the studies in the “Tools” category), (2) predictors and feature extraction. In most cases, the developed sysof learning outcome (19%), and (3) video segmentation tems make use of general-purpose pre-trained models, techniques (13%). Technology-wise, keyword-based anal- even though first examples of models specifically trained yses (e.g., using tf-idf to detect pertinent keywords) are on educational data exist, e.g., EduBERT [56], a word the most commonly adopted. Approaches that aim to embedding trained on educational materials. It will be provide video playlists (as opposed to singular video rec- exciting to see the potential of deep learning techniques ommendations) often refer to techniques from the area properly adapted to educational media, and how they of prerequisite detection. Recent approaches to forecast- will enhance educational applications. ing learning success mainly build upon deep learning There are various studies that investigate learning on techniques (e.g., multilayer perceptrons, gated recurrent the web, mainly focusing on features of user behavior units, RNNs, and LSTMs). A similar tendency towards [57] or textual materials. Only recent publications such as deep learning approaches can be seen in methods aiming [58] explore the role of videos in such web-based learning processes. However, as shown here, the analysis of video as a learning resource is a highly active research area.

The related work outlined here can be a guide for future works which aim to integrate video-based resources into individualized online-learning trajectories, and will provide interested scientists with a starting point for their research.

The reviewed papers investigated a wide range of video features, including information from the visual content, be it textual or image, and audio information. However, the modalities were mainly considered separately, without regard to their interactions. Research on VBL eficiency could be brought forward by a truly multi-modal analysis of educational videos, which explores how spoken language, shown text and image, and speaker behavior exhibit a combined message as, for instance, explored by Shi et al. [59]. This is in line with psychological research on instructional design, and might bring new insights for educational research in quantifying formal relationships between image, text and speech in their impact on learning success.

Limitations and future work: This review, with a focus on publications from the years 2020 and 2021, only considers a short period time. It is planned to extend the study in the future, to provide a more thorough analysis of tendencies and directions in VBL. Moreover, although the keyword set used in the paper retrieval process is extensive, we plan to broaden this set to ensure a more comprehensive literature review. As pointed out by the reviewers of this paper, there is a surprisingly low number of MOOC-related studies included in our dataset. This will be considered in the extension of our keyword set.

Acknowledgments This work has been partly supported by the Ministry

of Science and Education of Lower Saxony, Germany, through the Graduate training network “LernMINT: Data-assisted classroom teaching in the MINT subjects”.

Also, this study has been partly supported by the Leibniz Association, Germany (Leibniz Competition 2018, funding line “Collaborative Excellence”, project SALIENT [K68/2017]). Finally, we would like to thank the reviewers for their valuable feedback. compedu.2020.104096. doi:10.1016/j.compedu. on Human Factors in Computing Systems, Hon2020.104096. olulu, HI, USA, April 25-30, 2020, ACM, 2020, pp. [11] J. A. Ghauri, S. Hakimov, R. Ewerth, Classifica- 1–12. URL: https://doi.org/10.1145/3313831.3376845. tion of important segments in educational videos doi:10.1145/3313831.3376845. using multimodal features, in: S. Conrad, I. Tiddi [18] A. Pérez-Navarro, V. Garcia, J. Conesa, Students per(Eds.), Proceedings of the CIKM 2020 Workshops ception of videos in introductory physics courses co-located with 29th ACM International Confer- of engineering in face-to-face and online environence on Information and Knowledge Management ments, Multim. Tools Appl. 80 (2021) 1009–1028. (CIKM 2020), Galway, Ireland, October 19-23, 2020, URL: https://doi.org/10.1007/s11042-020-09665-0. volume 2699 of CEUR Workshop Proceedings, CEUR- doi:10.1007/s11042-020-09665-0. WS.org, 2020. URL: http://ceur-ws.org/Vol-2699/ [19] N. Srivastava, S. Nawaz, J. M. Lodge, E. Velloso, S. M. paper15.pdf. Erfani, J. Bailey, Exploring the usage of thermal [12] A. Das, P. P. Das, Incorporating domain knowl- imaging for understanding video lecture designs edge to improve topic segmentation of long and students’ experiences, in: C. Rensing, H. DrachMOOC lecture videos, CoRR abs/2012.07589 sler (Eds.), LAK ’20: 10th International Conference (2020). URL: https://arxiv.org/abs/2012.07589. on Learning Analytics and Knowledge, Frankfurt, arXiv:2012.07589. Germany, March 23-27, 2020, ACM, 2020, pp. 250– [13] M. Tavakoli, S. Hakimov, R. Ewerth, G. Kismi- 259. URL: https://doi.org/10.1145/3375462.3375514. hók, A recommender system for open educa- doi:10.1145/3375462.3375514. tional videos based on skill requirements, in: [20] J. C. Muñoz-Carpio, M. A. Cowling, J. R. Birt, Doc20th IEEE International Conference on Advanced toral colloquium - exploring the benefits of usLearning Technologies, ICALT 2020, Tartu, Esto- ing 3600 video immersion to enhance motivation nia, July 6-9, 2020, IEEE, 2020, pp. 1–5. URL: https: and engagement in system modelling education, //doi.org/10.1109/ICALT49669.2020.00008. doi:10. in: D. Economou, A. Klippel, H. Dodds, A. Peña1109/ICALT49669.2020.00008. Ríos, M. J. W. Lee, D. Beck, J. Pirker, A. Dengel, [14] A. Abyaa, M. Khalidi Idrissi, S. Bennani, Learner T. M. Peres, J. Richter (Eds.), 6th International Conmodelling: systematic review of the literature ference of the Immersive Learning Research Netfrom the last 5 years, Educational Technology work, iLRN 2020, San Luis Obispo, CA, USA, June Research and Development 67 (2019) 1105–1143. 21-25, 2020, IEEE, 2020, pp. 403–406. URL: https: doi:10.1007/s11423-018-09644-1. //doi.org/10.23919/iLRN47897.2020.9155100. doi:10. [15] A. Nugraha, I. A. Wahono, J. Zhanghe, T. Harada, 23919/iLRN47897.2020.9155100.

T. Inoue, Creating dialogue between a tutee agent [21] W. Daher, H. Sleem, Middle school students’ learnand a tutor in a lecture video improves students’ ing of social studies in the video and 360-degree attention, in: A. Nolte, C. Alvarez, R. Hishiyama, videos contexts, IEEE Access 9 (2021) 78774–78783. I. Chounta, M. J. Rodríguez-Triana, T. Inoue (Eds.), URL: https://doi.org/10.1109/ACCESS.2021.3083924. Collaboration Technologies and Social Computing doi:10.1109/ACCESS.2021.3083924. - 26th International Conference, CollabTech 2020, [22] H. Huang, G. Hwang, C. Chang, Learning to be Tartu, Estonia, September 8-11, 2020, Proceedings, a writer: A spherical video-based virtual reality volume 12324 of Lecture Notes in Computer Science, approach to supporting descriptive article writing Springer, 2020, pp. 96–111. URL: https://doi.org/10. in high school chinese courses, Br. J. Educ. Technol. 1007/978-3-030-58157-2_7. doi:10.1007/978-3- 51 (2020) 1386–1405. URL: https://doi.org/10.1111/ 030-58157-2\_7. bjet.12893. doi:10.1111/bjet.12893. [16] M. Beege, M. Ninaus, S. Schneider, S. Nebel, [23] C.-H. Chen, Ar videos as scafolding to foster stuJ. Schlemmel, J. Weidenmüller, K. Moeller, G. D. Rey, dents’ learning achievements and motivation in efl Investigating the efects of beat and deictic gestures learning, British Journal of Educational Technology of a lecturer in educational videos, Comput. Educ. 51 (2020) 657–672. 156 (2020) 103955. URL: https://doi.org/10.1016/j. [24] X. Wang, L. Lin, M. Han, J. M. Spector, Imcompedu.2020.103955. doi:10.1016/j.compedu. pacts of cues on learning: Using eye-tracking tech2020.103955. nologies to examine the functions and designs of [17] B. Lee, K. Muldner, Instructional video design: In- added cues in short instructional videos, Comvestigating the impact of monologue- and dialogue- put. Hum. Behav. 107 (2020) 106279. URL: https: style presentations, in: R. Bernhaupt, F. F. Mueller, //doi.org/10.1016/j.chb.2020.106279. doi:10.1016/ D. Verweij, J. Andres, J. McGrenere, A. Cockburn, j.chb.2020.106279.

I. Avellino, A. Goguey, P. Bjøn, S. Zhao, B. P. Sam- [25] J. Moon, J. Ryu, The efects of social and cognitive son, R. Kocielnik (Eds.), CHI ’20: CHI Conference cues on learning comprehension, eye-gaze pattern, 158 (2020) 104000. URL: https://doi.org/10.1016/j. and cognitive load in video instruction, J. Comput. compedu.2020.104000. doi:10.1016/j.compedu. High. Educ. 33 (2021) 39–63. URL: https://doi.org/10. 2020.104000. 1007/s12528-020-09255-x. doi:10.1007/s12528- [34] N. Garrett, Segmentation’s failure to improve soft020-09255-x. ware video tutorials, Br. J. Educ. Technol. 52 (2021) [26] A. Cookson, D. Kim, T. Hartsell, Enhancing stu- 318–336. URL: https://doi.org/10.1111/bjet.13000. dent achievement, engagement, and satisfaction doi:10.1111/bjet.13000. using animated instructional videos, Int. J. Inf. [35] D. Lang, G. Chen, K. Mirzaei, A. Paepcke, Is faster Commun. Technol. Educ. 16 (2020) 113–125. URL: better?: a study of video playback speed, in: C. Renshttps://doi.org/10.4018/IJICTE.2020070108. doi:10. ing, H. Drachsler (Eds.), LAK ’20: 10th International 4018/IJICTE.2020070108. Conference on Learning Analytics and Knowledge, [27] M. A. Al-Khateeb, A. M. Alduwairi, Efect of Frankfurt, Germany, March 23-27, 2020, ACM, 2020, teaching geometry by slow-motion videos on the pp. 260–269. URL: https://doi.org/10.1145/3375462. 8th graders’ achievement, Int. J. Interact. Mob. 3375466. doi:10.1145/3375462.3375466. Technol. 14 (2020) 57–67. URL: https://www.online- [36] A. A. Mubarak, H. Cao, S. A. M. Ahmed, Predictive journals.org/index.php/i-jim/article/view/12985. learning analytics using deep learning model in [28] M. C. Sözeri, S. B. Kert, Inefectiveness of online in- moocs’ courses videos, Educ. Inf. Technol. 26 (2021) teractive video content developed for programming 371–392. URL: https://doi.org/10.1007/s10639-020education, Int. J. Comput. Sci. Educ. Sch. 4 (2021) 10273-6. doi:10.1007/s10639-020-10273-6. 49–69. URL: https://doi.org/10.21585/ijcses.v4i3.99. [37] B. Jeon, N. Park, Dropout prediction over weeks doi:10.21585/ijcses.v4i3.99. in moocs by learning representations of clicks and [29] A. A. Kuhail, M. S. Aqel, Interactive digital videos videos, CoRR abs/2002.01955 (2020). URL: https: and their impact on sixth graders’ english read- //arxiv.org/abs/2002.01955. arXiv:2002.01955. ing and vocabulary skills and retention, Int. J. [38] H. E. Aouifi, Y. Es-Saady, M. E. Hajji, M. Mimis, Inf. Commun. Technol. Educ. 16 (2020) 42–56. URL: H. Douzi, Toward student classification in educahttps://doi.org/10.4018/IJICTE.2020070104. doi:10. tional video courses using knowledge tracing, in: 4018/IJICTE.2020070104. M. Fakir, M. Baslam, R. E. Ayachi (Eds.), Business [30] D. Leisner, C. G. Zahn, A. Ruf, A. A. P. Cattaneo, Dif- Intelligence - 6th International Conference, CBI ferent ways of interacting with videos during learn- 2021, Beni Mellal, Morocco, May 27-29, 2021, Proing in secondary physics lessons, in: C. Stephanidis, ceedings, volume 416 of Lecture Notes in Business M. Antona (Eds.), HCI International 2020 - Posters Information Processing, Springer, 2021, pp. 73–82. - 22nd International Conference, HCII 2020, Copen- URL: https://doi.org/10.1007/978-3-030-76508-8_6. hagen, Denmark, July 19-24, 2020, Proceedings, Part doi:10.1007/978-3-030-76508-8\_6. II, volume 1225 of Communications in Computer [39] C. Schulten, S. Manske, A. Langner-Thiele, H. U. and Information Science, Springer, 2020, pp. 284– Hoppe, Digital value-adding chains in vocational 291. URL: https://doi.org/10.1007/978-3-030-50729- education: Automatic keyword extraction from 9_40. doi:10.1007/978-3-030-50729-9\_40. learning videos to provide learning resource recom[31] S. Chen, D. Wang, Y. Huang, Exploring the comple- mendations, in: C. Alario-Hoyos, M. J. Rodríguezmentary features of audio and text notes for video- Triana, M. Schefel, I. A. Sánchez, S. Dennerlein based learning in mobile settings, in: Extended (Eds.), Addressing Global Challenges and Quality Abstracts of the 2021 CHI Conference on Human Education - 15th European Conference on TechFactors in Computing Systems, 2021, pp. 1–7. nology Enhanced Learning, EC-TEL 2020, Heidel[32] X. Lu, Q. Li, X. Wang, Research on the impacts berg, Germany, September 14-18, 2020, Proceedings, of feedback in instructional videos on college stu- volume 12315 of Lecture Notes in Computer Science, dents’ attention and learning efects, in: W. Shen, Springer, 2020, pp. 15–29. URL: https://doi.org/10. J. A. Barthès, J. Luo, Y. Shi, J. Zhang (Eds.), 24th 1007/978-3-030-57717-9_2. doi:10.1007/978-3IEEE International Conference on Computer Sup- 030-57717-9\_2. ported Cooperative Work in Design, CSCWD 2021, [40] J. Jordán, S. Valero, C. Turró, V. J. Botti, RecomDalian, China, May 5-7, 2021, IEEE, 2021, pp. 513– mending learning videos for moocs and flipped 516. URL: https://doi.org/10.1109/CSCWD49262. classrooms, in: Y. Demazeau, T. Holvoet, J. M. 2021.9437774. doi:10.1109/CSCWD49262.2021. Corchado, S. Costantini (Eds.), Advances in Practi9437774. cal Applications of Agents, Multi-Agent Systems, [33] D. C. D. van Alten, C. Phielix, J. Janssen, L. Kester, and Trustworthiness. The PAAMS Collection - 18th Self-regulated learning support in flipped learning International Conference, PAAMS 2020, L’Aquila, videos enhances learning outcomes, Comput. Educ. Italy, October 7-9, 2020, Proceedings, volume 12092 of Lecture Notes in Computer Science, Springer, cational videos hierarchical indexing with ebooks, 2020, pp. 146–157. URL: https://doi.org/10.1007/ in: H. Mitsuhara, Y. Goda, Y. Ohashi, M. M. T. Ro978-3-030-49778-1_12. doi:10.1007/978-3-030- drigo, J. Shen, N. Venkatarayalu, G. Wong, M. Ya49778-1\_12. mada, C. Lei (Eds.), IEEE International Conference [41] M. C. Aytekin, S. Räbiger, Y. Saygin, Discov- on Teaching, Assessment, and Learning for Engiering the prerequisite relationships among in- neering, TALE 2020, Takamatsu, Japan, December structional videos from subtitles, in: A. N. 8-11, 2020, IEEE, 2020, pp. 482–489. URL: https: Raferty, J. Whitehill, C. Romero, V. Cavalli- //doi.org/10.1109/TALE48869.2020.9368461. doi:10. Sforza (Eds.), Proceedings of the 13th Interna- 1109/TALE48869.2020.9368461. tional Conference on Educational Data Mining, [47] S. Lallé, C. Conati, A data-driven student model EDM 2020, Fully virtual conference, July 10-13, to provide adaptive support during video watching 2020, International Educational Data Mining Soci- across moocs, in: I. I. Bittencourt, M. Cukurova, ety, 2020. URL: https://educationaldatamining.org/ K. Muldner, R. Luckin, E. Millán (Eds.), Artificial ifles/conferences/EDM2020/papers/paper_99.pdf. Intelligence in Education - 21st International Con[42] C. Tang, J. Liao, H. Wang, C. Sung, W. Lin, Con- ference, AIED 2020, Ifrane, Morocco, July 6-10, 2020, ceptguide: Supporting online video learning with Proceedings, Part I, volume 12163 of Lecture Notes concept map-based recommendation of learning in Computer Science, Springer, 2020, pp. 282–295. path, in: J. Leskovec, M. Grobelnik, M. Najork, URL: https://doi.org/10.1007/978-3-030-52237-7_23. J. Tang, L. Zia (Eds.), WWW ’21: The Web Con- doi:10.1007/978-3-030-52237-7\_23. ference 2021, Virtual Event / Ljubljana, Slovenia, [48] Y. Chen, C. Wang, Z. Jian, Research on evaluation April 19-23, 2021, ACM / IW3C2, 2021, pp. 2757– algorithm of teacher’s teaching enthusiasm based 2768. URL: https://doi.org/10.1145/3442381.3449808. on video, in: ICRAI 2020: 6th International Conferdoi:10.1145/3442381.3449808. ence on Robotics and Artificial Intelligence, Singa[43] D. I. Bleoanca, S. Heras, J. Palanca, V. Julián, M. C. pore, November 20-22, 2020, ACM, 2020, pp. 184– Mihaescu, LSI based mechanism for educational 191. URL: https://doi.org/10.1145/3449301.3449333. videos retrieval by transcripts processing, in: doi:10.1145/3449301.3449333. C. Analide, P. Novais, D. Camacho, H. Yin (Eds.), [49] T. Weinert, M. T. de Gafenco, M. S. Billert, Intelligent Data Engineering and Automated Learn- N. Boerner, Fostering interaction in higher ing - IDEAL 2020 - 21st International Conference, education with deliberate design of interactive Guimaraes, Portugal, November 4-6, 2020, Pro- learning videos, in: J. F. George, S. Paul, ceedings, Part I, volume 12489 of Lecture Notes R. De’, E. Karahanna, S. Sarker, G. Oestreicherin Computer Science, Springer, 2020, pp. 88–100. Singer (Eds.), Proceedings of the 41st InternaURL: https://doi.org/10.1007/978-3-030-62362-3_9. tional Conference on Information Systems, ICIS doi:10.1007/978-3-030-62362-3\_9. 2020, Making Digital Inclusive: Blending the Lo[44] X. Wang, W. Huang, Q. Liu, Y. Yin, Z. Huang, cak and the Global, Hyderabad, India, December L. Wu, J. Ma, X. Wang, Fine-grained similarity 13-16, 2020, Association for Information Systems, measurement between educational videos and ex- 2020. URL: https://aisel.aisnet.org/icis2020/digital_ ercises, in: C. W. Chen, R. Cucchiara, X. Hua, learning_env/digital_learning_env/12. G. Qi, E. Ricci, Z. Zhang, R. Zimmermann (Eds.), [50] J. Ge, X. Li, Design strategies of EFL learning MM ’20: The 28th ACM International Confer- videos: Exampled by a china MOOC, in: ICEIT 2020, ence on Multimedia, Virtual Event / Seattle, WA, Proceedings of the 9th International Conference USA, October 12-16, 2020, ACM, 2020, pp. 331– on Educational and Information Technology, Ox339. URL: https://doi.org/10.1145/3394171.3413783. ford, UK, February11-13, 2020, ACM, 2020, pp. 68– doi:10.1145/3394171.3413783. 71. URL: https://doi.org/10.1145/3383923.3383927. [45] A. Hartholt, A. Reilly, E. Fast, S. Mozgai, Introduc- doi:10.1145/3383923.3383927. ing canvas: Combining nonverbal behavior gener- [51] R. E. Mayer, C. Pilegard, Principles for managing ation with user-generated content to rapidly cre- essential processing in multimedia learning: Segate educational videos, in: S. Marsella, R. Jack, menting, pretraining, and modality principles, The H. H. Vilhjálmsson, P. Sequeira, E. S. Cross (Eds.), Cambridge handbook of multimedia learning (2005) IVA ’20: ACM International Conference on In- 169–182. telligent Virtual Agents, Virtual Event, Scotland, [52] R. Ramachandran, E. M. Sparck, M. Levis-Fitzgerald, UK, October 20-22, 2020, ACM, 2020, pp. 25:1– Investigating the efectiveness of using application25:3. URL: https://doi.org/10.1145/3383652.3423880. based science education videos in a general chemdoi:10.1145/3383652.3423880. istry lecture course, J. Chem. Educ. 96 (2019) [46] S. Horovitz, Y. Ohayon, Boocture: Automatic edu- 479–485. URL: https://doi.org/10.1021/acs.jchemed.

8b00777. doi:10.1021/acs.jchemed.8b00777. [53] G. Salomon, Television is "easy" and print is "tough": The diferential investment of mental efort in learning as a function of perceptions and attributions., Journal of Educational Psychology 76 (1984) 647–658. doi:https://doi.org/10.

1037/0022-0663.76.4.647. [54] K. Chorianopoulos, A taxonomy of asynchronous instructional video styles, The International Review of Research in Open and Distributed Learning 19 (2018) 294–311. [55] R. Mayer, R. E. Mayer, The Cambridge handbook of multimedia learning, Cambridge university press, 2005. [56] B. Clavié, K. Gal, Edubert: Pretrained deep language models for learning analytics, CoRR abs/1912.00690 (2019). URL: http://arxiv.org/abs/ 1912.00690. arXiv:1912.00690. [57] U. Gadiraju, R. Yu, S. Dietze, P. Holtz, Analyzing knowledge gain of users in informational search sessions on the web, in: C. Shah, N. J. Belkin, K. Byström, J. Huang, F. Scholer (Eds.), Proceedings of the 2018 Conference on Human Information Interaction and Retrieval, CHIIR 2018, New Brunswick, NJ, USA, March 11-15, 2018, ACM, 2018, pp. 2–11. URL: https://doi.org/10.1145/3176349.

3176381. doi:10.1145/3176349.3176381. [58] C. Otto, R. Yu, G. Pardi, J. von Hoyer, M. Rokicki,

A. Hoppe, P. Holtz, Y. Kammerer, S. Dietze, R. Ewerth, Predicting knowledge gain during web search based on multimedia resource consumption, in: I. Roll, D. S. McNamara, S. A. Sosnovsky, R. Luckin, V. Dimitrova (Eds.), Artificial Intelligence in Education - 22nd International Conference, AIED 2021, Utrecht, The Netherlands, June 14-18, 2021, Proceedings, Part I, volume 12748 of Lecture Notes in Computer Science, Springer, 2021, pp. 318–330.

URL: https://doi.org/10.1007/978-3-030-78292-4_26.

doi:10.1007/978-3-030-78292-4\_26. [59] J. Shi, C. Otto, A. Hoppe, P. Holtz, R. Ewerth, Investigating correlations of automatically extracted multimodal features and lecture video quality, in: Proceedings of the 1st International Workshop on Search as Learning with Multimedia Information, SALMM ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 11–19. URL: https: //doi.org/10.1145/3347451.3356731. doi:10.1145/ 3347451.3356731.