=Paper=
{{Paper
|id=Vol-3052/ipaper7
|storemode=property
|title=A Review on Recent Advances in Video-based Learning Research: Video Features, Interaction, Tools, and Technologies
|pdfUrl=https://ceur-ws.org/Vol-3052/paper7.pdf
|volume=Vol-3052
|authors=Evelyn Navarrete,,Anett Hoppe,,Ralph Ewerth
|dblpUrl=https://dblp.org/rec/conf/cikm/NavarreteHE21
}}
==A Review on Recent Advances in Video-based Learning Research: Video Features, Interaction, Tools, and Technologies==
<pdf width="1500px">https://ceur-ws.org/Vol-3052/paper7.pdf</pdf>
<pre>
A Review on Recent Advances in Video-based Learning
Research: Video Features, Interaction, Tools, and
Technologies
Evelyn Navarrete1 , Anett Hoppe1,2 and Ralph Ewerth1,2
1
    L3S Research Center, Leibniz University Hannover, Germany
2
    TIB – Leibniz Information Centre for Science and Technology, Hannover, Germany


                                             Abstract
                                             Human learning shifts stronger than ever towards online settings, and especially towards video platforms. There is an
                                             abundance of tutorials and lectures covering diverse topics, from fixing a bike to particle physics. While it is advantageous
                                             that learning resources are freely available on the Web, the quality of the resources varies a lot. Given the number of available
                                             videos, users need algorithmic support in finding helpful and entertaining learning resources.
                                                 In this paper, we present a review of the recent research literature (2020-2021) on video-based learning. We focus on
                                             publications that examine the characteristics of video content, analyze frequently used features and technologies, and, finally,
                                             derive conclusions on trends and possible future research directions.

                                             Keywords
                                             Video-based Learning, web-based learning, literature review, video features


1. Introduction                                                                                                       (c) teachers who seek to enhance their teaching by inte-
                                                                                                                      grating adapted multimedia resources, and finally, (d) stu-
The Web has fundamentally changed the way we learn.                                                                   dents who pursue a more effective learning process and
Especially video platforms, such as YouTube, play an                                                                  experience.
increasing role here – from the about 30 million video                                                                   In this paper, we provide a review of recent research
views a day, about 50% relate to some kind of learning                                                                on the analysis of video-based learning, with a focus on
content [1]. Another YouTube-related statistic states that                                                            studies that examine video characteristics. For this pur-
as of May 2019, about 500 hours of new content were                                                                   pose, we performed a systematic literature search. Here,
uploaded to the platform every minute [2], implying that                                                              we present a preliminary summary of our findings from
users are reliant on effective search and recommenda-                                                                 the most recent publications, covering the years 2020 and
tion algorithms to find content that is relevant to their                                                             2021. We systematize the contents of 41 reviewed papers
learning needs.                                                                                                       and provide an overview on (a) the chosen research ap-
   These algorithms need to consider multiple factors to                                                              proach (e.g., empirical study, tool prototype, production
provide suitable rankings and recommendations. Espe-                                                                  guidelines), (b) considered video characteristics, and (c)
cially in learning contexts, an open question is: When                                                                tasks and technologies used to develop VBL tools and
is a video the best way to learn, depending on the indi-                                                              frameworks. From these dimensions, we derive recent
vidual user, the learning objective, and context factors?                                                             research trends and identify gaps that provide directions
This question has been investigated in several studies                                                                for future research.
in recent years, examining characteristics of the videos                                                                 The rest of this paper is structured as follows: Sec-
and the surrounding platforms to determine when video-                                                                tion 2 presents the related work; Section 3 describes
based learning (VBL) processes are especially success-                                                                our methodology; Section 4 provides an overview of the
ful. The resulting insights are of interest for all the in-                                                           video-based learning field, its benefits, potentials, and the
volved stakeholders: (a) Platform providers who wish to                                                               current challenges; and explains why identifying relevant
provide their users with the best possible content from                                                               features in videos is important. Section 5 presents the
their database, but also (b) content producers who aim                                                                results of the review. Finally, we summarize our findings,
to develop efficient and entertaining learning resources,                                                             point out the limitations, and conclude with indications
Proceedings of the CIKM 2021 Workshops, November 1–5, Gold Coast,
                                                                                                                      for future research in Section 6.
Queensland, Australia
" navarrete@l3s.de (E. Navarrete); anett.hoppe@tib.eu
(A. Hoppe); ralph.ewerth@tib.eu (R. Ewerth)                                                                           2. Related Work
 0000-0002-5610-9908 (E. Navarrete); 0000-0002-1452-9509
(A. Hoppe); 0000-0003-0918-6297 (R. Ewerth)                                                                           We found four survey articles that examine video-based
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                       Commons License Attribution 4.0 International (CC BY 4.0).                     learning. Their core characteristics are summed up in
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
Table 1
Core information of related survey articles Reference & publication year, temporal scope of the survey, number of reviewed
papers and reviewed characteristics
 Ref.           Review period        N° papers       Review dimensions
 [3] (2013)     2000-2012            166             1) Type of research                      4) Technology type
                                                     2) Sample                                5) Style of use
                                                     3) Subject area
 [4] (2014)     2003-2014            76              1) Learning effectiveness                3) Design
                                                     2) Teaching methods                      4) Reflection
 [5] (2018)     2007-2017            178             1) Video type                            3) Sample size
                                                     2) Population                            5) Video features
                                                     3) Control of prior knowledge
 [6] (2020)     2008-2019            39              Teacher perception and reflection


Table 1. All of them reference the significant development         The present review follows a similar format to the one
in research on VBL [4] or the increase in publications          presented by Poquet et al. [5], such that our results are fo-
[5, 3] as a motivation for their work. Giannakos [3] and        cused around the video characteristics that are analyzed
Poquet et al. [5] find that there is an increase specifically   or experimented with, and collect additional context in-
in empirical studies.                                           formation such as study sample sizes and covered subject
   All four surveys focus on different key characteristics      areas. Further, this review is more comprehensive in that
(see Table 1, column “Dimensions”); and derive common           it does not include only papers that studied the effects on
themes and research trends based on these. For instance,        the learning effectiveness but any paper that examines
Giannakos [3] centers on the development of the field and       features of video-based learning materials regardless of
shows a shift in the used technologies. Yousef et al. [4]       the ultimate goal. For example, papers whose aim is to
concentrate on the effectiveness of VBL and provide a           propose tools, analyze data from e-learning platforms or
review of teaching methods using video, design features,        suggest features that should be considered in the design
and how video-based contents and tools are integrated.          of educational videos.Also, in contrast to Giannakos [3]
Poquet et al. [5], spotlight the video characteristics that     and Yousef et al. [4], this review deepens into the techno-
have been analyzed in research with specific regard to          logical trends of VBL by detailing not only the developed
their influence on learning effectiveness. As stated by         tools but the technological tasks carried out to develop
the authors, the most often used metrics to qualify the         such tools. Moreover, our review complements previ-
effectiveness are recall (remembering information) and          ous studies by focusing on the most recent publications
transfer (applying what was learned to different scenar-        (2020-2021).
ios), followed by motivation, cognitive load, mental effort,       Inductively, we deduce a taxonomy of researched video
attention, and affect. The most analyzed features are text,     features. This is used to structure our account of research
audio, animations, and the video production style. Ta-          in the domain; and will guide the successive extension
ble 2 summarizes the features derived in their review. It       of our review towards earlier works.
is also found that when data sources are available, the
most common sources are eye-tracking and click-stream
data. A very recent work by Sablic et al. [6] reviews           3. Methodology
approaches to support teachers’ self-reflection and pro-
                                                                This literature review covers studies extracted from three
fessional development with video recordings of teaching
                                                                academic databases: (a) the Digital Bibliography and Li-
sessions.
                                                                brary Project (DBLP), (b) the Association for Computing
   Giannakos [3] discovers a shift in the focus of research
                                                                Machinery (ACM), (c) and SpringerLink databases.
from social science disciplines to more technological do-
                                                                   These are specialized in the computer science field and,
mains. In accordance to these findings, Poquet et al. [5]
                                                                thus, provide suitable domain specificity not only for our
state that more often than other subjects, the reviewed
                                                                review of technological systems surrounding video-based
studies focus on STEM topics. The authors further find
                                                                learning but also to identify the state of the art in this
that these are followed by psychology, humanities, and
                                                                specific research area.
social domains. The identified research trends include
                                                                   We iteratively refined our set of query terms, to include
collaborative video-based learning [4], the effect of anno-
                                                                new vocabulary from the research papers. Whenever we
tation and authoring features [4], and the use of videos
                                                                encountered a new term to describe research around VBL
as an instrument for reflective teacher education [6].
                                                                in an article, we went back to the databases and re-ran
Table 2                                                         4. Video-based Learning
Literature review findings of Poquet et al. [5]
                                                               This section gives a short introduction to video-based
  Category         Features
                                                               learning, including a discussion of potentials and chal-
  Presentation     Video modality (e.g., text, audio, anima-
  features         tion, and voice over slides)                lenges as outlined in the reviewed papers, and motivates
                   Text customization (e.g., personal pro-     our specific interest in research works on the analysis
                   nouns)                                      of video characteristics. It lays the foundation for the
                   Signaling (e.g., highlight relevant infor-  following discussion of the literature review with a focus
                   mation)                                     on video features.
   Others          Video Usage (e.g., videos instead of face-     The task of learning by using video sources has
                   to-face-lectures)                           adopted the name of video-based learning (VBL). This ter-
                   Content (e.g., metaphors in contrast to     minology is used in several studies (e.g., [3, 5, 4, 6]). In the
                   descriptive language)                       context of this review, the term “video-based learning” is
                   Task (e.g., annotation features)
                                                               employed in the same way, that is, as gaining knowledge
                   Quizzes
                   Learner characteristics                     and skills by using video material. This process is mani-
                   Learner Control (e.g., pause, play)         fold and involves several actors beyond the learner and
                   Distraction                                 the video such as learning platforms. Consequently, we
   Dependent       Recall (remembering information)            consider VBL in its wider context by including features
   variables       Transfer (applying what was learned to      of video platforms.
                   different scenarios)                           Potentials: It has been argued that VBL can be a pow-
                                                               erful tool to enhance teaching and learning. Yousef et
Table 3                                                        al. [4], for instance, state educational videos to be effec-
Search queries: Base terms in the first column are joined (us- tive, able to increase motivation, engage the learner, and
ing logical AND) with extensions in the second column          support diverse learning styles. The authors further un-
                                                               derline the videos’ potential to convey information that is
  Base             Extension
                                                               hard to capture in text, e.g., to visualize procedural infor-
  Video-based      Learning
  Video            Education Educational Instructional
                                                               mation. They cite several studies that present evidence
                   Learn        Learning      Instruction      about videos improving not only the learning outcome
                   Teach        Teaching      Explaining       but also the satisfaction, interaction, and communication
                   Knowing      Knowledge Explanatory          among learners.
                   Tutorial     Student       Classroom           Videos are already an integral part of students’ learn-
                   Pupil        School        University       ing habits [1] and common teaching practices [5]. Indeed,
                   Skills                                      some even state that in online education, videos might
                                                               become the main medium [7]. One reason for this devel-
                                                               opment is certainly availability: Video production has
the search with the new terminology. The final set of become much easier [5]; and dedicated platforms pro-
query combinations can be found in Table 3.                    vide simple, scalable ways for dissemination [3, 8]. This
   The resulting set of papers was filtered based on the allows laypeople and professional teachers to make their
following criteria:                                            materials available to the world.
                                                                  Challenges: While offering many proven benefits,
     1. Videos are used in academic learning scenarios there are challenges involved in the production and dis-
         (this excludes tutorials for everyday tasks, such semination of efficient video-based learning resources.
         as cooking instructions).                             Guo et al. [8] found in interviews that the video edit-
     2. The article considers videos as learning resources ing “was not done with any specific pedagogical “design
         (this excludes the use of video recordings in re- patterns” in mind” [8, p. 45]. The authors also conclude
         flective scenarios, such as in teacher education). that the production of videos is based on “decisions on
                                                               anecdotes, folk wisdom, and best practices distilled from
     3. The article examines specific characteristics and studies with at most dozens of subjects and hundreds of
         features of the video-based learning materials.       video watching sessions.” [8, p. 42], which evidences the
                                                               lack of understanding of how to produce effective videos.
   Here, we report on a subset of 41 articles, focusing on Furthermore, it is still unclear if techniques from classical
very recent publications from the years 2020 and 2021. It classroom scenarios can be directly transferred to VBL
is planned to extend the review in the future to cover a environments, or if and to what degree VBL scenarios
longer period time.                                            need customized didactic approaches and concepts.
                                                                  Another challenge arises from the question of how a
learner can be supported in her video-based learning tra-       current insights and research trends. The discussion of
jectory. It is widely recognized that learners have diverse     learner-related features, such as previous knowledge and
needs [8], depending on their current learning objective,       skills, is out of the scope of this review (see, for instance,
context, and general preferences. How to analyze and            [14] for a recent review).
index video resources to satisfy these individual require-
ments is still an open research question. As we will show
below, exploring possible features for the automatic deter-     5. Literature Review
mination of video content and quality is one active area
                                                                This section summarizes our findings from the analysis
of research. These explorations are indispensable in the
                                                                of 41 publications on video-based learning in the pe-
quest to allow diverse learners a satisfying video-based
                                                                riod 2020-2021. We analyze (a) the principal research
learning experience.
                                                                approach (Section 5.1); (b) the target task and the used
   Lastly, MOOCs (Massive Open Online Courses) and
                                                                technologies (Section 5.2); (c) the video features explored
similar educational courses are a central research area,
                                                                in the studies (Section 5.3); (d) and, finally, verify former
mainly because they have enabled data analysis at scale.
                                                                surveys’ finding that there is a specific focus on STEM
But it is precisely the large-scale nature of MOOCs that
                                                                subjects in VBL studies (Section 5.4).
can carry issues in handling and processing such a mas-
sive amount of information. As stated by Guo et al. “The
scale of data from MOOC interaction logs hundreds of            5.1. Research Approaches
thousands of students from around the world and mil-            Four main research approaches have been identified in
lions of video watching sessions is four orders of mag-         the reviewed literature: (1) Controlled experiments,
nitude larger than those available in prior studies” [8,        which in this study are defined as experiments where
p. 42]. Moreover, critics directed to the research commu-       one or a few variables are manipulated (according to the
nity have been raised as well. For example, a challenge         learning context, video characteristics, or content) and
that needs to be confronted in these big scenarios is find-     where the impact of this manipulation is measured by
ing how to incorporate controlled experiments [4] and           some target metric. This group includes experiments run
methods or instruments from other fields, such as the           in laboratory settings, but also those performed in real-
field of psychology, instead of relying only on data anal-      istic academic environments such as university courses.
ysis [5].                                                       This category accounts for 22 instances (54%) in the re-
   Importance of features: Research on VBL poses a              viewed papers. (2) Novel tools, architectures, or anal-
number of relevant questions: How can efficient learning        ysis pipelines surrounding video-based learning sce-
be facilitated and improved? How to enable long-term            narios are covered in 16 (39%) articles. (3) Design prin-
learning success? How to ensure the learner’s engage-           ciples and guidelines for the creation of educational
ment and satisfaction? The answer to these questions            videos are presented in two publications (5%). (4) Results
strongly relies on one base research question: What char-       of the analysis of data generated by users in a video-
acterizes an effective learning video? [8, 4, 5] The search     based online learning platform are presented in one paper
for relevant video features has been widely studied from        (2%).
the perspective of different research fields (e.g., [9]), and      Table 4 lists all the publications in each category. The
in an exploration of what is possible with current tech-        high number of empirical studies matches Poquet et al.’s
nologies (e.g., [10]). In consequence, there is a multitude     [5] statement that, especially after 2016, empirical works
of variables that have been extracted, studied, and ma-         are on the rise. However, an extension of the reviewed pe-
nipulated. Video is a complex medium combining audio,           riod time is necessary to see if our findings concord with
text, and visual information from all of which features         the authors’ temporal placement of the change. Within
can be extracted and combined. These can range from             this category (controlled experiments), 21 out of 22 stud-
low-level features like the use of audio quality features       ies reported the number of participants in the experiment
(e.g.,[11]), to high-level features where semantics try to      (sample size). The smallest study reported 12 participants
be captured (e.g., [12]). Features can also not necessar-       [15] and the biggest acquired a group of 229 learners
ily come directly from the video but can result from the        [16]. The average number of participants was 85 learn-
interaction with a video, e.g., views and likes (e.g., [13]).   ers (SD=56). Studies that focused on the development of
Additional to the characteristics of the video, it is vital     tools (group (2)) or data analysis of learning platforms
to study its environment. The success of a VBL process          such as MOOCs (group (4)) often do not report a sample
is further influenced by functionalities of the surround-       size.
ing platform, the learner’s context, and the instructor            Some of the subsequent sections focus on a subset of
captured in the video.                                          the groups since not all dimensions can be meaningfully
   In our review, we focus on those features directly           extracted from all research objectives. The following
related to the learning video, with the goal to outline
review of used technologies, for instance, only makes              a Coursera course (a MOOC provider) [37]. All these
sense in the context of works that have a technology               recent examples make use of deep learning technolo-
component, and will, consequently, mostly cover works              gies, specifically, neural networks with gated recurrent
of the group (2).                                                  units [37], recurrent neural networks (RNN) [38], or long
                                                                   short-term memory neural networks (LSTM) [36]. These
Table 4                                                            classifiers have been mainly fed with real-time features
Papers according to the research approach                          [36, 37], especially play, pause, and speed rate, however,
                                                                   aggregated data [37] and results of quizzes have also been
  Research Approach               Publications
                                                                   explored [38].
  Controlled Experiments          [16, 15, 17, 18, 19, 20, 10, 21,
                                  22, 23, 24, 25, 26, 27, 28, 29,     Two of the 16 reviewed papers in this section (13%)
                                  30, 31, 32, 33, 34, 35]          focus  on video segmentation. Video as a learning
  Tool, architecture, pipeline    [11, 12, 36, 37, 38, 13, 39, 40, medium has the downside that all information is pre-
                                  41, 42, 43, 44, 45, 46, 47, 48]  sented sequentially. Unlike text, videos cannot be easily
  Design principles               [49, 50]                         skimmed to find relevant information. This can be alle-
  Data analysis                   [9]                              viated by automatic video segmentation. In our sample,
                                                                   one paper aims to automatically determine important
                                                                   segments in the educational content using a bidirectional
                                                                   LSTM network classifier built on pre-trained models to
5.2. Target Task and Used Technologies                             extract text and visual features [11]. Das & Das [12]
                                                                   improve on topical segmentation of lectures videos by
This section focuses on the 16 studies which present a identifying concepts in the speech transcripts. To achieve
technology component. From these, roughly a third of this, speech-to-text-technologies are used, pre-trained
the studies considered in this section (N=5, 31%) present neural network models are to analyze the resulting text,
recommender systems that aim to support learners in and the cluster centroid algorithm is used to group the
finding suitable learning material. One of the systems resulting concepts.
does not only recommend video resources but also learn-               The remaining six studies (38%) focus on diverse tasks
ing materials in other modalities (e.g., text articles such such as video indexing [46], video ranking [43], video
as Wikipedia pages [39]). Another two systems provide customization according to the type of learner [47], pre-
the user with a video sequence (playlist) instead of single diction of instructors’ enthusiasm [48], matching videos
learning contents [41, 42].                                        with practical exercises [44] and helping educators with
   A popular approach to recommend the material is the creation of teaching materials [45].
based on the extraction of topical keywords from the
video’s speech transcript [39, 40, 42], especially, by using 5.3. Features
the tf-idf algorithm. Those are then used to compute
the similarity of the currently viewed video to the other The core focus of this review is the collection of video
ones in the database [39, 40]. Tavakoli et al. [13] suggest characteristics that have been studied in the literature
not only use the current video for this but all resources (2020 and 2021). The objective is to provide an account
previously viewed by the user. They combine this infor- of how video features have been investigated, extracted,
mation with a profile of the learner’s skill set and predict and varied in the context of studies on video-based learn-
relevant videos using a Random Forest classifier. Another ing in the computer science field. We developed a feature
technique is to detect prerequisite relations between the taxonomy that groups the video features into categories,
videos to generate sequences organized by difficulty and which allows for a structured presentation of our find-
complexity [41, 42]. Lastly, Tang et al. [42] addition- ings.
ally analyzes viewer comments with sentiment analysis                 We identified eight high-level categories: (a) audio fea-
to generate a playlist, assuming that positive-sounding tures, (b) visual features, (c) textual features, (d) features
comments point to engaging learning material.                      related to the instructor’s behavior, (e) features result-
   Other common use cases are systems for the forecast- ing from the interaction of the learner with the video
ing of learning success (N=3, 19%). “Success” can be (e.g., likes), (f) interactive features, (g) production style
operationalized in different ways: In two studies, the (e.g., Khan style), and (h) features related to instructional
objective is to predict the final score of the learner as design principles. All of these will be defined and their
“pass” or “fail”. The first case is in the context of passing appearances reviewed in separate paragraphs.
a university MOOC course [36], and the second case is                 Audio features are all features that relate to the ac-
about passing video quizzes presented by videos in an tual video’s audio stream and also the features that can
e-learning platform [38]. In a third study, the target is be extracted from it. This includes audio records (e.g.,
to forecast the probability of a student dropping out of [39]), and quality-related features, such as energy, en-
Table 5
Features taxonomy
   Category            Sub-category                                           Examples
   Audio features      Audio records [39]
                       Quality-related features [11, 9, 45, 50]               Energy, entropy, spectral features
                       Person-related audio features [24, 45, 48, 50]         Clarity of instructor’s voice, speech rate, vocabu-
                                                                              lary, attention guiding emphasis
   Visual features     Frames [11, 12, 44]
                       Representation features (directing attention)          Animations, realistic visuals, visual cues (e.g.,
                       [24, 25, 26, 27, 19, 45]                               color-coding, highlighting, arrows)
                       Enhanced visuals [23, 20, 10, 21, 22]                  Augmented/virtual reality, 360-degree features
                       Quality-related features [9]
   Text                Transcripts [11, 12, 13, 39, 40, 41, 42, 43, 44, 46]
                       Metadata [13, 40, 42, 36, 9]                           Titles, keywords, tags, video length
                       Visual [24, 44]                                        Visual text (text extracted from frames)
                                                                              Textual cues (guiding attention to a key location
                                                                              in a slide)
   Instructor’s        Gestures [16, 25, 48]                                  Beat gestures (e.g., hand strokes, natural hand-
   behavior                                                                   waving, and pointing gestures that aim at high-
                                                                              light the speech), facial emotions
                       Body-related [48]                                      Volume (expansion of the teacher’s body), pose
                       Personality [45]
   Interaction         Real-time features [36, 37, 33, 26, 34, 35, 47]        Play, pause, rate-change (speed), seek forward,
   between learners                                                           seek backward
   and the video                                                              Percentage of the video that was watched
                                                                              Average proportion of videos watched per week
                                                                              Standard deviation of the proportion of videos
                                                                              watched per week
                       Popularity [13, 9]                                     Number of views, user ratings, relevancy score
                                                                              (rank in search results)
   Interactive         [38, 33, 30, 28, 26, 32, 29, 31]                       Quizzes, annotation, feedback, exchange (ac-
   features                                                                   tions to interact with other learners)
   Production style    [18, 32, 15, 17, 19]                                   Tutorial, lecture, Khan style, talking head, Dia-
                                                                              logue and monologue, voice over slides, only-text,
                                                                              animations included
   Instructional de-   Principles of multimedia learning [9]                  Coherence, signalling, spatial contiguity, seg-
   sign principles                                                            mentation (information in segments, rather than
                                                                              a long stream), pre-training (present first the
                                                                              basic information), modality (graphics and spo-
                                                                              ken words), multimedia (words together with pic-
                                                                              tures), personalization (informal conversational
                                                                              style), voice (human rather than a computer
                                                                              voice), image (animations)


tropy, and spectral features which allow making pre-                to text and still image, is that movement can guide the
dictions on how well the auditory information can be                learner’s attention. These types of features are captured
perceived, e.g., [9, 11, 45]. A second big cluster is person-       in the category visual representation features where we
related audio features which groups characteristics di-             can find, for example, highlighting and visual cues (e.g.,
rectly related to the instructor’s expression [24, 45, 48].         animated and appearing elements) [24, 25, 26, 27, 19,
This includes, for instance, metrics for the teacher’s voice        45]. There is also work investigating visual quality fea-
[45, 48], speech rate [50], used vocabulary [50], emphasis          tures [9]. Finally, some works explore enhanced visual
on specific words that guide the listener’s attention [24].         representations by including 360-degree scene represen-
   The category visual features contains analyses or                tations [20, 10, 21], augmented [23], and virtual reality
adaptations related to the video’s image information.               features [22].
This includes the analysis of information in video frames              In the category text features, we group every text-
as images [11, 12, 44]. Particular to video, in contrast            based information which is available about the video
or can be extracted from it. It is common practice to          several (dialogue, interview), and in which perspective
generate a speech transcript from the spoken content in        the content is presented (upfront, or over-the-shoulder
a learning video. This pre-processing step transforms          view in tutorials). Production style was mentioned sev-
hard-to-analyze speech into written text, for which so-        eral times as a feature in our sample [18, 32, 15, 17, 19].
phisticated and efficient analysis methods are available.         There is a number of psychological and educational
Indeed, in our sample, speech transcripts are widely           theories on how to efficiently support learners with mul-
used [11, 12, 13, 39, 40, 41, 42, 43, 44, 46], and are ref-    timedia resources, references to those are collected in the
erenced in all the works on video segmentation [11, 12],       category instructional design principles. The only
and on recommending subsequent videos [36, 37, 38].            theoretical element referenced in our sample is the set
   Many video platforms provide metadata about the             of Multimedia Learning Principles introduced by Mayer
hosted videos, which are widely used as an easily avail-       [55], which was used by Eradze et al. [9]. They collected
able data source. These include structured information         manual annotations that state whether a video follows
about the video’s title, attributed keywords, tags, and sim-   those design principles (see Table 5 for examples). The
ilar [13, 40, 42]. Video length has been used, for instance,   aim was to find a correlation between the principles and
by Mubarak et al. [36] as a factor to predict the learning     students’ perceptions regarding the quality of the video.
outcome, and by Tavakoli et al. [13] to suggest similar           As shown by the taxonomy in Table 5, from all the
videos. Finally, text, especially in learning videos, also     studies reviewed in this research work (2020-2021), the
appears to support and visualize the speakers’ message         most explored video features are text-related features,
in the form of superimposed scene text, often in the form      especially transcripts. The next most investigated are in-
of presentation slides (visual text) [24, 44].                 teractive features, which study additional functionalities
   In every learning process, the teacher is a central fa-     (quizzes, note-taking) that impact the learning success.
cilitator, there is thus a number of features related to       In third place, we find real-time features. Other usually
instructor behavior [16, 25, 45, 48]. This group cap-          explored features are visual features, especially features
tures gesturing [16, 25] and facial emotions [48], features    that direct attention (e.g., animation), enhanced visuals
related to the body such as volume and pose [48], and also     (e.g., 360-degree features), and production style (e.g., talk-
information about the speaker’s personality [45], which        ing head). These findings are similar to the results of
have been used as base techniques to determine higher-         Poquet et al. [5] who show that text, animation, audio,
level information.                                             and production style are the most explored features.
   Several studies use learner interactions with the              Lastly, we found that the impact of the above men-
video as a feature. This includes real-time features of        tioned features has been measured mainly through the
a user interacting with a certain instructional video          learning outcome using metrics such as recall and trans-
[36, 37, 33, 26, 34, 35, 47], e.g., playing and pausing a      fer [16, 33, 30, 23, 24, 25, 28, 26, 27, 10, 32, 15, 29, 22, 34, 17,
video, skipping video content or interrupting; and aggre-      35, 19], which, again, conforms to the results of Poquet
gated measures, such as the number and ratio of videos         et al. [5]. Other frequently used metrics are: cognitive
watched in a time period, or the aggregated behavior           load [16, 25, 32, 22, 34, 17, 31, 19, 33, 23, 26, 18, 32, 22],
of several users on a certain video (e.g., [47]). Besides,     motivation [30, 23, 20, 10, 22, 17], self-efficacy [28, 22, 17],
user interactions are used as indicators of video popular-     mental effort [16, 32, 19], eye-gaze direction [25, 15, 17],
ity [13, 9], in form of the number of views and ratings.       engagement [20, 26], comprehension (understanding)
   Interactive functionalities surrounding the                 [21, 29], and social presence [16, 17]. Of these, moti-
video: A critical challenge in videos is that information      vation, cognitive load, mental effort, and the results of
is transient [51] and consumed in a rather passive             eye-tracking were also included in the findings of Po-
way [52, 53]. This invites superficial processing              quet et al. [5] as often used metrics. On the other hand,
and leads to challenges in knowledge acquisition               the less frequently used metrics encompass: enjoyment
and integration. In consequence, several studies               [10], agent-persona (credibility, engagement, and learn-
investigate functionalities to actively involve the            ing facilitating) [16], affective rating [16], para-social
user [38, 33, 30, 28, 26, 32, 29, 31] by adding functionali-   interaction [16], self-regulated learning [33], sense of
ties such as interactive quizzes, collaborative elements,      presence (feeling of being present in the environment)
or note-taking areas.                                          [10], perceived benefits [21], confidence [19], creative
   The production style category classifies character-         thinking [22], application [21], analysis [21], synthesis
istic types of video design [54]. It includes rough sub-       [21], evaluation [21], attention span [32], views about
categories of how the learning setting is composed – e.g.,     the teaching technique [27], facial temperature [19], and
if people are visible (talking-head video, lecture setting     challenge [19].
with or without an audience) or not (voice-over-slides,
Khan-style lecture, animations, and films of real-world
phenomena). If there is a single speaker (monologue) or
5.4. Covered Subject Domain                                   to improve video segmentation for educational videos;
                                                              the approaches widely rely on pre-trained models for the
Previous literature reviews stated that STEM (Science,
                                                              extraction of textual and visual features.
Technology, Engineering & Mathematics) areas are over-
                                                                 Video features: In general, the most explored video
represented in the investigation of VBL [3, 5]. This is
                                                              features are text-related, especially metadata and speech
confirmed in our sample of research papers from 2020-
                                                              transcripts. The second most studied features are inter-
2021. In our first category of controlled experiments, 14
                                                              active (such as interactive quizzes and other additional
of 22 studies (64%) use videos on some STEM domain.
                                                              functionalities), followed by click-stream features, visual
Specifically, computer science [28, 20, 17, 35] and physics
                                                              features, and production style.
[30, 18] are often targeted. In the remaining eight pa-
                                                                 Specifically, in the controlled experiments category,
pers, teaching English as a foreign language is the most
                                                              the main features manipulated are related to interactive
common discipline [23, 29], followed by other social and
                                                              features (32% of the studies), principally quizzes and an-
humanities domains.
                                                              notations. Other often explored features in this category
   In the research categories “Data analysis” and “Design
                                                              are enhanced visual features, mainly 360-degree features,
principles & guidelines” the main focus is also on STEM
                                                              production style features (18% of the studies) with an spe-
video content. However, the small sample size in these
                                                              cial emphasis in dialogue and monologue, and features
categories does not lead to further interpretation. Papers
                                                              that direct the attention of the learner (18% of the studies)
in the “Tools” research category do not usually focus on
                                                              (e.g., animations.)
specific disciplines but aim at offering general solutions
                                                                 Subject domain: Based on our sample, we found that
for VBL. This is why no specific analysis regarding the
                                                              STEM domains are prevalent in the targeted subject areas.
subject domain is provided.
                                                                 Tendencies and directions: In comparison to previ-
                                                              ous reviews, the tendencies that could be reaffirmed in
6. Conclusions                                                this study are mainly related to the type of research that
                                                              was performed, the type of features that have been manip-
In this paper, we have presented a survey on 41 papers        ulated, and the subject domain in which the educational
in the field of video-based learning (2020-2021) to pro-      videos have focused. Specifically, the trends discovered
vide a structured account of current tendencies in the        by Poquet et al. [5] could be confirmed by our review:
field. Specifically, we identified (a) common research        (1) It was observed that the research community-directed
approaches, (b) target applications and the used tech-        effort principally towards controlled experiments (54%
nologies, (c) investigated video features and developed       of the reviewed papers). (2) Text, animation, audio, and
a taxonomy which allows structuring the related work,         production style are situated among the most explored
(d) and, finally, collected information on the covered sub-   features. (3) Recall and transfer are the most-used metrics
ject domains of the learning environments.                    to measure learning outcome and effectiveness. (4) Cog-
   Research approaches: The results show that more            nitive load, mental effort, motivation, and the results
than half of the studies are dedicated to controlled ex-      of eye-tracking are considered among the most popular
periments (54% of the studies). A significant proportion      metrics for measuring learning experience. (5) Target dis-
focused on tool development (39% of the studies), while       ciplines of the videos were primarily addressing STEM
a very small proportion of studies dedicated effort to        topics
analyze data from learning platforms such as MOOCs               Research on video, text, and image analysis has been
(2% of the studies) and proposed design principles and        dominated by deep learning approaches in recent years.
guidelines (5% of the studies).                               Increasingly, these also find application in the analysis of
   Tools and technologies: The main applications in           educational data, exemplified by their dominance in learn-
our sample are three: (1) recommender systems (35%            ing outcome prediction, educational video segmentation,
of the studies in the “Tools” category), (2) predictors       and feature extraction. In most cases, the developed sys-
of learning outcome (19%), and (3) video segmentation         tems make use of general-purpose pre-trained models,
techniques (13%). Technology-wise, keyword-based anal-        even though first examples of models specifically trained
yses (e.g., using tf-idf to detect pertinent keywords) are    on educational data exist, e.g., EduBERT [56], a word
the most commonly adopted. Approaches that aim to             embedding trained on educational materials. It will be
provide video playlists (as opposed to singular video rec-    exciting to see the potential of deep learning techniques
ommendations) often refer to techniques from the area         properly adapted to educational media, and how they
of prerequisite detection. Recent approaches to forecast-     will enhance educational applications.
ing learning success mainly build upon deep learning             There are various studies that investigate learning on
techniques (e.g., multilayer perceptrons, gated recurrent     the web, mainly focusing on features of user behavior
units, RNNs, and LSTMs). A similar tendency towards           [57] or textual materials. Only recent publications such as
deep learning approaches can be seen in methods aiming        [58] explore the role of videos in such web-based learning
processes. However, as shown here, the analysis of video         [2] J. Clement,        Hours of video uploaded to
as a learning resource is a highly active research area.             youtube every minute, Statista.com (2019). URL:
The related work outlined here can be a guide for future             https://www.statista.com/statistics/259477/hours-
works which aim to integrate video-based resources into              of-video-uploaded-to-youtube-every-minute/.
individualized online-learning trajectories, and will pro-       [3] M. N. Giannakos, Exploring the video-based learn-
vide interested scientists with a starting point for their           ing research: A review of the literature, Br. J. Educ.
research.                                                            Technol. 44 (2013) 191. URL: https://doi.org/10.1111/
   The reviewed papers investigated a wide range of video            bjet.12070. doi:10.1111/bjet.12070.
features, including information from the visual content,         [4] A. M. F. Yousef, M. A. Chatti, U. Schroeder, The
be it textual or image, and audio information. However,              state of video-based learning: A review and future
the modalities were mainly considered separately, with-              perspectives, Int. J. Adv. Life Sci 6 (2014) 122–135.
out regard to their interactions. Research on VBL effi-          [5] O. Poquet, L. Lim, N. Mirriahi, S. Dawson, Video
ciency could be brought forward by a truly multi-modal               and learning: a systematic review (2007-2017), in:
analysis of educational videos, which explores how spo-              A. Pardo, K. Bartimote-Aufflick, G. Lynch, S. B.
ken language, shown text and image, and speaker be-                  Shum, R. Ferguson, A. Merceron, X. Ochoa (Eds.),
havior exhibit a combined message as, for instance, ex-              Proceedings of the 8th International Conference on
plored by Shi et al. [59]. This is in line with psychological        Learning Analytics and Knowledge, LAK 2018, Syd-
research on instructional design, and might bring new                ney, NSW, Australia, March 07-09, 2018, ACM, 2018,
insights for educational research in quantifying formal              pp. 151–160. URL: https://doi.org/10.1145/3170358.
relationships between image, text and speech in their                3170376. doi:10.1145/3170358.3170376.
impact on learning success.                                      [6] M. Sablić, A. Mirosavljević, A. Škugor, Video-
   Limitations and future work: This review, with a                  based learning (vbl)—past, present and future: An
focus on publications from the years 2020 and 2021, only             overview of the research published from 2008 to
considers a short period time. It is planned to extend the           2019, Technology, Knowledge and Learning (2020)
study in the future, to provide a more thorough analysis             1–17.
of tendencies and directions in VBL. Moreover, although          [7] A. Hansch, L. Hillers, K. McConachie, C. Newman,
the keyword set used in the paper retrieval process is               T. Schildhauer, J. P. Schmidt, Video and online
extensive, we plan to broaden this set to ensure a more              learning: Critical reflections and findings from the
comprehensive literature review. As pointed out by the               field (2015).
reviewers of this paper, there is a surprisingly low num-        [8] P. J. Guo, J. Kim, R. Rubin, How video production
ber of MOOC-related studies included in our dataset. This            affects student engagement: an empirical study of
will be considered in the extension of our keyword set.              MOOC videos, in: M. Sahami, A. Fox, M. A. Hearst,
                                                                     M. T. H. Chi (Eds.), First (2014) ACM Conference
                                                                     on Learning @ Scale, L@S 2014, Atlanta, GA, USA,
Acknowledgments                                                      March 4-5, 2014, ACM, 2014, pp. 41–50. URL: https:
                                                                     //doi.org/10.1145/2556325.2566239. doi:10.1145/
This work has been partly supported by the Ministry
                                                                     2556325.2566239.
of Science and Education of Lower Saxony, Germany,
                                                                 [9] M. Eradze, A. Dipace, B. Fazlagic, A. D. Pietro, Semi-
through the Graduate training network “LernMINT:
                                                                     automated student feedback and theory-driven
Data-assisted classroom teaching in the MINT subjects”.
                                                                     video-analytics: An exploratory study on educa-
Also, this study has been partly supported by the Leib-
                                                                     tional value of videos, in: L. S. Agrati, D. Bur-
niz Association, Germany (Leibniz Competition 2018,
                                                                     gos, P. Ducange, P. Limone, L. Perla, P. Picerno,
funding line “Collaborative Excellence”, project SALIENT
                                                                     P. Raviolo, C. M. Stracke (Eds.), Bridges and Me-
[K68/2017]). Finally, we would like to thank the reviewers
                                                                     diation in Higher Distance Education - Second In-
for their valuable feedback.
                                                                     ternational Workshop, HELMeTO 2020, Bari, BA,
                                                                     Italy, September 17-18, 2020, Revised Selected Pa-
References                                                           pers, volume 1344 of Communications in Computer
                                                                     and Information Science, Springer, 2020, pp. 28–39.
 [1] A. Smith, S. Toor, P. Van Kessel, Many turn to                  URL: https://doi.org/10.1007/978-3-030-67435-9_3.
     youtube for children’s content, news, how-to                    doi:10.1007/978-3-030-67435-9\_3.
     lessons, Pew Research Centre 7 (2018). URL:                [10] P. Araiza-Alba, T. Keane, B. Matthews, K. Simp-
     https://www.pewresearch.org/internet/2018/11/                   son, G. Strugnell, W. S. Chen, J. Kaufman, The
     07/many-turn-to-youtube-for-childrens-content-                  potential of 360-degree virtual reality videos to
     news-how-to-lessons/.                                           teach water-safety skills to children, Comput. Educ.
                                                                     163 (2021) 104096. URL: https://doi.org/10.1016/j.
     compedu.2020.104096. doi:10.1016/j.compedu.                   on Human Factors in Computing Systems, Hon-
     2020.104096.                                                  olulu, HI, USA, April 25-30, 2020, ACM, 2020, pp.
[11] J. A. Ghauri, S. Hakimov, R. Ewerth, Classifica-              1–12. URL: https://doi.org/10.1145/3313831.3376845.
     tion of important segments in educational videos              doi:10.1145/3313831.3376845.
     using multimodal features, in: S. Conrad, I. Tiddi       [18] A. Pérez-Navarro, V. Garcia, J. Conesa, Students per-
     (Eds.), Proceedings of the CIKM 2020 Workshops                ception of videos in introductory physics courses
     co-located with 29th ACM International Confer-                of engineering in face-to-face and online environ-
     ence on Information and Knowledge Management                  ments, Multim. Tools Appl. 80 (2021) 1009–1028.
     (CIKM 2020), Galway, Ireland, October 19-23, 2020,            URL: https://doi.org/10.1007/s11042-020-09665-0.
     volume 2699 of CEUR Workshop Proceedings, CEUR-               doi:10.1007/s11042-020-09665-0.
     WS.org, 2020. URL: http://ceur-ws.org/Vol-2699/          [19] N. Srivastava, S. Nawaz, J. M. Lodge, E. Velloso, S. M.
     paper15.pdf.                                                  Erfani, J. Bailey, Exploring the usage of thermal
[12] A. Das, P. P. Das, Incorporating domain knowl-                imaging for understanding video lecture designs
     edge to improve topic segmentation of long                    and students’ experiences, in: C. Rensing, H. Drach-
     MOOC lecture videos,          CoRR abs/2012.07589             sler (Eds.), LAK ’20: 10th International Conference
     (2020). URL: https://arxiv.org/abs/2012.07589.                on Learning Analytics and Knowledge, Frankfurt,
     arXiv:2012.07589.                                             Germany, March 23-27, 2020, ACM, 2020, pp. 250–
[13] M. Tavakoli, S. Hakimov, R. Ewerth, G. Kismi-                 259. URL: https://doi.org/10.1145/3375462.3375514.
     hók, A recommender system for open educa-                     doi:10.1145/3375462.3375514.
     tional videos based on skill requirements, in:           [20] J. C. Muñoz-Carpio, M. A. Cowling, J. R. Birt, Doc-
     20th IEEE International Conference on Advanced                toral colloquium - exploring the benefits of us-
     Learning Technologies, ICALT 2020, Tartu, Esto-               ing 3600 video immersion to enhance motivation
     nia, July 6-9, 2020, IEEE, 2020, pp. 1–5. URL: https:         and engagement in system modelling education,
     //doi.org/10.1109/ICALT49669.2020.00008. doi:10.              in: D. Economou, A. Klippel, H. Dodds, A. Peña-
     1109/ICALT49669.2020.00008.                                   Ríos, M. J. W. Lee, D. Beck, J. Pirker, A. Dengel,
[14] A. Abyaa, M. Khalidi Idrissi, S. Bennani, Learner             T. M. Peres, J. Richter (Eds.), 6th International Con-
     modelling: systematic review of the literature                ference of the Immersive Learning Research Net-
     from the last 5 years, Educational Technology                 work, iLRN 2020, San Luis Obispo, CA, USA, June
     Research and Development 67 (2019) 1105–1143.                 21-25, 2020, IEEE, 2020, pp. 403–406. URL: https:
     doi:10.1007/s11423-018-09644-1.                               //doi.org/10.23919/iLRN47897.2020.9155100. doi:10.
[15] A. Nugraha, I. A. Wahono, J. Zhanghe, T. Harada,              23919/iLRN47897.2020.9155100.
     T. Inoue, Creating dialogue between a tutee agent        [21] W. Daher, H. Sleem, Middle school students’ learn-
     and a tutor in a lecture video improves students’             ing of social studies in the video and 360-degree
     attention, in: A. Nolte, C. Alvarez, R. Hishiyama,            videos contexts, IEEE Access 9 (2021) 78774–78783.
     I. Chounta, M. J. Rodríguez-Triana, T. Inoue (Eds.),          URL: https://doi.org/10.1109/ACCESS.2021.3083924.
     Collaboration Technologies and Social Computing               doi:10.1109/ACCESS.2021.3083924.
     - 26th International Conference, CollabTech 2020,        [22] H. Huang, G. Hwang, C. Chang, Learning to be
     Tartu, Estonia, September 8-11, 2020, Proceedings,            a writer: A spherical video-based virtual reality
     volume 12324 of Lecture Notes in Computer Science,            approach to supporting descriptive article writing
     Springer, 2020, pp. 96–111. URL: https://doi.org/10.          in high school chinese courses, Br. J. Educ. Technol.
     1007/978-3-030-58157-2_7. doi:10.1007/978-3-                  51 (2020) 1386–1405. URL: https://doi.org/10.1111/
     030-58157-2\_7.                                               bjet.12893. doi:10.1111/bjet.12893.
[16] M. Beege, M. Ninaus, S. Schneider, S. Nebel,             [23] C.-H. Chen, Ar videos as scaffolding to foster stu-
     J. Schlemmel, J. Weidenmüller, K. Moeller, G. D. Rey,         dents’ learning achievements and motivation in efl
     Investigating the effects of beat and deictic gestures        learning, British Journal of Educational Technology
     of a lecturer in educational videos, Comput. Educ.            51 (2020) 657–672.
     156 (2020) 103955. URL: https://doi.org/10.1016/j.       [24] X. Wang, L. Lin, M. Han, J. M. Spector, Im-
     compedu.2020.103955. doi:10.1016/j.compedu.                   pacts of cues on learning: Using eye-tracking tech-
     2020.103955.                                                  nologies to examine the functions and designs of
[17] B. Lee, K. Muldner, Instructional video design: In-           added cues in short instructional videos, Com-
     vestigating the impact of monologue- and dialogue-            put. Hum. Behav. 107 (2020) 106279. URL: https:
     style presentations, in: R. Bernhaupt, F. F. Mueller,         //doi.org/10.1016/j.chb.2020.106279. doi:10.1016/
     D. Verweij, J. Andres, J. McGrenere, A. Cockburn,             j.chb.2020.106279.
     I. Avellino, A. Goguey, P. Bjøn, S. Zhao, B. P. Sam-     [25] J. Moon, J. Ryu, The effects of social and cognitive
     son, R. Kocielnik (Eds.), CHI ’20: CHI Conference
      cues on learning comprehension, eye-gaze pattern,             158 (2020) 104000. URL: https://doi.org/10.1016/j.
      and cognitive load in video instruction, J. Comput.           compedu.2020.104000. doi:10.1016/j.compedu.
      High. Educ. 33 (2021) 39–63. URL: https://doi.org/10.         2020.104000.
     1007/s12528-020-09255-x. doi:10.1007/s12528-              [34] N. Garrett, Segmentation’s failure to improve soft-
      020-09255-x.                                                  ware video tutorials, Br. J. Educ. Technol. 52 (2021)
[26] A. Cookson, D. Kim, T. Hartsell, Enhancing stu-                318–336. URL: https://doi.org/10.1111/bjet.13000.
      dent achievement, engagement, and satisfaction                doi:10.1111/bjet.13000.
      using animated instructional videos, Int. J. Inf.        [35] D. Lang, G. Chen, K. Mirzaei, A. Paepcke, Is faster
      Commun. Technol. Educ. 16 (2020) 113–125. URL:                better?: a study of video playback speed, in: C. Rens-
      https://doi.org/10.4018/IJICTE.2020070108. doi:10.            ing, H. Drachsler (Eds.), LAK ’20: 10th International
     4018/IJICTE.2020070108.                                        Conference on Learning Analytics and Knowledge,
[27] M. A. Al-Khateeb, A. M. Alduwairi, Effect of                   Frankfurt, Germany, March 23-27, 2020, ACM, 2020,
      teaching geometry by slow-motion videos on the                pp. 260–269. URL: https://doi.org/10.1145/3375462.
      8th graders’ achievement, Int. J. Interact. Mob.              3375466. doi:10.1145/3375462.3375466.
     Technol. 14 (2020) 57–67. URL: https://www.online-        [36] A. A. Mubarak, H. Cao, S. A. M. Ahmed, Predictive
      journals.org/index.php/i-jim/article/view/12985.              learning analytics using deep learning model in
[28] M. C. Sözeri, S. B. Kert, Ineffectiveness of online in-        moocs’ courses videos, Educ. Inf. Technol. 26 (2021)
      teractive video content developed for programming             371–392. URL: https://doi.org/10.1007/s10639-020-
      education, Int. J. Comput. Sci. Educ. Sch. 4 (2021)           10273-6. doi:10.1007/s10639-020-10273-6.
     49–69. URL: https://doi.org/10.21585/ijcses.v4i3.99.      [37] B. Jeon, N. Park, Dropout prediction over weeks
      doi:10.21585/ijcses.v4i3.99.                                  in moocs by learning representations of clicks and
[29] A. A. Kuhail, M. S. Aqel, Interactive digital videos           videos, CoRR abs/2002.01955 (2020). URL: https:
      and their impact on sixth graders’ english read-              //arxiv.org/abs/2002.01955. arXiv:2002.01955.
      ing and vocabulary skills and retention, Int. J.         [38] H. E. Aouifi, Y. Es-Saady, M. E. Hajji, M. Mimis,
      Inf. Commun. Technol. Educ. 16 (2020) 42–56. URL:             H. Douzi, Toward student classification in educa-
      https://doi.org/10.4018/IJICTE.2020070104. doi:10.            tional video courses using knowledge tracing, in:
     4018/IJICTE.2020070104.                                        M. Fakir, M. Baslam, R. E. Ayachi (Eds.), Business
[30] D. Leisner, C. G. Zahn, A. Ruf, A. A. P. Cattaneo, Dif-        Intelligence - 6th International Conference, CBI
      ferent ways of interacting with videos during learn-          2021, Beni Mellal, Morocco, May 27-29, 2021, Pro-
      ing in secondary physics lessons, in: C. Stephanidis,         ceedings, volume 416 of Lecture Notes in Business
      M. Antona (Eds.), HCI International 2020 - Posters            Information Processing, Springer, 2021, pp. 73–82.
     - 22nd International Conference, HCII 2020, Copen-             URL: https://doi.org/10.1007/978-3-030-76508-8_6.
      hagen, Denmark, July 19-24, 2020, Proceedings, Part           doi:10.1007/978-3-030-76508-8\_6.
      II, volume 1225 of Communications in Computer            [39] C. Schulten, S. Manske, A. Langner-Thiele, H. U.
      and Information Science, Springer, 2020, pp. 284–             Hoppe, Digital value-adding chains in vocational
      291. URL: https://doi.org/10.1007/978-3-030-50729-            education: Automatic keyword extraction from
      9_40. doi:10.1007/978-3-030-50729-9\_40.                      learning videos to provide learning resource recom-
[31] S. Chen, D. Wang, Y. Huang, Exploring the comple-              mendations, in: C. Alario-Hoyos, M. J. Rodríguez-
      mentary features of audio and text notes for video-           Triana, M. Scheffel, I. A. Sánchez, S. Dennerlein
      based learning in mobile settings, in: Extended               (Eds.), Addressing Global Challenges and Quality
     Abstracts of the 2021 CHI Conference on Human                  Education - 15th European Conference on Tech-
      Factors in Computing Systems, 2021, pp. 1–7.                  nology Enhanced Learning, EC-TEL 2020, Heidel-
[32] X. Lu, Q. Li, X. Wang, Research on the impacts                 berg, Germany, September 14-18, 2020, Proceedings,
      of feedback in instructional videos on college stu-           volume 12315 of Lecture Notes in Computer Science,
      dents’ attention and learning effects, in: W. Shen,           Springer, 2020, pp. 15–29. URL: https://doi.org/10.
     J. A. Barthès, J. Luo, Y. Shi, J. Zhang (Eds.), 24th           1007/978-3-030-57717-9_2. doi:10.1007/978-3-
      IEEE International Conference on Computer Sup-                030-57717-9\_2.
      ported Cooperative Work in Design, CSCWD 2021,           [40] J. Jordán, S. Valero, C. Turró, V. J. Botti, Recom-
      Dalian, China, May 5-7, 2021, IEEE, 2021, pp. 513–            mending learning videos for moocs and flipped
      516. URL: https://doi.org/10.1109/CSCWD49262.                 classrooms, in: Y. Demazeau, T. Holvoet, J. M.
      2021.9437774. doi:10.1109/CSCWD49262.2021.                    Corchado, S. Costantini (Eds.), Advances in Practi-
      9437774.                                                      cal Applications of Agents, Multi-Agent Systems,
[33] D. C. D. van Alten, C. Phielix, J. Janssen, L. Kester,         and Trustworthiness. The PAAMS Collection - 18th
      Self-regulated learning support in flipped learning           International Conference, PAAMS 2020, L’Aquila,
     videos enhances learning outcomes, Comput. Educ.               Italy, October 7-9, 2020, Proceedings, volume 12092
     of Lecture Notes in Computer Science, Springer,               cational videos hierarchical indexing with ebooks,
     2020, pp. 146–157. URL: https://doi.org/10.1007/              in: H. Mitsuhara, Y. Goda, Y. Ohashi, M. M. T. Ro-
     978-3-030-49778-1_12. doi:10.1007/978-3-030-                  drigo, J. Shen, N. Venkatarayalu, G. Wong, M. Ya-
     49778-1\_12.                                                  mada, C. Lei (Eds.), IEEE International Conference
[41] M. C. Aytekin, S. Räbiger, Y. Saygin, Discov-                 on Teaching, Assessment, and Learning for Engi-
     ering the prerequisite relationships among in-                neering, TALE 2020, Takamatsu, Japan, December
     structional videos from subtitles,         in: A. N.          8-11, 2020, IEEE, 2020, pp. 482–489. URL: https:
     Rafferty, J. Whitehill, C. Romero, V. Cavalli-                //doi.org/10.1109/TALE48869.2020.9368461. doi:10.
     Sforza (Eds.), Proceedings of the 13th Interna-               1109/TALE48869.2020.9368461.
     tional Conference on Educational Data Mining,            [47] S. Lallé, C. Conati, A data-driven student model
     EDM 2020, Fully virtual conference, July 10-13,               to provide adaptive support during video watching
     2020, International Educational Data Mining Soci-             across moocs, in: I. I. Bittencourt, M. Cukurova,
     ety, 2020. URL: https://educationaldatamining.org/            K. Muldner, R. Luckin, E. Millán (Eds.), Artificial
     files/conferences/EDM2020/papers/paper_99.pdf.                Intelligence in Education - 21st International Con-
[42] C. Tang, J. Liao, H. Wang, C. Sung, W. Lin, Con-              ference, AIED 2020, Ifrane, Morocco, July 6-10, 2020,
     ceptguide: Supporting online video learning with              Proceedings, Part I, volume 12163 of Lecture Notes
     concept map-based recommendation of learning                  in Computer Science, Springer, 2020, pp. 282–295.
     path, in: J. Leskovec, M. Grobelnik, M. Najork,               URL: https://doi.org/10.1007/978-3-030-52237-7_23.
     J. Tang, L. Zia (Eds.), WWW ’21: The Web Con-                 doi:10.1007/978-3-030-52237-7\_23.
     ference 2021, Virtual Event / Ljubljana, Slovenia,       [48] Y. Chen, C. Wang, Z. Jian, Research on evaluation
     April 19-23, 2021, ACM / IW3C2, 2021, pp. 2757–               algorithm of teacher’s teaching enthusiasm based
     2768. URL: https://doi.org/10.1145/3442381.3449808.           on video, in: ICRAI 2020: 6th International Confer-
     doi:10.1145/3442381.3449808.                                  ence on Robotics and Artificial Intelligence, Singa-
[43] D. I. Bleoanca, S. Heras, J. Palanca, V. Julián, M. C.        pore, November 20-22, 2020, ACM, 2020, pp. 184–
     Mihaescu, LSI based mechanism for educational                 191. URL: https://doi.org/10.1145/3449301.3449333.
     videos retrieval by transcripts processing, in:               doi:10.1145/3449301.3449333.
     C. Analide, P. Novais, D. Camacho, H. Yin (Eds.),        [49] T. Weinert, M. T. de Gafenco, M. S. Billert,
     Intelligent Data Engineering and Automated Learn-             N. Boerner,      Fostering interaction in higher
     ing - IDEAL 2020 - 21st International Conference,             education with deliberate design of interactive
     Guimaraes, Portugal, November 4-6, 2020, Pro-                 learning videos,       in: J. F. George, S. Paul,
     ceedings, Part I, volume 12489 of Lecture Notes               R. De’, E. Karahanna, S. Sarker, G. Oestreicher-
     in Computer Science, Springer, 2020, pp. 88–100.              Singer (Eds.), Proceedings of the 41st Interna-
     URL: https://doi.org/10.1007/978-3-030-62362-3_9.             tional Conference on Information Systems, ICIS
     doi:10.1007/978-3-030-62362-3\_9.                             2020, Making Digital Inclusive: Blending the Lo-
[44] X. Wang, W. Huang, Q. Liu, Y. Yin, Z. Huang,                  cak and the Global, Hyderabad, India, December
     L. Wu, J. Ma, X. Wang, Fine-grained similarity                13-16, 2020, Association for Information Systems,
     measurement between educational videos and ex-                2020. URL: https://aisel.aisnet.org/icis2020/digital_
     ercises, in: C. W. Chen, R. Cucchiara, X. Hua,                learning_env/digital_learning_env/12.
     G. Qi, E. Ricci, Z. Zhang, R. Zimmermann (Eds.),         [50] J. Ge, X. Li, Design strategies of EFL learning
     MM ’20: The 28th ACM International Confer-                    videos: Exampled by a china MOOC, in: ICEIT 2020,
     ence on Multimedia, Virtual Event / Seattle, WA,              Proceedings of the 9th International Conference
     USA, October 12-16, 2020, ACM, 2020, pp. 331–                 on Educational and Information Technology, Ox-
     339. URL: https://doi.org/10.1145/3394171.3413783.            ford, UK, February11-13, 2020, ACM, 2020, pp. 68–
     doi:10.1145/3394171.3413783.                                  71. URL: https://doi.org/10.1145/3383923.3383927.
[45] A. Hartholt, A. Reilly, E. Fast, S. Mozgai, Introduc-         doi:10.1145/3383923.3383927.
     ing canvas: Combining nonverbal behavior gener-          [51] R. E. Mayer, C. Pilegard, Principles for managing
     ation with user-generated content to rapidly cre-             essential processing in multimedia learning: Seg-
     ate educational videos, in: S. Marsella, R. Jack,             menting, pretraining, and modality principles, The
     H. H. Vilhjálmsson, P. Sequeira, E. S. Cross (Eds.),          Cambridge handbook of multimedia learning (2005)
     IVA ’20: ACM International Conference on In-                  169–182.
     telligent Virtual Agents, Virtual Event, Scotland,       [52] R. Ramachandran, E. M. Sparck, M. Levis-Fitzgerald,
     UK, October 20-22, 2020, ACM, 2020, pp. 25:1–                 Investigating the effectiveness of using application-
     25:3. URL: https://doi.org/10.1145/3383652.3423880.           based science education videos in a general chem-
     doi:10.1145/3383652.3423880.                                  istry lecture course, J. Chem. Educ. 96 (2019)
[46] S. Horovitz, Y. Ohayon, Boocture: Automatic edu-              479–485. URL: https://doi.org/10.1021/acs.jchemed.
     8b00777. doi:10.1021/acs.jchemed.8b00777.
[53] G. Salomon, Television is "easy" and print is
     "tough": The differential investment of mental
     effort in learning as a function of perceptions
     and attributions., Journal of Educational Psychol-
     ogy 76 (1984) 647–658. doi:https://doi.org/10.
     1037/0022-0663.76.4.647.
[54] K. Chorianopoulos, A taxonomy of asynchronous
     instructional video styles, The International Review
     of Research in Open and Distributed Learning 19
     (2018) 294–311.
[55] R. Mayer, R. E. Mayer, The Cambridge handbook of
     multimedia learning, Cambridge university press,
     2005.
[56] B. Clavié, K. Gal, Edubert: Pretrained deep
     language models for learning analytics, CoRR
     abs/1912.00690 (2019). URL: http://arxiv.org/abs/
     1912.00690. arXiv:1912.00690.
[57] U. Gadiraju, R. Yu, S. Dietze, P. Holtz, Analyzing
     knowledge gain of users in informational search
     sessions on the web, in: C. Shah, N. J. Belkin,
     K. Byström, J. Huang, F. Scholer (Eds.), Proceed-
     ings of the 2018 Conference on Human Informa-
     tion Interaction and Retrieval, CHIIR 2018, New
     Brunswick, NJ, USA, March 11-15, 2018, ACM, 2018,
     pp. 2–11. URL: https://doi.org/10.1145/3176349.
     3176381. doi:10.1145/3176349.3176381.
[58] C. Otto, R. Yu, G. Pardi, J. von Hoyer, M. Rokicki,
     A. Hoppe, P. Holtz, Y. Kammerer, S. Dietze, R. Ew-
     erth, Predicting knowledge gain during web search
     based on multimedia resource consumption, in:
     I. Roll, D. S. McNamara, S. A. Sosnovsky, R. Luckin,
     V. Dimitrova (Eds.), Artificial Intelligence in Educa-
     tion - 22nd International Conference, AIED 2021,
     Utrecht, The Netherlands, June 14-18, 2021, Pro-
     ceedings, Part I, volume 12748 of Lecture Notes
     in Computer Science, Springer, 2021, pp. 318–330.
     URL: https://doi.org/10.1007/978-3-030-78292-4_26.
     doi:10.1007/978-3-030-78292-4\_26.
[59] J. Shi, C. Otto, A. Hoppe, P. Holtz, R. Ewerth, In-
     vestigating correlations of automatically extracted
     multimodal features and lecture video quality, in:
     Proceedings of the 1st International Workshop on
     Search as Learning with Multimedia Information,
     SALMM ’19, Association for Computing Machin-
     ery, New York, NY, USA, 2019, p. 11–19. URL: https:
     //doi.org/10.1145/3347451.3356731. doi:10.1145/
     3347451.3356731.

</pre>