Videolectures Ingredients that can make Analytics Effective Marco Ronchetti Abstract Dipartimento di Ingegneria e Videolectures over the Internet started at the turn of Scienza dell’Informazione the century and became more and more popular, until Università degli Studi di Trento they recently obtained a wide echo in the form of Trento, IT- 38050, Italy Massive Open On-Line Courses (MOOCs). Although marco.ronchetti@unitn.it videolecture usage data have always been important, in the case of MOOCs they are vital for the success of the initiative. In the present paper, we suggest that some (already available) tools for the extraction of semantic information from the video should be used, as they may vastly improve the meaningfulness of the information extracted from videolecture analytics. Author Keywords Videolectures, effective analytics, semantics, MOOCs ACM Classification Keywords K.3.1 [Computers and Education]: Computer Uses in Education - Computer-managed instruction (CMI). Introduction Copyright © 2013 for the individual papers by the papers' authors. The idea of massively using videos of recorded lectures Copying permitted only for private and academic purposes. for teaching goes back to the attempts to use TV as an This volume is published and copyrighted by its editors. educational medium. The TV introduced some WAVe 2013 workshop at LAK’13, April 8, 2013, Leuven, Belgium. educational programs (and later channels), but only in rare occasions they were a success. A such case was the Italian TV show “Non è mai troppo tardi” (It’s never too late) which from 1960 to 1968 brought more than a 15 million of illiterates to achieve a primary school degree experiments [3, 13] a lot of research has been done on (probably one of the most successful examples of TV- the Internet carried videolectures field (for a review see based distance education ever, and a sort of early [9, 10]). MOOC –Massive Open On-line Course, even though the “line” was not the Internet). Even before that there It took then about 15 years for these videolectures to were instructional movies – used for instance to pass from the work of the pioneers to the pages of the demonstrate scientific experiments that were too New York Times [8]. They went progressively though a complex or too lengthy to be performed in a school larger and larger diffusion, with a first boost given laboratory. Also today there are educational TV (around 2005) by the Apple iTunes-U initiative, which channels, like Teachers TV : a digital channel for also allowed extracting some usage data from the logs, everyone who works in schools. Teachers TV’s see e.g. [2]. Along the path, for a few years (starting programmes cover every subject in the curriculum, all again from 2005) the podcasting variant has been a key stages and every professional teaching role. It can fashionable approach. Only recently MOOCs finally be accessed on digital cable and satellite (more recently made it into the official dictionaries: the MOOC entry in also via Internet). the English Wikipedia dates July 2011. A history of MOOCs in 2012, the year of the boom, is reported in a In the seventies, the use of VHS cassettes allowed for post by Audrey Watters. the first time to attempt transforming videos into an “on demand” resource for satisfying educational needs, MOOC Numbers but again the effort had only a marginal impact on the Figures such as “1.7 million students for Coursera” or education mainstream. “ratio students to professor 150.000:1 in Udacity” [8] are certainly impressive: however, in spite of their At the end of the 80’s, a system that implemented a popularity, there is little data on MOOCs. Stories of rather mechanical process of individualized instruction success and failure are often anecdotal. Some statistics was patented [1]. Part of the system consisted in the is available coming from MOOC platforms like Coursera, ability to use some ad-hoc hardware to play movies. Udacity and MITx, and they are puzzling. Only in the nineties PCs had sufficient power and The first MIT MOOC (MITx - 6.002x: Circuits and memory space to consider them as tools that can be Electronics.), boomed with 154,763 registrants. Only used for reproducing videos and multimedia in general. 45% however (69,221 people) looked at the first With the millennium turn the increased network problem set, and out of them only 26,349 earned at bandwidth and the power of mobile devices (laptops least one point (17% of the enrolled): we can consider first, and then pads and smartphones) allowed these as the ones who manifested a real interest, distributing videos over the Internet, which ultimately rather than just a curiosity. delivered today’s capability to use video instruction anywhere and at any time. Since the early 16 The number halved by the midterm assignment  Are there any portions of the videos that are (13,569 people looked at it while it was still open and being watched repeatedly? 9,318 people got a passing score on the midterm - 6%  Are the students watching the videos by the of the enrolled). assigned deadlines?  Do the videos generating active user In the end, after completing 14 weeks of study, 7,157 engagement? people earned the first certificate (4,6% of the enrolled,  Do students edit, share, download the i.e. 27% of those who really manifested interest). In material? spite of the gigantic drop, having more than seven The interpretation of the statistics may however be not thousand students passing a course is a massive easy. Knowing that the sequence on lecture N at time achievement indeed. between t1 and t2 is often reviewed is not by itself a meaningful cue. What is there? To know, we need to The numbers for Coursera’s Social Network Analysis view ourselves the fragment. When the potentially class are less encouraging. Out of the 61,285 students interesting sections or points are many, this may be a registered, 1303 (2%) earned a certificate, and only very time-consuming task. The problem arises by the 107 earned "the programming (i.e. with distinction) lack of semantic information. version of the certificate” (0.17%). Some help may come from a low-granularity structure MOOC questions and challenges of the material. For instance, if “lectures” are broken The statistics raise several questions. The most into small pieces (20 minutes) as in the case of Kahn compelling one is probably “why aren’t a large number Academy, or even less (10 minutes fragments, like in of students finishing the course?”. This question may certain Coursera cases), it is likely that each unit has a be difficult to find a response to, but responses to other well-defined semantics. Instead, if a lecture is recorded inquiries can be obtained by monitoring the users’ in class, and hence follows time constraints which are behaviour, and gathering statistics and analytics. dictated by logistics rather than by content, things are Examples of such queries are e.g. the following ones: much more difficult.  Where do the students come from? In these cases, substantial help may come from certain  Which videos are most popular, and which ones ingredients that we claim to be important ingredients of attract little interest? the videolectures:  Are students actually watching the videos on the assigned dates?  multiple (parallel) cognitive channels,  Are viewers watching all the way through?  semantic marking,  At what point in the lecture, if any, do viewers  transcripts, stop watching?  annotations 17 Videolecture enhancements that may (also) efficacy brought by the presence of video as an help analytics additional cognitive channels. We believe MOOCS The ingredients we mentioned are not really new, as should adopt such a rich communication paradigm, and some people have been using them for years in the not rely on the poorer paradigm based on a single context of videolectures as tools for improving the user video channel (+ audio). experience. For instance, semantic annotation has been used for facilitating lecture navigation (see e.g. [11]), This choice would help introducing the second and transcripts have helped searching a videolecture ingredient: semantic marking. Having e.g. slides (see later). However, in the light of analytics they transitions makes it very easy to associate metadata to assume a new dimension. Let us briefly examine them. specific portions of a video. When a teacher presents a slide, what is s/he talking about? Most likely, we find The first component we mentioned is multiple cognitive the answer in the slide title. If slide transition timing, channels. Typically on-line lectures in MOOCs focus on and slide content, are captured while recording the at exactly two channels: they are either video + audio, video, it becomes extremely easy to tag the video with slides + audio (the so called webcasts), or computer semantic annotation. Questions like the ones we have screen + audio (as e.g. in the case of the Kahn mentioned, e.g. “Are there any portions of the videos Academy). There are even lectures bases on audio that are being watched repeatedly?” may have now a alone (podcast), even though they were mostly used significantly more interesting answer than “at time before the success of the MOOC term. nn:nn”: the answer might rather be something like “the fragment discussing third Kepler law”. The power of In contrast, even the snubbed frontal lectures in class analytics suddenly is vastly increased, exactly because are based on a richer paradigm. The teacher uses the of the availability of semantic metadata. And the blackboard, PowerPoint slides, may project his/her important point is that such metadata – which are a computer screen, and at the same time students see resource which is notoriously difficult and costly to gestures and facial expressions. It is quite possible to obtain, are automatically generated! reproduce such environment even in on-line lectures. A variety of authoring systems allow using in parallel (at On the same line, availability of (synchronized) audio least) two visual channels (e.g. slides + video), making transcripts allows associating meaningful information to the on-line lecture richer. While Moreno and Mayer [7] the timeline. A few years ago, we [5] successfully suggested that the presence of multiple cognitive experimented using Automatic Speech Recognition channel brings a negative “split attention” effect, tools to enrich videos with synchronized transcripts that Glowalla [6], a German instruction psychologist, allowed students to perform searches into on-line reported that lectures showing a video and slides videolectures. This technique would of course also favour learners show better concentration, while the allow mapping any data coming from analytics on the audio + slide version is perceived as more boring. Data content without the need of visual inspection of video obtained by other investigators [4] confirm the better fragments. Natural language processing (NLP) tools 18 could be used to extract additional semantic World Conference on Educational Multimedia, information from a specific video fragment. Hypermedia and Telecommunications 2011, Chesapeake, VA: AACE, 2011, p. 720-727. Finally, we mention in passing that the possibility for [3] M.H. Hayes. (1998) Some approaches to Internet distance learning with streaming media, Second IEEE students to annotate video lectures would be a yet Workshop on Multimedia Signal Processing, Redondo additional, precious source of information. Again, this Beach, CA, USA (1998). would be a case of a feature that was originally [4] A. Fey. (2002) Audio vs. Video: Hilft Sehen beim designed to achieve a particular goal (such as e.g. to Lernen? Vergleich zwischen einer audio-visuellen und grow a community sense around a set of auditiven virtuellen Vorlesungen. Lernforschung, 30. videolectures), and that would acquire an additional Jhg (4):331–338 (in German) value in the context of usage analysis that is typical for [5] A. Fogarolli, G. Riccardi, M. Ronchetti. (2007). analytics tools. This would be true for the extra “NEEDLE: Searching information in a collection of information that NLP tools could mine from the notes, video-lectures” in Proceedings of World Conference on but in addition to that, data regarding annotation would Educational Multimedia, Hypermedia and per se be an extra source that could be mined (e.g. to Telecommunications ED-MEDIA 2007, Norfolk (Va): AACE, 2007, p. 1450-1459. find correlations with the difficulty or interest of a particular video portion). [6] U. Glowalla. (2004). Utility and Usability von E- Learning am Beispiel von Lecture-on-demand Anwendungen. In Entwerfen und Gestalten, 2004 (in Conclusion German). MOOCs may be just an ephemeral fashion, or might [7] R. Moreno, & R. E. Mayer. (2000). A Learner- revolutionize the future landscape of higher education: Centred Approach to Multimedia Explanations: Deriving only time will tell. In this short paper we advocated the Instructional Design Principles from Cognitive Theory, need for them to embrace a richer cognitive paradigm, Interactive Multimedia Electronic Journal of Computer- and to be enriched by metadata associated with video Enhanced Learning (2). fragments. The availability of such metadata, which [8] L. Pappano. (2012). “The year of the MOOC”, The should be automatically extracted, provides important New York Times, Nov 2, 2012 hints that they make the information extracted by http://www.nytimes.com/2012/11/04/education/edlife/ videolecture analytics much more significant. massive-open-online-courses-are-multiplying-at-a- rapid-pace.html?pagewanted=all&_r=0 References [9] M. Ronchetti. (2011). "Perspectives of the [1] A. Louis Abrahamson, Frederick F. Hantline, Milton Application of Video Streaming to Education" in Ce Zhu, G. Fabert, Michael J. Robson, Robert J. Knapp. (1989). Yuenan Li, Xiamu Niu (a cura di), Streaming Media “Electronic classroom system enabling interactive self- Architectures, Techniques, and Applications: Recent paced learning”, US Patent 5002491. Advances, Hershey PA, USA: Information Science Reference, IGI Global, 2011, p. 411-428. [2] A. Defranceschi, M. Ronchetti. (2011). "Video- lectures in a traditional mathematics course on iTunes [10] M. Ronchetti. (2011). "Video-Lectures over U: usage analysis" in Proceedings of EDMEDIA 2011 - Internet: The Impact on Education" in G. Magoulas (a 19 cura di), E-Infrastructures and Technologies for Lifelong Learning: Next Generation Environments, New York: IGI Global, 2011, p. 253-270. [11] M. Ronchetti. (2012). "LODE: Interactive demonstration of an open source system for authoring video-lectures" in Interactive Collaborative Learning (ICL), 2012 15th International Conference on, Los Alamitos, USA: Computer society Press of the IEEE, 2012, p. 1-5. [12] D. McKinney., J.L. Dyck, & E.S. Luber. (2009). iTunes University and the classroom: Can podcast replace Professors? Computer & Education 52, 617- 623. [13] F. Tobagi. (1995) Distance learning with digital video. Multimedia, IEEE vol. 2 (1) pp. 90 – 93. 20