<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring the effectiveness of video viewing in an introductory x-MOOC of algebra</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Joan Triay</string-name>
          <email>juanfrancisco.triay01@estudiant.upf.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julià Minguillón</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Sancho-Vinuesa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vanesa Daza</string-name>
          <email>vanesa.daza@upf.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Oberta de Catalunya, Rambla del Poblenou</institution>
          ,
          <addr-line>156. 08018 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universitat Pompeu Fabra</institution>
          ,
          <addr-line>Tànger, 122-140. 08018 Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>63</fpage>
      <lpage>70</lpage>
      <abstract>
        <p>The huge amount of gathered data in a MOOC allows providing professors and course managers with insightful information about real course usage and consumption. The main aim of this work is to explore how efficient the video viewing is for completing and passing the first course offered by UCATx.cat platform, “Decoding Algebra”, in order to improve its design and resources. The statistical method used is the principal component analysis but using polychoric correlation matrix between the binary variables involved in each group. The main result suggests that the participants' behavior is polarized in two extremes: they view all videos and pass de course or, on the contrary, they do not watch any one and they do not pass the test either. This information can be used by course managers to provide learners with better strategies for achieving their learning goals.</p>
      </abstract>
      <kwd-group>
        <kwd>video</kwd>
        <kwd>learning analytics</kwd>
        <kwd>course design</kwd>
        <kwd>MOOCs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        MOOCs (Massive Open Online Courses) have just started shaking higher
education in a global scale. Now it is feasible to reach courses from top universities
worldwide in a free and open way, threatening both the traditional and online higher
education systems. These courses are supported by web-based learning management
systems that keep track of all the navigation and interaction between course participants
and the course elements (resources, activities, etc.). As thousands of participants take
part in these courses, the large amount of gathered data make very interesting to
analyze such courses from a participant perspective, providing teachers and course
managers with insightful information about real course usage and consumption. Several
recent works have been tackled the effectiveness in MOOCs through the analysis of
these data ([
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]).
      </p>
      <p>
        In this sense, and quoting George Siemens, “Learning Analytics is the use of
intelligent data, learner-produced data, and analysis models to discover information and
social connections for predicting and advising people's learning” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Learning
Analytics can be used to better understand how participants in an online course learn as
well as to help them to achieve their learning goals, while improving the course each
edition by detecting bottlenecks regarding teaching plan or interaction among
participants and even misplaced or unused course elements.
      </p>
      <p>
        In Europe, the main stakeholders in higher education have slowly started moving
towards adapting the initial MOOC phenomenon in order to meet the educational
needs in a more diverse, flexible, open and transversal way. Initiatives such as Future
Learn in UK, Iversity in Germany, FUN in France, MiriadaX and UCATx in Spain
show how massive online education evolves by both targeting complementary
markets and strengthening the internal higher education systems building joint strategies
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Within the Catalan Programme UCATx, the Platform UCATx.cat1, based on open
edX, has been developed. The first MOOC in this platform, was named
“Descodificando Algebra” (in English, “Decoding Algebra”). From the very beginning, the main
aim of the course [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] was to take advantage of this new educational format to fill the
gap between High School and University regarding basic notions of Algebra. At the
same time, the course must remain appealing to students who do not fill this profile
(in transition between school and university). Decoding Algebra was designed in such
a way that despite its global outreach, it also allows prospective students of
engineering or science to tackle first year Linear Algebra competently. To capture the
students’ interest, concepts from cryptography and coding theory were introduced.
      </p>
      <p>The main aim of this work is to explore how to analyze the data of the “Decoding
Algebra” in order to improve several aspects of its design and resources. In particular,
we are interested in exploring how efficient is the video consumption (i.e. viewing)
for completing and passing the course.</p>
      <p>The rest of the paper will be structured as follows: In Section 2 we describe the
main issues of the course, Section 3 is devoted to how analyze data of the course,
while results obtained are showed in Section 4. Finally we conclude in Section 5,
pointing out some future lines.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Description of the course</title>
      <p>In a nutshell, the course aims to introduce some basic algebraic concepts. Problems
related to communications (cryptography and coding theory) are used as a motivating
factor.</p>
      <p>The course is structured in 5 different modules/lessons spanning 5 weeks, with a
weekly average dedication of 3 to 5 hours. Each module is about a different topic.
Topics covered are: number sets (structure and properties), basics of modular
arithmetic, matrices and polynomials, introduction to vector spaces and finally, complex
numbers.</p>
      <p>With the exception of the first module, at the beginning of all the other ones, it is
introduced what it is called a challenge, which is basically a simple real problem
related to the theory of communication stated in a challenge-style. We refer to those
1</p>
      <p>UCATX.cat platform: www.ucatx.cat
videos stating challenges as challenge videos. Thus, each module allows to go
through enough mathematical notions to understand the solution to the challenge at
the end of each week. Each challenge can be formulated in mathematical terms, so by
applying the concepts of each module, students should be able to understand the
proposed solution (we refer to these videos as challenge resolution videos) or even solve
it by themselves.</p>
      <p>All modules share the same structure. All of them have an ordered set of videos
(we refer to them as conceptual videos) where the concepts of each module are
developed. These concepts are accompanied by numerous illustrative examples. The
duration of the videos ranges from 5 to 15 minutes, with 10 minutes being the average.
Each one covers a single idea/concept so that students can watch it as many times as
necessary to understand it before moving on to the next one.</p>
      <p>There is still a final type of videos that should be taken into account, those ones
that contain the resolution of the exercises proposed in the conceptual videos. We will
refer to them as exercise resolution videos.</p>
      <p>At the end of each module, students take a quiz consisting of 8 or 10 questions.
The main objective is the self-evaluation of each student, so they can check if they
understand the main concepts proposed in the videos that make up the module.
Feedback is provided for each of the questions and, when a wrong answer is provided, the
student is referred to the particular section/video of the course the students needs to
work on. To pass the course, students are expected to obtain a 50% mark on each
module. Following this structure, the MOOC assumes an individual participant
activity and minimal interaction with both the professor and other colleagues.</p>
      <p>Regarding the data gathered by the UCATx platform, during six weeks between
25th of August 2014 and 5th of October 2014, around 400000 events were generated
for a total of 194 course participants. Use 10-point type for the name(s) of the
author(s) and 9-point type for the address(es) and the abstract. For the main text, please
use 10-point type and single-line spacing. We recommend the use of Computer
Modern Roman or Times. Italic type may be used to emphasize words in running text.
Bold type and underlining should be avoided.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Analysis methodology: learning analytics</title>
      <p>Learning analytics is the methodology used to answer questions that we cannot be
solved in a fairly straightforward way. What is the efficacy of video for passing a
MOOC? What is the weekly connection pattern of students? Students who participate
more on forum are those students who pass the MOOC? These are some of the
questions we might try to answer with the aid of learning analytics. In this paper we focus
on analyzing the relationship between video consumption and evaluation, using data
derived from Decoding Algebra MOOC in UCATx platform.</p>
      <p>
        Several years of research [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have shown that using video in education can impact
on teaching and learning and provide some benefits: increasing motivation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
as a necessary tool in a flipped classroom model2.
2 http://www.uq.edu.au/tediteach/flipped-classroom/index.html
With respect to data, Learning Analytics methodology can be divided in four
different phases: data collection, data pre-processing, data analysis and data
visualization. In this work we describe the first three stages, as follows.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Main entry barrier: still a very novel approach</title>
        <p>UCATx platform automatically collects all data generated by the students when
interacting with the course and span a log file in JSON format that registers all
participants’ activities, ranging from the enrollment action to the final MOOC
action. JSON format is a type of text format for structured information based on
key-value pairs. This file may contain from several hundreds of thousands lines of
information up to several millions. Each line describes an event3. Each event has
different fields of information, such as username, time, IP address, session and
event type, among others. There is a special field named context that is very
important as it contains specific information about the event and, depending on this,
it may take different values. For instance, when a participant presses the pause
button when watching a video, “context” is used to store the exact time when it
occurs. The access to data structured in this way simplifies further processing and
analysis.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Data pre-processing</title>
        <p>In this second phase, we developed some scripts in Python language. Python was
chosen because of all the functionalities that offer to interact with .JSON files as well
as to extract the data from the log file. Our main goal is to obtain a “plain” structured
file that describes the activity of each student of the course by means of aggregating
and summarizing all the interaction available for each one of them. By plain we mean
that we have the same information (columns) for each course participant (rows), that
is, there are no missing fields or different length.</p>
        <p>In this paper, we focus on those lines of the log file related to the interaction with
the videos. These lines correspond to four events, namely: play_video, stop_video,
seek_video, and pause_video. We also extract those lines based on the grades of the
first course module, corresponding to the event called problem_check, in order to
establish the relationship with the previous ones.</p>
        <p>The plain structured file is built as follows. The result of the execution of the
Python scripts, one for each event part of the analysis, is a set of new files, each one of
them corresponds to a variable which values are the data that we want to extract from
original log data according to such event or group of events. These variables can be a
vector (containing a variable number of values, i.e. all the activity around a given
video) or simple indicators, mostly numeric or binary. Finally, we join all these files
into another one, that is, a matrix where each row contains the data of a course
participant; and each column or a set of columns is a variable (corresponding with the
different events we want to analyze). Once this process is finished, we can proceed with
analyzing this structured file with a statistical package. For instance, if there are M
videos in the course, obviously not all the N course participants watch the M videos;
this process creates an N x M matrix containing a binary variable describing whether
participant i (1…N) has seen video j (1…M) or not.
Concretely, with all this information, we created a variable called Videos Module
One (VM1) composed by a vector of all videos used in such module (23 out of 98
course videos). The elements of this vector correspond to the videos related to the first
module, containing binary values, either 1 or 0, according to whether the student has
seen the video or not. To keep the information of those students passing the first
module, we created a variable called Pass First Module (PFM). This variable is
another matrix with only one column, that is, the result of taking the maximum grade of the
three attempts available for the first module evaluation test. The minimum grade is 0
and the maximum grade is 8, because this module has only 8 questions. If the student
did not take the test, we specify it by -1.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Data analysis</title>
        <p>In this phase we processed the plain file obtained in the previous phase with a
statistical package, namely R. As mentioned before, this file contains data from
194 course participants related to the consumption of the 23 videos used in the
first course module and the final students’ mark. According to course syllabus,
students pass the test if their final grade is at least 4. To describe the result in the
first test we generate a binary variable PFM (1 PASS, 0 FAIL). Notice that we
have 23 binary variables (VM1_1 … VM1_23, one for each video) and only 194
samples, which is not a good ratio for prediction purposes.</p>
        <p>Therefore, we need to explore how to reduce the number of variables according
to the characteristics of each video in order to reduce dimensionality, and being
able to compare categories, instead of individual videos, as well as analyzing the
relative importance of each video within each category. For doing so, we
classified the videos of module 1 in two different ways. First, according to topic, they
were classified into 4 different categories: natural numbers (4 videos), integer
numbers (13), rational numbers (3), and real-complex numbers (3). On the other
hand, we classify them according to their activity type. Therefore, videos were
classified as theory or conceptual (12 videos) and exercise videos (11). We will
proceed as follows:
a)
b)
c)</p>
        <p>Create an indicator G1, G2, G3 and G4 for each one of the four groups.
As we are just exploring the nature of the gathered data, we will use
principal component analysis for summarizing how course participants
consume the videos within a group.</p>
        <p>Create two more indicators, GT and GE, for theory and exercise videos,
respectively, using also PCA with the same goal.</p>
        <p>Build two different generalized linear models (logistic), M1 and M2, one
for each group of indicators abovementioned, trying to predict whether a
student passes the first module test or not with respect to such group of
videos.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>As we mentioned in Section 3, we compute a component summarizing the
consumption of the videos for each group. We use principal component analysis but
using the polychoric correlation matrix between the binary variables involved in each
group. Table 1 shows the percentage of variance explained by the first component,
which is reasonable for all of them. Furthermore, all these components also show a
very interesting behavior: they have a large kurtosis, which means that most of the
distribution mass is not centered on the mean and it follows a quite asymmetrical
distribution. Notice that the maximum value is larger than the minimum one (in
absolute value) but for G1. Table 2 shows the weights for each variable taking part in the
component.</p>
      <p>Using these components, we build two different generalized linear models, one for
explaining the importance of each topic (G1 … G4) and another one to explain the
importance of each kind of video (GT and GE), with respect to attempting (and
passing) the first test of the course. In order to obtain positive β coefficients for all
components, we force a Varimax rotation, so we can compare only magnitudes.</p>
      <p>Table 3 shows the computed logistic model that tries to predict whether a student
will attempt (and pass) the test according to the videos the student has viewed. This
model has a (pseudo) R2 of 0.668, quite high. Notice that we are not trying to
generalize these results, so we are only interested in the magnitudes of the β coefficients. As
the intercept is negative (so students not watching videos or only a few are predicted
to not pass the test), it is necessary to have large values in one or more components in
order to pass the test.</p>
      <p>In the light of these results, and taking into account the exploratory nature of the
analysis, we can draw some interesting conclusions about how course participants are
consuming the videos.</p>
      <p>First, the computed components summarizing the consumption of videos for each
group show that most course participants watch all the videos within each group. The
distribution of each component, once normalized (MEAN = 0, SD = 1), shows that
the majority of students either do nothing or do everything, taking almost always
extreme values of the range in Table 2.</p>
      <p>In fact, for each group in Table 2, we can observe that the weights increase. This
means that the more videos they watch, the better results they obtain. Therefore, those
students that see all the videos accumulate more knowledge. This fact happens both
for groups by topic (except perhaps the artificial group real / complex) and for the
theory and exercises groups. It is also remarkable that within each topic, exercise
videos have larger weights than theory videos, in general.</p>
      <p>Table 3 shows that both G1 and G3 are irrelevant, since the beta coefficient
multiplied by the maximum values of its range (the positive one) does not allow the model
to predict who will succeed with the test. However, G2 and G4 are indeed relevant. In
fact, that G1 is negative it may be caused by the fact that what it is really important is
G2 (as natural numbers are just briefly presented compared to integers), so G1
consumption is subsumed by those students who see the videos in G2. This could be
stated as if you "study" integers will make understand natural numbers. Moreover,
perhaps the first model in Table 3 shows only that the exam is biased towards a particular
type of exercise.
Finally, Table 4 also shows that whenever both theory and exercise videos are
watched jointly, the chances to pass the test increase. It is important to remark that
both must be watched, since the weight of each block is similar. Given the
distribution of these components, it is necessary to do both things. Otherwise, weights are
cancelled out and the model does not predict passing the test.</p>
      <p>In summary, even a preliminary exploratory analysis can be very helpful for
determining if course participants are using the proposed resources (i.e. videos) as
expected. Principal component analysis combined with logistic regression can be used
to determine how videos are watched, the relative importance of each video within a
group and the relative importance of each group of videos with respect to the
evaluation test. In fact, evaluation itself can be analyzed to detect whether course
participants are skipping parts of the course or not, as well as test biases towards some
topics rather than others.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This work was supported by research grants from Generalitat of Catalonia (2014
SGR 1271) and the interuniversitary programme UCATx.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Muñoz-Merino</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruipérez-Valiente</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alario-Hoyos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pérez-Sanagustín</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Delgado</given-names>
            <surname>Kloos</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Precise Effectiveness Strategy for analyzing the effectiveness of students with educational resources and activities in MOOCs</article-title>
          .
          <source>Computers in Human Behavior</source>
          , vol.
          <volume>47</volume>
          ,
          <fpage>108</fpage>
          --
          <lpage>118</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Milligan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Crowd-sourced learning in MOOCs: learning analytics meets measurement theory</article-title>
          .
          <source>In: Fifth International Conference on Learning Analytics And Knowledge LAK'15</source>
          , pp.
          <fpage>151</fpage>
          --
          <lpage>155</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Whitmer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiorring</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>James</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miley</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>How Students Engage with a Remedial English Writing MOOC: A Case Study in Learning Analytics with Big Data. Educause Learning Initiative (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Siemens</surname>
          </string-name>
          ,
          <source>George. "What Are Learning Analytics?" Elearnspace, August</source>
          <volume>25</volume>
          ,
          <year>2010</year>
          . http://www.elearnspace.org/blog/2010/08/25/what-are
          <article-title>-learning-analytics/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Sancho-Vinuesa</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliver</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Gisbert</given-names>
            <surname>Cervera</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Moocs en cataluña: un instrumento para la innovación en educación superior</article-title>
          . Educación XX1: Revista de la Facultad de Educación, vol.
          <volume>18</volume>
          ,
          <issue>2</issue>
          ,
          <fpage>125</fpage>
          --
          <lpage>146</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Daza</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rovira</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makriyannis</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <article-title>MOOC attack: closing the gap between pre-</article-title>
          university and university mathematics. Open Learning:
          <source>The Journal of Open, Distance and eLearning</source>
          , vol.
          <volume>28</volume>
          ,
          <issue>3</issue>
          ,
          <fpage>227</fpage>
          --
          <lpage>238</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Clearance Center:
          <article-title>Video Use and Higher Education: Options for the Future</article-title>
          .
          <source>Technical report</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bravo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amante</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enache</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Video as a new teaching tool to increase student motivation</article-title>
          .
          <source>In: Global Engineering Education Conference (EDUCON)</source>
          ,
          <year>2011</year>
          IEEE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>