<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>What data and analytics can and do say about effective
learning. npj Science of Learning.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Focus on Methodology in Learning Analytics: Building a Structurally Sound Bridge Discipline Yoav Bergner New York University Charles Lang Teachers College Columbia University</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Geraldine Gray Institute of Technology</institution>
          ,
          <addr-line>Blanchardstown</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>2</volume>
      <issue>5</issue>
      <fpage>184</fpage>
      <lpage>193</lpage>
      <abstract>
        <p>The following paper gives an overview of the inaugural Methodologies in Learning Analytics Workshop at the International Learning Analytics &amp; Knowledge Conference 2017 in Vancouver, Canada. The event discussed many topics but two key themes emerged, that of middle space, the space between learning and analytics in which methodologies reside and the importance of multivocality, the challenge of finding shared analytic objectives and learning to not talk past one another. The following summarizes these themes and their importance for generating robust methodological arguments within Learning Analytics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In the third year of the International Learning Analytics &amp; Knowledge Conference
(LAK2013), the conference organizers sketched out a theme of dialectics in learning analytics
[15]. One of these dialectics was middle space—as in, a space in between learning and analytics.
Another was productive multivocality—that is, finding shared analytic objectives and learning
to not talk past each other. In preparing our workshop proposal for LAK2017, we found
ourselves wanting to push a bit harder on the conceptual infrastructure necessary for
sustaining this middle space and for effective communication, or what it means to build a
structurally sound bridge discipline.</p>
      <p>From its beginnings, learning analytics emphasized the importance of bridging computer
sciences and social sciences [13]. Learning analytics has also been described as helping to
bridge education, psychology, and neuroscience [10]. Interestingly, in Boyack, Klavans, and
Börner’s [3] scientometric analysis of the “backbone of science,” a network analysis based on
journal inter-citations found both education and computer science lie toward the outside of the
network. Education and computer science are fairly insular by the authors’ analysis, while
artificial intelligence and psychology are more central. The shortest path connecting education
to computer science passes through psychology and statistics. Thus, one might speculate that if
Boyack et al. were to redo their analysis in the future, learning analytics and educational data
mining journals would cluster somewhere on this path.</p>
      <p>Mindful of Suthers and Verbert’s second dialectic, we see productive multivocality as
intimately connected to methodology and the uses of statistics as principled argument (cf.
[1]) to support claims about learning and educational improvement. Methodologists are the
quality assurance engineers in the bridge-building enterprise of learning analytics, stress testing
the roadways, trusses, and wire ropes that connect educational technologists, psychologists,
data scientists, learning scientists, substantive experts in various educational domains, and
measurement specialists. For all of the strength that comes from such diversity of expertise,
there are also challenges when it comes to establishing norms for methodological work.
Learning analytics has embraced an eclectic approach to methodology but may lack its own
coherent epistemology [5]. In an amicus brief of a paper to his own scholarly community, Peter
Kennedy [7] enumerated the ten commandments of applied econometrics, that is, the
unwritten rules that guide methodological choices. This short paper does not attempt to
establish the rules of learning analytics methodology, but only to make the case that some set
of rules is desirable.</p>
    </sec>
    <sec id="sec-2">
      <title>Papers as Arguments</title>
      <p>To elucidate the role of a methodological focus in learning analytics, consider the
general framework for reasoning or argument due to Toulmin [16]. A representation of the
Toulmin model is shown in Figure 1. The end of an argument is a claim, which is supported
along a central path by data/observations. However, the observations alone do not suffice to
model non-trivial arguments, which usually involve some warrant for justifying the claim from
the observations. The warrant itself is backed by some other observations. Finally, alternative
explanations and rebuttal evidence serve to qualify a claim or, possibly, strengthen it if
alternative explanations appear to be weak. The model is simple but remarkably general.</p>
      <p>An example of Toulmin’s model in the context of a simple school assessment might go
as follows: “Donald is very weak in mathematics” (claim). “He got almost all of the questions
wrong on the diagnostic test” (data). “These questions were a good gauge of grade 6 math
ability” (warrant): “they were drawn from a pool that our school developed with expert help
and refined over several years. Math teachers say the scores on the placement test indeed
predict which students struggle without extra help and which students succeed” (backing).
Or, ”This math test did not accurately gauge Donald’s math ability” (alternative explanation):
“Donald suffers from health problems that kept him awake throughout the night, and he was
too exhausted to perform at his actual level” (rebuttal evidence).</p>
      <p>Of course, a paper is usually composed of a chain of arguments rather than a single one.
A simplified representation of an archetypal learning analytics paper might involve the chain
illustrated in Figure 2. It is not implied that the paper is written in this order, but rather that the
logical path from data/observations to a claim (about learners, technologies, pedagogies, etc.)
will typically involve warrants justifying data selection and preparation (possibly multiple such
steps), model selection and implementation, and evaluation. For each of these links in the chain
of argument, alternative explanations might exist that potentially undermine the claim. For
example, was the data selection or transformation justifiably warranted given the claim and the
data? Paper authors who aim to build a strong argument are likely to engage with these
alternative explanations. Methodologists, in particular though, make it their business to
understand the underside in this diagram. This is hardly to suggest that methodologists are a
finger-wagging bunch who delight in niggling their colleagues about violations of model
assumptions. Alternative explanations and warrants go hand in hand in building an argument.
The exhaustion of alternative explanations can strengthen the warrant.</p>
      <p>Chaining weak links undermines the structural integrity of an argument, in learning
analytics as anywhere. However, learning analytics may be particularly susceptible to this
weakness given the breadth of techniques practitioners use. It is challenging for readers and
reviewers to be fluent with all of them. We will cherry pick an example from one of the
pioneering papers on modeling student engagement in massive open online courses by Kizilcec,
Piech, and Schneider (KPS; [8]). We single this paper out not because it represents a particularly
egregious example, but only because it is one of the most cited references1 in the learning
analytics literature supporting the discrete characterization of MOOC learners by
disengagement patterns.</p>
      <p>KPS arrive at the plausible claim that MOOC learners can be categorized as completing,
auditing, disengaging, or sampling. On their way to this claim, the following three steps are
involved. First, in each assessment period, learners are labeled as “on track” (did the
assessment on time), “behind” (turned in the assessment late), “auditing” (didn’t do the
assessment but engaged by watching a video or doing a quiz), or “out” (didn’t participate in the
course at all), leading to a vector of engagement observations, for example, [T, T, T, T, T, B, A, A,
A]. In the second step, the similarity between engagement vectors for two students is
computed as follows: assign numerical values to each label (on track = 3, behind = 2, auditing =
1, out = 0), and compute the L1 norm of the list of numbers. Finally, in step 3, k-means
clustering is used (repeated 100 times from random start points). What is the problem with this
sequence of steps?</p>
      <p>The first step in the above sequence involves (several instances of) dichotomizing a
continuous variable, a practice generally frowned upon for increasing the likelihood of type I
and type II errors [11]. Does it matter how late is late? Or whether a student watched one video
or ten? Perhaps not, but the authors do not make this case. In the second step, a categorical
label is transformed into an interval scale for the purpose of computing distances. Is this
justified? (Assumes the “difference” between two learners is the same if (a) one of them
watched 10 videos and the other none or (b) if one of them completed an assignment on time
and the other completed it late.) Lastly, the use of k-means is suspect with a non-Euclidean
metric as is used in the KPS analysis; a k-medians modification is recommended for L1 norms
[14]. In summary, the KPS analysis involves a chaining of three analytical steps, each of which is
potentially suspect. Does this mean the ultimate claim is wrong? Of course not. However, we
note that not only has this paper been cited for its claims about learners, but its methods have
also been used in replication studies (e.g., [6]). Repeated use in itself becomes evidence for
validity and tacitly vindicates the lack of consideration of alternative explanations.</p>
    </sec>
    <sec id="sec-3">
      <title>Operationalization and Sensitivity Analyses</title>
      <p>What KPS perhaps lacked most was an analysis of the sensitivity of results to
operationalization of their engagement variables and distance metrics. We are reminded of
Kennedy’s [7] tenth commandment of applied econometrics: “Thou shalt confess in the
presence of sensitivity (Corollary: Thou shalt anticipate criticism)” (p. 583). In fact, as the field of
1 At the time of writing, this paper was cited over 500 times according to Google Scholar.
learning analytics has matured, a number of more recent papers have emphasized the
sensitivity of quantitative analyses to data collection and variable operationalization choices,
for example in the cases of selection bias [4], time-on-task analyses [9], studies of discussion
forum usage [2], and evaluation of student models [12]. Our aim is not to ring an alarm bell. At
the risk of stating the obvious, we emphasize only that the methods we use in learning analytics
are subject to random and systematic error. If we do not make explicit efforts to quantify
uncertainty of both kinds, we chain together weak links.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In conclusion, the Inaugural Workshop in Methodologies in Learning Analytics raised more
questions than it answered, but we are confident that this is a positive sign. It demonstrates the
thirst of the community to engage critically in methodological conversations and address the
challenges of building methodological bridges within the discipline. The form that this endeavor
will ultimately take within the community will largely depend on the ideas discussed,
middlespace and multivocality. The challenge is to define the objectives of the field, align those
objectives with methodologies, and communicate those arguments across the many fields
involved in learning analytics and beyond.</p>
      <p>Yoav Bergner acknowledges research support by the National Science Foundation
(DRL1740371)</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement References</title>
      <p>[1]
[2]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Abelson</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          <year>1995</year>
          .
          <article-title>Statistics as principled argument</article-title>
          . L. Erlbaum Associates.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bergner</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <year>2015</year>
          .
          <article-title>Methodological Challenges in the Analysis of MOOC Data for Exploring the Relationship between Discussion Forum Views and Learning Outcomes</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>Proceedings of 8th International Conference on Educational Data Mining</source>
          . (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Boyack</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          et al.
          <year>2005</year>
          .
          <article-title>Mapping the backbone of science</article-title>
          .
          <source>Scientometrics</source>
          .
          <volume>64</volume>
          ,
          <issue>3</issue>
          (Aug.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          et al.
          <year>2015</year>
          .
          <article-title>Reducing selection bias in quasi-experimental educational studies</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>Proceedings of the Fifth International Conference on Learning Analytics And Knowledge</source>
          (
          <year>2015</year>
          ),
          <fpage>295</fpage>
          -
          <lpage>299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Clow</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>An overview of learning analytics</article-title>
          .
          <source>Teaching in Higher Education</source>
          .
          <volume>18</volume>
          ,
          <issue>6</issue>
          (
          <year>2013</year>
          ),
          <fpage>683</fpage>
          -
          <lpage>695</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Ferguson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Clow</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Examining engagement</article-title>
          .
          <source>Proceedings of the Fifth International Conference on Learning Analytics And Knowledge - LAK '15</source>
          . (
          <year>2015</year>
          ),
          <fpage>51</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>P.E.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>Sinning in the basement: what are the rules? The ten commandments of applied econometrics</article-title>
          .
          <source>Journal of Economic Surveys</source>
          .
          <volume>16</volume>
          ,
          <issue>4</issue>
          (
          <year>2002</year>
          ),
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>