<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MAGIC: Massive Automated Grading in the Cloud</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Armando Fox</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Patterson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuel Joseph</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul McCulloch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Apex Computing Inc. and AgileVentures</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Background: Autograding for a Software Engineering Course</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Hawai'i Pacific University</institution>
          ,
          <addr-line>MakersAcademy, and AgileVentures</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of California</institution>
          ,
          <addr-line>Berkeley</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>39</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>We describe our experience developing and using a specific category of cloud-based autograder (automatic evaluator of student programming assignments) for software engineering. To establish our position in the landscape, our autograder is fully automatic rather than assisting the instructor in performing manual grading, and test based, in that it exercises student code under controlled conditions rather than relying on static analysis or comparing only the output of student programs against reference output. We include a brief description of the course for which the autograders were built, Engineering Software as a Service, and the rationale for building them in the first place, since we had to surmount some new obstacles related to the scale and delivery mechanism of the course. In three years of using the autograders in conjunction with both a software engineering MOOC and the residential course on which the MOOC is based, they have reliably graded hundreds of thousands of student assignments, and are currently being refactored to make their code more easily extensible and maintainable. We have found cloud-based autograding to be scalable, sandboxable, and reliable, and students value the near-instant feedback and opportunities to resubmit homework assignments more than once. Our autograder architecture and implementation are open source, cloud-based, LMS-agnostic, and easily extensible with new types of grading engines. Our goal is not to make specific research claims on behalf of our system, but to extract from our experience engineering lessons for others interested in building or adapting similar systems.</p>
      </abstract>
      <kwd-group>
        <kwd>automatic grading</kwd>
        <kwd>programming</kwd>
        <kwd>software engineering</kwd>
        <kwd>on-line education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        perform both dynamic tests and static analysis [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Autograders have also found use in
residential classrooms, with some instructors even finding that grades on autograded
programming assignments are a surprisingly good predictor of final course grades
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        From 2008 to 2010, authors Fox and Patterson refocused UC Berkeley’s
onesemester (14-week) undergraduate software engineering course [
        <xref ref-type="bibr" rid="ref6 ref8">6, 8</xref>
        ] on agile
development, emphasizing behavior-driven design (BDD)1 and automated testing. A key
goal of the redesign was to promote software engineering methodologies by giving
students access to best-of-breed tools to immediately practice those methodologies.
These tools would not only enable the students to learn immediately by doing, but
also provide quantitative feedback for instructors to check students’ work. We chose
Ruby on Rails as the teaching vehicle because its developer ecosystem has by far the
richest set of such tools, with a much stronger emphasis on high productivity,
refactoring, and beautiful code than any other ecosystem we’d seen. The choice of Rails in
turn influenced our decision to use Software as a Service (SaaS) as the learning
vehicle, rather than (for example) mobile or embedded apps. In just 14 weeks, third- and
fourth-year students learn Ruby and Rails (which most haven’t seen before), learn the
tools in Figure 1, complete five programming assignments, take three exams, and
form “two-pizza teams” of 4–6 to prototype a real SaaS application for a nonprofit,
NGO, or campus unit, over four two-week agile iterations.
      </p>
      <p>
        The new course was offered experimentally in 2009–2010 and was immediately
successful; growing enrollment demand (from 45 in the pilot to 240 in Spring 2015)
led us to write a book around the course [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and to start thinking about how to scale it
up. Coincidentally, in mid-2011 our colleagues Prof. Andrew Ng and Prof. Daphne
Koller at Stanford were experimenting with a MOOC platform which would
eventually become Coursera, and invited us to try adapting part of our course to the platform
as an experiment. With the help of some very strong teaching assistants, we not only
created Berkeley’s first MOOC, but also the initial versions of the autograder
tech
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 http://guide.agilealliance.org/guide/bdd.html</title>
      <p>
        nology described here. To date, we estimate over 1,500 engineer-hours have been
invested in the autograders, including contributions from MOOC alumni, from the
AgileVentures2 open development community, and from instructors using our MOOC
materials as a SPOC [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
2
      </p>
      <sec id="sec-2-1">
        <title>Cloud Grading Architecture With OpenEdX</title>
        <p>We adopt a narrow Unix-like view of an autograder: it is a stateless command-line
program that, given a student work submission and a rubric, computes a score and
some textual feedback. We treat separately the question of how to connect this
program to a Learning Management System (LMS). All other policy issues—whether
students can resubmit homeworks, how late penalties are computed, where the
gradebook is stored, and so on—are independent of the autograder3, as is the question
of whether these autograders should replace or supplement manual grading by
instructors. While these issues are pedagogically important, for engineering purposes we
declare them strictly outside the scope of the autograder code itself.
2.1</p>
        <sec id="sec-2-1-1">
          <title>Why Another Autograder?</title>
          <p>
            Given that 17 autograding systems and over 60 papers about them were produced
from 2006–2010 alone [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], why did we choose to build our own? First, as the survey
authors point out, many existing systems’ code is not readily available or is tightly
integrated to a particular Learning Management System (LMS). We needed to
integrate with Coursera and later OpenEdX, both of which were new and had not yet
embraced standards such as Learning Tools Interoperability4. Unlike most previous
systems, ours would need to work at “cloud scale” and respond to workload spikes:
the initial offering of our MOOC in February 2012 attracted over 50,000 learners, and
we expected that thousands of submissions would arrive bunched together close to the
submission deadline. For the same reason, our graders needed to be highly insulated
from the LMS, so that students whose code accidentally or deliberately damaged the
autograder could not compromise other information in the LMS. Similarly, the
autograders had to be trustworthy, in that the student assignments were authoritatively
graded on trusted servers rather than having students self-report grades computed by
their own computers (although of course we still have no guarantee that students are
doing their own work).
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 http://agileventures.org</title>
      <p>3 Due to an implementation artifact of OpenEdX, the autograders currently do adjust their
scores to reflect late penalties, based on metadata about due dates provided with each
assignment submission.
4 http://imsglobal.org/lti
2.2</p>
      <sec id="sec-3-1">
        <title>Student Experience and Cloud Grading Architecture</title>
        <p>Our initial implementation of autograding was designed to work with Coursera and
later adapted to OpenEdX. Both the API and the student experience are similar
between the two. A logged-in student navigates to a page containing instructions and
handouts for a homework assignment; when ready, the learner submits a single file or
a tar or zip archive through a standard HTTP file-upload form. A short time later,
typically less than a minute, the student can refresh the page to see feedback on her
work from the autograder.</p>
        <p>
          As Figure 2 shows, the student’s submitted file, plus submission metadata
specified at course authoring time, go into a persistent queue in the OpenEdX server; each
course has its own queue. We use the metadata field to distinguish different
assignments so that the autograder knows which engine and rubric files must be used to
grade that assignment. The OpenEdX LMS defines an authenticated RESTful API5 by
which external standalone autograders can retrieve student submissions from these
queues and later post back a numerical grade and textual feedback. The external
grader does not have access to the identity of the learner; instead, an obfuscated token
identifies the learner, with the mappings to the learners’ true identities maintained
only on OpenEdX. Hence no sensitive information connecting a work product to a
specific student is leaked if the autograder is compromised. Once a submission is
retrieved from the queue, the metadata identifies which grader engine and
instructorsupplied rubric files (described subsequently) should be used to grade the assignment.
The engine itself, rag (Ruby AutoGrader), is essentially a Unix command-line
program that consumes the submission filename and rubric filename(s) as command-line
arguments and produces a numerical score (normalized to 100 points) and freeform
text feedback. The XQueueAdapter in the figure is a wrapper around this program
that retrieves the submission from OpenEdX and posts the numerical score and
feedback (formatted as a JSON object) back to OpenEdX.
5 http://edx-partner-course-staff.readthedocs.org/en/latest/exercises_tools/external_graders.html
This simple architecture keeps the grader process stateless, thereby simplifying the
implementation of cloud-based graders in three ways:
1. No data loss. If an assignment is retrieved but no grade is posted back before a
pre-set timeout, OpenEdX eventually returns the ungraded assignment to the
queue, where it will presumably be picked up again by another autograder
instance. Therefore, if an autograder crashes while grading an assignment, no
student work is lost.
2. Scale-out. Since the autograders are stateless consumers contacting a single
producer (the queue), and grading is embarrassingly task-parallel, we can drain
the queues faster by simply deploying additional autograder instances. Since
we package the entire autograder and supporting libraries as a virtual machine
image deployed on Amazon’s cloud, deploying an additional grader is
essentially a one-click operation. (We have not yet had the time to automate scaling
and provisioning.)6. Even our most sophisticated autograders take less than
one machine-minute per assignment, so at less than 10 cents per
machinehour, MOOC-scale autograding is cost-effective and fast: even with thousands
of assignments being submitted in a 50,000-learner course, students rarely
waited more than a few minutes to get feedback, and we can grade over 1,000
assignments for US $1.
3. Crash-only design [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. If the external grader crashes (which it does
periodically), it can simply be restarted, which we in fact do in the body of a
while(true) shell script. If the entire VM becomes unresponsive, for
example if it becomes corrupted by misbehaving student code, it can be rebooted or
undeployed as needed, with no data loss.
        </p>
        <p>In short, the simple external grader architecture of OpenEdX provides a good
separation of concerns between the LMS and autograder authors.
2.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>CI Workflow for Autograders</title>
        <p>
          Since at any given time the autograders may be in use by the MOOC and several
campus SPOCs (Small Private Online Courses [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]), it is important to avoid
introducing breaking changes to rubric files, homework assignments, or the autograders
themselves. We set up continuous integration tasks using Travis-CI, which is integrated
with GitHub. When a pull request is made7, the CI task instantiates a new virtual
machine, installs all the needed software to build an autograder image based on the
codebase as it would appear after the pull request, and tests the autograder with
known solutions versioned with each homework, as Figure 3 shows. Each homework
6 OpenEdX also supports an alternative “push” protocol in which each student submission
event triggers a call to a RESTful autograder endpoint. We do not use this alternative
protocol because it thwarts this simple scaling technique and because we would be unable to limit
the rate at which submissions were pushed to our autograders during peak times.
7 A pull request is GitHub’s term for the mechanism by which a developer requests that a set
of changes be merged into the production branch of a codebase.
assignment repository also has a CI task that automates the installation of the
autograders and verifies their configuration.
        </p>
        <p>rag, a Ruby Autograder for ESaaS</p>
        <p>Having chosen Ruby and Rails for their excellent testing and code-grooming tools,
our approach was to repurpose those same tools into autograders that would give
finer-grained feedback than human graders using more detailed tests, and would be
easier to repurpose than those built for other languages.</p>
        <p>rag8 is actually a collection of three different autograding “engines” based on
open-source testing tools, as Figure 4 shows. Each engine takes as input a
studentsubmitted work product and one or more rubric files whose content depends on the
grader engine9, and grades the work according to the rubric.</p>
        <p>The first of these (Figure 4, left) is RSpecGrader, based on RSpec, an XUnit-like
TDD framework that exploits Ruby’s flexible syntax to embed a highly readable
unithttp://github.com/saasbook/rag
Currently the rubric files must be present in the local filesystem of the autograder VM, but
refactoring is in progress to allow these files to be securely loaded on-demand from a remote
host so that currently-running autograder VMs do not have to be modified when an
assignment is added or changed
testing DSL in Ruby. The instructor annotates specific tests within an assignment with
point values (out of 100 total); RSpecGrader computes the total points achieved and
concatenates and formats the error/failure messages from any failed tests, as Figure 6
shows. RSpecGrader wraps the student code in a standard preamble and postamble in
which large sections of the standard library such as file I/O and most system calls
have been stubbed out, allowing us to handle exceptions in RSpec itself as well as test
failures. RSpecGrader also runs in a separate timer-protected interpreter thread to
protect against infinite loops and pathologically slow student code.</p>
        <p>
          A variant of RSpecGrader is MechanizeGrader (Figure 4, center). Surveys of
recent autograders [
          <xref ref-type="bibr" rid="ref11 ref4">11, 4</xref>
          ] mentioned as a “future direction” a grader that can assess
full-stack GUI applications. MechanizeGrader does this using Capybara and
Mechanize10. Capybara implements a Ruby-embedded DSL for interacting with Web-based
applications by providing primitives that trigger actions on a web page such as filling
in form fields or clicking a button, and examining the server’s delivered results pages
using XPath11, as Figure 7 shows. Capybara is usually used as an in-process testing
tool, but Mechanize can trigger Capybara’s actions against a remote application,
allowing black-box testing. Students’ “submission” to MechanizeGrader is therefore the
URL to their application deployed on Heroku’s public cloud.
        </p>
        <p>
          Finally, one of our assignments requires students to write integration-level tests
using Cucumber, which allows such tests to be formulated in stylized plain text, as
Figure 8 shows. Our autograder for this style of assignment is inspired by mutation
testing, a technique invented by George Ammann and Jeff Offutt [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in which a testing
tool pseudo-randomly mutates the program under test to ensure that some test fails as
a result of these introduced errors.
        </p>
        <p>Specifically, FeatureGrader (Figure 4, right) operates by working with a
reference application designed so that its behavior can be modified by manipulating
certain environment variables. Each student-created test is first applied to the reference
application to ensure it passes when run against a known-good subject. Next the
FeatureGrader starts to mutate the reference application according to a simple
specification (Figure 5), introducing specific bugs and checking that some student-created test
does in fact fail in the expected manner in the presence of the introduced bug.
4</p>
        <sec id="sec-3-2-1">
          <title>Lessons and Future Work</title>
          <p>
            Both surveys of autograders ask why existing autograders aren’t reused more, at
least when the programming languages and types of assignments supported by the
autograder match those used in courses other than the one(s) for which the autograder
was designed. We believe one reason is the configuration required for teachers to
deploy autograders and students to submit work to them. Since we faced and
surmounted this problem in deploying our “autograders as a service” with OpenEdX, we
can make them easy for others to use. We already have several instructors running
SPOCs based on our materials [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] using OpenEdX, not only using our autograders but
creating new assignments that take advantage of them. We are completing a major
refactoring that should allow our autograders to be used entirely as a service by others
and a toolchain to create autogradable homeworks for use in conjunction with the
ESaaS course materials.
          </p>
          <p>We now discuss how we are addressing ongoing challenges resulting from lessons
learned in using these autograders for nearly three years.</p>
          <p>Tuning rubrics. When rubrics for new assignments are developed, it is easy to
overlook correct implementations that don’t match the rubric, and easy to forget
“preflight checks” that may cause the grader process to give up (for example, checking
that a function is defined in the appropriate class namespace before calling it, to avoid
a “method not found” exception). Similarly, if tests are redundant—that is, if the same
single line of code or few lines of code in a student submission causes all tests in a
group to either pass or fail together— then student scoring is distorted. (This is the
more general problem of test suite quality in software engineering.) In general we try
to debug rubrics at classroom scale and then deploy the assignments to the MOOC,
relying on the CI workflow to ensure we haven’t broken the autograding of existing
assignments.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>Avoiding “Autograder-Driven Development.” Because feedback from the au</title>
        <p>tograder is quick, students can get into the habit of relying on the autograder for
debugging. To some extent we have turned this into a benefit by showing students how
to use RSpec and Cucumber/Capybara on their own computers12 and run a subset of
the same tests the instructors use, which is much faster and also gives them access to
an interactive debugger.</p>
        <p>Combining with manual grading. In a classroom setting (though usually not in a
MOOC), instructors may want to spot-check students’ code manually in addition to
having it autograded. The current workflow makes it a bit awkward to do this, though
we do save a copy of every graded assignment.</p>
        <p>
          Grading for style. As Douce et al. observe [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], one flaw of many autograders is
that “A program. . . may be correct in its operation yet pathological in its
construction.” We have observed this problem firsthand and are developing “crowd-aware”
autograders that take advantage of scale to give students feedback on style as well as
correctness. This work is based on two main ideas. The first is that a small number of
clusters may capture the majority of stylistic errors, and browsing these clusters can
help the instructor quickly grasp the main categories of stylistic problems students are
experiencing [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The second is that within a sufficiently large set of student
submissions, we can observe not only examples of excellent style and examples of very poor
style, but enough examples in between that we can usually identify a submission that
is slightly more stylistic than a given student’s submission [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. We can then use the
differences between two submissions as the basis for giving a hint to the student
whose submission is stylistically worse.
        </p>
        <p>
          Cheating. Woit and Mason [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] found that not only is cheating rampant (in their
own 5-year study and supported by earlier studies), as demonstrated dramatically by
students who got high marks on required programming assignments but failed the
exact same questions when they appeared on exams, but also that students don’t do
optional exercises. Notwithstanding these findings—and we’re sure plagiarism is
occurring in both our MOOC and campus class—plagiarism detection has been a
nongoal for us. We use these assignments as formative rather than summative
assessments, and we have the option of using MOSS13. That said, we continue to work on
strengthening the autograders against common student attacks, such as trying to
generate output that mimics what the autograder generates when outputting a score, with
the goal of getting the synthetic output parsed as the definitive score.
12 These and all the other tools are preinstalled in the virtual machine image in which we
package all student-facing courseware, available for download from saasbook.info.
13 http://theory.stanford.edu/˜aiken/moss
5
        </p>
        <sec id="sec-3-3-1">
          <title>Conclusions</title>
          <p>The autograders used by ESaaS have been running for nearly three years and have
graded hundreds of thousands of student assignments in our EdX MOOC, our campus
Software Engineering course, and many other instructors’ campus courses (SPOCs)
that use some or all of our materials. The substantial investment in them has paid off
and we are continuing to improve and streamline them for future use. Instructors
interested in adopting them for their course, or in creating adapters to connect them to
other LMSs, are invited to email spoc@saasbook.info.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Acknowledgements</title>
          <p>We thank Google for early support of this effort as well as support for the current
refactoring to further scale the courses that use this technology. We thank the
technical teams at Coursera and edX for their early support for this course by providing
the necessary external grader APIs. We thank the AgileVentures team for both
helping steward the MOOC and providing substantial development and maintenance
effort on the autograders, especially with their contribution of the CI workflows.</p>
          <p>Finally, thanks to the many, many UC Berkeley undergraduate and graduate
students who have contributed to the development of the autograders, including James
Eady, David Eliahu, Jonathan Ko, Robert Marks, Mike Smith, Omer Spillinger,
Richard Xia, and Aaron Zhang.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>Appendix: Code examples</title>
          <p>Fig. 7. This excerpt of three test cases from a MechanizeGrader rubric runs against a
student's deployed full-stack application.</p>
          <p>Fig. 8. Cucumber accepts integration tests written in stylized prose (top) and uses regular expressions
to map each step to a step definition (bottom) that sets up preconditions, exercises the app, or checks
postconditions. Step definitions can stimulate a full-stack GUI-based web application in various
ways, including remote-controlling a real browser with Webdriver (formerly Selenium) or using the
Ruby Mechanize library to interact with a remote site. Our code blocks are in Ruby, but the
Cucumber framework itself is polyglot.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ammann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Offutt</surname>
          </string-name>
          , J.: Introduction to Software Testing. Cambridge University Press (
          <year>2008</year>
          ), http://www.amazon.com/Introduction-Software-Testing-PaulAmmann/dp/0521880386
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bloom</surname>
            ,
            <given-names>B.S.:</given-names>
          </string-name>
          <article-title>Mastery learning</article-title>
          .
          <source>Mastery learning: Theory and practice</source>
          pp.
          <fpage>47</fpage>
          -
          <lpage>63</lpage>
          (
          <year>1971</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Candea</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Crash-only software</article-title>
          .
          <source>In: Proc. 9th Workshop on Hot Topics in Operating Systems. Sante Fe, New Mexico (Jun</source>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Douce</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Livingstone</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orwell</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Automatic test-based assessment of programming: A review</article-title>
          .
          <source>J. Educ. Resour. Comput</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ) (
          <year>Sep 2005</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1163405.1163409
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Viewpoint: From MOOCs to SPOCs: How MOOCs can strengthen academia</article-title>
          .
          <source>Communications of the ACM 56 (Dec</source>
          <year>2013</year>
          ), http://cacm.acm.org/magazines/2013/12/169931-from
          <article-title>-moocs-to-spocs/abstract</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patterson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Crossing the software education chasm</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>55</volume>
          (
          <issue>5</issue>
          ),
          <fpage>25</fpage>
          -
          <lpage>30</lpage>
          (May
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patterson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Engineering Software as a Service: An Agile Approach Using Cloud Computing</article-title>
          . Strawberry
          <string-name>
            <surname>Canyon</surname>
            <given-names>LLC</given-names>
          </string-name>
          , San Francisco, CA, 1st edition edn. (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patterson</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          :
          <article-title>Is the new software engineering curriculum agile? IEEE Software (September/October</article-title>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patterson</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilson</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joseph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walcott-Justice</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Software engineering curriculum technology transfer: Lessons learned from MOOCs and SPOCs</article-title>
          .
          <source>Tech. Rep. UCB/EECS-2014-17</source>
          , EECS Department, University of California, Berkeley (Mar
          <year>2014</year>
          ), http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-17.html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hollingsworth</surname>
          </string-name>
          , J.:
          <article-title>Automatic graders for programming classes</article-title>
          .
          <source>Commun. ACM</source>
          <volume>3</volume>
          (
          <issue>10</issue>
          ),
          <fpage>528</fpage>
          -
          <lpage>529</lpage>
          (
          <year>Oct 1960</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/367415.367422
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ihantola</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahoniemi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karavirta</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , Sepp¨al¨a, O.:
          <article-title>Review of recent systems for automatic assessment of programming assignments</article-title>
          .
          <source>In: Proceedings of the 10th Koli Calling International Conference on Computing Education Research</source>
          . pp.
          <fpage>86</fpage>
          -
          <lpage>93</lpage>
          . Koli Calling '
          <fpage>10</fpage>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2010</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1930464.1930480
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Moghadam</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choudhury</surname>
            ,
            <given-names>R.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>AutoStyle: Toward coding style feedback at scale</article-title>
          .
          <source>In: 2nd ACM Conference on Learning at Scale</source>
          . Vancouver, Canada (March
          <year>2015</year>
          ), http://dx.doi.org/10.1145/2724660.2728672, short paper
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Navrat</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tvarozek</surname>
          </string-name>
          , J.:
          <article-title>Online programming exercises for summative assessment in university courses</article-title>
          .
          <source>In: Proceedings of the 15th International Conference on Computer Systems and Technologies</source>
          . pp.
          <fpage>341</fpage>
          -
          <lpage>348</lpage>
          . CompSysTech '14,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2014</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/2659532.2659628
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Woit</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mason</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Effectiveness of online assessment</article-title>
          .
          <source>In: Proceedings of the 34th SIGCSE Technical Symposium on Computer Science Education</source>
          . pp.
          <fpage>137</fpage>
          -
          <lpage>141</lpage>
          . SIGCSE '03,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2003</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/611892.611952
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Clustering student programming assignments to multiply instructor leverage</article-title>
          .
          <source>In: 2nd ACM Conference on Learning at Scale</source>
          . Vancouver, Canada (March
          <year>2015</year>
          ), http://dx.doi.org/10.1145/2724660.2728695, short paper
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>