<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining the Web to Leverage Collective Intelligence and Learn Student Preferences</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Morettiy</string-name>
          <email>antonio.moretti@pearson.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>José P. González-Brenes?</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katherine McKnighty</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>1982</year>
      </pub-date>
      <issue>3</issue>
      <abstract>
        <p>University professors of conventional o ine classes are often experts in their research elds, but have little training on educational sciences. Current educational data mining techniques o er little support to them. In this paper we propose a novel algorithm, Analyzing CurrIculum Decisions (ACID), that leverages collective intelligence to model student opinions to help instructors of traditional classes. ACID mines publicly available educational websites, such as student ratings of professors and course information, and learns student opinions within a statistical framework. We demonstrate ACID to discover patterns in learner feedback and factors that a ect Computer Science instruction. Speci cally, we investigate the choice of a programming language for introductory courses, the grading criteria and the posting of a publicly available online syllabus.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;o ine teacher support</kwd>
        <kwd>collective intelligence</kwd>
        <kwd>web mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        There are thousands of undergraduates in computer science
programs throughout the US, roughly 24% of whom will
switch majors to non-computing elds [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. An essential
component of retaining students is the quality of
instruction that students receive in introductory courses [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. While
clear instruction and good pedagogy are widely
acknowledged as fundamental to retention, supports for instructors
to improve their educational practice are often based on old
data; the languages used in computer science courses quickly
evolve and old surveys are not useful. In this paper, we
develop a data mining technique that will help provide insight
into learner feedback which can be translated into changes
that a ect course quality. In general, our approach is similar
to large scale surveys that attempt to be representative of
student populations. The bene ts of our approach are that
it is rapid and inexpensive due to its use of publicly available
information on the Web.
      </p>
      <p>
        The eld of educational data mining has been cultivating
a strong interest in creating technologies to mine data
collected from sophisticated online systems such as intelligent
tutoring systems, virtual learning environments, and recently
from Massive Open Online Courses (MOOC). The merits
of these complex online systems have been demonstrated
empirically [
        <xref ref-type="bibr" rid="ref2 ref8">2, 8</xref>
        ] with controlled studies. MOOCs are a
powerful resource that allow educators to study student
behavior and social learning in a controlled environment,
however the scope of the impact of such technologies is
limited. For example, a recent survey of active MOOC users
in 200 countries and territories revealed that an
overwhelmingly majority of students on these courses correspond to
the most educated elite of their respective countries [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It
is clear that improving basic education worldwide is
necessary before MOOCs can deliver their promise. Moreover,
because most education still happens o ine, it is
important to provide educational technologies that can utilize the
power of internet to understand student behavior and to
deliver these technologies to traditional o ine classes. It is not
clear how existing educational data mining technologies can
help bridge this divide.
      </p>
      <p>
        We discuss the Analyzing CurrIculum Decisions (ACID) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
methodology, which has been presented and applied brie y.
In this paper we elaborate on both our methodology and
statistical model and expand upon our results. ACID is an
algorithm that leverages collective intelligence within a
statistical framework. ACID supports the decisions of
instructors of traditional o ine courses by extracting from the web
teaching syllabi data, and using crowd-sourcing to pair it
up with students' course ratings, comments and sentiment
to analyze the relationship between the two.
      </p>
      <p>This paper reports a case study of using the ACID
methodology to explore three questions that instructors of
computer science courses face when designing their courses. In
addition we discuss ACID's heuristic value within a larger
educational framework. We address the following questions:
1. What course activities and grading rubric
correlate with clear instruction? The question of how
to design a grading rubric and weight course activities
determines what students focus on within a course. It
is important for instructors to optimize course
activities and grading criteria with respect to the student
experience.</p>
      <sec id="sec-1-1">
        <title>Algorithm 1 ACID pseucode</title>
        <p>n universities to analyze, z reviews to analyze
procedure ACID
while jRj &lt; z do
s
s
R
R
sample of n universities</p>
      </sec>
      <sec id="sec-1-2">
        <title>Remove non-English speaking universities</title>
      </sec>
      <sec id="sec-1-3">
        <title>Search The Web For Reviews(s) ratings rated by more than students</title>
        <p>Q</p>
      </sec>
      <sec id="sec-1-4">
        <title>CrowdSource Questionnaire(R) Analyze Data(Q)</title>
        <p>
          2. For introductory classes, which programming
language(s) correlate with clear instruction?
Academics and industry professionals disagree as to the
programming language that is best suited for
beginners [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. For example, some argue that introductory
courses should use interpreted languages that allow for
a faster understanding of the applications of
programming rather than compiled languages that rely heavily
on language-speci c syntax. Others believe that
developing skill with compiled languages is necessary for
future work in computer science. The choice of a rst
programming language likely a ects students' decision
to continue education within the eld of computer
science.
3. Are students more interested in courses with
publicly available online syllabi? The choice to
make a syllabus publicly available adds to information
available to prospective students on the Web. We
hypothesize that the posting of an online syllabus can be
used as a proxy for factors including instructor
organization and motivation, and that students will both
be more interested in and prefer these courses.
        </p>
        <p>The rest of this paper is organized as follows. x 2 explains
the ACID methodology; x 3 describes three case studies of
evaluating teaching decisions using ACID; x 4 relates to prior
work; x 5 concludes.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. ANALYZING CURRICULUM DECISIONS</title>
      <p>Pseudocode for the ACID methodology is presented in
Algorithm 1. For a given number of reviews, we sample n
universities, remove the non-English speaking universities,
scrape and parse the relevant reviews from a ratings website
and retain ratings rated by more than a given number of
students. We then extract information from these courses
using crowd-sourcing, and analyze the data. We describe
the process in detail below.</p>
      <p>To evaluate the relative impact of di erent course features,
we mine the web for data that re ect:</p>
      <p>Curriculum decisions University professors often
upload information about their classes. This information
is targeted towards prospective or enrolled students.
This information includes syllabi with detailed
descriptions of course material such as textbooks, projects,</p>
      <p>
        Student perceptions of the course. We make use
of self-selected student evaluations collected from a
third-party website. The validity and usefulness of
selfselected online rating systems, have been assessed in
the literature [
        <xref ref-type="bibr" rid="ref1 ref12">1, 12</xref>
        ]. For example, evidence suggests
that online ratings do not lead to substantially more
biased ratings than those done in a traditional
classroom setting [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and that online ratings are a proxy
to measure student learning [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]: student learning can
often be modeled as a latent variable that causes
patterns of observed faculty ratings. Researchers
hypothesize a non-linear or concave relationship between
student learning and the perceived di culty level of a
course [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]; students learn most when a course is not
too di cult or too easy. Our work relies on self-selected
ratings as a metric to study learner opinion.
      </p>
      <p>
        We use publicly available self-selected ratings of professors
from a third-party website, Rate My Professor 1 (RMP).
This site allows students to rate the professors of the courses
they have taken. The database contains data from over 13
million ratings for 1.5 million professors. They collect
ratings on a 1|5 scale (being 1 the lowest possible score, and 5
the highest) under the categories of \easiness", \helpfulness"
and \clarity." Additionally students may ll out an
\interest" eld in which they indicate how appealing the class was
before enrolling, and a 350 character summary of their class
experience. We focus on perceived clarity because of the
direct link between clarity and quality of instruction.
For the purposes of this paper, we focus on Computer
Science courses due to our familiarity with the content. Since
we do not have access to the ratings database, we develop
a process to sample data from the website. For this, we
rst select a random sample of 50 international universities
that teach Computer Science from the Academic Ranking of
1ratemyprofessor.com
World Universities2 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. From this sample we only consider
the 41 universities are English speaking.
      </p>
      <p>We nd, scrape and parse the reviews of the ratings data-set
for all professors within the computer science departments of
the universities in our sample. We remove the ratings from
faculty that were rated by fewer than 30 students. More
than one professor can teach the same course. For our
analysis, we describe one course listing taught by two di erent
professors as two separate courses. Table 1 shows the mean,
standard deviation and median of the ratings in our sample.
Figure 1 shows two sample ratings for one professor from our
sample. The professor name and course names are removed
for privacy.</p>
      <p>We use Amazon Mechanical Turk, a crowdsourcing platform,
to nd course features for each of the courses in our ratings
sample. We do this by asking respondents to ll out a
survey. The survey requests to provide the URL for the online
syllabus that corresponds to the course and professor from
which we have ratings that is closest to the date of the
student review online. Then, using the syllabus, respondents
are asked to to provide the programming language(s) used,
the textbook(s) used, and the percentage of the grade that
was determined by homework, projects, quizzes, exams and
whether the course was taught online or in a blended format
(both face-to-face and online). However, when we reviewed
the responses to the blended format question, it appeared
that most syllabi did not provide enough information by
which to make an accurate response.</p>
      <p>From our original sample of 1,112 courses taught by a unique
professor, respondents nd an online syllabus matching the
professor for 342 courses ( 31%). We hypothesize three
explanations for the missing syllabi: (i) the syllabi may be
accessed only with a password through a course
management system, such as blackboard, (ii) the syllabi may not
be available only, or (iii) the respondents are not able to
nd the syllabi.</p>
    </sec>
    <sec id="sec-3">
      <title>3. DATA ANALYSIS: WHAT MAKES A BET</title>
    </sec>
    <sec id="sec-4">
      <title>TER CLASS?</title>
      <p>We report our results of applying the ACID methodology to
evaluate teaching decisions. In x 3.1 we assess the quality of
the data collected by the crowd sourcing platform. In x 3.2
we discuss the statistical model we use. In x 3.3 we report
the results of using ACID.</p>
    </sec>
    <sec id="sec-5">
      <title>3.1 Data Quality</title>
      <p>We now report the how we attempt to collect high-quality
data through the use of crowd-sourcing and how we assess
the quality of our data.</p>
      <p>Mechanical Turk provides a \master" quali cation level to
respondents that are more reliable. Masters-level
respondents require higher compensation for crowd-sourcing tasks
than non-masters level respondents although their
\acceptance rate," or proportion of approved tasks is much higher.
We ran a preliminary experiment, to decide whether
respondents on master level quali cation provide better quality
2Academic Ranking of World Universities is also known as
Shanghai Ranking shanghairanking.com</p>
      <sec id="sec-5-1">
        <title>Masters non-Masters</title>
        <p>data for our purposes. We ask respondents to nd the
syllabus corresponding to a random sample of 30 courses and
to answer a set of questions. Table 2 shows the accuracy
and interrater agreement of Masters and non-Masters level
respondents.</p>
        <p>In the pretest we used a screening question to evaluate the
accuracy of respondents' data on each task. We asked
respondents to nd the URL of the website of a randomly
selected faculty member at Carnegie Mellon University from
a set of 8, from which we knew the answer. We compared
the URL they provided with the correct URL to assess
accuracy. Of the 13 responses of non-masters workers that
did not provide an exact URL match, ve responses left the
validation question blank. We found that respondents with
master level quali cation were signi cantly more accurate
(i.e. answered the validation item correctly) than the
nonMasters level respondents (p-value = 0.0002).</p>
        <p>Additionally, we tested interrater agreement by asking 3
respondents to carry out the same task, i.e. nding the
same URL (for a total of 3x30 or 90 tasks). We used a
dummy variable to code whether the three respondents
provided the same URL for the course syllabus. Our measure
of agreement is calculated by taking the proportion of total
responses in which all three respondents provide the same
URL. Masters-level respondents agreed (i.e. all three
provided the same URL) 100% of the time, whereas the
nonMasters level respondents performed much worse { only 6%
agreed. As a result of these comparisons, we decided to hire
only Masters-level respondents to complete the
crowdsourcing experiment.</p>
        <p>After collecting the data using Masters level respondents, we
performed a post-hoc analysis by examining the responses
to the screening question. From the nal group of 342
responses that provided a link to an online syllabus, 325
responses (95.03%) provided the correct URL for the faculty
website. It should be noted that 13 of the 17 responses that
did not provide an exact URL match provided the website
for a di erent faculty member from the set of 8, suggesting
that they copied and pasted their previous response
without checking to see that the prompt had changed for the
new response. Two of the 17 responses provided a link to
the directory website for the faculty member rather than the
faculty member's personal website. One response provided
the correct faculty member's website within the department
of Statistics rather than the department of Computer
Science (the faculty member is in both departments).</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.2 Model</title>
      <p>
        We describe our general linear mixed model. We provide
descriptive statistics and model selection criteria.
VPC
ICC
We explore the relationship between student reviews and
features collected from online syllabus data using general
linear mixed modeling. Student reviews are organized at
three levels: by university, professor and course. It is
important to note the non-independence of the student reviews
due to the hierarchical or clustered nature of the data. We
suspect that student ratings within each course, professor
and perhaps university are correlated. We begin by
estimating the amount of variance attributed to each of these
three levels. The simplest multilevel model does not yet
include explanatory variables:
yi;j = 0 + u0;j + i;j
(1)
The dependent variable yi;j is the clarity rating that student
i gave to level j. The term 0 represents the intercept or
mean student clarity rating across all observations. The
term u0;j represents the mean clarity rating for level j. The
term i;j represents the error attributed to student rating i
at level j. For comparison we t a null or single-level model:
yi;j = 0 + i;j
We calculate the percentage of variation in the data set that
is separately attributed to each of the three levels of the data.
Conventionally the variance partition coe cient (VPC) and
intraclass correlation coe cient (ICC) can be interpreted
similarly to an R-squared term and are reported in Table 3.
= 1
2
e
e2 + u2
The VPC and ICC are denoted by , the residual variance
is denoted by e2 and the variance of the e ect is denoted
by u2. The ICC is a statistic that is similar to the VPC.
However, since the parameter values of the within and
between level variance are estimated using sample data, there
may be bias due to sampling variation, particularly when
there are fewer observations within a given level. The ICC
as described by Bartko [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] corrects for this bias by making
a small computational adjustment.3 Observe that the ICC
term appears to give slightly less weight to the course e ect.
It is clear from both statistics that the main e ect is the
professor e ect.
      </p>
      <p>We examine the professor level-residuals and their
associated standard errors to look for variation in clarity ratings
across professors. The caterpillar plot displays the professor
residuals in rank order together with 95% con dence
intervals. Wider intervals occur for professors with more student
reviews. Observe that the majority of the intervals do not
overlap and thus there are signi cant di erences between
professors. The blue circles on the far left represent
professors who are rated two standard deviations below the mean
clarity rating, whereas those on the far right are 1.5
stan3For a description of the computation of the ICC, see the
documentation and source code for the R library lme.
(2)
(3)
dard deviations higher than the mean clarity rating. The
red horizontal line refers to the \average" professor.
We calculate a Chi-squared likelihood ratio statistic by
taking the di erence between log likelihood values of two
successive models. We begin by comparing the null model and
the course level model to compare the signi cance of
including the course e ect. We continue by adding each of the
additional e ects. We do not report the values of the test
statistic although all additional levels of complexity are
statistically signi cant. We consider the Bayesian information
criterion (BIC) and Akaike information criterion (AIC) as
model selection tools to avoid over- tting the data. The
BIC and AIC penalize the log-likelihood of a model for the
inclusion of extra parameters. The parameters are estimated
using restricted maximum likelihood estimation (REML).
We choose the model with the minimum BIC. A two-level
mixed model including course e ect and professor e ect
provides the optimal Bayesian information criterion value. Two
and three way interaction e ects were considered although
they did not decrease the AIC or BIC of any of the
models. While the log likelihood value is maximized by including
the university e ect, a simpler model is preferable because
it involves fewer parameter estimates and is more likely to
generalize. The model can be written in matrix form:
Y = X
+ Z +
(4)
Y denotes the response variable observations (student
ratings). The matrix represents a vector of xed-e ects
parameters with a design matrix X. Z is a design
matrix of indicator variables denoting group membership across
random-e ect levels and is a vector containing
randome ect parameters. is a vector of error terms.</p>
    </sec>
    <sec id="sec-7">
      <title>3.3 Case Studies</title>
      <p>We show the results of using the ACID methodology to
answer three course design questions.
3.3.1 For introductory classes, which programming
language do students associate with clear
instruction?
Professors teaching introductory level courses in computer
science choose between a number of programming languages
and textbooks. We make use of the data collected to provide
insights into which programming languages beginning
students associate with clear instruction. We lter the data to
only include introductory level courses (one which does not
require any prerequisite coursework in computer science).
Our restricted sample includes 1,024 reviews; 34.58% of all
reviews with syllabus data are of introductory courses. We
explore the relationship between clarity ratings and
programming language with random professor and course
effects. Programming languages with less than 30 student
reviews are not reported4. Table 4 gives the estimates for
student ratings of clarity by programming language and their
associated p-values. An intercept is not modeled in order
to make the results easily interpretable. The mean clarity
rating for introductory courses is 3.599.</p>
      <p>We found C and C++ had the lowest coe cients (i.e.
compiled languages had the lowest perceived clarity ratings).
Scheme and Scratch have the highest clarity ratings followed
by Python and Java. We note that the standard errors are
largest for Scheme and Scratch and smallest for Java and
Python. This suggests that results for Java and Python
are stronger. Students in our sample associate clearer
instruction with interpreted languages rather than compiled
languages. Also, both Python and Java are associated with
clearer instruction than C or C++.
3.3.2 What mix of course activities – exams, quizzes,
homework and projects – do students associate
with clear instruction?
To assess students' course ratings of clarity based on the
percentage of the grade due to exams, quizzes, homework
and projects, we created a factor made up of four clusters
representing four ways of weighting homework, projects,
exams, quizzes and miscellaneous (such as extra credit) for
the students' grade. We begin by sorting the data to only
include observations in which the grading criteria
(percentage of the grade determined by homework, projects, exams,
quizzes and miscellaneous) is available and sums to 100. Of
the 2,935 observations with syllabus data, there are 2,225
observations with full grading criteria. The di erence in these
numbers represents 710 ratings for which the respondents
4SQL is a special purpose programming language used only
for relational databases and is not reported.
0
0
0
8
0
0
9
7
0
0
6
7
0
0
5
7
teonC 7800
iirr
n
o
it
fIrnoam 7700
0
0
7
3
−
0
5
7
3
−
0
0
8
3
d −
o
o
ilikheL− 8053−
g
o
L 0
0
9
3
−
0
5
9
3
−
0
0
0
4
−</p>
      <p>Optimizing the Number of Clusters
Bayesian Information Criterion</p>
      <p>Akaike Information Criterion
2
3
4
7
8</p>
      <p>9
5 6</p>
      <p>Number of Clusters
were not able to nd a complete grade breakdown from the
online syllabus.</p>
      <p>We use k-means clustering to partition the 2,225
observations with complete grading criteria information based on
the ve aforementioned variables. We optimize k, our
number of clusters, by examining how the BIC and AIC of the
mixture model change based on the number of clusters
selected. Figure 3 displays the information criterion and
Figure 4 displays the log-likelihood values for each number of
clusters respectively. A solution involving two clusters
minimizes the BIC of the model, whereas a four cluster solution
minimizes the AIC. The log likelihood is optimized with the
four cluster solution. We consider both two and four cluster
models as optimal and we nd that they lend themselves to
similar interpretation. The cluster means for the four cluster
solution are presented in table 5.</p>
      <p>The rst cluster represents courses that are heavily weighted
towards exams with a smaller weight towards homework.
The second cluster represents a more even weighting of
exams, homework, projects and quizzes. The third cluster
represents an equal weighting towards exams and projects. The
fourth cluster represents courses that are heavily weighted
towards exams and homework. The cluster membership is
treated as a predictor variable and modeled using equation
4. Table 6 displays the estimated clarity ratings within each
group for the four cluster solution.</p>
      <p>The exams and projects cluster has the highest estimate of
clarity. We nd that weighting projects equally with exams
is associated with a clearer course experience. The equal
mix cluster also is associated with higher clarity estimates.
The exam heavy cluster and the exam and homework heavy
clusters are associated with lower student clarity ratings. We
nd that a rubric that weights exams and projects evenly has
higher perceived clarity ratings to a rubric which is weighted
heavily towards exams and homework. This result extends
to both two and four cluster solutions.
3.3.3 Does the posting of a syllabus online translate
into higher ratings?
We hypothesize the posting of the syllabus online is a proxy
for organization, perhaps motivation or drive of the
professor. We make use of all of the data collected to compare
student reviews of professors who have a publicly available
syllabus and of those who do not. Many professors may choose
to only post a syllabus through course management systems
that require a password. Potential students of these courses
are unable to access the syllabus to determine whether the
course would be a good t. We treat the posting of an online
syllabus as a factor and test for di erences in clarity ratings
between the two groups using our model.</p>
      <p>
        We nd statistically signi cant di erences between clarity,
helpfulness and interest ratings and report the clarity
estimates for the two groups in Table 7. We note that the
di erence in easiness ratings is not statistically signi cant.
We nd evidence that students are more interested in
professors and courses in which the syllabus is made publicly
available. We note that the parameter estimates for the two
groups are within one standard error of one another which
suggests that the conclusions are modest.
Research has recently focused on online faculty ratings with
mixed conclusions. Felton et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] found that online
instructor ratings were associated with perceived easiness, and that
a \halo e ect" existed in which raters gave high scores to
instructors perhaps because their courses were easier. We nd
that student ratings of clarity and easiness are correlated
( =0.45) although not as strongly associated as clarity and
helpfulness. We do nd that student ratings of clarity and
helpfulness are highly correlated ( =0.84). We chose to
focus on clarity ratings as we assumed these were less
susceptible to a \halo e ect" and other bias relative to the overall
ratings of a course or professor. Otto et al [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] found issues
related to bias in online ratings stating that online ratings
are characterized by selection bias as anyone can enter
faculty ratings at any time. Carini et al [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Hardy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], McGhee
and Lowell [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] had contradictory results nding that an
online format did not lead to more biased ratings. Otto et
al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] hypothesized that instructor clarity and helpfulness
as captured by Rate My Professor are more positively
associated with student learning than easiness.
      </p>
      <p>
        Several approaches have been proposed to synthesize
responses using crowd sourcing systems such as Amazon's
Mechanical Turk. Majority voting is perhaps the simplest
way to combine crowd responses using equal weights
irrespective of respondent experience. The results of our
preliminary analysis in accessing the accuracy of non-Masters
level respondents correspond to the steep drop in
respondent accuracy noted by Karger [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] when low-quality
respondents are present. Whitehill et al [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] proposed a
probabilistic model for combining crowd responses called
Generative model of Labels, Abilities and Di culties (GLAD). The
GLAD methodology makes use of the EM algorithm to
calculate parameter estimates of unobserved variables
including an approximation of the expertise of the rater. Khattak
and Salleb-Aouissi compared the accuracy and percentage
of bad responses using majority voting, probabilistic
models, and their novel approach entitled Expert Label Injected
Crowd Estimation (ELICE) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. ELICE makes use of a few
\ground truth" responses and incorporates expertise of the
labeler, di culty of the instance and an aggregation of
labels. Khattak and Salleb-Aouissi found that their approach
was robust and outperformed GLAD and iterative methods
even when bad labelers were present. Our simple approach
was to use Masters level respondents from Mechanical Turk
although GLAD and ELICE are alternative methods to
reduce the number of expert level respondents required while
also obtaining high quality data.
      </p>
    </sec>
    <sec id="sec-8">
      <title>5. CONCLUSIONS, LIMITATIONS AND FU</title>
    </sec>
    <sec id="sec-9">
      <title>TURE WORK</title>
      <p>We demonstrate how the Analyzing CurrIculum Decisions
(ACID) methodology can be used to leverage collective
intelligence and learn student preferences. In introductory
computer science courses, we nd that students that are
taught interpreted languages nd their classes clearer. We
also that nd students who are given an even weighting of
exams and projects nd their classes clearer; and that
interest in a course corresponds to the availability of an
online syllabus. Our study does not necessarily suggest that
teachers should change their programming language.
Further research is needed before drawing causal inferences. We
argue that ACID is a bene cial tool to discover patterns in
student behavior. Syllabus data and course ratings data are
becoming increasingly available on the Web. This data is
used by millions of students and worthy of further research.
This study can be expanded in several ways. Student
evaluations often include free form text where students can
describe their experience in the course. Sentiment analysis is
a probabilistic approach for categorizing student comments
as being either positive or negative. One extension is to
regress text sentiment on course features. There is arguably
a strong association between comment sentiment and
student preference. Another way ACID can be applied is to
disciplines other than computer science, or to discover
patterns in syllabi across disciplines that can provide insight
into learner experiences.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Carini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hayek</surname>
          </string-name>
          , G. Kuh,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ouimet</surname>
          </string-name>
          .
          <article-title>College student responses to web and paper surveys: does mode matter? Research in Higher Education</article-title>
          ,
          <volume>44</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>19</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Corbett</surname>
          </string-name>
          .
          <article-title>Cognitive computer tutors: Solving the two-sigma problem</article-title>
          . In M. Bauer,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gmytrasiewicz</surname>
          </string-name>
          , and J. Vassileva, editors,
          <source>User Modeling</source>
          <year>2001</year>
          , volume
          <volume>2109</volume>
          of Lecture Notes in Computer Science, pages
          <volume>137</volume>
          {
          <fpage>147</fpage>
          . Springer Berlin Heidelberg,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Emanuel</surname>
          </string-name>
          .
          <article-title>Online education: Moocs taken by educated few</article-title>
          .
          <source>Nature</source>
          ,
          <volume>503</volume>
          (
          <issue>7476</issue>
          ):
          <volume>342</volume>
          {
          <fpage>342</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Felton</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          .
          <article-title>Web based student evaluations of professors: the relations between perceived quality, easiness and sexiness</article-title>
          .
          <source>Assessment and Evaluation in Higher Education</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <volume>91</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hardy</surname>
          </string-name>
          .
          <article-title>Online ratings: fact and ction. New Directions for Teaching and Learning</article-title>
          , (
          <volume>96</volume>
          ):
          <volume>31</volume>
          {
          <fpage>38</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hardy</surname>
          </string-name>
          .
          <article-title>Psychometric properties of student ratings of instruction in online and on-campus courses</article-title>
          .
          <source>New Directions for Teaching and Learning</source>
          ,
          <year>2003</year>
          (
          <volume>96</volume>
          ):
          <volume>39</volume>
          {
          <fpage>48</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Haungs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clements</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Janzen</surname>
          </string-name>
          .
          <article-title>Improving rst-year success and retention through internet-based cs0 courses</article-title>
          .
          <source>ACM SIGCSE</source>
          , pages
          <volume>549</volume>
          {
          <fpage>594</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jaggars</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Bailey</surname>
          </string-name>
          .
          <article-title>E ectiveness of fully online courses for college students: Response to a department of education meta-analysis</article-title>
          . Teachers College: Community College Research Center,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oh</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <article-title>Budget{optimal task allocation for reliable crowdsourcing systems</article-title>
          . CoRR, arXiv:
          <fpage>1110</fpage>
          .3564,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Khattak</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Salleb-Aouissi</surname>
          </string-name>
          .
          <article-title>Robust crowd labeling using little experience</article-title>
          .
          <source>Discovery Science</source>
          ,
          <volume>8140</volume>
          :
          <fpage>94</fpage>
          {
          <fpage>109</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Moretti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalez-Brenes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>McKnight</surname>
          </string-name>
          .
          <article-title>Towards data{driven curriculum design: Mining the web to make better teaching decisions</article-title>
          .
          <source>EDM</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Sanford</surname>
          </string-name>
          <string-name>
            <surname>Jr</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Ross</surname>
          </string-name>
          .
          <article-title>Does ratemyprofessor. com really rate my professor? Assessment &amp; Evaluation in Higher Education</article-title>
          ,
          <volume>33</volume>
          (
          <issue>4</issue>
          ):
          <volume>355</volume>
          {
          <fpage>368</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Sanford</surname>
          </string-name>
          <string-name>
            <surname>Jr</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Analysis of online student ratings of university faculty</article-title>
          .
          <source>Journal of College Teaching &amp; Learning</source>
          ,
          <volume>2</volume>
          (
          <issue>7</issue>
          ):
          <volume>25</volume>
          {
          <fpage>30</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] Shanghai. Academic ranking of world universities. Retrieved from http://www.shanghairanking.com/, Accessed at
          <year>2013</year>
          12
          <fpage>01</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>J.</given-names>
            <surname>Whitehill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ruvolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergsma</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Movellan</surname>
          </string-name>
          .
          <article-title>Whose vote should count more: Optimal integration of labels from labelers of unknown expertise</article-title>
          .
          <source>Neural Information Processing Systems</source>
          , pages
          <year>2035</year>
          {
          <year>2043</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zelle</surname>
          </string-name>
          .
          <article-title>Python as a rst language</article-title>
          . Retrieved from http://mcsp.wartburg.edu/zelle/python/pythonrst.html/, Accessed at
          <year>2014</year>
          02
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <source>APPENDIX A. SAMPLE OF UNIVERSITIES SELECTED Country n Professors n Courses n Reviews Colorado State USA 1 9 32 Carnegie Mellon University USA 3 21 102 North Carolina State USA 2 10 63 Pennsylvania State USA 12 74 938 Rensselaer Polytechnic Institute USA 3 22 131 Rutgers USA 8 30 468 Simon Fraser Canada 27 98 1873 SUNY Stony Brook USA 8 55 505 UC Davis USA 10 44 589 UNC Chapel Hill USA 1 4 49 University of Alberta Canada</source>
          <volume>2</volume>
          6 69 University of Arizona USA 3 13 158 University of Delaware USA 15 56 806 University of Florida Gainsville USA 5 36 321 University of Illinois at Urbana USA 5 14 339 University of Massachusetts USA 6 39 405 University of Montreal USA 1 6 59 University of Toronto Canada 14 66 775 University of Utah USA 2 17 66 University of Virginia USA 3 19 131 University of Waterloo Canada 46 125 2700 Vanderbilt University USA 2
          <fpage>10</fpage>
          <lpage>76</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>