=Paper= {{Paper |id=Vol-1446/GEDM_2015_Submission_2 |storemode=property |title=Communities of Performance & Communities of Preference |pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_2.pdf |volume=Vol-1446 |dblpUrl=https://dblp.org/rec/conf/edm/BrownLWEABBBM15 }} ==Communities of Performance & Communities of Preference== https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_2.pdf

Communities of Performance
& Communities of Preference

Rebecca Brown Collin Lynch Yuan Wang
North Carolina State North Carolina State Teachers College, Columbia
University University University
Raleigh, NC Raleigh, NC New York, NY
rabrown7@ncsu.edu cflynch@ncsu.edu elle.wang@columbia.edu
Michael Eagle Jennifer Albert Tiffany Barnes
North Carolina State University North Carolina State North Carolina State
Raleigh, NC University University
mjeagle@ncsu.edu Raleigh, NC Raleigh, NC
jennifer_albert@ncsu.edu tmbarnes@ncsu.edu
Ryan Baker Yoav Bergner Danielle McNamara
Teachers College, Columbia Educational Testing Service Arizona State University
University Princeton, NJ Phoenix, AZ
New York, NY ybergner@gmail.com dsmcnamara1@gmail.com
ryanshaunbaker@gmail.com

ABSTRACT the performance of weaker ones. It has not yet been shown,
The current generation of Massive Open Online Courses (MOOCs) however, that this type of support occurs in practice.
operate under the assumption that good students will help poor
students, thus alleviating the burden on instructors and Teaching Prior research on social networks has shown that social groups,
Assistants (TAs) of having thousands of students to teach. In even those that gather face-to-face, can fragment into disjoint
practice, this may not be the case. In this paper, we examine so- sub-communities [37]. This small-group separation, if it takes
cial network graphs drawn from forum interactions in a MOOC place in an online course, can be considered negative or positive,
to identify natural student communities and characterize them depending on one’s perspective. If poor students communi-
based on student performance and stated preferences. We exam- cate only with similarly-floundering peers, then they run the
ine the community structure of the entire course, students only, risk of perpetuating misunderstandings and of missing insights
and students minus low performers and hubs. The presence of discussed by better-performing peers and teaching staff. An
these communities and the fact that they are homogeneous with instructor may wish to avoid this fragmentation to encourage
respect to grade but not motivations has important implications poor students to connect with better ones.
for planning in MOOCs.
These enduring subgroups may be beneficial, however, by help-
Keywords ing students to form enduring supportive relationships. Research
MOOC, social network, online forum, community detection by Li et al. has shown that such enduring relationships can
enhance students’ social commitment to a course [18]. We be-
lieve that this social commitment will in turn help to reduce
1. INTRODUCTION feelings of isolation and alienation among students in a course.
The current generation of Massive Open Online Courses (MOOCs) Eckles and Stradley [9] have shown that such isolation is a key
is designed to leverage student interactions to augment instruc- predictor of student dropout.
tor guidance. The activity in courses on sites such as Coursera
and edX is centered around user forums that, while curated We have previously shown that students can form stable com-
and updated by instructors and TAs, are primarily constructed munities and that those communities are homogeneous with
by students. When planning and building these courses, it is respect to performance [3]. However that work did not: show
hoped that students will help one another through the course whether these results are consistent with prior work on imme-
and that interacting with stronger students will help to improve diate peer relationships; address the impact of hub students on
these results; or discuss whether students’ varying goals and
preferences motivate the community structure. Our goal in this
paper is to build upon our prior work by addressing these issues.
In the remainder of this paper we will survey prior educational
literature on community formation in traditional and online
classrooms. We will then build upon our prior work by exam-
ining the impact of hub users. And we will look at the impact
of user motivations on community formation.
2. RELATED WORK 2.2 Communities, Hubs, & Peers
Kovanovic et al. [15] examined the relationship between social
2.1 MOOCs, Forums, & Student Performance network position or centrality, and social capital formation in
A survey of the literature on MOOCs shows the beginnings of a courses. Their work is specifically informed by the Community
research base generating an abundance of data that has not yet of Inquiry (COI) framework. the COI framework is focused on
been completely analyzed [19]. According to Seaton et al. [29], distance education and is particularly suited to online courses of
most of the time students spend on a MOOC is spent in dis- the type that we study here. The model views course behavior
cussion forums, making them a rich and important data source. through three presences which mediate performance: cognitive,
Stahl et al. [30] illustrates how through this online interaction teaching, and social.
students collaborate to create knowledge. Thus students’ forum
activity is good not only for the individual student posting con- This social presence considers the nature and persistence of
tent or receiving answers, but for the class as a whole. Huang et student interactions and the extent to which they reinforce stu-
al. [14] investigated the behavior of the highest-volume posters dents’ behaviors. In their analysis, the authors sought to test
in 44 MOOC-related forums. These “superposters” tended to whether network relationships, specifically students’ centrality
enroll in more courses and do better in those courses than the in their social graph, is related to their social performance as
average. Their activity also added to the overall volume of forum measured by the nature and type of their interactions. To that
content and they left fewer questions unanswered in the forums. end, they examined a set of course logs taken from a series of
Huang et al. also found that these superposters did not suppress online courses offered within a public university. They found
the activity of less-active users. Rienties et al. [25] examined the that students’ position within their social graph was positively
way in which user interaction in MOOCs is structured. They correlated with the nature and type of their interactions, thus
found that allowing students to self-select collaborators is more indicating that central players also engaged in more useful social
conducive to learning than randomly assigning partners. Further, interactions. They did not extend this work to groups, however,
Van Dijk et al. [31] found that simple peer instruction is signif- focusing solely on individual hub students.
icantly less effective in the absence of a group discussion step,
pointing again to the importance of a class discussion forum. Other authors have also examined the relationship between
network centrality, neighbor relationships, network density, and
More recently Rosé et al. [27] examined students’ evolving inter- student performance factors. Eckles and Stradley [9] applied
actions in MOOCs using a Mixed-Membership Stochastic Block network analysis to student attrition, finding that students with
model which seeks to detect partially overlapping communities. strong social relationships with other students who drop out
They found that the likelihood that students would drop out are significantly more likely to drop out themselves. Rizzuto
of the course is strongly correlated with their community mem- et al. [26] studied the impact of social network density on stu-
bership. Students who actively participated in forums early in dent performance. Network density is defined as the fraction
the course were less likely to drop out later. Furthermore, they of possible edges that are present in a given graph. Thus it
found one forum sub-community that was much more prone is a measure of how “clique-like” the graph is. The authors
to dropout than the rest of the class, suggesting that MOOC examined self-reported social networks for students in a large
communities are made up of students who behave in similar traditional undergraduate psychology course. They found that
ways. This community can in turn reflect or impact a student’s denser social networks were significantly correlated with per-
level of motivation and their overall experience in a course much formance. However, a dominance analysis [1] showed that this
like the “emotional contagion” model used in the Facebook mood factor was less predictive than pure academic ability. These re-
manipulation study by Kramer, Guillroy, and Hancock [16]. sults serve to motivate a focus on the role of social relationships
in student behavior. Their analysis is complicated, however, by
Yang et al. [36] also notes that unlike traditional courses stu- their reliance on self-report data which will skew the strength
dents can join MOOCs at different times and observed that and recency of the reported relationships.
students who join a course early are more likely to be active
and connected in the forums, and less likely to drop out, than Fire et al. [11] studied student interaction in traditional class-
those who join later. MOOCs also attract users with a range of rooms, constructing a social network based on cooperation on
individual motivations. In a standard classroom setting students class assignments. Students were linked based on partnership on
are constrained by availability, convention, and goals. Few stu- group work as well as inferred cooperation based on assignment
dents enroll in a traditional course without seeking to complete submission times and IP addresses. The authors found that a
it and to get formal credit for doing so. MOOCs by virtue of student’s grade was significantly correlated with the grade of
their openness and flexibility attract a wide range of students the student with the strongest links to that student in the social
with unique personal motivations [10]. Some join the course network. We perform similar analysis in this paper to examine
with the intent of completing it. Others may seek only to brush whether the same correlation exists in MOOCs.
up on existing knowledge, obtain specific skills, or just watch
the videos. These distinct motivations in turn lend themselves Online student interaction in blended courses has also been
to different in-class behaviors including assignment viewing and linked to course performance. Dawson [8] extracted student
forum access. The impact of user motivations in online courses and instructor social networks from a blended course’s online
has been previously discussed by Wang et al. [32, 33]; we will discussion forums and found that students in the 90th grade
build upon that work here. Thus it is an open question whether percentile had larger social networks than those in the 10th
these motivations affect students’ community behaviors or not. percentile. The study also found that high-performing students
primarily associated with other high-performing students and
were more likely to be connected to the course instructor, while
low-performing students tended to associate with other low-
performers. In a blended course, this effect may be offset by the same material as a graduate-level course, Core Methods
face-to-face interaction not captured in the online social network, in Educational Data Mining, at Teachers College Columbia
but if the same separation happens in MOOC communities, low- University. The MOOC spanned from October 24, 2013 to
performing students are less likely to have other chances to learn December 26, 2013. The weekly course was composed of lecture
from high-performing ones. videos and 8 weekly assignments. Most of the videos contained
in-video quizzes (that did not count toward the final grade).
2.3 Community Detection
One of the primary activities students engage in on forums All of the weekly assignments were structured as numeric input
is question answering. Zhang et al. [38] conducted a social or multiple-choice questions. The assignments were graded au-
network analysis on an online question-and-answer forum about tomatically. In each assignment, students were asked to conduct
Java programming. Using vertex in-degree and out-degree, they analyses on a data set provided to them and answer questions
were able to identify a relatively small number of active users about it. In order to receive a grade, students had to com-
who answered many questions. This allowed the researchers to plete this assignment within two weeks of its release with up
develop various algorithms for calculating a user’s Java expertise. to three attempts for each assignment, and the best score out
Dedicated question-and-answer forums are more structured than of the three attempts was counted. The course had a total
MOOC forums, with question and answer posts identified, but a enrollment of over 48,000, but a much smaller number actively
similar approach might help identify which students in a MOOC participated. 13,314 students watched at least one video, 1,242
ask or answer the most questions. students watched all the videos, 1,380 students completed at
least one assignment,and 778 made a post or comment in the
Choo et al. [5] studied community detection in Amazon product- weekly discussion sections. Of those with posts, 426 completed
review forums. Based on which users replied to each other most at least one class assignment. 638 students completed the online
often, they found communities of book and movie reviewers who course and received a certificate (meaning that some students
had similar tastes in these products. As in MOOC forums, users could earn a certificate without participating in forums at all).
did not declare any explicit social relationships represented in the
system, but they could still be grouped by implicit connections. In addition to the weekly assignments the students were sent
a survey that was designed to assess their personal motivations
In the context of complex networks, a community structure is a for enrolling in the course. This survey consisted of 3 sets
subgraph which is more densely connected internally than it is to of questions: MOOC-specific motivational items; two PALS
the rest of the network. We chose to apply the Girvan-Newman (Patterns of Adaptive Learning Survey) sub-scales [21], Aca-
edge-betweenness algorithm (GN) [13]. This algorithm takes as demic Efficacy and Mastery-Goal Orientation; and an item
input a weighted graph and a target number of communities. focused on confidence in course completion. It was distributed
It then ranks the edges in the graph by their edge-betweenness to students through the course’s E-mail messaging system to
value and removes the highest ranking edge. To calculate Edge- students who enrolled in the course prior to the official start
betweenness we identify the shortest path p(a,b) between each date. Data on whether participants successfully completed the
pair of nodes a and b in the graph. The edge-betweenness course was downloaded from the same course system after the
of an arc is defined as the number of shortest paths that it course concluded. The survey received 2,792 responses; 38% of
participates in. This is one of the centrality measures explored the participants were female and 62% of the participants were
by Kovanovic et al. above [15]. The algorithm then recalcu- male. All of the respondents were over 18 years of age.
lates the edge-betweenness values and iterates until the desired
number of disjoint community subgraphs has been produced. The MOOC-specific items consisted of 10 questions drawn from
Thus the algorithm operates by iteratively finding and removing previous MOOC research studies (cf. [2, 22]) asking respondents
the highest-value communications channel between communities to rate their reasons for enrollment. These 10 items address
until the graph is fully segmented. For this analysis, we used traits of MOOCs as a novel online learning platform. Specifically,
the iGraph library [7] implementation of G-N within R [24]. these 10 items included questions on both the learning content
and features of MOOCs as a new platform. Two PALS Survey
The strength of a candidate community can be estimated by scales [21] measuring mastery-goal orientation and academic
modularity. The modularity score of a given subgraph is defined efficacy were used to study standard motivational constructs.
as a ratio of its intra-connectedness (edges within the subgraph) PALS scales have been widely used to investigate the relation
to the inter-connectedness with the rest of the graph minus the between a learning environment and a student’s motivation (cf.
fraction of such edges expected if they were distributed at ran- [6, 20, 28]). Altogether ten items with five under each scale
dom [13, 35]. A graph with a high modularity score represents were included. The participants were asked to select a number
a dense sub-community within the graph. from 1 to 5 with 1 meaning least relevant and 5 most relevant.
Respondents were also asked to self-rate their confidence on a
3. DATA SET scale of 1 to 10 as to whether they could complete the course
according to the pace set by the course instructor. All three
This study used data collected from the “Big Data in Education”
groups of items were domain-general.
MOOC hosted on the Coursera platform as one of the inaugural
courses offered by Columbia University [32]. It was created in
response to the increasing interest in the learning sciences and 4. METHODS
educational technology communities in using EDM methods For our analysis, we extracted a social network from the online
with fine-grained log data. The overall goal of this course was forum associated with the course. We assigned a node to each
to enable students to apply each method to answer education student, instructor, or TA in the course who added to it. Nodes
research questions and to drive intervention and improvement in representing students were labeled with their final course grade
educational software and systems. The course covered roughly out of 100 points. The Coursera forums operate as standard
threaded forums. Course participants could start a new thread the network would only share edges with vertices of different
with an initial post, add a post to an existing thread, and add scores. Thus grade assortativity allows us to measure whether
a comment or child element below an existing post. We added individuals are not just connected directly to individuals with
a directed edge from the author of each post or comment to the similar scores but whether they correlate with individuals who
parent post and to all posts or comments that preceded it on are one step removed.
the thread based upon their timestamp. We made a conscious
decision to omit the textual content of the replies with the goal Several commonly studied classes of networks tend to have pat-
of isolating the impact of the structure alone. terns in their assortativity. Social networks tend to have high
assortativity, while biological and technological networks tend
We thus treat each reply or followup in the graph as an implicit to have negative values (dissortativity) [23]. In a homogeneous
social connection and thus a possible relationship. Such implicit course or one where students only form stratified communities
social relationships have been explored in the context of recom- we would expect the assortativity to be very high while in a het-
mender systems to detect strong communities of researchers [5]. erogeneous class with no distinct communities we would expect
This is, by design, a permissive definition that is based upon it to be quite low.
the assumption that individuals generally add to a thread after
viewing the prior content within it and that individual threads 4.2 Community Detection
can be treated as group conversations with each reply being a The process of community detection we employed is briefly de-
conscious statement for everyone who has already spoken. The scribed here [3]. As noted there we elected to ignore the edge
resulting network forms a multigraph with each edge represent- direction when making our graph. Our goal in doing so was to
ing a single implicit social interaction. We removed self loops focus on communities of learners who shared the same threads,
from this graph as they indicate general forum activity but even when they were not directly replying to one-another. We
not any meaningful interaction with another person. We also believe this to be a reasonable assumption given the role of class
removed vertices with a degree of 0, and collapsed the parallel forums as a knowledge-building environment in which students
edges to form a simple weighted graph for analysis. exchange information with the group. Individuals who partic-
ipate in a thread generally review prior posts before submitting
In the analyses below we will focus on isolating student perfor- their contribution and are likely to return to view the followups.
mance and assessing the impact of the faculty and hub students. Homogeneity in this context would mean that students gathered
We will therefore consider four classes of graphs: ALL the com- and communicated primarily with equally-performing peers and
plete graph; Student the graph with the instructor and TAs thus that they did not consistently draw from better-performing
removed; NoHub the graph with the instructor and hub users re- classmates and help lower-performing ones or that the at-will
moved; and Survey which includes only students who completed communities served to homogenize performance, with the stu-
the motivation survey. We will also consider versions of the above dents in a given cluster evening out over time.
graphs without students who obtained a score of 0, and without
the isolated individuals who connect with at most one other While algorithms such as GN are useful for finding clusters they
person. As we will discuss below, a number of students received do not, in and of themselves, determine the right number of
a zero grade in the course. Because this is an at-will course, how- communities. Rather, when given a target number they will seek
ever, we cannot readily determine why these scores were obtained. to identify the best possible set of communities. In some imple-
They may reflect a lack of engagement with the course, differen- mentations the algorithm can be applied to iteratively select the
tial motivations for taking the course, a desire to see the course maximum modularity value over a possible range. Determining
materials without assignments, or genuinely poor performance. the correct number of communities to detect, however, is a
non-trivial task especially in large and densely connected graphs
4.1 Best-Friend Regression & Assortativity where changes to smaller communities will have comparatively
Fire et al. [11] applied a similar social network approach to small effects on the global modularity score. As a consequence
traditional classrooms and found a correlation between a stu- we cannot simply optimize for the best modularity score as we
dent’s most highly connected neighbor (”best friend”) and the would risk missing small but important communities [12].
student’s grade. The links in that graph included cooperation
on assignments as well as partnership on group assignments. Therefore, rather than select the clusterings based solely on
To examine whether the same correlation existed in a massive the highest modularity, we have opted to estimate the correct
online course in which students were less likely to know each number of clusters visually. To that end we plotted a series of
other beforehand and there were no group assignments, we modularity curves over the set of graphs. For each graph G we
calculated each student’s best friend in the same manner and applied the GN algorithm iteratively to produce all clusters in
performed a similar correlation. the range (2,|GN |). For each clustering, we then calculated the
global modularity score. We examined the resulting scores to
The simple best friends analysis gives a straightforward mech- identify a crest where the modularity gain leveled off or began to
anism for correlating individual students. However it is also decrease thus indicating that future subdivisions added no mean-
worthwhile to ask about students who are one-step removed ingful information or created schisms in existing high-quality
from their peers. Therefore we will also calculate the grade communities. This is a necessarily heuristic process that is sim-
assortativity (rG ) of the graphs. Assortativity describes the cor- ilar to the use of Scree plots in Exploratory Factor Analysis [4].
relation of values between vertices and their neighbors [23]. The We define the number identified as the natural cluster number.
assortativity metric r ranges between -1 and 1, and is essentially
the Pearson correlation between vertex and their neighbors [23]. 5. RESULTS AND DISCUSSION
A network with r =1 would have each vertex only sharing edges Before removing self-loops and collapsing the edges, the network
with vertices of the same score. Likewise, if r =−1 vertices in contained 754 nodes and 49,896 edges. The final social network
contained 754 nodes and 17,004 edges. 751 of the participants
were students, with 1 instructor and 2 TAs. One individual was
incorrectly labeled as a student when they were acting as the
Chief Community TA. Since this person’s posts clearly indicated
that he or she was acting in a TA capacity with regard to the
forums, we relabeled him/her as a TA. Of the 751 students 304
obtained a zero grade in the course leaving 447 nonzero students.
215 of the 751 students responded to the motivation survey.

There were a total of 55,179 registered users, so the set of 754
forum participants is a small fraction of the entire course audi-
ence. However, forum users are not necessarily those who will
make an effort or succeed in the course. Forum users did not all
participate in the course, and some students who participated in
the course did not use the forums: 1,381 students in the course
got a grade greater than 0, and 934 of those did not post or
comment on the forums, while 304 of the 751 students who did Figure 1: Modularity for each number of clusters,
participate in the forums received a grade of 0. Clearly students including students with zeros.
who go to the trouble of posting forum content are in some
respect making an effort in the course beyond those who don’t,
but this does not necessarily correspond to course success.

5.1 Best-Friend Regression & Assortativity
We followed Fire et al.’s methodology for identifying Best Friends
in a weighted graph and calculated a simple linear regression
over the pairs. This correlation did not include the instructor or
TAs in the analysis. We calculated the correlation between the
students’ grades to their best friends’ grades in the set using
Spearman’s Rank Correlation Coefficient (ρ) [34]. The two vari-
ables were strongly correlated, ρ(748)=0.44, p<0.001. However,
the correlation was also affected by the dense clusters of students
with 0 grades. After removing the 0 grade students we found
an additional moderate correlation, ρ(444)=0.29, p<0.001.

Thus the significant correlation between best-friend grade and
grade holds over the transition from the traditional classroom to
Figure 2: Modularity for each number of clusters,
a MOOC. This suggests that students in a MOOC, excluding the
excluding students with zeros.
many who drop out or do not submit assignments, behave sim-
ilarly to those in a traditional classroom in this respect. These
results are also consistent with our calculations for assortativity. 5.2 Community Structure
There we found a small assortative trend for the grades as shown The modularity curves for the graphs both with and without
in Table 1. These values reflect that a student was frequently zero-score students are shown in Figures 1 and 2. We exam-
communicating with students who in turn communicated with ined these plots to select the natural cluster numbers which are
students at a similar performance level. This in turn supports our shown in Table 2. As the values illustrate the instructor, TAs,
belief that homogeneous communities may be found. As Table and hub students have a disproportionate impact on the graph
1 also illustrates, the zero-score students contribute substan- structure. The largest hub student in our graph connects to
tially to the assortativity correlation as well with the correlation 444 out of 447 students in the network. The graph with all
dropping by as much as a third when they were removed. users had lower modularity and required more clusters than the
graphs with only students or only non-hubs (see Table 2), with

Table 2: Graph sizes and natural number of clusters
for each graph.
Table 1: The grade assortativity for each network. Users Zeros V E Clusters
Users Zeros V E rG
All Yes 754 17004 212
All Yes 754 17004 0.29 All No 447 5678 173
All No 447 5678 0.20 Students Yes 751 15989 184
Students Yes 751 15989 0.32 Students No 447 5678 169
Students No 447 5678 0.20 Non-Hub Yes 716 9441 79
Non-Hub Yes 716 9441 0.37 Non-Hub No 422 3119 52
Non-Hub No 422 3119 0.24 Survey Yes 215 1679 58
Figure 3: View of the student communities with edges of frequency <2 removed. The Student network with (left)
and without (right) hub-students, with each vertex representing a student and grade represented as color.

the non-hub graph having the highest modularity. This suggests
that non-hub students formed more isolated communities, while Table 3: Grade statistics by community, selected
teaching staff and hubs communicated across these communities to show examples of more and less homogeneous
and connected them. communities.
Members Average Grade Standard Deviation
This largely consistent with the intent of the forums and the 118 21.62 36.58
active role played by the instructor and TAs in monitoring and 41 22.00 32.45
replying to all relevant posts in the forums. It is particularly in- 34 25.41 40.44
teresting how closely the curves for the ALL and Student graphs 31 56.13 47.69
mirror one another. This may indicate that the hub students are 20 49.05 45.64
also those that followed the instructor and TAs closely, thus giv- 16 12.44 31.13
ing them isomorphic relationships, or it may indicate that they 14 88.43 22.47
are more connected than even the instructors and thus came to 12 96.08 6.36
bind the forums together on their own. This impact is further 11 96.45 7.38
illustrated by the cluster plots shown in Figure 3. Here the ab- 4 3.00 6.00
sence of the hub students results in a noticeable thinning of the 4 8.50 9.81
graph which in turn highlights the frequency of communication 4 4.25 8.50
that can be attributed to this, comparatively small, group. 4 96.25 3.50

The difference between the full plots and those with zero values
are also notable as the zero grade students were clearly a major
standard deviation for a small selection of the communities in
factor in community formation. A direct examination of the
the ALL reply network including zero-grades, hub students,
user graph showed that many of the zero students were only
and teaching staff. Several of the communities, particularly
connected to other zero students or were not connected at all.
the larger ones, do show a blend of good and poor students,
This is also highlighted in Figure 3. In both graphs the bulk of
with a high standard deviation. However many if not most of
the zero score students are clustered in a tight network of com-
the communities are more homogeneous with good and poor
munities on the left-hand side. That super-community consists
students sharing a community with similarly-performing peers.
primarily of zero score students communicating with other zero-
These clusters have markedly lower standard deviation.
score students, a structure we have nick-named the ‘deathball.’
An examination of the grade distribution for each of the clusters
5.3 Student Performance & Motivation showed that the scores within each cluster were non-normal.
As the color coding in Figure 3 illustrates, the students did Therefore we opted to apply the Kruskal-Wallis (KW) test to
cluster by performance. Table 3 shows the average grade and assess the correlation between cluster membership and perfor-
We also found that community membership was not a significant
Table 4: Kruskal-Wallis test of student grade by predictor of whether students would complete the motivation
community, for each graph. survey or of students’ motivations. We were surprised by the
Users Zeros Chi-Squared df p-value fact that even when we focused solely on individuals who had
All Yes 349.0273 211 < 0.005 completed the survey, the students did not connect by stated
All No 216.1534 172 < 0.02 goals. This suggests to us that the students are more likely
Students Yes 202.0814 78 < 0.005 coalescing around the pragmatic needs of the class or conceptual
Students No 80.93076 51 < 0.005 challenges rather than on the winding paths that brought them
Non-Hub Yes 309.8525 183 < 0.005 there. One limitation of this work is that by relying on the
Non-Hub No 218.9603 168 < 0.01 forum data we were focused solely on the comparatively small
Survey Yes 99.99840 577 < 0.005 proportion of enrolled students (6%) who actively participated
in the forums. This group is, by definition a smaller set of more
actively-involved participants.
mance. The KW test is a nonparametric rank-based analogue
In addition to addressing our primary questions this study also
to the common Analysis of Variance [17]. Here we tested grade
raised a number of open issues for further exploration. Firstly,
by community number with the community being treated as a
this work focused solely on the final course structure, grades, and
categorical variable. The results of this comparison are shown
motivations. We have not yet addressed whether these commu-
in Table 4. As that illustrates, cluster membership was a sig-
nities are stable over time or how they might change as students
nificant predictor of student performance for all of the graphs
drop in our out. Secondly, while we ruled out motivations as a
with the non-zero graphs having markedly lower p-values than
basis for the community this work we were not able to identify
those with zero students included. These results are consistent
what mechanisms do support the communities. And finally this
with our hypothesis that students would form clusters of equal-
study raises the question of generality and whether or not these
performers and we find that those results hold even when the
results can be applied to MOOCs offered on different topics or
highly-connected instructors, TAs and hub students are included.
whether the results apply to traditional and blended courses.
We performed a similar KW analysis for the questions on the
In subsequent studies we plan to examine both the evolution of
motivation survey and for a binary variable indicating whether
the networks over time as well as additional demographic data
or not the student completed the survey at all. For this analysis
with the goal of assessing both the stability of these networks
we evaluated the clusters on all of the graphs. We found no
and the role of other potential latent factors. We will also
significant relationship between the community structure on
examine other potential clustering mechanisms that control for
any of the graphs and the survey question results or the survey
other user features such as frequency of involvement and thread
completion variable. Thus while the clusters may be driven by
structure. We also plan to examine other similar datasets to
separate factors they are not reflected in the survey content.
determine if these features transition across classes and class
types. We believe that these results may change somewhat once
6. CONCLUSIONS AND FUTURE WORK students can coordinate face to face far more easily than online.
Our goal in this paper was to expand upon our prior community
detection work with the goal of aligning that work with prior
research on peer impacts, notably the work of Fire et al. [11]. 7. ACKNOWLEDGMENTS
We also sought to examine the impact of hub students and This work was supported by NSF grant #1418269: “Modeling
student motivations on our prior results. Social Interaction & Performance in STEM Learning” Yoav
Bergner, Ryan Baker, Danielle S. McNamara, & Tiffany Barnes
To that end we performed a novel community clustering analysis Co-PIs.
of student performance data and forum communications taken
from a single well-structured MOOC. As part of this analysis we 8. REFERENCES
described a novel heuristic method for selecting natural numbers [1] R. Azen and D. Budescu. The dominance
of clusters, and replicated the results of prior studies of both analysis approach for comparing predictors in multiple
immediate neighbors and second-order assortativity. regression. Psychological Methods, 8(2):129–48, 2003.
[2] Y. Belanger and J. Thornton.
Consistent with prior work, we found that students’ grades Bioelectricity: A quantitative approach Duke University’s
were significantly correlated with their most closely associated first MOOC. Journal of Learning Analytics, 2013.
peers in the new networks. We also found that this correlation [3] R. Brown, C. F. Lynch, M. Eagle, J. Albert, T. Barnes,
extended out to their second-order neighborhood. This is consis- R. Baker, Y. Bergner, and D. McNamara. Good
tent with our prior work showing that students form stable user communities and bad communities: Does membership
communities that are homogeneous by performance. We found affect performance? In C. Romero and M. Pechenizkiy,
that those results were stable even if instructors, hub players, editors, Proceedings of the 8th International
students with 0 scores, and students who did not fill out the sur- Conference on Educational Data Mining, 2015. submitted.
vey were removed from consideration. This suggests that either
[4] R. B. Cattell. The scree test for the number of factors.
the students are forming communities that are homogeneous or
Multivariate Behavioral Research, 1(2):245–276, 1966.
that the effect of those individual and network features on the
communities and on performance is minimal. [5] E. Choo, T. Yu, M. Chi, and
Y. Sun. Revealing and incorporating implicit communities
to improve recommender systems. In M. Babaioff,
V. Conitzer, and D. Easley, editors, ACM Conference
on Economics and Computation, EC ’14, Stanford, structure, student motivation, and academic achievement.
CA, USA, June 8-12, 2014, pages 489–506. ACM, 2014. Annual Review of Psychology, 57:487–503, 2006.
[6] K. Clayton, F. Blumberg, [21] C. Midgley, M. L.
and D. P. Auld. The relationship between motivation Maehr, L. Hruda, E. Anderinan, L. Anderman, and K. E.
learning strategies and choice of environment whether Freeman. Manual for the Patterns of Adaptive Learning
traditional or including an online component. British Scales (PALS). University of Michigan, Ann Arbor, 2000.
Journal of Educational Technology, 41(3):349–364, 2010. [22] MOOC @ Edinburgh 2013. MOOC @ Edinburgh
[7] G. Csardi and T. Nepusz. 2013 - report #1. Journal of Learning Analytics, 2013.
The igraph software package for complex network [23] M. E. Newman. Assortative Mixing in Networks.
research. InterJournal, Complex Systems:1695, 2006. Physical Review Letters, 89(20):208701, Oct. 2002.
[8] S. Dawson. ’Seeing’ the learning [24] R Core Team. R: A Language
community: An exploration of the development of a and Environment for Statistical Computing. R Foundation
resource for monitoring online student networking. British for Statistical Computing, Vienna, Austria, 2012.
Journal of Educational Technology, 41(5):736–752, 2010. [25] B. Rienties, P. Alcott, and D. Jindal-Snape.
[9] J. Eckles and E. Stradley. A To let students self-select or not: That is the
social network analysis of student retention using archival question for teachers of culturally diverse groups. Journal
data. Social Psychology of Education, 15(2):165–180, 2012. of Studies in International Education, 18(1):64–83, 2014.
[10] A. Fini. The technological [26] T. Rizzuto, J. LeDoux, and J. Hatala. It’s not just what you
dimension of a massive open online course: The know, it’s who you know: Testing a model of the relative
case of the CCK08 course tools. The International Review importance of social networks to academic performance.
Of Research In Open And Distance Learning, 10(5), 2009. Social Psychology of Education, 12(2):175–189, 2009.
[11] M. Fire, [27] C. P. Rosé, R. Carlson, D. Yang, M. Wen, L. Resnick,
G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting P. Goldman, and J. Sherer. Social factors that contribute to
student exam’s scores by analyzing social network data. In attrition in MOOCs. In Proc. of the first ACM conference
Active Media Technology, pages 584–595. Springer, 2012. on Learning@ scale conference, pages 197–198. ACM, 2014.
[12] S. Fortunato and M. Barthélemy. [28] A. M. Ryan and H. Patrick. The classroom
Resolution limit in community detection. Proc. social environment and changes in adolescents’ motivation
of the National Academy of Sciences, 104(1):36–41, 2007. and engagement during middle school. American
[13] M. Girvan and M. E. J. Newman. Community structure Educational Research Journal, 38(2):437–460, 2001.
in social and biological networks. Proc. of the National [29] D. Seaton, Y. Bergner, I. Chuang, P. Mitros, and
Academy of Sciences, 99(12):7821–7826, June 2002. D. Pritchard. Who does what in a massive open online
[14] J. Huang, A. Dasgupta, A. Ghosh, course? Communications of the ACM, 57(4):58–65, 2014.
J. Manning, and M. Sanders. Superposter behavior [30] G. Stahl, T. Koschmann,
in MOOC forums. In Proc. of the first ACM conference and D. Suthers. Computer-supported collaborative
on Learning@ scale conference, pages 117–126. ACM, 2014. learning: An historical perspective. Cambridge
[15] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala. handbook of the learning sciences, 2006:409–426, 2006.
What is the source of social capital? the association [31] L. Van Dijk, G. Van Der Berg, and H. Van Keulen.
between social network position and social presence in Interactive lectures in engineering education. European
communities of inquiry. In S. G. Santos and O. C. Santos, Journal of Engineering Education, 26(1):15–28, 2001.
editors, Proc. of the Workshops held at Educational [32] Y. Wang and R. Baker. Content or platform:
Data Mining 2014, co-located with 7th International Why do students complete MOOCs? MERLOT Journal
Conference on Educational Data Mining (EDM of Online Learning and Teaching, 11(1):191–218, 2015.
2014), London, United Kingdom, July 4-7, 2014., volume [33] Y. Wang, L. Paquette, and R. Baker.
1183 of CEUR Workshop Proc. CEUR-WS.org, 2014. A longitudinal study on learner career advancement
[16] A. D. I. Kramer, J. E. Guillory, in MOOCs. Journal of Learning Analytics, 1(3), 2014.
and J. T. Hancock. Experimental evidence of massive-scale [34] Wikipedia. Spearman’s
emotional contagion through social networks. Proc. of the rank correlation coefficient — Wikipedia, the free
National Academy of Sciences, 111(24):8788–8790, 2014. encyclopedia, 2013. [Online; accessed 27-February-2013].
[17] W. H. Kruskal and W. A. Wallis. Use [35] Wikipedia. Modularity (networks) — Wikipedia, the free
of ranks in one-criterion variance analysis. Journal of the encyclopedia, 2014. [Online; accessed 5-February-2015].
American statistical Association, 47(260):583–621, 1952. [36] D. Yang, T. Sinha, D. Adamson, and C. P. Rose. Turn on,
[18] N. Li, H. Verma, A. Skevi, tune in, drop out: Anticipating student dropouts in massive
G. Zufferey, J. Blom, and P. Dillenbourg. Watching open online courses. In Proc. of the 2013 NIPS Data-Driven
MOOCs together: investigating co-located MOOC Education Workshop, volume 10, page 13, 2013.
study groups. Distance Education, 35(2):217–233, 2014. [37] W. W. Zachary. An information
[19] T. R. Liyanagunawardena, A. A. Adams, and S. A. flow model for conflict and fission in small groups.
Williams. MOOCs: A systematic study of the published Journal of Anthropological Research, 33:452–473, 1977.
literature 2008-2012. The International Review of Research [38] J. Zhang, M. S. Ackerman, and L. Adamic.
in Open and Distributed Learning, 14(3):202–227, 2013. Expertise networks in online communities: structure and
[20] J. L. Meece, algorithms. In Proc. of the 16th international conference
E. M. Anderman, and L. H. Anderman. Classroom goal on World Wide Web, pages 221–230. ACM, 2007.