=Paper= {{Paper |id=Vol-1446/GEDM_2015_Submission_2 |storemode=property |title=Communities of Performance & Communities of Preference |pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_2.pdf |volume=Vol-1446 |dblpUrl=https://dblp.org/rec/conf/edm/BrownLWEABBBM15 }} ==Communities of Performance & Communities of Preference== https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_2.pdf
                                 Communities of Performance
                                 & Communities of Preference

                Rebecca Brown                             Collin Lynch                          Yuan Wang
               North Carolina State                   North Carolina State             Teachers College, Columbia
                    University                             University                          University
                   Raleigh, NC                            Raleigh, NC                        New York, NY
             rabrown7@ncsu.edu                       cflynch@ncsu.edu                 elle.wang@columbia.edu
                Michael Eagle                           Jennifer Albert                       Tiffany Barnes
         North Carolina State University              North Carolina State                  North Carolina State
                 Raleigh, NC                               University                            University
              mjeagle@ncsu.edu                            Raleigh, NC                           Raleigh, NC
                                                jennifer_albert@ncsu.edu                 tmbarnes@ncsu.edu
                 Ryan Baker                             Yoav Bergner                       Danielle McNamara
         Teachers College, Columbia                 Educational Testing Service             Arizona State University
                 University                               Princeton, NJ                           Phoenix, AZ
               New York, NY                          ybergner@gmail.com                 dsmcnamara1@gmail.com
       ryanshaunbaker@gmail.com

ABSTRACT                                                            the performance of weaker ones. It has not yet been shown,
The current generation of Massive Open Online Courses (MOOCs)       however, that this type of support occurs in practice.
operate under the assumption that good students will help poor
students, thus alleviating the burden on instructors and Teaching   Prior research on social networks has shown that social groups,
Assistants (TAs) of having thousands of students to teach. In       even those that gather face-to-face, can fragment into disjoint
practice, this may not be the case. In this paper, we examine so-   sub-communities [37]. This small-group separation, if it takes
cial network graphs drawn from forum interactions in a MOOC         place in an online course, can be considered negative or positive,
to identify natural student communities and characterize them       depending on one’s perspective. If poor students communi-
based on student performance and stated preferences. We exam-       cate only with similarly-floundering peers, then they run the
ine the community structure of the entire course, students only,    risk of perpetuating misunderstandings and of missing insights
and students minus low performers and hubs. The presence of         discussed by better-performing peers and teaching staff. An
these communities and the fact that they are homogeneous with       instructor may wish to avoid this fragmentation to encourage
respect to grade but not motivations has important implications     poor students to connect with better ones.
for planning in MOOCs.
                                                                    These enduring subgroups may be beneficial, however, by help-
Keywords                                                            ing students to form enduring supportive relationships. Research
MOOC, social network, online forum, community detection             by Li et al. has shown that such enduring relationships can
                                                                    enhance students’ social commitment to a course [18]. We be-
                                                                    lieve that this social commitment will in turn help to reduce
1.   INTRODUCTION                                                   feelings of isolation and alienation among students in a course.
The current generation of Massive Open Online Courses (MOOCs)       Eckles and Stradley [9] have shown that such isolation is a key
is designed to leverage student interactions to augment instruc-    predictor of student dropout.
tor guidance. The activity in courses on sites such as Coursera
and edX is centered around user forums that, while curated          We have previously shown that students can form stable com-
and updated by instructors and TAs, are primarily constructed       munities and that those communities are homogeneous with
by students. When planning and building these courses, it is        respect to performance [3]. However that work did not: show
hoped that students will help one another through the course        whether these results are consistent with prior work on imme-
and that interacting with stronger students will help to improve    diate peer relationships; address the impact of hub students on
                                                                    these results; or discuss whether students’ varying goals and
                                                                    preferences motivate the community structure. Our goal in this
                                                                    paper is to build upon our prior work by addressing these issues.
                                                                    In the remainder of this paper we will survey prior educational
                                                                    literature on community formation in traditional and online
                                                                    classrooms. We will then build upon our prior work by exam-
                                                                    ining the impact of hub users. And we will look at the impact
                                                                    of user motivations on community formation.
2.    RELATED WORK                                                     2.2    Communities, Hubs, & Peers
                                                                       Kovanovic et al. [15] examined the relationship between social
2.1    MOOCs, Forums, & Student Performance                            network position or centrality, and social capital formation in
A survey of the literature on MOOCs shows the beginnings of a          courses. Their work is specifically informed by the Community
research base generating an abundance of data that has not yet         of Inquiry (COI) framework. the COI framework is focused on
been completely analyzed [19]. According to Seaton et al. [29],        distance education and is particularly suited to online courses of
most of the time students spend on a MOOC is spent in dis-             the type that we study here. The model views course behavior
cussion forums, making them a rich and important data source.          through three presences which mediate performance: cognitive,
Stahl et al. [30] illustrates how through this online interaction      teaching, and social.
students collaborate to create knowledge. Thus students’ forum
activity is good not only for the individual student posting con-      This social presence considers the nature and persistence of
tent or receiving answers, but for the class as a whole. Huang et      student interactions and the extent to which they reinforce stu-
al. [14] investigated the behavior of the highest-volume posters       dents’ behaviors. In their analysis, the authors sought to test
in 44 MOOC-related forums. These “superposters” tended to              whether network relationships, specifically students’ centrality
enroll in more courses and do better in those courses than the         in their social graph, is related to their social performance as
average. Their activity also added to the overall volume of forum      measured by the nature and type of their interactions. To that
content and they left fewer questions unanswered in the forums.        end, they examined a set of course logs taken from a series of
Huang et al. also found that these superposters did not suppress       online courses offered within a public university. They found
the activity of less-active users. Rienties et al. [25] examined the   that students’ position within their social graph was positively
way in which user interaction in MOOCs is structured. They             correlated with the nature and type of their interactions, thus
found that allowing students to self-select collaborators is more      indicating that central players also engaged in more useful social
conducive to learning than randomly assigning partners. Further,       interactions. They did not extend this work to groups, however,
Van Dijk et al. [31] found that simple peer instruction is signif-     focusing solely on individual hub students.
icantly less effective in the absence of a group discussion step,
pointing again to the importance of a class discussion forum.          Other authors have also examined the relationship between
                                                                       network centrality, neighbor relationships, network density, and
More recently Rosé et al. [27] examined students’ evolving inter-     student performance factors. Eckles and Stradley [9] applied
actions in MOOCs using a Mixed-Membership Stochastic Block             network analysis to student attrition, finding that students with
model which seeks to detect partially overlapping communities.         strong social relationships with other students who drop out
They found that the likelihood that students would drop out            are significantly more likely to drop out themselves. Rizzuto
of the course is strongly correlated with their community mem-         et al. [26] studied the impact of social network density on stu-
bership. Students who actively participated in forums early in         dent performance. Network density is defined as the fraction
the course were less likely to drop out later. Furthermore, they       of possible edges that are present in a given graph. Thus it
found one forum sub-community that was much more prone                 is a measure of how “clique-like” the graph is. The authors
to dropout than the rest of the class, suggesting that MOOC            examined self-reported social networks for students in a large
communities are made up of students who behave in similar              traditional undergraduate psychology course. They found that
ways. This community can in turn reflect or impact a student’s         denser social networks were significantly correlated with per-
level of motivation and their overall experience in a course much      formance. However, a dominance analysis [1] showed that this
like the “emotional contagion” model used in the Facebook mood         factor was less predictive than pure academic ability. These re-
manipulation study by Kramer, Guillroy, and Hancock [16].              sults serve to motivate a focus on the role of social relationships
                                                                       in student behavior. Their analysis is complicated, however, by
Yang et al. [36] also notes that unlike traditional courses stu-       their reliance on self-report data which will skew the strength
dents can join MOOCs at different times and observed that              and recency of the reported relationships.
students who join a course early are more likely to be active
and connected in the forums, and less likely to drop out, than         Fire et al. [11] studied student interaction in traditional class-
those who join later. MOOCs also attract users with a range of         rooms, constructing a social network based on cooperation on
individual motivations. In a standard classroom setting students       class assignments. Students were linked based on partnership on
are constrained by availability, convention, and goals. Few stu-       group work as well as inferred cooperation based on assignment
dents enroll in a traditional course without seeking to complete       submission times and IP addresses. The authors found that a
it and to get formal credit for doing so. MOOCs by virtue of           student’s grade was significantly correlated with the grade of
their openness and flexibility attract a wide range of students        the student with the strongest links to that student in the social
with unique personal motivations [10]. Some join the course            network. We perform similar analysis in this paper to examine
with the intent of completing it. Others may seek only to brush        whether the same correlation exists in MOOCs.
up on existing knowledge, obtain specific skills, or just watch
the videos. These distinct motivations in turn lend themselves         Online student interaction in blended courses has also been
to different in-class behaviors including assignment viewing and       linked to course performance. Dawson [8] extracted student
forum access. The impact of user motivations in online courses         and instructor social networks from a blended course’s online
has been previously discussed by Wang et al. [32, 33]; we will         discussion forums and found that students in the 90th grade
build upon that work here. Thus it is an open question whether         percentile had larger social networks than those in the 10th
these motivations affect students’ community behaviors or not.         percentile. The study also found that high-performing students
                                                                       primarily associated with other high-performing students and
                                                                       were more likely to be connected to the course instructor, while
                                                                       low-performing students tended to associate with other low-
performers. In a blended course, this effect may be offset by          the same material as a graduate-level course, Core Methods
face-to-face interaction not captured in the online social network,    in Educational Data Mining, at Teachers College Columbia
but if the same separation happens in MOOC communities, low-           University. The MOOC spanned from October 24, 2013 to
performing students are less likely to have other chances to learn     December 26, 2013. The weekly course was composed of lecture
from high-performing ones.                                             videos and 8 weekly assignments. Most of the videos contained
                                                                       in-video quizzes (that did not count toward the final grade).
2.3    Community Detection
One of the primary activities students engage in on forums             All of the weekly assignments were structured as numeric input
is question answering. Zhang et al. [38] conducted a social            or multiple-choice questions. The assignments were graded au-
network analysis on an online question-and-answer forum about          tomatically. In each assignment, students were asked to conduct
Java programming. Using vertex in-degree and out-degree, they          analyses on a data set provided to them and answer questions
were able to identify a relatively small number of active users        about it. In order to receive a grade, students had to com-
who answered many questions. This allowed the researchers to           plete this assignment within two weeks of its release with up
develop various algorithms for calculating a user’s Java expertise.    to three attempts for each assignment, and the best score out
Dedicated question-and-answer forums are more structured than          of the three attempts was counted. The course had a total
MOOC forums, with question and answer posts identified, but a          enrollment of over 48,000, but a much smaller number actively
similar approach might help identify which students in a MOOC          participated. 13,314 students watched at least one video, 1,242
ask or answer the most questions.                                      students watched all the videos, 1,380 students completed at
                                                                       least one assignment,and 778 made a post or comment in the
Choo et al. [5] studied community detection in Amazon product-         weekly discussion sections. Of those with posts, 426 completed
review forums. Based on which users replied to each other most         at least one class assignment. 638 students completed the online
often, they found communities of book and movie reviewers who          course and received a certificate (meaning that some students
had similar tastes in these products. As in MOOC forums, users         could earn a certificate without participating in forums at all).
did not declare any explicit social relationships represented in the
system, but they could still be grouped by implicit connections.       In addition to the weekly assignments the students were sent
                                                                       a survey that was designed to assess their personal motivations
In the context of complex networks, a community structure is a         for enrolling in the course. This survey consisted of 3 sets
subgraph which is more densely connected internally than it is to      of questions: MOOC-specific motivational items; two PALS
the rest of the network. We chose to apply the Girvan-Newman           (Patterns of Adaptive Learning Survey) sub-scales [21], Aca-
edge-betweenness algorithm (GN) [13]. This algorithm takes as          demic Efficacy and Mastery-Goal Orientation; and an item
input a weighted graph and a target number of communities.             focused on confidence in course completion. It was distributed
It then ranks the edges in the graph by their edge-betweenness         to students through the course’s E-mail messaging system to
value and removes the highest ranking edge. To calculate Edge-         students who enrolled in the course prior to the official start
betweenness we identify the shortest path p(a,b) between each          date. Data on whether participants successfully completed the
pair of nodes a and b in the graph. The edge-betweenness               course was downloaded from the same course system after the
of an arc is defined as the number of shortest paths that it           course concluded. The survey received 2,792 responses; 38% of
participates in. This is one of the centrality measures explored       the participants were female and 62% of the participants were
by Kovanovic et al. above [15]. The algorithm then recalcu-            male. All of the respondents were over 18 years of age.
lates the edge-betweenness values and iterates until the desired
number of disjoint community subgraphs has been produced.              The MOOC-specific items consisted of 10 questions drawn from
Thus the algorithm operates by iteratively finding and removing        previous MOOC research studies (cf. [2, 22]) asking respondents
the highest-value communications channel between communities           to rate their reasons for enrollment. These 10 items address
until the graph is fully segmented. For this analysis, we used         traits of MOOCs as a novel online learning platform. Specifically,
the iGraph library [7] implementation of G-N within R [24].            these 10 items included questions on both the learning content
                                                                       and features of MOOCs as a new platform. Two PALS Survey
The strength of a candidate community can be estimated by              scales [21] measuring mastery-goal orientation and academic
modularity. The modularity score of a given subgraph is defined        efficacy were used to study standard motivational constructs.
as a ratio of its intra-connectedness (edges within the subgraph)      PALS scales have been widely used to investigate the relation
to the inter-connectedness with the rest of the graph minus the        between a learning environment and a student’s motivation (cf.
fraction of such edges expected if they were distributed at ran-       [6, 20, 28]). Altogether ten items with five under each scale
dom [13, 35]. A graph with a high modularity score represents          were included. The participants were asked to select a number
a dense sub-community within the graph.                                from 1 to 5 with 1 meaning least relevant and 5 most relevant.
                                                                       Respondents were also asked to self-rate their confidence on a
3.    DATA SET                                                         scale of 1 to 10 as to whether they could complete the course
                                                                       according to the pace set by the course instructor. All three
This study used data collected from the “Big Data in Education”
                                                                       groups of items were domain-general.
MOOC hosted on the Coursera platform as one of the inaugural
courses offered by Columbia University [32]. It was created in
response to the increasing interest in the learning sciences and       4.   METHODS
educational technology communities in using EDM methods                For our analysis, we extracted a social network from the online
with fine-grained log data. The overall goal of this course was        forum associated with the course. We assigned a node to each
to enable students to apply each method to answer education            student, instructor, or TA in the course who added to it. Nodes
research questions and to drive intervention and improvement in        representing students were labeled with their final course grade
educational software and systems. The course covered roughly           out of 100 points. The Coursera forums operate as standard
threaded forums. Course participants could start a new thread         the network would only share edges with vertices of different
with an initial post, add a post to an existing thread, and add       scores. Thus grade assortativity allows us to measure whether
a comment or child element below an existing post. We added           individuals are not just connected directly to individuals with
a directed edge from the author of each post or comment to the        similar scores but whether they correlate with individuals who
parent post and to all posts or comments that preceded it on          are one step removed.
the thread based upon their timestamp. We made a conscious
decision to omit the textual content of the replies with the goal     Several commonly studied classes of networks tend to have pat-
of isolating the impact of the structure alone.                       terns in their assortativity. Social networks tend to have high
                                                                      assortativity, while biological and technological networks tend
We thus treat each reply or followup in the graph as an implicit      to have negative values (dissortativity) [23]. In a homogeneous
social connection and thus a possible relationship. Such implicit     course or one where students only form stratified communities
social relationships have been explored in the context of recom-      we would expect the assortativity to be very high while in a het-
mender systems to detect strong communities of researchers [5].       erogeneous class with no distinct communities we would expect
This is, by design, a permissive definition that is based upon        it to be quite low.
the assumption that individuals generally add to a thread after
viewing the prior content within it and that individual threads       4.2    Community Detection
can be treated as group conversations with each reply being a         The process of community detection we employed is briefly de-
conscious statement for everyone who has already spoken. The          scribed here [3]. As noted there we elected to ignore the edge
resulting network forms a multigraph with each edge represent-        direction when making our graph. Our goal in doing so was to
ing a single implicit social interaction. We removed self loops       focus on communities of learners who shared the same threads,
from this graph as they indicate general forum activity but           even when they were not directly replying to one-another. We
not any meaningful interaction with another person. We also           believe this to be a reasonable assumption given the role of class
removed vertices with a degree of 0, and collapsed the parallel       forums as a knowledge-building environment in which students
edges to form a simple weighted graph for analysis.                   exchange information with the group. Individuals who partic-
                                                                      ipate in a thread generally review prior posts before submitting
In the analyses below we will focus on isolating student perfor-      their contribution and are likely to return to view the followups.
mance and assessing the impact of the faculty and hub students.       Homogeneity in this context would mean that students gathered
We will therefore consider four classes of graphs: ALL the com-       and communicated primarily with equally-performing peers and
plete graph; Student the graph with the instructor and TAs            thus that they did not consistently draw from better-performing
removed; NoHub the graph with the instructor and hub users re-        classmates and help lower-performing ones or that the at-will
moved; and Survey which includes only students who completed          communities served to homogenize performance, with the stu-
the motivation survey. We will also consider versions of the above    dents in a given cluster evening out over time.
graphs without students who obtained a score of 0, and without
the isolated individuals who connect with at most one other           While algorithms such as GN are useful for finding clusters they
person. As we will discuss below, a number of students received       do not, in and of themselves, determine the right number of
a zero grade in the course. Because this is an at-will course, how-   communities. Rather, when given a target number they will seek
ever, we cannot readily determine why these scores were obtained.     to identify the best possible set of communities. In some imple-
They may reflect a lack of engagement with the course, differen-      mentations the algorithm can be applied to iteratively select the
tial motivations for taking the course, a desire to see the course    maximum modularity value over a possible range. Determining
materials without assignments, or genuinely poor performance.         the correct number of communities to detect, however, is a
                                                                      non-trivial task especially in large and densely connected graphs
4.1    Best-Friend Regression & Assortativity                         where changes to smaller communities will have comparatively
Fire et al. [11] applied a similar social network approach to         small effects on the global modularity score. As a consequence
traditional classrooms and found a correlation between a stu-         we cannot simply optimize for the best modularity score as we
dent’s most highly connected neighbor (”best friend”) and the         would risk missing small but important communities [12].
student’s grade. The links in that graph included cooperation
on assignments as well as partnership on group assignments.           Therefore, rather than select the clusterings based solely on
To examine whether the same correlation existed in a massive          the highest modularity, we have opted to estimate the correct
online course in which students were less likely to know each         number of clusters visually. To that end we plotted a series of
other beforehand and there were no group assignments, we              modularity curves over the set of graphs. For each graph G we
calculated each student’s best friend in the same manner and          applied the GN algorithm iteratively to produce all clusters in
performed a similar correlation.                                      the range (2,|GN |). For each clustering, we then calculated the
                                                                      global modularity score. We examined the resulting scores to
The simple best friends analysis gives a straightforward mech-        identify a crest where the modularity gain leveled off or began to
anism for correlating individual students. However it is also         decrease thus indicating that future subdivisions added no mean-
worthwhile to ask about students who are one-step removed             ingful information or created schisms in existing high-quality
from their peers. Therefore we will also calculate the grade          communities. This is a necessarily heuristic process that is sim-
assortativity (rG ) of the graphs. Assortativity describes the cor-   ilar to the use of Scree plots in Exploratory Factor Analysis [4].
relation of values between vertices and their neighbors [23]. The     We define the number identified as the natural cluster number.
assortativity metric r ranges between -1 and 1, and is essentially
the Pearson correlation between vertex and their neighbors [23].      5.    RESULTS AND DISCUSSION
A network with r =1 would have each vertex only sharing edges         Before removing self-loops and collapsing the edges, the network
with vertices of the same score. Likewise, if r =−1 vertices in       contained 754 nodes and 49,896 edges. The final social network
contained 754 nodes and 17,004 edges. 751 of the participants
were students, with 1 instructor and 2 TAs. One individual was
incorrectly labeled as a student when they were acting as the
Chief Community TA. Since this person’s posts clearly indicated
that he or she was acting in a TA capacity with regard to the
forums, we relabeled him/her as a TA. Of the 751 students 304
obtained a zero grade in the course leaving 447 nonzero students.
215 of the 751 students responded to the motivation survey.

There were a total of 55,179 registered users, so the set of 754
forum participants is a small fraction of the entire course audi-
ence. However, forum users are not necessarily those who will
make an effort or succeed in the course. Forum users did not all
participate in the course, and some students who participated in
the course did not use the forums: 1,381 students in the course
got a grade greater than 0, and 934 of those did not post or
comment on the forums, while 304 of the 751 students who did           Figure 1: Modularity for each number of clusters,
participate in the forums received a grade of 0. Clearly students      including students with zeros.
who go to the trouble of posting forum content are in some
respect making an effort in the course beyond those who don’t,
but this does not necessarily correspond to course success.

5.1    Best-Friend Regression & Assortativity
We followed Fire et al.’s methodology for identifying Best Friends
in a weighted graph and calculated a simple linear regression
over the pairs. This correlation did not include the instructor or
TAs in the analysis. We calculated the correlation between the
students’ grades to their best friends’ grades in the set using
Spearman’s Rank Correlation Coefficient (ρ) [34]. The two vari-
ables were strongly correlated, ρ(748)=0.44, p<0.001. However,
the correlation was also affected by the dense clusters of students
with 0 grades. After removing the 0 grade students we found
an additional moderate correlation, ρ(444)=0.29, p<0.001.

Thus the significant correlation between best-friend grade and
grade holds over the transition from the traditional classroom to
                                                                       Figure 2: Modularity for each number of clusters,
a MOOC. This suggests that students in a MOOC, excluding the
                                                                       excluding students with zeros.
many who drop out or do not submit assignments, behave sim-
ilarly to those in a traditional classroom in this respect. These
results are also consistent with our calculations for assortativity.   5.2    Community Structure
There we found a small assortative trend for the grades as shown       The modularity curves for the graphs both with and without
in Table 1. These values reflect that a student was frequently         zero-score students are shown in Figures 1 and 2. We exam-
communicating with students who in turn communicated with              ined these plots to select the natural cluster numbers which are
students at a similar performance level. This in turn supports our     shown in Table 2. As the values illustrate the instructor, TAs,
belief that homogeneous communities may be found. As Table             and hub students have a disproportionate impact on the graph
1 also illustrates, the zero-score students contribute substan-        structure. The largest hub student in our graph connects to
tially to the assortativity correlation as well with the correlation   444 out of 447 students in the network. The graph with all
dropping by as much as a third when they were removed.                 users had lower modularity and required more clusters than the
                                                                       graphs with only students or only non-hubs (see Table 2), with



                                                                       Table 2: Graph sizes and natural number of clusters
                                                                       for each graph.
  Table 1: The grade assortativity for each network.                           Users    Zeros  V     E    Clusters
          Users    Zeros V       E       rG
                                                                                All           Yes   754    17004         212
            All            Yes    754    17004    0.29                          All           No    447     5678         173
            All            No     447     5678    0.20                          Students      Yes   751    15989         184
            Students       Yes    751    15989    0.32                          Students      No    447     5678         169
            Students       No     447     5678    0.20                          Non-Hub       Yes   716     9441          79
            Non-Hub        Yes    716     9441    0.37                          Non-Hub       No    422     3119          52
            Non-Hub        No     422     3119    0.24                          Survey        Yes   215     1679          58
Figure 3: View of the student communities with edges of frequency <2 removed. The Student network with (left)
and without (right) hub-students, with each vertex representing a student and grade represented as color.


the non-hub graph having the highest modularity. This suggests
that non-hub students formed more isolated communities, while          Table 3: Grade statistics by community, selected
teaching staff and hubs communicated across these communities          to show examples of more and less homogeneous
and connected them.                                                    communities.
                                                                            Members Average Grade Standard Deviation
This largely consistent with the intent of the forums and the                      118              21.62                   36.58
active role played by the instructor and TAs in monitoring and                      41              22.00                   32.45
replying to all relevant posts in the forums. It is particularly in-                34              25.41                   40.44
teresting how closely the curves for the ALL and Student graphs                     31              56.13                   47.69
mirror one another. This may indicate that the hub students are                     20              49.05                   45.64
also those that followed the instructor and TAs closely, thus giv-                  16              12.44                   31.13
ing them isomorphic relationships, or it may indicate that they                     14              88.43                   22.47
are more connected than even the instructors and thus came to                       12              96.08                    6.36
bind the forums together on their own. This impact is further                       11              96.45                    7.38
illustrated by the cluster plots shown in Figure 3. Here the ab-                     4               3.00                    6.00
sence of the hub students results in a noticeable thinning of the                    4               8.50                    9.81
graph which in turn highlights the frequency of communication                        4               4.25                    8.50
that can be attributed to this, comparatively small, group.                          4              96.25                    3.50

The difference between the full plots and those with zero values
are also notable as the zero grade students were clearly a major
                                                                       standard deviation for a small selection of the communities in
factor in community formation. A direct examination of the
                                                                       the ALL reply network including zero-grades, hub students,
user graph showed that many of the zero students were only
                                                                       and teaching staff. Several of the communities, particularly
connected to other zero students or were not connected at all.
                                                                       the larger ones, do show a blend of good and poor students,
This is also highlighted in Figure 3. In both graphs the bulk of
                                                                       with a high standard deviation. However many if not most of
the zero score students are clustered in a tight network of com-
                                                                       the communities are more homogeneous with good and poor
munities on the left-hand side. That super-community consists
                                                                       students sharing a community with similarly-performing peers.
primarily of zero score students communicating with other zero-
                                                                       These clusters have markedly lower standard deviation.
score students, a structure we have nick-named the ‘deathball.’
                                                                       An examination of the grade distribution for each of the clusters
5.3    Student Performance & Motivation                                showed that the scores within each cluster were non-normal.
As the color coding in Figure 3 illustrates, the students did          Therefore we opted to apply the Kruskal-Wallis (KW) test to
cluster by performance. Table 3 shows the average grade and            assess the correlation between cluster membership and perfor-
                                                                     We also found that community membership was not a significant
Table 4: Kruskal-Wallis test of student grade by                     predictor of whether students would complete the motivation
community, for each graph.                                           survey or of students’ motivations. We were surprised by the
      Users    Zeros Chi-Squared df  p-value                         fact that even when we focused solely on individuals who had
      All            Yes       349.0273    211    < 0.005            completed the survey, the students did not connect by stated
      All            No        216.1534    172     < 0.02            goals. This suggests to us that the students are more likely
      Students       Yes       202.0814     78    < 0.005            coalescing around the pragmatic needs of the class or conceptual
      Students       No        80.93076     51    < 0.005            challenges rather than on the winding paths that brought them
      Non-Hub        Yes       309.8525    183    < 0.005            there. One limitation of this work is that by relying on the
      Non-Hub        No        218.9603    168     < 0.01            forum data we were focused solely on the comparatively small
      Survey         Yes       99.99840    577    < 0.005            proportion of enrolled students (6%) who actively participated
                                                                     in the forums. This group is, by definition a smaller set of more
                                                                     actively-involved participants.
mance. The KW test is a nonparametric rank-based analogue
                                                                     In addition to addressing our primary questions this study also
to the common Analysis of Variance [17]. Here we tested grade
                                                                     raised a number of open issues for further exploration. Firstly,
by community number with the community being treated as a
                                                                     this work focused solely on the final course structure, grades, and
categorical variable. The results of this comparison are shown
                                                                     motivations. We have not yet addressed whether these commu-
in Table 4. As that illustrates, cluster membership was a sig-
                                                                     nities are stable over time or how they might change as students
nificant predictor of student performance for all of the graphs
                                                                     drop in our out. Secondly, while we ruled out motivations as a
with the non-zero graphs having markedly lower p-values than
                                                                     basis for the community this work we were not able to identify
those with zero students included. These results are consistent
                                                                     what mechanisms do support the communities. And finally this
with our hypothesis that students would form clusters of equal-
                                                                     study raises the question of generality and whether or not these
performers and we find that those results hold even when the
                                                                     results can be applied to MOOCs offered on different topics or
highly-connected instructors, TAs and hub students are included.
                                                                     whether the results apply to traditional and blended courses.
We performed a similar KW analysis for the questions on the
                                                                     In subsequent studies we plan to examine both the evolution of
motivation survey and for a binary variable indicating whether
                                                                     the networks over time as well as additional demographic data
or not the student completed the survey at all. For this analysis
                                                                     with the goal of assessing both the stability of these networks
we evaluated the clusters on all of the graphs. We found no
                                                                     and the role of other potential latent factors. We will also
significant relationship between the community structure on
                                                                     examine other potential clustering mechanisms that control for
any of the graphs and the survey question results or the survey
                                                                     other user features such as frequency of involvement and thread
completion variable. Thus while the clusters may be driven by
                                                                     structure. We also plan to examine other similar datasets to
separate factors they are not reflected in the survey content.
                                                                     determine if these features transition across classes and class
                                                                     types. We believe that these results may change somewhat once
6.   CONCLUSIONS AND FUTURE WORK                                     students can coordinate face to face far more easily than online.
Our goal in this paper was to expand upon our prior community
detection work with the goal of aligning that work with prior
research on peer impacts, notably the work of Fire et al. [11].      7.   ACKNOWLEDGMENTS
We also sought to examine the impact of hub students and             This work was supported by NSF grant #1418269: “Modeling
student motivations on our prior results.                            Social Interaction & Performance in STEM Learning” Yoav
                                                                     Bergner, Ryan Baker, Danielle S. McNamara, & Tiffany Barnes
To that end we performed a novel community clustering analysis       Co-PIs.
of student performance data and forum communications taken
from a single well-structured MOOC. As part of this analysis we      8.   REFERENCES
described a novel heuristic method for selecting natural numbers      [1] R. Azen and D. Budescu. The dominance
of clusters, and replicated the results of prior studies of both          analysis approach for comparing predictors in multiple
immediate neighbors and second-order assortativity.                       regression. Psychological Methods, 8(2):129–48, 2003.
                                                                      [2] Y. Belanger and J. Thornton.
Consistent with prior work, we found that students’ grades                Bioelectricity: A quantitative approach Duke University’s
were significantly correlated with their most closely associated          first MOOC. Journal of Learning Analytics, 2013.
peers in the new networks. We also found that this correlation        [3] R. Brown, C. F. Lynch, M. Eagle, J. Albert, T. Barnes,
extended out to their second-order neighborhood. This is consis-          R. Baker, Y. Bergner, and D. McNamara. Good
tent with our prior work showing that students form stable user           communities and bad communities: Does membership
communities that are homogeneous by performance. We found                 affect performance? In C. Romero and M. Pechenizkiy,
that those results were stable even if instructors, hub players,          editors, Proceedings of the 8th International
students with 0 scores, and students who did not fill out the sur-        Conference on Educational Data Mining, 2015. submitted.
vey were removed from consideration. This suggests that either
                                                                      [4] R. B. Cattell. The scree test for the number of factors.
the students are forming communities that are homogeneous or
                                                                          Multivariate Behavioral Research, 1(2):245–276, 1966.
that the effect of those individual and network features on the
communities and on performance is minimal.                            [5] E. Choo, T. Yu, M. Chi, and
                                                                          Y. Sun. Revealing and incorporating implicit communities
                                                                          to improve recommender systems. In M. Babaioff,
                                                                          V. Conitzer, and D. Easley, editors, ACM Conference
     on Economics and Computation, EC ’14, Stanford,                    structure, student motivation, and academic achievement.
     CA, USA, June 8-12, 2014, pages 489–506. ACM, 2014.                Annual Review of Psychology, 57:487–503, 2006.
 [6] K. Clayton, F. Blumberg,                                      [21] C. Midgley, M. L.
     and D. P. Auld. The relationship between motivation                Maehr, L. Hruda, E. Anderinan, L. Anderman, and K. E.
     learning strategies and choice of environment whether              Freeman. Manual for the Patterns of Adaptive Learning
     traditional or including an online component. British              Scales (PALS). University of Michigan, Ann Arbor, 2000.
     Journal of Educational Technology, 41(3):349–364, 2010.       [22] MOOC @ Edinburgh 2013. MOOC @ Edinburgh
 [7] G. Csardi and T. Nepusz.                                           2013 - report #1. Journal of Learning Analytics, 2013.
     The igraph software package for complex network               [23] M. E. Newman. Assortative Mixing in Networks.
     research. InterJournal, Complex Systems:1695, 2006.                Physical Review Letters, 89(20):208701, Oct. 2002.
 [8] S. Dawson. ’Seeing’ the learning                              [24] R Core Team. R: A Language
     community: An exploration of the development of a                  and Environment for Statistical Computing. R Foundation
     resource for monitoring online student networking. British         for Statistical Computing, Vienna, Austria, 2012.
     Journal of Educational Technology, 41(5):736–752, 2010.       [25] B. Rienties, P. Alcott, and D. Jindal-Snape.
 [9] J. Eckles and E. Stradley. A                                       To let students self-select or not: That is the
     social network analysis of student retention using archival        question for teachers of culturally diverse groups. Journal
     data. Social Psychology of Education, 15(2):165–180, 2012.         of Studies in International Education, 18(1):64–83, 2014.
[10] A. Fini. The technological                                    [26] T. Rizzuto, J. LeDoux, and J. Hatala. It’s not just what you
     dimension of a massive open online course: The                     know, it’s who you know: Testing a model of the relative
     case of the CCK08 course tools. The International Review           importance of social networks to academic performance.
     Of Research In Open And Distance Learning, 10(5), 2009.            Social Psychology of Education, 12(2):175–189, 2009.
[11] M. Fire,                                                      [27] C. P. Rosé, R. Carlson, D. Yang, M. Wen, L. Resnick,
     G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting         P. Goldman, and J. Sherer. Social factors that contribute to
     student exam’s scores by analyzing social network data. In         attrition in MOOCs. In Proc. of the first ACM conference
     Active Media Technology, pages 584–595. Springer, 2012.            on Learning@ scale conference, pages 197–198. ACM, 2014.
[12] S. Fortunato and M. Barthélemy.                              [28] A. M. Ryan and H. Patrick. The classroom
     Resolution limit in community detection. Proc.                     social environment and changes in adolescents’ motivation
     of the National Academy of Sciences, 104(1):36–41, 2007.           and engagement during middle school. American
[13] M. Girvan and M. E. J. Newman. Community structure                 Educational Research Journal, 38(2):437–460, 2001.
     in social and biological networks. Proc. of the National      [29] D. Seaton, Y. Bergner, I. Chuang, P. Mitros, and
     Academy of Sciences, 99(12):7821–7826, June 2002.                  D. Pritchard. Who does what in a massive open online
[14] J. Huang, A. Dasgupta, A. Ghosh,                                   course? Communications of the ACM, 57(4):58–65, 2014.
     J. Manning, and M. Sanders. Superposter behavior              [30] G. Stahl, T. Koschmann,
     in MOOC forums. In Proc. of the first ACM conference               and D. Suthers. Computer-supported collaborative
     on Learning@ scale conference, pages 117–126. ACM, 2014.           learning: An historical perspective. Cambridge
[15] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala.            handbook of the learning sciences, 2006:409–426, 2006.
     What is the source of social capital? the association         [31] L. Van Dijk, G. Van Der Berg, and H. Van Keulen.
     between social network position and social presence in             Interactive lectures in engineering education. European
     communities of inquiry. In S. G. Santos and O. C. Santos,          Journal of Engineering Education, 26(1):15–28, 2001.
     editors, Proc. of the Workshops held at Educational           [32] Y. Wang and R. Baker. Content or platform:
     Data Mining 2014, co-located with 7th International                Why do students complete MOOCs? MERLOT Journal
     Conference on Educational Data Mining (EDM                         of Online Learning and Teaching, 11(1):191–218, 2015.
     2014), London, United Kingdom, July 4-7, 2014., volume        [33] Y. Wang, L. Paquette, and R. Baker.
     1183 of CEUR Workshop Proc. CEUR-WS.org, 2014.                     A longitudinal study on learner career advancement
[16] A. D. I. Kramer, J. E. Guillory,                                   in MOOCs. Journal of Learning Analytics, 1(3), 2014.
     and J. T. Hancock. Experimental evidence of massive-scale     [34] Wikipedia. Spearman’s
     emotional contagion through social networks. Proc. of the          rank correlation coefficient — Wikipedia, the free
     National Academy of Sciences, 111(24):8788–8790, 2014.             encyclopedia, 2013. [Online; accessed 27-February-2013].
[17] W. H. Kruskal and W. A. Wallis. Use                           [35] Wikipedia. Modularity (networks) — Wikipedia, the free
     of ranks in one-criterion variance analysis. Journal of the        encyclopedia, 2014. [Online; accessed 5-February-2015].
     American statistical Association, 47(260):583–621, 1952.      [36] D. Yang, T. Sinha, D. Adamson, and C. P. Rose. Turn on,
[18] N. Li, H. Verma, A. Skevi,                                         tune in, drop out: Anticipating student dropouts in massive
     G. Zufferey, J. Blom, and P. Dillenbourg. Watching                 open online courses. In Proc. of the 2013 NIPS Data-Driven
     MOOCs together: investigating co-located MOOC                      Education Workshop, volume 10, page 13, 2013.
     study groups. Distance Education, 35(2):217–233, 2014.        [37] W. W. Zachary. An information
[19] T. R. Liyanagunawardena, A. A. Adams, and S. A.                    flow model for conflict and fission in small groups.
     Williams. MOOCs: A systematic study of the published               Journal of Anthropological Research, 33:452–473, 1977.
     literature 2008-2012. The International Review of Research    [38] J. Zhang, M. S. Ackerman, and L. Adamic.
     in Open and Distributed Learning, 14(3):202–227, 2013.             Expertise networks in online communities: structure and
[20] J. L. Meece,                                                       algorithms. In Proc. of the 16th international conference
     E. M. Anderman, and L. H. Anderman. Classroom goal                 on World Wide Web, pages 221–230. ACM, 2007.