=Paper=
{{Paper
|id=Vol-1446/GEDM_2015_Submission_2
|storemode=property
|title=Communities of Performance & Communities of Preference
|pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_2.pdf
|volume=Vol-1446
|dblpUrl=https://dblp.org/rec/conf/edm/BrownLWEABBBM15
}}
==Communities of Performance & Communities of Preference==
Communities of Performance & Communities of Preference Rebecca Brown Collin Lynch Yuan Wang North Carolina State North Carolina State Teachers College, Columbia University University University Raleigh, NC Raleigh, NC New York, NY rabrown7@ncsu.edu cflynch@ncsu.edu elle.wang@columbia.edu Michael Eagle Jennifer Albert Tiffany Barnes North Carolina State University North Carolina State North Carolina State Raleigh, NC University University mjeagle@ncsu.edu Raleigh, NC Raleigh, NC jennifer_albert@ncsu.edu tmbarnes@ncsu.edu Ryan Baker Yoav Bergner Danielle McNamara Teachers College, Columbia Educational Testing Service Arizona State University University Princeton, NJ Phoenix, AZ New York, NY ybergner@gmail.com dsmcnamara1@gmail.com ryanshaunbaker@gmail.com ABSTRACT the performance of weaker ones. It has not yet been shown, The current generation of Massive Open Online Courses (MOOCs) however, that this type of support occurs in practice. operate under the assumption that good students will help poor students, thus alleviating the burden on instructors and Teaching Prior research on social networks has shown that social groups, Assistants (TAs) of having thousands of students to teach. In even those that gather face-to-face, can fragment into disjoint practice, this may not be the case. In this paper, we examine so- sub-communities [37]. This small-group separation, if it takes cial network graphs drawn from forum interactions in a MOOC place in an online course, can be considered negative or positive, to identify natural student communities and characterize them depending on one’s perspective. If poor students communi- based on student performance and stated preferences. We exam- cate only with similarly-floundering peers, then they run the ine the community structure of the entire course, students only, risk of perpetuating misunderstandings and of missing insights and students minus low performers and hubs. The presence of discussed by better-performing peers and teaching staff. An these communities and the fact that they are homogeneous with instructor may wish to avoid this fragmentation to encourage respect to grade but not motivations has important implications poor students to connect with better ones. for planning in MOOCs. These enduring subgroups may be beneficial, however, by help- Keywords ing students to form enduring supportive relationships. Research MOOC, social network, online forum, community detection by Li et al. has shown that such enduring relationships can enhance students’ social commitment to a course [18]. We be- lieve that this social commitment will in turn help to reduce 1. INTRODUCTION feelings of isolation and alienation among students in a course. The current generation of Massive Open Online Courses (MOOCs) Eckles and Stradley [9] have shown that such isolation is a key is designed to leverage student interactions to augment instruc- predictor of student dropout. tor guidance. The activity in courses on sites such as Coursera and edX is centered around user forums that, while curated We have previously shown that students can form stable com- and updated by instructors and TAs, are primarily constructed munities and that those communities are homogeneous with by students. When planning and building these courses, it is respect to performance [3]. However that work did not: show hoped that students will help one another through the course whether these results are consistent with prior work on imme- and that interacting with stronger students will help to improve diate peer relationships; address the impact of hub students on these results; or discuss whether students’ varying goals and preferences motivate the community structure. Our goal in this paper is to build upon our prior work by addressing these issues. In the remainder of this paper we will survey prior educational literature on community formation in traditional and online classrooms. We will then build upon our prior work by exam- ining the impact of hub users. And we will look at the impact of user motivations on community formation. 2. RELATED WORK 2.2 Communities, Hubs, & Peers Kovanovic et al. [15] examined the relationship between social 2.1 MOOCs, Forums, & Student Performance network position or centrality, and social capital formation in A survey of the literature on MOOCs shows the beginnings of a courses. Their work is specifically informed by the Community research base generating an abundance of data that has not yet of Inquiry (COI) framework. the COI framework is focused on been completely analyzed [19]. According to Seaton et al. [29], distance education and is particularly suited to online courses of most of the time students spend on a MOOC is spent in dis- the type that we study here. The model views course behavior cussion forums, making them a rich and important data source. through three presences which mediate performance: cognitive, Stahl et al. [30] illustrates how through this online interaction teaching, and social. students collaborate to create knowledge. Thus students’ forum activity is good not only for the individual student posting con- This social presence considers the nature and persistence of tent or receiving answers, but for the class as a whole. Huang et student interactions and the extent to which they reinforce stu- al. [14] investigated the behavior of the highest-volume posters dents’ behaviors. In their analysis, the authors sought to test in 44 MOOC-related forums. These “superposters” tended to whether network relationships, specifically students’ centrality enroll in more courses and do better in those courses than the in their social graph, is related to their social performance as average. Their activity also added to the overall volume of forum measured by the nature and type of their interactions. To that content and they left fewer questions unanswered in the forums. end, they examined a set of course logs taken from a series of Huang et al. also found that these superposters did not suppress online courses offered within a public university. They found the activity of less-active users. Rienties et al. [25] examined the that students’ position within their social graph was positively way in which user interaction in MOOCs is structured. They correlated with the nature and type of their interactions, thus found that allowing students to self-select collaborators is more indicating that central players also engaged in more useful social conducive to learning than randomly assigning partners. Further, interactions. They did not extend this work to groups, however, Van Dijk et al. [31] found that simple peer instruction is signif- focusing solely on individual hub students. icantly less effective in the absence of a group discussion step, pointing again to the importance of a class discussion forum. Other authors have also examined the relationship between network centrality, neighbor relationships, network density, and More recently Rosé et al. [27] examined students’ evolving inter- student performance factors. Eckles and Stradley [9] applied actions in MOOCs using a Mixed-Membership Stochastic Block network analysis to student attrition, finding that students with model which seeks to detect partially overlapping communities. strong social relationships with other students who drop out They found that the likelihood that students would drop out are significantly more likely to drop out themselves. Rizzuto of the course is strongly correlated with their community mem- et al. [26] studied the impact of social network density on stu- bership. Students who actively participated in forums early in dent performance. Network density is defined as the fraction the course were less likely to drop out later. Furthermore, they of possible edges that are present in a given graph. Thus it found one forum sub-community that was much more prone is a measure of how “clique-like” the graph is. The authors to dropout than the rest of the class, suggesting that MOOC examined self-reported social networks for students in a large communities are made up of students who behave in similar traditional undergraduate psychology course. They found that ways. This community can in turn reflect or impact a student’s denser social networks were significantly correlated with per- level of motivation and their overall experience in a course much formance. However, a dominance analysis [1] showed that this like the “emotional contagion” model used in the Facebook mood factor was less predictive than pure academic ability. These re- manipulation study by Kramer, Guillroy, and Hancock [16]. sults serve to motivate a focus on the role of social relationships in student behavior. Their analysis is complicated, however, by Yang et al. [36] also notes that unlike traditional courses stu- their reliance on self-report data which will skew the strength dents can join MOOCs at different times and observed that and recency of the reported relationships. students who join a course early are more likely to be active and connected in the forums, and less likely to drop out, than Fire et al. [11] studied student interaction in traditional class- those who join later. MOOCs also attract users with a range of rooms, constructing a social network based on cooperation on individual motivations. In a standard classroom setting students class assignments. Students were linked based on partnership on are constrained by availability, convention, and goals. Few stu- group work as well as inferred cooperation based on assignment dents enroll in a traditional course without seeking to complete submission times and IP addresses. The authors found that a it and to get formal credit for doing so. MOOCs by virtue of student’s grade was significantly correlated with the grade of their openness and flexibility attract a wide range of students the student with the strongest links to that student in the social with unique personal motivations [10]. Some join the course network. We perform similar analysis in this paper to examine with the intent of completing it. Others may seek only to brush whether the same correlation exists in MOOCs. up on existing knowledge, obtain specific skills, or just watch the videos. These distinct motivations in turn lend themselves Online student interaction in blended courses has also been to different in-class behaviors including assignment viewing and linked to course performance. Dawson [8] extracted student forum access. The impact of user motivations in online courses and instructor social networks from a blended course’s online has been previously discussed by Wang et al. [32, 33]; we will discussion forums and found that students in the 90th grade build upon that work here. Thus it is an open question whether percentile had larger social networks than those in the 10th these motivations affect students’ community behaviors or not. percentile. The study also found that high-performing students primarily associated with other high-performing students and were more likely to be connected to the course instructor, while low-performing students tended to associate with other low- performers. In a blended course, this effect may be offset by the same material as a graduate-level course, Core Methods face-to-face interaction not captured in the online social network, in Educational Data Mining, at Teachers College Columbia but if the same separation happens in MOOC communities, low- University. The MOOC spanned from October 24, 2013 to performing students are less likely to have other chances to learn December 26, 2013. The weekly course was composed of lecture from high-performing ones. videos and 8 weekly assignments. Most of the videos contained in-video quizzes (that did not count toward the final grade). 2.3 Community Detection One of the primary activities students engage in on forums All of the weekly assignments were structured as numeric input is question answering. Zhang et al. [38] conducted a social or multiple-choice questions. The assignments were graded au- network analysis on an online question-and-answer forum about tomatically. In each assignment, students were asked to conduct Java programming. Using vertex in-degree and out-degree, they analyses on a data set provided to them and answer questions were able to identify a relatively small number of active users about it. In order to receive a grade, students had to com- who answered many questions. This allowed the researchers to plete this assignment within two weeks of its release with up develop various algorithms for calculating a user’s Java expertise. to three attempts for each assignment, and the best score out Dedicated question-and-answer forums are more structured than of the three attempts was counted. The course had a total MOOC forums, with question and answer posts identified, but a enrollment of over 48,000, but a much smaller number actively similar approach might help identify which students in a MOOC participated. 13,314 students watched at least one video, 1,242 ask or answer the most questions. students watched all the videos, 1,380 students completed at least one assignment,and 778 made a post or comment in the Choo et al. [5] studied community detection in Amazon product- weekly discussion sections. Of those with posts, 426 completed review forums. Based on which users replied to each other most at least one class assignment. 638 students completed the online often, they found communities of book and movie reviewers who course and received a certificate (meaning that some students had similar tastes in these products. As in MOOC forums, users could earn a certificate without participating in forums at all). did not declare any explicit social relationships represented in the system, but they could still be grouped by implicit connections. In addition to the weekly assignments the students were sent a survey that was designed to assess their personal motivations In the context of complex networks, a community structure is a for enrolling in the course. This survey consisted of 3 sets subgraph which is more densely connected internally than it is to of questions: MOOC-specific motivational items; two PALS the rest of the network. We chose to apply the Girvan-Newman (Patterns of Adaptive Learning Survey) sub-scales [21], Aca- edge-betweenness algorithm (GN) [13]. This algorithm takes as demic Efficacy and Mastery-Goal Orientation; and an item input a weighted graph and a target number of communities. focused on confidence in course completion. It was distributed It then ranks the edges in the graph by their edge-betweenness to students through the course’s E-mail messaging system to value and removes the highest ranking edge. To calculate Edge- students who enrolled in the course prior to the official start betweenness we identify the shortest path p(a,b) between each date. Data on whether participants successfully completed the pair of nodes a and b in the graph. The edge-betweenness course was downloaded from the same course system after the of an arc is defined as the number of shortest paths that it course concluded. The survey received 2,792 responses; 38% of participates in. This is one of the centrality measures explored the participants were female and 62% of the participants were by Kovanovic et al. above [15]. The algorithm then recalcu- male. All of the respondents were over 18 years of age. lates the edge-betweenness values and iterates until the desired number of disjoint community subgraphs has been produced. The MOOC-specific items consisted of 10 questions drawn from Thus the algorithm operates by iteratively finding and removing previous MOOC research studies (cf. [2, 22]) asking respondents the highest-value communications channel between communities to rate their reasons for enrollment. These 10 items address until the graph is fully segmented. For this analysis, we used traits of MOOCs as a novel online learning platform. Specifically, the iGraph library [7] implementation of G-N within R [24]. these 10 items included questions on both the learning content and features of MOOCs as a new platform. Two PALS Survey The strength of a candidate community can be estimated by scales [21] measuring mastery-goal orientation and academic modularity. The modularity score of a given subgraph is defined efficacy were used to study standard motivational constructs. as a ratio of its intra-connectedness (edges within the subgraph) PALS scales have been widely used to investigate the relation to the inter-connectedness with the rest of the graph minus the between a learning environment and a student’s motivation (cf. fraction of such edges expected if they were distributed at ran- [6, 20, 28]). Altogether ten items with five under each scale dom [13, 35]. A graph with a high modularity score represents were included. The participants were asked to select a number a dense sub-community within the graph. from 1 to 5 with 1 meaning least relevant and 5 most relevant. Respondents were also asked to self-rate their confidence on a 3. DATA SET scale of 1 to 10 as to whether they could complete the course according to the pace set by the course instructor. All three This study used data collected from the “Big Data in Education” groups of items were domain-general. MOOC hosted on the Coursera platform as one of the inaugural courses offered by Columbia University [32]. It was created in response to the increasing interest in the learning sciences and 4. METHODS educational technology communities in using EDM methods For our analysis, we extracted a social network from the online with fine-grained log data. The overall goal of this course was forum associated with the course. We assigned a node to each to enable students to apply each method to answer education student, instructor, or TA in the course who added to it. Nodes research questions and to drive intervention and improvement in representing students were labeled with their final course grade educational software and systems. The course covered roughly out of 100 points. The Coursera forums operate as standard threaded forums. Course participants could start a new thread the network would only share edges with vertices of different with an initial post, add a post to an existing thread, and add scores. Thus grade assortativity allows us to measure whether a comment or child element below an existing post. We added individuals are not just connected directly to individuals with a directed edge from the author of each post or comment to the similar scores but whether they correlate with individuals who parent post and to all posts or comments that preceded it on are one step removed. the thread based upon their timestamp. We made a conscious decision to omit the textual content of the replies with the goal Several commonly studied classes of networks tend to have pat- of isolating the impact of the structure alone. terns in their assortativity. Social networks tend to have high assortativity, while biological and technological networks tend We thus treat each reply or followup in the graph as an implicit to have negative values (dissortativity) [23]. In a homogeneous social connection and thus a possible relationship. Such implicit course or one where students only form stratified communities social relationships have been explored in the context of recom- we would expect the assortativity to be very high while in a het- mender systems to detect strong communities of researchers [5]. erogeneous class with no distinct communities we would expect This is, by design, a permissive definition that is based upon it to be quite low. the assumption that individuals generally add to a thread after viewing the prior content within it and that individual threads 4.2 Community Detection can be treated as group conversations with each reply being a The process of community detection we employed is briefly de- conscious statement for everyone who has already spoken. The scribed here [3]. As noted there we elected to ignore the edge resulting network forms a multigraph with each edge represent- direction when making our graph. Our goal in doing so was to ing a single implicit social interaction. We removed self loops focus on communities of learners who shared the same threads, from this graph as they indicate general forum activity but even when they were not directly replying to one-another. We not any meaningful interaction with another person. We also believe this to be a reasonable assumption given the role of class removed vertices with a degree of 0, and collapsed the parallel forums as a knowledge-building environment in which students edges to form a simple weighted graph for analysis. exchange information with the group. Individuals who partic- ipate in a thread generally review prior posts before submitting In the analyses below we will focus on isolating student perfor- their contribution and are likely to return to view the followups. mance and assessing the impact of the faculty and hub students. Homogeneity in this context would mean that students gathered We will therefore consider four classes of graphs: ALL the com- and communicated primarily with equally-performing peers and plete graph; Student the graph with the instructor and TAs thus that they did not consistently draw from better-performing removed; NoHub the graph with the instructor and hub users re- classmates and help lower-performing ones or that the at-will moved; and Survey which includes only students who completed communities served to homogenize performance, with the stu- the motivation survey. We will also consider versions of the above dents in a given cluster evening out over time. graphs without students who obtained a score of 0, and without the isolated individuals who connect with at most one other While algorithms such as GN are useful for finding clusters they person. As we will discuss below, a number of students received do not, in and of themselves, determine the right number of a zero grade in the course. Because this is an at-will course, how- communities. Rather, when given a target number they will seek ever, we cannot readily determine why these scores were obtained. to identify the best possible set of communities. In some imple- They may reflect a lack of engagement with the course, differen- mentations the algorithm can be applied to iteratively select the tial motivations for taking the course, a desire to see the course maximum modularity value over a possible range. Determining materials without assignments, or genuinely poor performance. the correct number of communities to detect, however, is a non-trivial task especially in large and densely connected graphs 4.1 Best-Friend Regression & Assortativity where changes to smaller communities will have comparatively Fire et al. [11] applied a similar social network approach to small effects on the global modularity score. As a consequence traditional classrooms and found a correlation between a stu- we cannot simply optimize for the best modularity score as we dent’s most highly connected neighbor (”best friend”) and the would risk missing small but important communities [12]. student’s grade. The links in that graph included cooperation on assignments as well as partnership on group assignments. Therefore, rather than select the clusterings based solely on To examine whether the same correlation existed in a massive the highest modularity, we have opted to estimate the correct online course in which students were less likely to know each number of clusters visually. To that end we plotted a series of other beforehand and there were no group assignments, we modularity curves over the set of graphs. For each graph G we calculated each student’s best friend in the same manner and applied the GN algorithm iteratively to produce all clusters in performed a similar correlation. the range (2,|GN |). For each clustering, we then calculated the global modularity score. We examined the resulting scores to The simple best friends analysis gives a straightforward mech- identify a crest where the modularity gain leveled off or began to anism for correlating individual students. However it is also decrease thus indicating that future subdivisions added no mean- worthwhile to ask about students who are one-step removed ingful information or created schisms in existing high-quality from their peers. Therefore we will also calculate the grade communities. This is a necessarily heuristic process that is sim- assortativity (rG ) of the graphs. Assortativity describes the cor- ilar to the use of Scree plots in Exploratory Factor Analysis [4]. relation of values between vertices and their neighbors [23]. The We define the number identified as the natural cluster number. assortativity metric r ranges between -1 and 1, and is essentially the Pearson correlation between vertex and their neighbors [23]. 5. RESULTS AND DISCUSSION A network with r =1 would have each vertex only sharing edges Before removing self-loops and collapsing the edges, the network with vertices of the same score. Likewise, if r =−1 vertices in contained 754 nodes and 49,896 edges. The final social network contained 754 nodes and 17,004 edges. 751 of the participants were students, with 1 instructor and 2 TAs. One individual was incorrectly labeled as a student when they were acting as the Chief Community TA. Since this person’s posts clearly indicated that he or she was acting in a TA capacity with regard to the forums, we relabeled him/her as a TA. Of the 751 students 304 obtained a zero grade in the course leaving 447 nonzero students. 215 of the 751 students responded to the motivation survey. There were a total of 55,179 registered users, so the set of 754 forum participants is a small fraction of the entire course audi- ence. However, forum users are not necessarily those who will make an effort or succeed in the course. Forum users did not all participate in the course, and some students who participated in the course did not use the forums: 1,381 students in the course got a grade greater than 0, and 934 of those did not post or comment on the forums, while 304 of the 751 students who did Figure 1: Modularity for each number of clusters, participate in the forums received a grade of 0. Clearly students including students with zeros. who go to the trouble of posting forum content are in some respect making an effort in the course beyond those who don’t, but this does not necessarily correspond to course success. 5.1 Best-Friend Regression & Assortativity We followed Fire et al.’s methodology for identifying Best Friends in a weighted graph and calculated a simple linear regression over the pairs. This correlation did not include the instructor or TAs in the analysis. We calculated the correlation between the students’ grades to their best friends’ grades in the set using Spearman’s Rank Correlation Coefficient (ρ) [34]. The two vari- ables were strongly correlated, ρ(748)=0.44, p<0.001. However, the correlation was also affected by the dense clusters of students with 0 grades. After removing the 0 grade students we found an additional moderate correlation, ρ(444)=0.29, p<0.001. Thus the significant correlation between best-friend grade and grade holds over the transition from the traditional classroom to Figure 2: Modularity for each number of clusters, a MOOC. This suggests that students in a MOOC, excluding the excluding students with zeros. many who drop out or do not submit assignments, behave sim- ilarly to those in a traditional classroom in this respect. These results are also consistent with our calculations for assortativity. 5.2 Community Structure There we found a small assortative trend for the grades as shown The modularity curves for the graphs both with and without in Table 1. These values reflect that a student was frequently zero-score students are shown in Figures 1 and 2. We exam- communicating with students who in turn communicated with ined these plots to select the natural cluster numbers which are students at a similar performance level. This in turn supports our shown in Table 2. As the values illustrate the instructor, TAs, belief that homogeneous communities may be found. As Table and hub students have a disproportionate impact on the graph 1 also illustrates, the zero-score students contribute substan- structure. The largest hub student in our graph connects to tially to the assortativity correlation as well with the correlation 444 out of 447 students in the network. The graph with all dropping by as much as a third when they were removed. users had lower modularity and required more clusters than the graphs with only students or only non-hubs (see Table 2), with Table 2: Graph sizes and natural number of clusters for each graph. Table 1: The grade assortativity for each network. Users Zeros V E Clusters Users Zeros V E rG All Yes 754 17004 212 All Yes 754 17004 0.29 All No 447 5678 173 All No 447 5678 0.20 Students Yes 751 15989 184 Students Yes 751 15989 0.32 Students No 447 5678 169 Students No 447 5678 0.20 Non-Hub Yes 716 9441 79 Non-Hub Yes 716 9441 0.37 Non-Hub No 422 3119 52 Non-Hub No 422 3119 0.24 Survey Yes 215 1679 58 Figure 3: View of the student communities with edges of frequency <2 removed. The Student network with (left) and without (right) hub-students, with each vertex representing a student and grade represented as color. the non-hub graph having the highest modularity. This suggests that non-hub students formed more isolated communities, while Table 3: Grade statistics by community, selected teaching staff and hubs communicated across these communities to show examples of more and less homogeneous and connected them. communities. Members Average Grade Standard Deviation This largely consistent with the intent of the forums and the 118 21.62 36.58 active role played by the instructor and TAs in monitoring and 41 22.00 32.45 replying to all relevant posts in the forums. It is particularly in- 34 25.41 40.44 teresting how closely the curves for the ALL and Student graphs 31 56.13 47.69 mirror one another. This may indicate that the hub students are 20 49.05 45.64 also those that followed the instructor and TAs closely, thus giv- 16 12.44 31.13 ing them isomorphic relationships, or it may indicate that they 14 88.43 22.47 are more connected than even the instructors and thus came to 12 96.08 6.36 bind the forums together on their own. This impact is further 11 96.45 7.38 illustrated by the cluster plots shown in Figure 3. Here the ab- 4 3.00 6.00 sence of the hub students results in a noticeable thinning of the 4 8.50 9.81 graph which in turn highlights the frequency of communication 4 4.25 8.50 that can be attributed to this, comparatively small, group. 4 96.25 3.50 The difference between the full plots and those with zero values are also notable as the zero grade students were clearly a major standard deviation for a small selection of the communities in factor in community formation. A direct examination of the the ALL reply network including zero-grades, hub students, user graph showed that many of the zero students were only and teaching staff. Several of the communities, particularly connected to other zero students or were not connected at all. the larger ones, do show a blend of good and poor students, This is also highlighted in Figure 3. In both graphs the bulk of with a high standard deviation. However many if not most of the zero score students are clustered in a tight network of com- the communities are more homogeneous with good and poor munities on the left-hand side. That super-community consists students sharing a community with similarly-performing peers. primarily of zero score students communicating with other zero- These clusters have markedly lower standard deviation. score students, a structure we have nick-named the ‘deathball.’ An examination of the grade distribution for each of the clusters 5.3 Student Performance & Motivation showed that the scores within each cluster were non-normal. As the color coding in Figure 3 illustrates, the students did Therefore we opted to apply the Kruskal-Wallis (KW) test to cluster by performance. Table 3 shows the average grade and assess the correlation between cluster membership and perfor- We also found that community membership was not a significant Table 4: Kruskal-Wallis test of student grade by predictor of whether students would complete the motivation community, for each graph. survey or of students’ motivations. We were surprised by the Users Zeros Chi-Squared df p-value fact that even when we focused solely on individuals who had All Yes 349.0273 211 < 0.005 completed the survey, the students did not connect by stated All No 216.1534 172 < 0.02 goals. This suggests to us that the students are more likely Students Yes 202.0814 78 < 0.005 coalescing around the pragmatic needs of the class or conceptual Students No 80.93076 51 < 0.005 challenges rather than on the winding paths that brought them Non-Hub Yes 309.8525 183 < 0.005 there. One limitation of this work is that by relying on the Non-Hub No 218.9603 168 < 0.01 forum data we were focused solely on the comparatively small Survey Yes 99.99840 577 < 0.005 proportion of enrolled students (6%) who actively participated in the forums. This group is, by definition a smaller set of more actively-involved participants. mance. The KW test is a nonparametric rank-based analogue In addition to addressing our primary questions this study also to the common Analysis of Variance [17]. Here we tested grade raised a number of open issues for further exploration. Firstly, by community number with the community being treated as a this work focused solely on the final course structure, grades, and categorical variable. The results of this comparison are shown motivations. We have not yet addressed whether these commu- in Table 4. As that illustrates, cluster membership was a sig- nities are stable over time or how they might change as students nificant predictor of student performance for all of the graphs drop in our out. Secondly, while we ruled out motivations as a with the non-zero graphs having markedly lower p-values than basis for the community this work we were not able to identify those with zero students included. These results are consistent what mechanisms do support the communities. And finally this with our hypothesis that students would form clusters of equal- study raises the question of generality and whether or not these performers and we find that those results hold even when the results can be applied to MOOCs offered on different topics or highly-connected instructors, TAs and hub students are included. whether the results apply to traditional and blended courses. We performed a similar KW analysis for the questions on the In subsequent studies we plan to examine both the evolution of motivation survey and for a binary variable indicating whether the networks over time as well as additional demographic data or not the student completed the survey at all. For this analysis with the goal of assessing both the stability of these networks we evaluated the clusters on all of the graphs. We found no and the role of other potential latent factors. We will also significant relationship between the community structure on examine other potential clustering mechanisms that control for any of the graphs and the survey question results or the survey other user features such as frequency of involvement and thread completion variable. Thus while the clusters may be driven by structure. We also plan to examine other similar datasets to separate factors they are not reflected in the survey content. determine if these features transition across classes and class types. We believe that these results may change somewhat once 6. CONCLUSIONS AND FUTURE WORK students can coordinate face to face far more easily than online. Our goal in this paper was to expand upon our prior community detection work with the goal of aligning that work with prior research on peer impacts, notably the work of Fire et al. [11]. 7. ACKNOWLEDGMENTS We also sought to examine the impact of hub students and This work was supported by NSF grant #1418269: “Modeling student motivations on our prior results. Social Interaction & Performance in STEM Learning” Yoav Bergner, Ryan Baker, Danielle S. McNamara, & Tiffany Barnes To that end we performed a novel community clustering analysis Co-PIs. of student performance data and forum communications taken from a single well-structured MOOC. As part of this analysis we 8. REFERENCES described a novel heuristic method for selecting natural numbers [1] R. Azen and D. Budescu. The dominance of clusters, and replicated the results of prior studies of both analysis approach for comparing predictors in multiple immediate neighbors and second-order assortativity. regression. Psychological Methods, 8(2):129–48, 2003. [2] Y. Belanger and J. Thornton. Consistent with prior work, we found that students’ grades Bioelectricity: A quantitative approach Duke University’s were significantly correlated with their most closely associated first MOOC. Journal of Learning Analytics, 2013. peers in the new networks. We also found that this correlation [3] R. Brown, C. F. Lynch, M. Eagle, J. Albert, T. Barnes, extended out to their second-order neighborhood. This is consis- R. Baker, Y. Bergner, and D. McNamara. Good tent with our prior work showing that students form stable user communities and bad communities: Does membership communities that are homogeneous by performance. We found affect performance? In C. Romero and M. Pechenizkiy, that those results were stable even if instructors, hub players, editors, Proceedings of the 8th International students with 0 scores, and students who did not fill out the sur- Conference on Educational Data Mining, 2015. submitted. vey were removed from consideration. This suggests that either [4] R. B. Cattell. The scree test for the number of factors. the students are forming communities that are homogeneous or Multivariate Behavioral Research, 1(2):245–276, 1966. that the effect of those individual and network features on the communities and on performance is minimal. [5] E. Choo, T. Yu, M. Chi, and Y. Sun. Revealing and incorporating implicit communities to improve recommender systems. In M. Babaioff, V. Conitzer, and D. Easley, editors, ACM Conference on Economics and Computation, EC ’14, Stanford, structure, student motivation, and academic achievement. CA, USA, June 8-12, 2014, pages 489–506. ACM, 2014. Annual Review of Psychology, 57:487–503, 2006. [6] K. Clayton, F. Blumberg, [21] C. Midgley, M. L. and D. P. Auld. The relationship between motivation Maehr, L. Hruda, E. Anderinan, L. Anderman, and K. E. learning strategies and choice of environment whether Freeman. Manual for the Patterns of Adaptive Learning traditional or including an online component. British Scales (PALS). University of Michigan, Ann Arbor, 2000. Journal of Educational Technology, 41(3):349–364, 2010. [22] MOOC @ Edinburgh 2013. MOOC @ Edinburgh [7] G. Csardi and T. Nepusz. 2013 - report #1. Journal of Learning Analytics, 2013. The igraph software package for complex network [23] M. E. Newman. Assortative Mixing in Networks. research. InterJournal, Complex Systems:1695, 2006. Physical Review Letters, 89(20):208701, Oct. 2002. [8] S. Dawson. ’Seeing’ the learning [24] R Core Team. R: A Language community: An exploration of the development of a and Environment for Statistical Computing. R Foundation resource for monitoring online student networking. British for Statistical Computing, Vienna, Austria, 2012. Journal of Educational Technology, 41(5):736–752, 2010. [25] B. Rienties, P. Alcott, and D. Jindal-Snape. [9] J. Eckles and E. Stradley. A To let students self-select or not: That is the social network analysis of student retention using archival question for teachers of culturally diverse groups. Journal data. Social Psychology of Education, 15(2):165–180, 2012. of Studies in International Education, 18(1):64–83, 2014. [10] A. Fini. The technological [26] T. Rizzuto, J. LeDoux, and J. Hatala. It’s not just what you dimension of a massive open online course: The know, it’s who you know: Testing a model of the relative case of the CCK08 course tools. The International Review importance of social networks to academic performance. Of Research In Open And Distance Learning, 10(5), 2009. Social Psychology of Education, 12(2):175–189, 2009. [11] M. Fire, [27] C. P. Rosé, R. Carlson, D. Yang, M. Wen, L. Resnick, G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting P. Goldman, and J. Sherer. Social factors that contribute to student exam’s scores by analyzing social network data. In attrition in MOOCs. In Proc. of the first ACM conference Active Media Technology, pages 584–595. Springer, 2012. on Learning@ scale conference, pages 197–198. ACM, 2014. [12] S. Fortunato and M. Barthélemy. [28] A. M. Ryan and H. Patrick. The classroom Resolution limit in community detection. Proc. social environment and changes in adolescents’ motivation of the National Academy of Sciences, 104(1):36–41, 2007. and engagement during middle school. American [13] M. Girvan and M. E. J. Newman. Community structure Educational Research Journal, 38(2):437–460, 2001. in social and biological networks. Proc. of the National [29] D. Seaton, Y. Bergner, I. Chuang, P. Mitros, and Academy of Sciences, 99(12):7821–7826, June 2002. D. Pritchard. Who does what in a massive open online [14] J. Huang, A. Dasgupta, A. Ghosh, course? Communications of the ACM, 57(4):58–65, 2014. J. Manning, and M. Sanders. Superposter behavior [30] G. Stahl, T. Koschmann, in MOOC forums. In Proc. of the first ACM conference and D. Suthers. Computer-supported collaborative on Learning@ scale conference, pages 117–126. ACM, 2014. learning: An historical perspective. Cambridge [15] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala. handbook of the learning sciences, 2006:409–426, 2006. What is the source of social capital? the association [31] L. Van Dijk, G. Van Der Berg, and H. Van Keulen. between social network position and social presence in Interactive lectures in engineering education. European communities of inquiry. In S. G. Santos and O. C. Santos, Journal of Engineering Education, 26(1):15–28, 2001. editors, Proc. of the Workshops held at Educational [32] Y. Wang and R. Baker. Content or platform: Data Mining 2014, co-located with 7th International Why do students complete MOOCs? MERLOT Journal Conference on Educational Data Mining (EDM of Online Learning and Teaching, 11(1):191–218, 2015. 2014), London, United Kingdom, July 4-7, 2014., volume [33] Y. Wang, L. Paquette, and R. Baker. 1183 of CEUR Workshop Proc. CEUR-WS.org, 2014. A longitudinal study on learner career advancement [16] A. D. I. Kramer, J. E. Guillory, in MOOCs. Journal of Learning Analytics, 1(3), 2014. and J. T. Hancock. Experimental evidence of massive-scale [34] Wikipedia. Spearman’s emotional contagion through social networks. Proc. of the rank correlation coefficient — Wikipedia, the free National Academy of Sciences, 111(24):8788–8790, 2014. encyclopedia, 2013. [Online; accessed 27-February-2013]. [17] W. H. Kruskal and W. A. Wallis. Use [35] Wikipedia. Modularity (networks) — Wikipedia, the free of ranks in one-criterion variance analysis. Journal of the encyclopedia, 2014. [Online; accessed 5-February-2015]. American statistical Association, 47(260):583–621, 1952. [36] D. Yang, T. Sinha, D. Adamson, and C. P. Rose. Turn on, [18] N. Li, H. Verma, A. Skevi, tune in, drop out: Anticipating student dropouts in massive G. Zufferey, J. Blom, and P. Dillenbourg. Watching open online courses. In Proc. of the 2013 NIPS Data-Driven MOOCs together: investigating co-located MOOC Education Workshop, volume 10, page 13, 2013. study groups. Distance Education, 35(2):217–233, 2014. [37] W. W. Zachary. An information [19] T. R. Liyanagunawardena, A. A. Adams, and S. A. flow model for conflict and fission in small groups. Williams. MOOCs: A systematic study of the published Journal of Anthropological Research, 33:452–473, 1977. literature 2008-2012. The International Review of Research [38] J. Zhang, M. S. Ackerman, and L. Adamic. in Open and Distributed Learning, 14(3):202–227, 2013. Expertise networks in online communities: structure and [20] J. L. Meece, algorithms. In Proc. of the 16th international conference E. M. Anderman, and L. H. Anderman. Classroom goal on World Wide Web, pages 221–230. ACM, 2007.