Social Positioning and Performance in MOOCs Suhang Jiang Sean M. Fitzhugh Mark Warschauer School of Education Department of Sociology School of Education University of California, Irvine University of California, Irvine University of California, Irvine Irvine, CA 92697 Irvine, CA 92697 Irvine, CA 92697 suhangj@uci.edu sean.fitzhugh@uci.edu markw@uci.edu ABSTRACT assisted collaborative learning, allow learners to connect, Literature indicates that centrality is correlated with learners’ exchange ideas, and stimulate thinking [3]. Social network engagement in MOOCs. This paper explores the relationship analysis (SNA) is valuable for analyzing the dynamics of these between centrality and performance in two MOOCs. We found discussions, as it emphasizes the structure and the relationship of one positive and one null correlation between centrality and grade actors [2]. SNA is thus a practical means for gaining insight into scores at the end of the MOOCs. In both MOOCs, we found out the relations and collaborative patterns of learners in the forum that learners tend to communicate with learners in different [8]. Learners’ behaviors measured by social network metrics (e.g. performance groups. This suggests that MOOCs’ discussion authority and hub) in discussion forums have been identified as forum serves to facilitate information flow and help-seeking positively correlated with learners’ engagement in MOOCs [12]. among learners. Previous research on online education indicates that network measures of centrality (out-degree) and prestige (in-degree) is Keywords strongly associated with learners’ cognitive learning outcomes [10]. Research in online collaborative learning community found MOOCs; Social Positioning; Performance out that central actors tend to have higher final grades and suggested that communication and social networks should be 1. INTRODUCTION central elements in distributed learning environments [4]. Massive Open Online Courses (MOOCs) have attracted over 7 million users in the past two years. In addition to offering videos The embedded theory states that learners’ embeddedness in the and online quizzes that users can watch and take, a key feature of social networks that pervades the educational programs predicts MOOCS is that they contain some platform for discussion among their satisfaction and performance [1]. We hypothesize that users. Indeed, discussion forums can even be considered a learners’ embeddeness in online learning environment is also defining feature of a MOOC, because, without such forums, a positively correlated with their performance. Three centrality MOOC is more like a collection of online instructional resources metrics, i.e.degree centrality, betweenness centrality and closeness rather than an interactive course. centrality are proposed to reflect embeddness in the online learning networks. Our own preliminary data analysis of 15 MOOCs offered at the University of California, Irvine, indicates that the number of posts This paper explores whether the correlation between the three in MOOC discussion forums significantly predicts the number of centrality metrics and academic performance exists in the MOOC people who complete MOOCs. Online discussion forums serve an settings. The study mainly focused on learners who took part in important role in the collaborative learning process of learners [9]; the discussion forum. however, little research explores the relationship between social positioning in the forum and the performance at the end of the 3. DATASET course in online learning environments. To better understand The project focuses on two online courses named “Intermediate learners’ interaction patterns in MOOC discussions, we employed Algebra” and “Fundamentals of Personal Financial Planning” social network analysis to study the collaborative learning process delivered via the Coursera platform. The Intermediate Algebra in the discussions of two large MOOCs. Social network analysis MOOC was 10 weeks long and developed by professors from is a methodology that identifies the underlying patterns of social University of California, Irvine. It was open for all to enroll for relations of actors [11]. This paper compares the discussion forum free. A total 63,100 learners registered in the course, among activities of two MOOCs and examines three centrality metrics of which 43,342 learners had a record in the gradebook and 23,662 online learners—degree centrality, betweenness centrality, and learners accessed course materials. The course consisted of lecture closeness centrality—and their relationship with learner videos, weekly quizzes, and the final exam. The quizzes performance. accounted for 20% of the final course grade while the final exam accounted for 80% of the final grade. Learners who obtained 65% 2. RELATED WORK or more of the maximum possible score were awarded with the Threaded discussion forums, an important component of computer Statement of Accomplishment, i.e. the Normal certificate. Learners who achieved 85% or more of the maximum possible score were rewarded the Statement of Accomplishment with Distinction, i.e. the Distinction certificate. The Financial Planning MOOC was 7 weeks long and developed by a certified financial planner practitioner from University of California, Irvine. Over 110,000 learners had enrolled in the course, among which 84,234 leaners have record in the gradbook and about 55, 000 learners accessed course materials. The course evaluation consisted of weekly quizzes (30%), one peer assessment (30%) and the final exam (40%). Learners who received a minimum of 70% on all graded assignment received the Statement of Accomplishment; those who received a minimum of 85% of all graded assignment obtained the Statement of Accomplishment with Distinction. In the Algebra course, 2,126 learners participated in the forum during the 10 week course duration. Among them, 1,558 were identified as learners with an academic record, who can be found in the gradebook. It is unclear why a certain percentage of users who participated in the forum, but did not have a record in the gradebook. A possible explaination is that some are instructors and teaching assistants. The percentage of MOOC forum participation of the three performance groups is relatively constant, with 68% of forum participants as none-certificate earners. Table 1 shows the composition of forum participants. Table 1 Composition of Discussion Forum Participants Performance Algebra Financial Planning Group Distinction 311 20% 998 24% Normal 193 12% 337 8% Figure 1: Algebra Network None 1054 68% 2897 68% In total 1558 100% 4232 100% 3.1 Network Descriptive To create each network we used the following procedure. The forum consists of several sub-forums. Users can initiate a thread in a sub-forum, make posts to a thread, and make comments to a post. Each thread and post serves as a site of interaction among learners. Learners engage in a variety of actions: asking questions, seeking help, and providing assistance to fellow learners. We treat individuals as tied if they co-participate in a thread or a post. These ties represent communication among learners. Although one could create directed ties between individuals who address each other directly in the posts/comments, doing so would require extensive reading and coding of the data and tackling issues such as how to define direct communication (e.g., is implied communication sufficient, or must the alter be directly named?). Given the size of our data, such an approach is infeasible for our purposes. The Algebra course discussion network has 1,389 nodes, as not all 1,558 individuals participated in the discussion forum have a record in the gradebook. The network has 3,540 edges. We Figure 2: Financial Planning Network illustrate it below in Figure 1. Nodes colored according to their dominated by a large component with a mix of isolates and performance groups. The network is dominated by a large, dense smaller components. Although the financial planning discussion component with a periphery of low-degree actors. A few isolates network is much larger than the algebra network, mean degree is and lone dyads are also present. Nodes of different performance lower. The average degree is 3.32. Like the algebra network, groups appear to be intermixed throughout the main component nodes with performance achievements of “normal” or and the rest of the graph. “distinction” have higher degree than those in the “none” Mean degree is 5.10, although mean degree varies slightly by category. Those in the “none” category have an average of 2.80 performance group. Those in the “none” category have the lowest ties, followed by the “normal” category with 4.15 ties, and mean degree (4.36) while those in the “normal” performance have “distinction” which has an average of 4.48 ties. a mean degree of 8.249 and individuals earning “distinction” have a mean degree of 5.502. 4. METHOD Our analysis consists of analyzing the graph-level centralization More than twice as large as the algebra course discussion and node-level centrality with permutation tests. network, the financial planning course discussion network has 3,317 nodes and 5,505 edges. We depict the network in Figure 2. Like the algebra network, the financial planning network is 4.1 Centrality of centralization in the discussion network are greater than what Among the most common structural indices employed in the we could expect from graphs of the same size with the same analysis of networks are centrality indices. These measures number of edges. demonstrate the extent to which a node has a central position in The second non-parametric network method we employ is the the network [5][11]. Several measures of centrality exist and we matrix permutation test, often referred to as the quadratic utilize three of the most common measures in this paper: degree, assignment procedure or QAP test [7]. This test evaluates betweenness, and closeness. One of the simplest centrality correlations between matrices by permuting rows and columns of indices, degree, measures the total number of alters to which a the matrices, recalculating the test statistic, and measuring node is tied. In the context of our MOOC network, this represents whether it is greater or less than the observed value. This test the number of other learners to which one is tied through controls for the structure of the network and allows us to participation in discussion forum threads. Those with high degree determine whether the labels (i.e., categorical attributes) of the have greater levels of participation in a variety of threads that put network explain its structure. Where the correlation between the them in contact with other learners. We also utilize betweenness, permuted graph rarely exceeds the observed test statistic, we find which measures the extent to which a node bridges other nodes by evidence that the observed statistic is greater than we would lying on a large number of shortest paths between them. Nodes expect by chance. We use this technique in our MOOC network with high betweenness have been described as having some to measure whether similarity in grades between any given pair of degree of control over the communication of others [5] as well as individuals is associated with the presence of a tie between those greater opportunities to exert interpersonal influence over others individuals. [11]. Nodes with high betweenness in these MOOCs participate in discussions in such a way to learners across multiple forum 5. RESULTS threads. Finally, we measure closeness, which measures the To determine whether observed graph-level centralization exceeds extent to which a node has short paths to other nodes in the levels we would expect by chance, we use conditional uniform network. Nodes with high closeness centrality are described as graph (CUG) tests conditioned on the dyad census. We hold being in the “middle” of the network structure [2]. Because the constant the number of nodes and number of dyads (either mutual standard definition of closeness does not accommodate networks or null, given our undirected graph) when running the test. In our with multiple components, we use the Gil and Schmidt algebra network, degree centralization (.164), betweenness [6]approach of measuring closeness of a node as the sum of the centralization (.269), and closeness centralization (.0001) all inverse distances to all other nodes. exceed chance levels, with p-values less than .01. These results In addition to measuring node-level centrality, we also measure are consistent with the financial planning course, where degree graph-level centralization. Unlike the node-level centrality centralization (.354), betweenness centralization (.626), and indices described above, these graph-level indices produce one closeness centralization (.001) were all significantly higher than measure for the entire graph. These indices measure the baseline (p <.01). These results indicate that both of our observed difference between the most central node and the centrality scores networks have much higher levels of centralization than we would for all other nodes in the network in order to provide a graph-level expect by chance. These networks are characterized by measure of the extent to which centrality is concentrated on a concentrations of centrality on a handful of nodes. While certain small portion of the network’s nodes. We compute these nodes have high levels of centrality, others lack centrality in the centralization scores for the three aforementioned centrality network. measures: degree, betweenness, and closeness. These measures We assess node-level centrality by relating our three centrality demonstrate the extent to which centrality is dominated by a small measures with attainment measures in the course. For each of the number of learners in the discussion network. nodes in the network, we calculate its degree, betweenness, and closeness and measure the correlation of centrality with the final 4.2 Permutation Test grade in the course. The correlation between the algebra course Because we cannot guarantee the normality assumptions required grade and degree (r=.043, p=.029), betweenness (r=.046, p=.018) by many statistical tests, we use a variety of permutation tests to are significant while closeness (r=.028, p=.125) failed to achieve assess various features of the network. While we use standard, significance in a non-parametric correlation test. Those with high non-parametric correlation tests, we also use non-parametric levels of degree and betweenness centrality have higher grades in network methods. These network methods uncover structural the algebra course. In the financial planning course we found no biases by using baseline models to determine the likelihood of evidence of a significant correlation between course grade and observing particular structural traits[2]. The results demonstrate degree (r=.003, p=.811), betweenness (r=-.002, p=.848), and the extent to which the network deviates from a reasonable closeness (r=-.006, p=.582). Individuals who are more central in baseline network. These tests allow us to test our hypotheses the financial planning discussion network did not appear to have despite the statistical complexities of the network notable differences in performance compared to those with lower representation. We use conditional uniform graph (CUG) tests to centrality. Although we find that both these networks have a high determine whether features of our observed graph occur at levels level of centralization, we find discrepancies between the exceeding what we would expect by chance. The CUG test correlation between centrality and course grade. While we find conditions on a certain set of network features (typically, size, no relation between the two in the financial course, we find a number of edges, or dyad census) and treats all graphs within that weakly positive relation between centrality (except closeness) and set as equally likely. It then draws at random from this set of grade in the algebra network. graphs and measures whether the statistic of interest is greater, Finally, we look for an association between learners’ scores and less than, or equal to the measure from our original, observed their propensities to form ties with one another. We use the graph. To the extent that few graphs drawn from the set exceed matrix permutation test, or QAP test, to find an association our observed measure, the measure is higher than we expect by between tie formation and similar performance in the classes, chance. In our analyses, we measure whether the observed levels where performance is measured as the overall grade or end-of- course distinction status. To measure this association, we In addition, we find in both networks a weak propensity for correlate the sociomatrix with a similarity matrix m, such that the individuals to form ties with classmates with very different grades i,j cell in the matrix represents the similarity in final grade or attainment. This suggests that the discussion forum serves an between individual i and individual j. To produce this matrix we important role in facilitating help seeking and promoting found the difference between i’s grade and j’s grade and communication between the knows and the know nots. subtracted it from 100, the maximum possible difference. The The study also has some limitations. For example, it mainly resulting scores represent similarity, where smaller scores indicate analyzed the behavior of learners who participated in the similar final grades while larger scores indicate large discussion forum, which only takes up a small proportion of discrepancies between their final grades. We use the same learners in MOOCs. In addition, we did not consider passive approach to construct a distance matrix for achievement status, forum participation, such as posts or comments viewing. The where learners who did not pass the class were scored as 0, while future research shall include the content analysis to analyze the learners who passed received a 1. In the algebra course we found cognitive engagement of MOOC learners. a significant, negative correlation between the observed sociomatrix and grade (r=-.005, p=.01) and achievement (-.007, p 7. ACKNOWLEDGMENTS < .01). These results suggest that there is an association between We are very indebted to the Digital Learning Lab, University of tie formation and difference in achievement; that is, algebra California, Irvine. learners with high achievement and high grades are more likely to be tied to learners with lower performance, and vice versa. In the 8. REFERENCES financial planning course we found similar results: negative 1. Baldwin, T.T., Bedell, M.D., and Johnson, J.L. The Social correlations between grade similarity (r=-.002, p=.08) and Fabric of a Team-Based M.B.A. Program: Network Effects on achievement status (r=-.005, p < .01). Although the relation is Student Satisfaction and Performance. The Academy of weak, it suggests that learners are more likely to form ties with Management Journal 40, 6 (1997), 1369–1397. learners who ended up with different achievement 2. Butts, C.T. Social network analysis: A methodological statuses. Learners who failed were more likely to communicate introduction. Asian Journal of Social Psychology 11, 1 (2008), with learners who passed, and vice versa. 13–41. 3. Calvani, A., Fini, A., Molino, M., and Ranieri, M. Visualizing 6. DISCUSSION AND CONCLUSTION and monitoring effective interactions in online collaborative The descriptive statistic shows that the discussion forum is mainly groups. British Journal of Educational Technology 41, 2 dominated by a small percentage of learners who contributed far (2010), 213–226. more than the rest of learners. This group of opinion leaders or 4. Cho, H., Gay, G., Davidson, B., and Ingraffea, A. Social knowledge source helps to build up and maintain the network. It networks, communication styles, and learning performance in also implies that the MOOCs’ network is more an information a CSCL community. Computers & Education 49, 2 (2007), network than a social network. 309–329. According to literature, a likely hypothesis would be that learners 5. Freeman, L.C. Centrality in social networks conceptual who perform well in a MOOC are more central in online clarification. Social Networks 1, 3 (1978), 215–239. discussions. However, our data demonstrated mixed results. In 6. Gil, J. and Schmidt, S. The Origin of the Mexican Network of one MOOC (Algebra) we found a significant relationship between Power. Proceedings of the International Social Network centrality in online discussions and student performance, while in Conference, (1996), 22–25. the other MOOC (Financial Planning) we found no relationship. 7. Krackardt, D. QAP partialling as a test of spuriousness. Social Networks 9, 2 (1987), 171–186. It is worthwhile to consider why there might have been 8. Nurmela, K., Lehtinen, E., and Palonen, T. Evaluating CSCL differences in outcomes between the two courses. Though our Log Files by Social Network Analysis. Proceedings of the study was not designed to pinpoint the cause of these differences, 1999 Conference on Computer Support for Collaborative they could be related to the differing purposes and audiences of Learning, International Society of the Learning Sciences the two MOOCs. The Algebra MOOC is more academically (1999). oriented and aims to prepare learners to succeed in higher 9. Rabbany, R., Elatia, S., Takaffoli, M., and Zaïane, O.R. education, whereas the Financial Planning MOOC is more geared Collaborative Learning of Students in Online Discussion toward assisting people in life skills. Due to the content of the Forums: A Social Network Analysis Perspective. In A. Peña- Financial Planning MOOC, learners who were actively involved Ayala, ed., Educational Data Mining. Springer International in the forum discussion may not have been very concerned about Publishing, 2014, 441–466. obtaining a certificate. Further social network analysis among a 10. Russo, T.C. and Koesten, J. Prestige, Centrality, and Learning: larger corpus of MOOC courses could reveal more about the A Social Network Analysis of an Online Class. relationship of course content to forum participation; we have Communication Education 54, 3 (2005), 254–261. recently obtained a corpus of data from 15 Coursera MOOCs at 11. Wasserman, S. Social Network Analysis: Methods and UCI and will conduct follow up research in this area. Applications. Cambridge University Press, 1994. Additionally, moving beyond permutation tests to model-based 12. Yang, D., Sinha, T., Adamson, D., and Rose, C.P. approaches such as ERGMs could provide further insight into the Anticipating student dropouts in Massive Open Online properties of these networks and the relations between individual Courses. (2013). positions and outcomes.