=Paper= {{Paper |id=Vol-1446/GEDM_2015_Submission_9 |storemode=property |title=Exploring the function of discussion forums in MOOCs: comparing data mining and graph-based approaches |pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_9.pdf |volume=Vol-1446 |dblpUrl=https://dblp.org/rec/conf/edm/VigentiniC15 }} ==Exploring the function of discussion forums in MOOCs: comparing data mining and graph-based approaches== https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_9.pdf
   Exploring the function of discussion forums in MOOCs:
    comparing data mining and graph-based approaches
          Lorenzo Vigentini                                                                            Andrew Clayphan
      Learning & Teaching Unit                                                                    Learning & Teaching Unit
           UNSW Australia,                                                                             UNSW Australia,
   Lev 4 Mathews, Kensington 2065                                                              Lev 4 Mathews, Kensington 2065
          +61 (2) 9385 6226                                                                           +61 (2) 9385 6226
     l.vigentini@unsw.edu.au                                                                     a.clayphan@unsw.edu.au


ABSTRACT                                                               learning environments [6]. These can be deployed in a variety of
In this paper we present an analysis (in progress) of a dataset        ways, ranging from a tangential support resource which students
containing forum exchanges from three different MOOCs. The             can refer to when they need help, to a space for learning with
forum data is enhanced because together with the exchanges and         others, driven by the activities students have to carry out (usually
the full text, we have a description of the design and pedagogical     sharing work and eliciting feedback). The latter, in a sense,
function of forums in these courses and a certain level of detail      emulates class-time in traditional courses providing a space for
about the users, which includes achievement, completion, and in        structured discussions about the topics of the course. One could
some instances more details such as: education; employment; age;       argue that like in face-to-face classes, the value of the interaction
and prior MOOC exposure.                                               depends on the importance attributed to the forums by the
                                                                       instructors. This is an interesting point to explore teachers’
Although a direct comparison between the datasets is not possible      presence and the value of their input in directing such
because the nature of the participants and the courses are             conversations. Mazzolini & Maddison characterize the role of the
different, what we hope to identify using graph-based techniques       teacher and teacher presence in online discussion forums as
is a characterization of the patterns in the nature and development    varying from being the ‘sage on the stage’, to the ‘guide on the
of communication between students and the impact of the ‘teacher       side’ or even ‘the ghost in the wings’ [7]. Furthermore they argue
presence’ in the forums. With the awareness of the differences, we     that the ‘ideal’ degree of visibility of the instructor in discussion
hope to demonstrate that student engagement can be directed ‘by-       forums depends on the purpose of forums and their relationship to
design’ in MOOCs: teacher presence should therefore be planned         assessment. There are also a number of accounts indicating that
carefully in the design of large-scale courses.                        students’ learning in forums is not very effective [8, 9]. However
                                                                       if one looks at the data there are numerous examples indicating
Keywords                                                               that behaviours in forums are good predictors of performance in
MOOCs, Discussion forums, graph-based EDM, pedagogy.                   the courses using them, particularly if forum activities are
                                                                       assessed [10,11,12,13]. Yet, forums in MOOCs tend to attract
                                                                       only a small portion of the student activity [14]. This is setting
1. INTRODUCTION                                                        forums in MOOCs apart from ‘tutorial-type’ forums used to
In the past couple of years MOOCs (Massive Open Online                 support students’ learning in online or blended courses in higher
Courses) have become the center of much media hype as                  education. Furthermore, some argue that active engagement is not
disruptive and transformational [1, 2]. Although the focus has         the only way of benefiting from discussion forums [15] and
been on a few characteristics of the MOOCS – i.e. free courses,        students’ characteristics and preferences could be more important
massive numbers, massive dropouts and implicit quality                 than the course design in determining the way in which they take
warranted by the status of the institutions delivering these courses   full advantage of online resources [16].
– a rapidly growing research interest has started to question the
effectiveness of MOOCS for learning and their pedagogies. If one
ignores entirely the philosophies of teaching driving the design
                                                                       2. THE THREE MOOCS IN DETAIL
and delivery of MOOCs going from the the socio-constructivist          In order to investigate the way in which students use the
(cMOOC, [4, 5]) to instructivist (xMOOC, [3]), at the practical        discussion forums, we have extracted data from three MOOCS
level, instructors have to make specific choices about how to use      delivered by a large, research intensive Australian university. The
the tools available to them. One of these tools is the discussion      three courses are: P2P (From Particles to Planets - Physics);
forum. Forums are one of the most popular asynchronous tools to        LTTO (Learning to Teach Online); and INTSE (Introduction to
support students’ communication and collaboration in web-based         Systems Engineering), which are broadly characterised in the top
                                                                       of Table 1. The courses were specifically designed in quite
                                                                       different ways to test hypotheses about their design, delivery and
                                                                       effectiveness.
                                                                       In particular, P2P was designed emulating a traditional university
                                                                       course in a sequential manner. All content was released on a
                                                                       week-by-week basis dictating the pace of instruction. LTTO and
                                                                       INTSE, instead were designed to provide a certain level of
                                                                       flexibility for the students to elect their learning paths. All content
was readily available at the start, however for LTTO, the delivery      in forum is 4x in magnitude compared to the other courses. Yet, if
followed a week-on-week delivery focusing on the interaction            we look at the average amount of posts or comments, the patterns
with students and a selective attention to particular weekly topics     are not straightforward to interpret, as the level of engagement is
(i.e. weekly feedback videos driven by the discussion forums as         similar across the courses with 3 to 5 posts per student and 1 to 3
well as weekly announcements). Although announcements were              comments (i.e. replies to existing posts), but with P2P showing a
used also in INTSE, the lack of weekly activities in the forums did     higher level of engagement than the other courses. One possible
not impose a strong pacing. In INTSE, the forums had only a             explanation is the different target group of the different courses
tangential support value and were used mainly to respond to             with INTSE including a majority of professional engineers with
students’ queries and to clarify specific topics emerging from the      postgraduate qualifications, P2P focusing on high school student
quizzes. Table 1 provides an overview of the different courses.         and teachers, and LTTO targeting a broad base of teachers across
This also shows that the forum activity in the various courses is a     different educational levels.
very small portion of all actions emerging from the logs of activity
which has been reported in the literature [9].                                                INTSE             LTTO               P2P
                                                                                                             Teachers at all   High school
                                                                        Target group         Engineers
3. DETAILS OF THE DATASET                                                                                        levels        and teachers
3.1 The dataset                                                         Course length         9 weeks           8 weeks          8 weeks
The data under consideration is an export form the Coursera                                     54                105               63
                                                                        Forums
platform. Raw forum database tables (posts, comments, tags,                                 (14 top level)    (17 top-level)   (15 top-level)
votes) as well as a JSON based web clickstream were used. The           Design mode         All-at-once       All-at-once      Sequential
clickstream events consist of a key which specifies action – either
a ‘pageview’ or ‘video’ item. Forum clickstream events were             Delivery mode       All-at-once        Staggered        Staggered
identified by a common ‘/forum’ prefix.                                 Use of forums       Tangential       Core activity      Support
The clickstream was further classified into: browsing; profile          N in forum          422 (2.1%)       1685 (9.3%)       293 (2.8%)
lookups; social interaction (looking at contributions); search;
tagging; and threads. From the classification it became evident the                            1361             6361              1399
                                                                        Tot posts
clickstream did not record all events, such as when a post or                                (avg=3.3)        (avg=3.8)         (avg=4.8)
comment was made, or when votes were applied. For these,                                        285             2728               901
                                                                        Tot comments
specific database tables were used. In order to manage different                             (avg=0.7)        (avg=1.7)         (avg=3.1)
data sets and sources, a standardized schema was built, allowing        Registrants            32705             28558            22466
disparate sources to feed into, but exposing a common interface to
conduct analysis over forum activities. This is shown in Figure 1.      Active
                                                                                                60%               63%              47%
                                                                        students1
          Figure 1. Forum data transformation process                                          4.2%              4.4%             0.7%
                                                                        Completing2
                                                                                              (0.3% D)           (2.4 D)          (0.2%)
                                                                        Table 1. Summary of the courses under investigation. NOTE:
                                                                         1. Active students are those appearing in the clickstream; 2.
                                                                        Completing students achieve the pass grade or Distinction (D)
                                                                        The type of activity is summarised in Figure 2. In the chart, the
                                                                        five categories refer to the following: View corresponds to listing
                                                                        forums, threads and viewing posts; Post is the writing of a post or
                                                                        start of a new thread; Comment is a reply to an existing post;
                                                                        Social refers to all actions engaging directly with other’s status
                                                                        (up-vote, down-vote and looking at profiles/reputation); Engage
                                                                        refers to the additional interaction with forums content (searching,
                                                                        tagging, ‘watching’ or subscribing to posts or threads).
                                                                        The viewing behaviour is the most prominent for both the student
                                                                        and instructor groups and the figures are pretty much similar
                                                                        across the board. A two-way ANOVA (2x5, role by activity) on
                                                                        the percentage of distributions, shows that there is no significant
                                                                        difference between students and instructors, but there is an
                                                                        obvious difference between views and the other types of
                                                                        behaviour (F(4,29) = 1656.3, p < .01).
                                                                        If we consider the engagement over the timeline and compare the
                                                                        type of activities carried out by students and instructors, Figure 3
3.2 An overview of forums activity                                      (end of the paper) shows the patterns for the three courses. The
There are very interesting trends which require more detailed           most striking pattern is that there doesn’t seem to be an obvious
examination (bottom of table 1). As expected, in LTTO the forum         one. For what concerns posts and views in all the three courses
activity is larger than in the other courses and this is probably due   there is a sense of synchronicity between the two groups, however
to the fact that students were asked to submit post in forums           from this chart it is not possible to understand in more detail what
following the learning activities. The proportion of active students    are the connections between what students and teachers do.
Instructors’ comments are slightly offset, possibly as a reaction to   activity of students in the forums is organized according to a set of
students’ posts. An interesting aspect is the amount of ‘social’       commonly used quantitative metrics and a couple of measures
engagement in the P2P course that merits further analysis.             borrowed from Social Network Analysis (table 2). Although this
                                                                       seems to be a promising approach, there are two issues with this
4. DIRECTIONS AND OPEN QUESTIONS                                       methodology in the MOOCs: 1) only a tiny proportion of students
From this coarse analysis it is apparent that there seem to be         can be considered active and 2) it is hard to scale the instructor’s
minimal behavioural differences in the way students and                evaluation. The first problem is not easily resolved and it is an
instructors interact in the different courses, however more analysis   issue in the literature reviewed [17, 18]; non-posting behavior is
is required to tackle questions about the individual differences in    considered as an index of disengagement, partly because this is
students’ and instructors’ patterns of interaction and their           easy to measure. In principle the latter could be substituted by
interrelations. Furthermore little can be said about how the nature    peer evaluation (up-vote, down-vote), but there is no easy way to
of interactions drives the development of communication and            ensure consistency.
engagement. However a number of questions like the following
                                                                          Indicator           Type                   Description
remain open and unanswered: how do discussions develop over
time? How teacher presence affects the development of                  Messages           Quantitative     Number of messages written by
discussions? Is the number of forums affecting how students                                                the student.
engage with them (i.e. causing disorientation)?                        Threads            Quantitative     Number of new threads created
                                                                                                           by the student.
       Figure 2. Distribution of forum activities by role
                                                                       Words              Quantitative     Number of words written by
                                                                                                           the student.
                                                                       Sentences          Quantitative     Number of sentences written
                                                                                                           by the student.
                                                                       Reads              Quantitative     Number of messages read on
                                                                                                           the forum by the student.
                                                                       Time               Quantitative     Total time, in minutes, spent
                                                                                                           on forum by the student.
                                                                       AvgScoreMsg        Qualitative      Average score on the
                                                                                                           instructor's evaluation of the
                                                                                                           student's messages.
                                                                       Centrality         Social           Degree centrality of the
                                                                                                           student.
                                                                       Prestige           Social           Degree prestige of the student.
                                                                       Table 2. Possible indicators characterising forum engagement


                                                                       An alternative method that can be explored is graph-based
                                                                       approaches. For example, Bhattacharya et al. [19] used graph-
                                                                       based techniques to explore the evolution of software and source
                                                                       branching providing an insight in the process. Kruck et al. [20]
                                                                       developed GSLAP, an interactive, graph‐based tool for analyzing
                                                                       web site traffic based on user‐defined criteria.
                                                                       Kobayashi et al. [18] used a method to quickly identify and track
                                                                       the evolution of topics in large datasets using a mix of assignment
                                                                       of documents to time slices and clustering to identify discussion
                                                                       topics. Yang et al [21] integrated graph-based clustering to
                                                                       characterize the emergence of communities and text-based
                                                                       analysis to portray the nature of exchanges. In fact, students move
                                                                       in the various sub-forums taking different roles or stances as they
                                                                       engage with different subsets of students. As the reasons to
                                                                       engage in these discussions are partly determined by different
                                                                       interests, goals, and issues, it is possible to construct a social
                                                                       network graph based on the post-reply-comment structure within
                                                                       threads. The network generated provides a possible view of a
4.1 The DM and graph-based approaches                                  student’s social participation within a MOOC, which may indicate
A possible way to answer the questions about the types/patterns of     some detail about their values, beliefs and intentions.
behaviours, the structure and development of networks and the
growth of groups/communities over time might be using data             Furthermore, Brown et al [22] have already shown the value of
mining and graph-based approaches. For example, [6] used a             exploring the communities in discussion forums in MOOCs
combination of quantitative, qualitative and social network            particularly for what concerns the homogeneity of performance
information about forum usage to predict students' success or          but dissimilarity of motivations characterizing student hubs.
failure in a course by applying classification algorithms and
classification via clustering algorithms. In their approach the
4.2 Discussion points                                                  [6] Cristóbal Romero, Manuel-Ignacio López, Jose-María Luna,
The examples above provide evidence of the potential for using             and Sebastián Ventura. 2013. Predicting students’ final
graph-based methods to obtain better insights into the process and         performance from participation in on-line discussion forums.
content analysis for our dataset and to extend its applicability to        Computers & Education 68 (October 2013), 458–472. DOI:
MOOCs, however there are a number of contentious points to                 http://dx.doi.org/10.1016/j.compedu.2013.06.009
raise which will provide opportunities for discussion.                 [7] Margaret Mazzolini and Sarah Maddison. 2007. When to
Firstly the number of students who are actively involved in                jump in: The role of the instructor in online discussion
discussion is a very small proportion of the active participants.          forums. Computers & Education 49, 2 (September 2007),
This means that the subset may not be representative at all. One           193–213.
could argue that these students are already engaged or desperately         DOI:http://dx.doi.org/10.1016/j.compedu.2005.06.011
need help. Previous literature [21, 22, 23] focused on the ability     [8] M.j.w. Thomas. 2002. Learning within incoherent structures:
to predict performance and on the peer effect which can emerge             the space of online discussion forums. Journal of Computer
from the analysis of the graphs/social networks.                           Assisted Learning 18, 3 (September 2002), 351–366.
Secondly, one could question the value of the communities in               DOI:http://dx.doi.org/10.1046/j.0266-4909.2002.03800.x
xMOOCs: especially when courses are designed with an                   [9] Daniel F.O. Onah, Jane Sinclair, and Russell Boyatt. 2014.
instructivits approach leading to mastery, by definition this is an        Exploring the use of MOOC discussion forums. In
individualistic perspective focused on the testing of one’s own            Proceedings of London International Conference on
skills/learning. Of course in cMOOCs -connectivists by design-             Education. London: LICE, 1–4.
the importance of the development of social support is essential.      [10] Alstete, J.W. and Beutell, N.J. Performance indicators in
This seems to be supported by Brown et al [22]: they were not               online distance learning courses: a study of management
able to uncover a direct relation between stated goals and                  education. Quality Assurance in Education 12, 1 (2004), 6–
motivations with the participation in forums, and attributed this to        14.
pragmatic needs. However, as the authors suggested earlier, the
instructors might play a fundamental role in shaping the               [11] Cheng, C.K., Paré, D.E., Collimore, L.-M., and Joordens, S.
communities based on the value attributed to forums in their                Assessing the effectiveness of a voluntary online discussion
plans/design and the level of engagement/interaction. Considering           forum on improving students’ course performance.
the split between cMOOCs and xMOOCs again, interesting work                 Computers & Education 56, 1 (2011), 253–261.
might come out of the experiment conducted by Rose’ and                [12] Palmer, S., Holt, D., and Bray, S. Does the discussion help?
colleagues in the DALMOOC in which automated agents were                    The impact of a formally assessed online discussion on final
deployed to support students’ conversations. In Coursera the                student results. British Journal of Educational Technology
deployment of ‘community mentors’ will be an interesting space              39, 5 (2008), 847–858.
to explore, given that the importance of design seems to be
                                                                       [13] Patel, J. and Aghayere, A. Students’ Perspective on the
removed from instructors in the ‘on-demand’ model.
                                                                            Impact of a Web-based Discussion Forum on Student
Lastly, more research is needed in the time-based dimension of              Learning. Frontiers in Education Conference, 36th Annual,
development of forums in MOOCs. Questions like how students                 (2006), 26–31.
bond and create stable relations, how they become authoritative
                                                                       [14] Jacqueline Aundree Baxter and Jo Haycock. 2014. Roles and
and what motivates them to contribute over time are all open
                                                                            student identities in online large course forums: Implications
questions which the analysis of graphs over time might be able to
                                                                            for practice. The International Review of Research in Open
address.
                                                                            and Distributed Learning 15, 1 (January 2014).

5. REFERENCES                                                          [15] Vanessa Paz Dennen. 2008. Pedagogical lurking: Student
[1] Dirk Jan van den Berg and Edward Crawley. Why MOOCS                     engagement in non-posting discussion behavior. Computers
    Are Transforming the Face of Higher Education. Retrieved                in Human Behavior 24, 4 (July 2008), 1624–1633.
    April 12, 2015 from http://www.huffingtonpost.co.uk/dirk-               DOI:http://dx.doi.org/10.1016/j.chb.2007.06.003
    jan-van-den-berg/why-moocs-are-                                    [16] René F. Kizilcec, Chris Piech, and Emily Schneider. 2013.
    transforming_b_4116819.html                                             Deconstructing Disengagement: Analyzing Learner
[2] Chris Parr. The evolution of Moocs. Retrieved April 12,                 Subpopulations in Massive Open Online Courses. In
    2015 from                                                               Proceedings of the Third International Conference on
    http://www.timeshighereducation.co.uk/comment/opinion/th                Learning Analytics and Knowledge. LAK ’13. New York,
    e-evolution-of-moocs/2015614.article                                    NY, USA: ACM, 170–179.
                                                                            DOI:http://dx.doi.org/10.1145/2460296.2460330
[3] C. Osvaldo Rodriguez. 2012. MOOCs and the AI-Stanford
    Like Courses: Two Successful and Distinct Course Formats           [17] Dennen, V.P. Pedagogical lurking: Student engagement in
    for Massive Open Online Courses. European Journal of                    non-posting discussion behavior. Computers in Human
    Open, Distance and E-Learning (January 2012).                           Behavior 24, 4 (2008), 1624–1633.

[4] George Siemens. 2005. Connectivism: A learning theory for          [18] Kobayashi, M. and Yung, R. Tracking Topic Evolution in
    the digital age. International journal of instructional                 On-Line Postings: 2006 IBM Innovation Jam Data. In T.
    technology and distance learning 2, 1 (2005), 3–10.                     Washio, E. Suzuki, K.M. Ting and A. Inokuchi, eds.,
                                                                            Advances in Knowledge Discovery and Data Mining.
[5] Stephen Downes. 2008. Places to go: Connectivism &                      Springer Berlin Heidelberg, 2008, 616–625.
    connective knowledge, Innovate.
[19] Bhattacharya, P., Iliofotou, M., Neamtiu, I., and Faloutsos,    [22] R. Brown, C Lynch, Y. Wang, M. Eagle, J. Albert, T.
     M. Graph-based Analysis and Prediction for Software                  Barnes, R. Baker, Y. Bergner, D. McNamara. Communities
     Evolution. Proceedings of the 34th International Conference          of performance and communities of preference. GEDM
     on Software Engineering, IEEE Press (2012), 419–429.                 2015, in press.
[20] Kruck, S.E., Teer, F., and Jr, W.A.C. GSLAP: a graph-based      [23] M. Fire, G. Katz, Y. Elovici, B. Shapira, and L. Rokach,
     web analysis tool. Industrial Management & Data Systems              “Predicting Student Exam’s Scores by Analyzing Social
     108, 2 (2008), 162–172.                                              Network Data,” in Active Media Technology, R. Huang, A.
                                                                          A. Ghorbani, G. Pasi, T. Yamaguchi, N. Y. Yen, and B. Jin,
[21] D. Yang, M. Wen, A. Kumar, E. P. Xing, and C. P. Rose,
                                                                          Eds. Springer Berlin Heidelberg, 2012, pp. 584–595.
     “Towards an integration of text and graph clustering methods
     as a lens for studying social interaction in MOOCs,” The
     International Review of Research in Open and Distributed
     Learning, vol. 15, no. 5, Oct. 2014.

     Figure 3. Time sequence of activity in the forums in the three courses by students and instructors grouped by activity type