=Paper=
{{Paper
|id=Vol-1841/R07_124
|storemode=property
|title=Toward Criteria-Based Automatic Group Formation in MOOCs
|pdfUrl=https://ceur-ws.org/Vol-1841/R07_124.pdf
|volume=Vol-1841
|authors=Luisa Sanz-Martínez,Juan A. Muñoz-Cristóbal,Miguel L. Bote-Lorenzo,Alejandra Martínez-Monés,Yannis Dimitriadis
|dblpUrl=https://dblp.org/rec/conf/emoocs/Sanz-MartinezMB17
}}
==Toward Criteria-Based Automatic Group Formation in MOOCs==
<pdf width="1500px">https://ceur-ws.org/Vol-1841/R07_124.pdf</pdf>
<pre>
                                       Proceedings of EMOOCs 2017:
    Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track


    Toward Criteria-Based Automatic Group Formation in
                         MOOCs

        Luisa Sanz-Martínez, Juan A. Muñoz-Cristóbal, Miguel L. Bote-Lorenzo,
                  Alejandra Martínez-Monés, and Yannis Dimitriadis

                   GSIC-EMIC Research Group, Universidad de Valladolid, Spain.
                                  luisa@gsic.uva.es


        Abstract. Effective use of Collaborative Learning in MOOC contexts faces many
        challenges. One of them regards the possibility to create groups according to a set
        of criteria, which is not currentlly supported by MOOC platforms. This paper
        presents our work in progress on this problem. We introduce the design and
        initial results of an experiment where groups based on homogeneous levels of
        activity, as creation criteria, are compared with randomly created control groups.
        The preliminary results provide initial evidence about the feasibility and eventual
        advantages of using criteria-based group formation in MOOCs.

        Keywords: MOOCs, collaboration, CL, teams, group formation.


1       Introduction
The emergence and popularity of MOOCs have fostered many discussions in the
educational technology community regarding, among others, their instructional quality
and their high dropout rates [2]. Active learning and peer interaction can promote
students’ engagement [4], and collaboration can enrich learning through the
achievement of social and cognitive competences [9]. Therefore, many authors are
trying to include Collaborative Learning (CL) in MOOCs identifying important
research challenges related to promotion of social interactions that generate knowledge
[5].
     One of the challenges of including CL in MOOCs, given their massive and variable
scale, is the management of groups of students [10]. Moreover, in MOOCs, the notable
differences between the students’ engagement levels and their learning paces strongly
affect the composition and structure of teams. Furthermore, the teachers’ orchestration
tasks become more complex and the information they need to be aware of the groups’
progress is significantly increased, and therefore manual group management becomes
infeasible. All these reasons prompted us to gain insight on how we can support MOOC
teachers in the management of groups to perform CL.
     A few MOOC hosting platforms incorporate features for group management (e.g.
Canvas Network, NovoEd), but they only allow student’s self-selected groups,
automatically created random groups or groups created manually by the instructors.
Hence, the first research objective we want to accomplish is to provide support in the


                                                       83
                                         Proceedings of EMOOCs 2017:
      Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track


creation of criteria-based groups so that teachers can select pedagogical or pragmatic
criteria such as those they would apply in a non-massive context. Furthermore, we
want to test the utility for the group formation of the information registered in the
platform about the students’ activity. This type of dynamic data could reflect relevant
features of this context, such as the variable level of students’ engagement, the high
dropout rate, or the differences between learning paces [10].
    To reach this goal, our initial step has been to intervene in a MOOC using a
research prototype that creates groups based on data collected by the system about the
students’ registered activity. Then, the interactions and performance of the criteria-
based groups will be compared with those in the control groups (formed using the
random group creation feature provided by the platform). Such study may allow us to
extract conclusions about the convenience or not of using criteria-based groups in
MOOC contexts.
    The rest of the paper is organized as follows. Firstly, the research design is
presented including the experiment carried out. Finally, some provisional conclusions
are presented together with the short term and longer term future work.


2         Research Design
2.1      Context: The TraduEco MOOC

The course topic is an introduction to translation from Spanish to English over
economical and financial texts. It was originally conceived as an instructor-led MOOC
of seven weeks. We formed a co-design team composed of instructors and researchers,
and such team redesigned the course to incorporate CL activities to identify the
challenges it faces [7]. Therefore, a compulsory collaborative task was included on
weeks four and six. The task consists in extracting terminology from some given texts
in teams of six members. Each team has to create a group artifact including 20
economical or financial English terms and their corresponding Spanish translation. The
teams should use the group forums for sharing opinions, discussing and reaching
agreements in order to select the wanted terms and choose a spokesman who will be in
charge of the task submission. Finally, the activity can be considered as having been
completed, when each member performs an individual revision of the artifact produced
by another team.
    The course was deployed in the Canvas Network platform and began on February 6.
The total number of students enrolled at the time of writing this paper was 1025, but
only 909 remain still registered.
2.2      Methods

The primary research methodology adopted to conduct our work is based on the Design
Science Research Methodology (DSRM) [8]. The study reported in this paper is part of
the iterations defined in DSRM, and has as main goal to contribute to evaluate initial
ideas of the proposal in order to improve them in the next iterations.


                                                         84
                                         Proceedings of EMOOCs 2017:
      Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track


    We collected data from questionnaires, interviews and meetings with the MOOC’s
teachers to codesign the compulsory collaborative activity, which is the basis of the
grouping experiment. The Canvas LMS REST API provides us with information for the
analysis of the experiment results. We will combine the quantitative data obtained from
the platform with a qualitative analysis of: (a) communications between teachers and
students in Canvas during the mandatory collaborative activity, and (b) a final student
satisfaction survey.
    We will analyze this information to find out the differences between the
experimental (criteria-based) and the control (random) groups (see section 2.3)
regarding: (i) active teams, (ii) active participants per team, (iii) interactions within a
team, (iv) task completion rate, (v) student complaints, and (vi) student satisfaction
level. This analysis may provide initial evidences about the benefits and drawbacks of
using criteria-based teams to perform effective CL in MOOC contexts.

2.3      The experiment

The learning design of the course includes, on the fourth week, a mandatory
collaborative activity that has to be performed in groups of six members. Our
experiment consists in the automatic creation of teams using homogeneous criteria over
the students’ activity, and their comparison with a baseline of random teams used as
control group.
     There were several decisions that conditioned the experiment development. One of
the most important was the selection of the criteria to be used for creating the
experimental groups. We used dynamic factors (i.e., data from the activity of the
students in the platform) to respond to our research question regarding the relevance of
these data to reflect some peculiarities of the context (i.e. the variable engagement
level). Therefore we choose three variables to cover three aspects regarding the
students’ engagement level: (i) page views, as a reflection of their activity, (ii)
submitted tasks (both mandatory and optional), as a measure of their commitment, and
(iii) posted messages on discussion forums, to reveal their active participation [3].
Another major decision was the application of homogeneity over the criteria instead of
heterogeneity. The underlying reason was that, taking into account the group size (six
members) and MOOC statistics in literature (5-15% of completion rates), heterogeneity
over students’ activity criteria could be very similar to a random grouping (feature
covered in the Canvas platform) and could result in many teams with only one active
student.
     For the composition of the control group, we chose random grouping because it can
be performed automatically in Canvas and guarantees that all students will be included
in a group. However, the fact that in our approach the students with an activity profile
type of no-shows [1] were clustered together could be a big advantage over the random
teams, where the no-shows students would be spread over the teams. Therefore, we
decided to improve the baseline to compare with in order to obtain richer conclusions
about the advantages of using a criteria-based approach for grouping. Hence, in the
control group, we grouped together the students with zero page views prior to the
creation of the random teams.


                                                         85
                                          Proceedings of EMOOCs 2017:
       Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track


     The algorithm selected for implementing the homogeneous grouping was k-means
clustering because it is a well known, effective technique that works with big datasets
[11]. We combined it with a balancing algorithm to obtain clusters with exactly the
same number of members (same size k-means variation1).
     To carry out the experiment the following steps were followed:
     - Data preprocessing. Prior to the clustering process the data was standardized in
order to assign the same weight to the three selected variables (page views had a
dimension much bigger than the other two) as recommended in [6].
     - Finding out the statistical distribution of the selected variables (page views, task
submitted and forum messages). We used the Kolmogorov & Smirnov, and the
D’Agostino & Pearson tests, resulting a non-gaussian distribution of the three variables.
     - Creation of two subsets (the experimental group and the control group) checking
their uniformity regarding the variables used as grouping criteria. As a consequence of
the non-gaussian distribution of the variables, a Wilcoxon test was selected to verify
that the subsets do not differ regarding the variables. The array of students was shuffled
and split in two equal size subsets until the Wilcoxon test returned a p value greater
than 0.5 in the three variables used as grouping criteria (if p < 0.05, the samples would
be different with 95% confidence; if p >= 0.05 we cannot state that the samples differ;
we required a p > 0.5 to strengthen the non-difference between samples).
     - Creation of the teams in the control group. Firstly, students with zero page views
were grouped together and then, the rest of the students in the control group were
distributed randomly in six-member teams.
     - Creation of the teams in the experimental group. The selected clustering
algorithms were used to obtain clusters of six members based on homogeneity on the
three standardized variables.
     - Monitoring of teams’ activity. We retrieved data about: (i) number of messages in
each group discussion forum, (ii) number of different participants in each team, and
(iii) teams that complete the task submission.
     - Analysis of gathered data. Quantitative data about the students’ activity, and
qualitative data collected from students messages and a final satisfaction survey will
serve to obtain conclusions about the eventual advantages of homogeneous-activity
criteria-based teams.
2.4       Preliminary Results

At the time of writing this paper there were 18 experimental vs. 39 control teams with
registered activity. The total number of messages registered in the homogeneous-activity
teams was 167 versus the 143 registered in the random teams. In Figure 1 (left) we can
appreciate that there are less active teams in the experimental groups, but they have a high
number of messages exchanged. Figure 1 (right) shows 64 non-active experimental teams
versus 42 non-active control groups, due to the fact that in experimental teams the students
with a very low level of activity during the course were grouped together. Furthermore, in
this figure we can also observe that the number of teams with a single one active student is
more than a quadruple in the control groups than in the homogeneous-activity ones. The

1
    https://elki-project.github.io/tutorial/same-size_k_means


                                                          86
                                       Proceedings of EMOOCs 2017:
    Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track


number of teams with two active participants is also much higher in the control groups (12
versus 2). However, the number of teams with more than two participants is greater (or
equal in the case of four participants) in the homogeneous-activity teams. We can also
appreciate that there are only full active teams (with five or six active members) in the
experimental group.


 Fig. 1. Messages exchanged in the teams forums (left side) and number of teams with a certain
                           number of active members (right side)


3       Conclusions and Future Work

Due to the dispersion of active students in the control group we can observe a higher
number of active teams in it, but many of them are teams with only one active
participant. The number of teams with an isolated participant is more than a quadruple
in the random groups than in the homogeneous-activity ones. Taking into account that
we adopt the decision of segregating the students with zero page views in the control
group to improve the baseline to compare with, this result suggest that our approach
presents advantages regarding students isolation. Moreover, we can only find teams
with five or six active members in the experimental group, and the interactions and
number of messages exchanged within the them are more numerous. Therefore, at the
moment of writing this paper, the preliminary results suggest that there is more
collaboration in the experimental groups than in the control groups.
    In the short term our work is focused on supporting and gathering data while the
experiment of the fourth week is taking place. Then, we will repeat the experiment in the
sixth week in order to compare and analyze the evolution of data and the results. In the long
term, we plan new iterations of DSRM with an evolution of the tool prototype including
different types of grouping criteria and new experiments to evaluate it.


Acknowledgements

This research has been partially supported by the Junta de Castilla y León, Spain
(VA082U16) and Ministerio de Economía y Competitividad, Spain (TIN2014-53199-C3-2-
R). The authors thank the rest of the GSIC/EMIC research team as well as the Canvas team
for their valuable ideas and support.


                                                       87
                                        Proceedings of EMOOCs 2017:
     Work in Progress Papers of the Experience and Research Tracks and Position Papers of the Policy Track


References
1.  Alario-Hoyos, C., Perez-Sanagustín, M., Delgado-Kloos, C., Parada-G., H.A., Muñoz-
    Organero, M.: Delving into participants’ profiles and use of social tools in MOOCs. IEEE
    Transactions on Learning Technologies 7(3), 260-266 (2014)
2. Dillenbourg, P., Fox, A., Kirchner, C., Wirsing, M.: Massive Open Online Courses: Current
    State and Perspectives. Tech. Rep. 1 (2014)
3. Ferguson, R., Clow, D., Beale, R., Cooper, A.J., Morris, N., Bayne, S., Woodgate, A.:
    Moving through MOOCS: Pedagogy, learning design and Patterns of Engagement. In:
    Proceedings of the 10th European Conference on Technology Enhanced Learning (EC-TEL-
    2015), 15-18 September 2015, Toledo (Spain). vol. 9307, pp. 70-84. Springer Verlag,
    Toledo, Spain (2015)
4. Hew, K.F.: Promoting engagement in online courses: What strategies can we learn from
    three highly rated MOOCS. British Journal of Educational Technology 47(2), 320-341
    (2016)
5. Manathunga, K., Hernández-Leo, D.: Has Research on Collaborative Learning Technologies
    Addressed Massiveness? A Literature Review. Educational Technology & Society 4522, 1-
    14 (2015)
6. Mohamad, I.B., Usman, D.: Standardization and its effects on K-means clustering algorithm.
    Research Journal of Applied Sciences, Engineering and Technology 6(17), 3299-3303
    (2013)
7. Ortega-Arranz, A., Sanz-Martínez, L., Álvarez-Álvarez, S., Muñoz-Cristóbal, J.A., Bote-
    Lorenzo, M.L., Martínez-Monés, A., Dimitriadis, Y.: From Low-Scale to Collaborative,
    Gamified and Massive-Scale Courses: Redesigning a MOOC. In: Proceedings of the 5th
    European MOOCs Stakeholders Summit (eMOOCs 2017) (2017)
8. Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research
    methodology for information systems research. Journal of Management Information
    Systems 24(3), 45-77 (2007)
9. Roschelle, J., Teasley, S.D.: The construction of shared knowledge in collaborative problem
    solving. In: O'Malley, C. (ed.) Computer-Supported Collaborative Learning, pp. 69-97
    (1995)
10. Sanz-Martínez, L., Dimitriadis, Y., Martínez-Monés, A., Alario-Hoyos, C., Bote-Lorenzo,
    M.L., Rubia-Avi, B., Ortega-Arranz, A.: Influential factors for managing virtual groups in
    massive and variable scale courses. In: 2016 International Symposium on Computers in
    Education (SIIE). pp. 1-4 (2016)
11. Wen, M.: Investigating Virtual Teams in Massive Open Online Courses: Deliberation-based
    Virtual Team Formation, Discussion Mining and Support. Phd thesis, Carnegie Mellon
    University (2016)


                                                        88

</pre>