Dataset on an online collaborative learning situation in a computer networks course Cristina Villa-Torrano Pankaj Chejara Juan I. Asensio-Pérez Universidad de Valladolid Tallinn University Universidad de Valladolid cristina@gsic.uva.es pankajch@tlu.ee juaase@tel.uva.es Yannis Dimitriadis Miguel L. Bote-Lorenzo Alejandra Universidad de Valladolid Universidad de Valladolid Martínez-Monés yannis@tel.uva.es migbot@tel.uva.es Universidad de Valladolid amartine@infor.uva.es Eduardo Gómez-Sánchez Universidad de Valladolid edugom@infor.uva.es ABSTRACT rating. This paper presents a dataset of a collaborative learning sit- uation. Students were enrolled in two undergradute courses The literature has shown that students’ motivation and strate- on computer networks where they were required to carry out gic regulation play a critical role in their success in Science, a set of learning activities supported by Moodle and an on- Technology, Engineering and Mathematics (STEM) courses line collaborative environment called CoTrackV2. The data [2]. For example, in [8] the authors studied how motivation, collected includes logs of the writing process of shared doc- strategic self-regulation, and creative competency were as- uments, logs of the chat messages between the group mem- sociated with computational thinking knowledge and skills bers, and logs from Moodle with coarser-grained information in introductory computer science courses. They found that about course-level interactions. This dataset has been gen- student performance and long-term retention were positively erated with the aim of allowing researchers to study self- correlated with the use of self-regulated strategies. Con- and socially-shared regulation in online environments. cerning the motivation, higher pursuit of goals, and posi- tive affect were also correlated with high performers, higher Keywords knowledge retention, strategic self-regulation and engage- ment. Moreover, collaborative activities, especially those Computer-Supported Collaborative Learning, Socially-Shared including CSCL tools, have been shown to favor knowledge Regulation of Learning, Self-Regulated Learning building [7] and also can benefit from socially-shared reg- ulation in order to be successful. However, although there 1. INTRODUCTION are studies that show that it is necessary to develop regula- Academic and work contexts are increasingly demanding the tory processes while collaborating, further study is needed in competence of being able to collaborate with peers [5] as one STEM courses and, specifically, in computer science courses. of the 21st Century Skills [6]. In order to have a success- ful collaboration, many studies show that it is necessary to Concerning the latter, we have not found shared datasets develop regulatory processes where group members can acti- enabling the study of regulation in collaborative activities vate and maintain their cognition, motivation, and emotion in computer science. This is a challenging issue, because the towards their common goals [4]. This need is also present in study of regulation in collaborative learning settings requires computer science and engineering courses. Moreover, the use the collection and integration of a variety of data sources of Information Communication and Technology (ICT) tools like, for example, logs of different learning platforms, the to support collaboration (leading to Computer-Supported communication between group members, and self-reported Collaborative Learning settings (CSCL) [3]) enables the col- data. The absence of such a dataset led us to the need of lection of traces to model students’ behavior while collabo- generating one of them. Computer networks are part of the ACM Computer Curricula [1], and we had access to two courses on this topic. Therefore, we generated one dataset related to a learning situation on this subject, designed to fulfill the aforementioned requirements. Further details will be provided in the following sections. 2. CONTEXT AND DATASET Copyright ©2021 for this paper by its authors. Use permitted under Cre- ative Commons License Attribution 4.0 International (CC BY 4.0) 2.1 Description of the learning situation Table 1: Attributes provided in the document logs.csv file Attribute Description Example Timestamp Timestamp of the action 14:42:57 17-02-2021 Author Student ID a.I6ZFAmhSZ4KY2HU1 Group Group ID 1 How many access points do Char bank Characters added during this action you have throughout the hotel? Source length Length of the text before performing the action 2352 Type of operation (>: writing, Operation > <: deleting) The difference in number of characters caused Difference 56 by the current action and source length Have you contacted your Internet service Text Text from the document at the current time provider, i.e. your operator? [...] Table 2: Attributes provided in the chat logs.csv file Attribute Description Example Timestamp Timestamp of the action 22:15:18 16-02-2021 Author Student ID a.WCpdVcSKpEcVM13V Group Group ID 1 OK, let’s put a section of definitive questions at the end if Message Text message you want. Table 3: Attributes provided in the table moodle.csv file Attribute Description Example Timestamp Timestamp of the action 16/02/2021 22:40:00 The user ID, it can represent a student or a User ID a.WCpdVcSKpEcVM13V teacher If the teacher does an action it can involve other teachers or students. User involved a.I6ZFAmhSZ4KY2HU1 This attribute represents the user ID of the user involved. The section in Moodle in which the event Course: TRAFFIC ENGINEERING IN Event context occurred TELEMATIC NETWORKS (1-211-460-45033-1-2020) Component The type of resource in Moodle Questionnaire Event name The name of the event Course module viewed The user with id ’a.WCpdVcSKpEcVM13’ viewed Description The description of the action the ’resource’ activity with course module id ’974963’ The source from where Moodle has been Source web accessed and the action has been performed The IP address from which the action was IP Address 83.58.29.136 performed The learning situation took place at two undergraduate courses order to submit a common solution). The different activities on Computer Networks during 4 days in the spring semester were carried out during 4 two-hour face-to-face sessions. of the academic year 2021 in a European University. There were 33 students, that were grouped into 8 different groups The learning situation was based on the following scenario: of 4-5 people to carry out an introductory learning situa- A hotel owner (role played by the teacher) goes to a team tion aimed at challenging their previous knowledge and be- of telco engineers (role played by the students) to ask them liefs about certain computer network topics. Before starting to solve his problem: the internet connection is not work- the learning situation, students were asked to fill out an in- ing properly; the internet access is very slow and sometimes formed consent. does not work at all. The hotel owner and the telco engi- neers agree to an interview in a few days. In order for the The situation was designed following the so-called pyramid telco engineers to think about the problem, the hotel owner or snowfall pattern, where the students had to first carry sends them a diagram of the current network. The different out the proposed activities individually and then in groups activities that students needed to complete were: (thus fostering the agreement among the group members in • Questions ind (individual): Thinking of questions to 3. ANALYSIS ask the hotel owner to find out more about his network. The dataset we have generated may allow researchers to an- swer different research questions related to Socially-Shared • Questions group (in groups of 4-5 students): Agreeing Regulation of Learning (SSRL). For example, the research on 7 final questions to ask the hotel owner. questions that have guided the design of this learning situ- ation are the following: 1) How do self- and socially-shared • Questions class (whole class): Asking the hotel owner regulation processes occur in groups that complete group ac- about his network. For this task, there was a spokesper- tivities with different levels of success?; 2) Are there differ- son in each group. The teacher, playing the role of the ent patterns of regulation associated with the performance hotel owner, answered those questions posed by the of groups when solving activities? To answer these ques- groups. tions, we want to analyse the data from a temporal per- spective using different techniques, like process mining (e.g.: Heuristic Miner or Fuzzy Miner algorithms), Markov models • Diagnosis ind (individual): Proposing a solution to the (e.g.: pMiner algorithm), social network analysis (temporal hotel’s Internet access problem. networks) and epistemic network analysis. Beforehand, we want to identify SSRL features that allow us to map low- • Diagnosis group (in groups of 4-5 students): Agreeing level data to higher-level constructs. After that, we could on a final proposal with the rest of the group members. make use of the techniques mentioned above and compare the results of the different approaches. Beyond detecting the • Diagnosis class (whole class): Creating a concept map different processes, we would like to build predictive models of the technical concepts that emerged during the whole with the identified features. However, at this stage of the situation. research, it would be very beneficial to get feedback from the community to better guide the analysis. Students had to work through an online collaborative en- 4. ACKNOWLEDGMENTS vironment called CoTrackV21 . This environment offered This research is partially funded by the European Regional the possibility to write documents collaboratively and had Development Fund and the Spanish National Research Agency a built-in chat so that the different members of the group under project grant TIN2017-85179-C3-2-R. could communicate. In addition, students used Moodle to submit individual assignments, to visit subject-related con- tent and to access the link to the CoTrackV2 sessions, so we 5. REFERENCES [1] ACM and IEEE-CS. Computing Curricula 2020–CC were able to obtain traces of the content visited by the stu- 2020: Paradigms for Global Computing Education, A dents, the writing process and the chat messages. Besides Computing Curricula Series Report, 2020. these traces, at the end of the learning situation, the stu- dents answered a questionnaire related to group regulation. [2] N. R. Council. How students learn: History, mathematics, and science in the classroom. National Academies Press, 2004. 2.2 Description of the dataset [3] P. Dillenbourg, S. Järvelä, and F. Fischer. The The dataset collected2 is based on the logs of two differ- evolution of research on computer-supported ent tools: CoTrackV2 and Moodle. The data obtained by collaborative learning. In Technology-enhanced learning, CoTrackV2 is divided into 2 files: 1) document_logs.csv, pages 3–19. Springer, 2009. with actions from the writing process of the shared doc- [4] S. Järvelä, D. Gašević, T. Seppänen, M. Pechenizkiy, uments; and 2) chat_logs.csv, that contains the logs of and P. A. Kirschner. Bridging learning sciences, the communication between the group members. The at- machine learning and affective computing for tributes in the two files are presented in Table 1, and Table understanding cognition and affect in collaborative 2, respectively. Regarding the data obtained through Moo- learning. British Journal of Educational Technology, dle, we have 2 files: 1) moodle_logs.csv, with the logs of 51(6):2391–2406, 2020. the contents visited by the students. Details are given in [5] J. Malmberg, S. Järvelä, H. Järvenoja, and Table 33 ; and 2) individual_submissions.csv, where the E. Panadero. Promoting socially shared regulation of individual submissions for activities 1 and 4 are collected, learning in CSCL: Progress of socially shared regulation containing the timestamp of the submission, the id of the among high-and low-performing groups. Computers in student submitting the solution and the solution itself. Be- Human Behavior, 52:562–572, 2015. sides these files, we have two others: 1) a file containing the [6] A. J. Rotherham and D. Willingham. 21st century. learning design, including the start time and the name of Educational leadership, 67(1):16–21, 2009. the tasks; and 2) a file containing the students’ answers to [7] M. Scardamalia and C. Bereiter. Knowledge building. the final questionnaire. All files provided have been properly The Cambridge, 2006. anonymized. [8] D. F. Shell, M. P. Hazley, L.-K. Soh, E. Ingraham, and 1 S. Ramsay. Associations of students’ creativity, CotrackV2 website: https://www.cotrack.website/ motivation, and self-regulation with learning and 2 The dataset will be available at achievement in college computer science courses. In https://zenodo.org/record/5033198#.YNsQv-gzaUk 3 The examples presented in the different tables have been 2013 IEEE Frontiers in Education Conference (FIE), translated into English for a better understanding, but the pages 1637–1643. IEEE, 2013. dataset is in Spanish.