Learning analytics tells: Know your basics and go to class Elena Tiukhova1,* , Charlotte Verbruggen1 , Bart Baesens1,2 and Monique Snoeck1 1 Research Centre for Information Systems Engineering, KU Leuven, Naamsestraat 69, 3000 Leuven, Belgium 2 Department of Decision Analytics and Risk, University of Southampton, University Road, SO17 1BJ, Southampton, UK Abstract Research on conceptual modeling education and learning analytics often lacks grounding in instructional design theories. Furthermore, the analysis of blended courses often neglects data about offline activities. This paper investigates the data of one conceptual modeling course the design of which is grounded on Bloom’s taxonomy and the 4C/ID model, and for which intent to participate in on-campus lab sessions was tracked. The results demonstrate that attending the on-campus lab sessions have high predictive value for study success. In addition, using Bloom’s cognitive levels confirms the value of organising assessment along these cognitive levels and offers perspectives for more efficient evaluation of conceptual modeling skills. Keywords Conceptual Modeling, Learning Analytics, Instructional Design 1. Introduction Teaching conceptual modeling to university students poses a complex challenge as it entails fostering essential competencies such as problem solving, system analysis, and abstract thinking [1]. To effectively cultivate these skills, a sophisticated course design is essential. Instructors must not only provide corrective feedback but also cognitive feedback on modeling solutions, enabling students to enhance their cognitive processes and reflect on the quality of their modeling [2]. Additionally, a shift towards a student-centered approach is required to promote active and motivated learning [3]. Lastly, achieving a deep understanding of modeling and the development of high-level cognitive skills requires technological support, which can manifest itself in various forms. On the one hand, employing modeling tools enriched with cognitive feedback can serve as a means to test the solutions developed, resulting in improved student performance [4]. On the other hand, incorporating online components into the course using an online course authoring platform empowers students to process the material at their own pace, thereby shifting the focus from passive to active learning [5]. ER2023: Companion Proceedings of the 42nd International Conference on Conceptual Modeling: ER Forum, 7th SCME, Project Exhibitions, Posters and Demos, and Doctoral Consortium, November 06-09, 2023, Lisbon, Portugal * Corresponding author. $ elena.tiukhova@kuleuven.be (E. Tiukhova); charlotte.verbruggen@kuleuven.be (C. Verbruggen); bart.baesens@kuleuven.be (B. Baesens); monique.snoeck@kuleuven.be (M. Snoeck)  0000-0002-5050-9417 (E. Tiukhova); 0000-0003-0418-2633 (C. Verbruggen); 0000-0002-5831-5668 (B. Baesens); 0000-0002-3824-3214 (M. Snoeck) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Course design should be carefully crafted in order to meet the aforementioned requirements. To do so, instructors can rely on frameworks such as Bloom’s taxonomy [6] and the Four- Component Instructional Design (4C/ID) model [7]. Unfortunately, research on conceptual modeling education lacks grounding in educational theories [8]. Learning analytics (LA) on student data can further inform teachers about usage of course elements and their impact on study success, thus providing information about the effectiveness of their course design. Unfortunately, in blended learning courses, data on offline activities is missing. The problem is worsened by the fact that LA often lacks grounding in educational design too. This paper presents the analysis of student activity in a conceptual modeling course. The course’s learning objectives and assessments (both formative and summative) are based on CaMeLOT, an educational framework that aligns learning outcomes with the cognitive levels of Bloom’s taxonomy [9]. The design of the course has been grounded in the 4C/ID model, and the intent to attend offline lab sessions has been recorded during 5 weeks. As such, the data collected for this course allows investigating the importance of features about in-person activities in predicting study success. Furthermore, the analysis can be related to instructional design elements, thus informing the teacher about the effectiveness of the course’s design. The remainder of the paper is structured as follows. Section 2 gives a general overview on the related conceptual modeling education literature. Section 3 describes the detailed research questions and methodology while Section 4 presents the results and discusses them. Finally, Section 5 highlights the main findings and outlines future research opportunities. 2. Related work The intersection between conceptual modeling (CM) and learning analytics (LA) is an emerging area of cross-domain research with only few articles discussing the application of LA to CM education. LA, which involves the measurement, collection, analysis, and reporting of learner and contextual data, aims to enhance learning and optimize the environments in which it occurs [10]. LA is particularly valuable in the context of CM as it allows gaining deeper insights into the learning processes of novice modelers and optimizing these processes. A prevalent research topic for CM education is the modeling tool support, i.e., modeling functionalities which assist in learning or teaching CM [8]. A predominant LA approach in the CM domain is to collect the data generated while executing modeling tasks in these tools [8]. As an example, Sedrakyan et al. [11], Deeva et al. [12], Claes et al. [13] analyse the event logs generated by the interactions with the modeling tool together with process mining techniques to discover the relationship of the modeling behavior with the final grade on the assignments. These studies do, however, not consider instructional design or learning objective scaffolds. Integrating online components in courses offers the advantage of gathering data for LA. Even basic statistics from online platforms can aid teachers in providing study advice and improving the course. By using such data, possibly combined with evaluation surveys and assessment scores, student profiles can be created based on their behavior. Such profiles have repeatedly been shown to correlate with study performance, see e.g. [14, 15]. However, despite the wealth of LA research, too often it does not adequately consider the instructional conditions of the analysed course [16]. 3. Methodology To investigate the importance of offline activities in predicting study success, we answer the following research question: RQ1 What is the relationship between various types of online and offline study activity and successful completion of the course? Furthermore, we related the results of the data analysis to the cognitive levels of Bloom’s taxonomy by answering the following questions: RQ2 How do achievements of learning objectives at different cognitive levels relate? RQ3 How do achievements of learning objectives at different cognitive levels relate to the study activity level? In order to address these research questions, we utilize several data sources of one master- level course on teaching conceptual modeling and engineer study indicators from this data to represent self-regulation of the students. Second, we use this data to cluster the students based on their study activity and analyse final exam grades distribution with respect to these clusters and particular activity types. Third, we map different parts of the summative evaluation to Bloom’s taxonomy levels and analyse the performance on the exam with respect to these levels as well as to the study activity levels. 3.1. Course design The AMMIS (Architecture and Modelling of Management Information Systems) course is a one semester course taught at the Master programs of Information Management, and Business and Information Systems Engineering at KU Leuven. The course has 13 teaching weeks and is designed in a blended learning format with learning material on an online platform and live lectures and exercise sessions offered on-campus. The course design is based on the Four- Component Instructional Design (4C/ID) model [7] (Figure 1a). The online component offers supportive information, learning tasks and part-task practices. Online supportive information consists of video lectures, slides, exercises with automated grading and a library of cases with their solutions.1 This information is always available to learners throughout the course and is organised per task class: requirements analysis, structural modeling and behavioral modeling. Following the 4C/ID principles, learning tasks are authentic tasks, building up in complexity throughout the course with diminishing amount of support [7]. Part-task practices are implemented through online quizzes and aim at training "mechanical" skills: modeling language syntax application and the use of the modeling tool. Finally, throughout the entire semester students are able to post questions on an online discussion board. The offline component of the course offers supportive information, learning tasks and just-in- time (JIT) information. Supportive information is provided through live lectures and the course book. Students can participate in eight in-person exercise sessions where they solve larger modeling cases related to that week’s web-lecture, using the modeling and prototyping tool. A 1 https://merode.econ.kuleuven.be/ (a) 4C/ID model. Source: https://www.4cid.org (b) Bloom’s taxonomy Figure 1: Instructional design frameworks key difference from performing the learning tasks individually online, is that in-person exercise sessions provide students with a presentation of the model solution, clarifying the solution path and frequent mistakes and allows them to ask questions at any time during the lab. Exercise sessions together with the automated feedback from the modeling tools thus fulfill the "Just In Time" procedural information component of the 4C/ID model. After two most important chapters of the course (on class diagrams and state charts), students are given the option to make two home assignments in a form of a whole modeling task for which they receive both collective and individual feedback in the next lab session. The online part-task practice exercises and formative quizzes can be further classified accord- ing to Bloom’s taxonomy (Figure 1b, adapted from [6]). Some exercises and quizzes deal with remembering and understanding the notation, others deal with understanding and analyzing requirements, and evaluating whether a model satisfies the given requirements. Finally, the modeling cases solved during the exercise sessions deal with creating a model for a given set of requirements. In this way, the course addresses all different cognitive levels, rather than focusing predominantly on the create level as most conceptual modeling courses do [17]. Correspondingly, the exam of this course is also structured according to Bloom’s taxonomy (Figure 1b). For this run of the course, Part 1 of the exam was planned during the last week of the semester, and evaluated the students’ capability for remembering and understanding the notation (Part 1.A), and for recognizing and analyzing requirements and evaluating models based on these requirements (Part 1.B). Both Part 1.A and Part 1.B were tested using multiple choice questions (MPC) and one open question in each part. The maximal possible score for Part 1 of the exam is 8 points with 2.7 points corresponding to Part 1.A and 5.3 corresponding to Part 1.B. During the exam period (approximately three weeks after the Part 1), Part 2 of the exam was organized. Only students who successfully passed Part 1 of the exam (≥ 4, without rounding) were allowed to take part in Part 2 of the exam. This part consisted of two exercises: creating a class diagram and creating a set of state charts for a given class diagram. The maximal possible score for the Part 2 of the exam is 12 points. A student passes when obtaining a rounded score of at least 10 on the sum of both parts. 3.2. Behavioral study data By logging various aspects of students’ interactions, the online platform enables tracking and analysis of their online engagement, which are recognized as valuable indicators of motivated learning choices [18]. While MOOC technology allows for detailed logging, we focus on log data that is readily available to teachers. In particular, the grades for quizzes and an attempt indicator can be easily extracted to assess the activity of the student in the online component of the course. We utilize the average grade on the quizzes (Avg. quiz score) and the total number of quizzes performed (#quizzes) as study indicators (features - used interchangeably) for the downstream LA task. We view completing home assignments as another source of behavioral expression of meta- cognitively guided study motivation. Therefore, we incorporate grades from two assignments (HA1 and HA2) as features in our LA task. Additionally, students were provided with an online practice test made available in the Easter break to assess their understanding of course material and prepare for Part 1 of the exam. The grade achieved in this test serves as another feature. Finally, we could observe the offline behavior of the students in the form of participation in offline exercise sessions. In particular, after the first two exercise sessions, we noticed a drop in attendance. In order to anticipate the required space and number of teaching assistants, we asked students to register if they intended to attend the exercise session. We use this data (#exercise sessions) as another proxy of a motivated learning choice to master more high level modeling skills as only live exercise sessions offered the JIT information component. Using the aforementioned indicators measuring the formative assessments scores and their attempts, practice test scores, home assignments’ performance and attendance is in line with their wide adoption in the LA community [19]. 3.3. Clustering Clustering plays a crucial role in student performance analysis as it has the remarkable capa- bility to unveil study patterns in an unsupervised manner [20]. Among the various clustering algorithms, the non-hierarchical K-Means algorithm stands out as the most widely adopted one in the LA community [21]. In line with the results of Lust et al. [14], Sher et al. [22] and the authors’ teaching experience, we set the desired number of clusters to three, corresponding to three groups of students with different levels of study activity: highly active, moderately active (selective activity) and inactive students. Due to the unsupervised nature of clustering, which does not rely on study outcomes, it can be applied at different stages of the course. Therefore, we perform clustering at different points during the course. In line with Van Goidsenhoven et al. [23] who demonstrated that student success can be accurately predicted as early as mid-course, we initiate our first clustering analysis using the data of week 7. It is important to note that the data for HA2 and the online practice test are not yet available at this point, but the data for other indicators are calculated based on the activities completed up to week 7. Given that the examination is divided in two parts, we identify the days prior to Exam Part 1 and Exam Part 2 as two additional time points for conducting clustering analysis. During these instances, we have access to both HA1 and HA2 scores, along with the score on the online practice test and attendance records for all exercise sessions. The only distinction lies in the (a) Week 7 (b) Week 12 (c) Course end (d) Week 7 (e) Week 12 (f) Course end Figure 2: Clustering solution: mean values per indicator (a)-(c) and boxplots for the final grade distribution vs. cluster (d)-(f) activity on the online platform, which is calculated based on the activity and grades obtained on online quizzes leading up to the days of Exam Part 1 and Exam Part 2, respectively. 4. Results and Discussion 4.1. RQ1 Figure 2a shows the mean values for each indicator for each activity cluster discovered using the data of week 7. The green and yellow clusters are similar in terms of online activity. However, they are very different in terms of HA1 grade and exercise sessions attendance. The red cluster represents the students who are inactive in both online and offline components. Figure 2d shows how the detected clusters relate to the final exam grade. 75% of the students belonging to the green cluster get a pass grade (≥ 10). Surprisingly, the yellow and red clusters have almost identical distributions of the final grade with approximately 50% of students passing and 50% of students failing the final exam. These results highlight the importance of active participation in exercise sessions and home assignments early in the course: students who attend exercise sessions and make home assignments are much more likely to pass. This is in line with the findings in [23] that accurately predicted study success already at mid-course. Figures 2b and 2e illustrate the clustering solution on the data of week 12. While the study patterns represented by the mean values of the indicators remain relatively the same as in week 7, we see a greater difference in terms of the distribution of the final exam grade: more than 50% of the students from the red cluster (inactive students) fail. This means that inactivity during the semester weeks correlates with failing the course. Almost 75% of the students with moderate activity (i.e., high activity in the online platform but low/no grades for the home assignments) pass the course, whereas the highly active students from the green cluster successfully complete the course (except for a few outlying cases). The difference between the yellow and green clusters lies in the minimum grades obtained: the students in the green cluster have a minimum of 9 (excluding the outliers) while the students in the yellow cluster have a minimum of 2. The clustering solution built on the most complete data is shown in Figures 2c and 2f. The separation of the clusters along the final score dimension becomes even more apparent compared with the clustering solutions built on the data of week 7 and 12 with the 75% of inactive students (red cluster) failing the course. The aforementioned findings highlight the importance of the in-person component in blended learning when it comes to teaching conceptual modeling. The inclusion of live exercise sessions within the course provided students with the exclusive opportunity to receive JIT feedback from the instructors, while simultaneously benefiting from all the other 4C/ID components. This comprehensive learning environment represents the most optimal setting for acquiring knowledge. However, intrinsic motivation to attend lab sessions in-person still played a signifi- cant role. The students who used all the aspects of the provided setting (green cluster) benefited the most while the students not making use of the different course components (red cluster) mostly failed the course. 4.2. RQ2&RQ3 In order to address RQ2, we map the learning objectives tested in different parts of the exam onto Bloom’s taxonomy (Figure 1b). The first analysis compares Part 1.A and Part 1.B of the exam while the second analysis makes a comparison between the Part 1 as a whole and Part 2. As was mentioned in Section 3.1, Part 1.A of the exam covers the basics concepts of the course that map onto the "Remember" and "Understand" levels of Bloom’s taxonomy. Part 1.B of the exam tests more complex skills of applying basic concepts, analyzing requirements and evaluating models. Scatter plots in Figures 3a and 3b display the scores that the students obtained for Part 1.A (x-axis) and Part 1.B (y-axis) of the exam while the color represents the activity cluster the students are classified into. First, we look at the relationship between the exam scores obtained in Parts 1.A and 1.B of the exam. The horizontal purple line in Figures 3a and 3b represents the general passing threshold for part 1.B (𝑦 = 5.32 = 2.65) while the blue line represents a passing threshold for Part 1 as a whole. The vertical purple line corresponds to a score of 2/2.7 (≈ 75%) on Part 1.A and can be considered as a threshold that "secures" passing Part 1.B of the exam and Part 1 as a whole. We can observe that the majority of the students who obtained a score ≥ 2 for Part 1.A of the exam succeeds in Part 1.B, and almost all of the students pass Part 1 (except of three outlying cases). Obtaining less than 1.5/2.7 (≈ 55%) on Part 1.A results in failing Part 1.B and Part 1 as a whole (gray vertical line). The students scoring between 1.5 and 2 have an almost equal chance to pass or fail part 1 of the exam. These findings imply that the ability to reason on a lower cognitive level (Part 1.A) affects the ability to reason on a higher cognitive level (Part 1.B) which is in line with the recommendation of the CaMeLOT framework to assess the knowledge of novice modellers in a step-by-step manner according to cognitive levels of Bloom’s taxonomy [9]. (a) Week 7 (b) Week 12 Figure 3: Part 1.A vs. Part 1.B vs. Activity cluster (a) Week 7 (b) Week 12 (c) Course end Figure 4: Part 1 vs. Part 2 vs. Activity cluster Second, we assess the relationship between performance on Parts 1 and 2 of the exam. The horizontal purple line in Figures 4a-4c represents the general passing threshold for Part 2 (𝑦 = 122 = 6) while the blue line represents a passing threshold for the whole exam. The purple vertical line in Figures 4a-4c represent a threshold of 6/8 (75%) on Part 1. Here we observe the same pattern as for the Part 1.A vs. Part 1.B comparison: the majority of students surpassing this threshold succeed in Part 2 of the exam while all of the students pass the exam as a whole. This supports the aforementioned findings and the usefulness of mapping the course’s learning objectives to Bloom’s taxonomy cognitive levels. To address RQ3, we look at the relationship between the level of study activity and the scores obtained for different parts of the exam. Already based on the activity in week 7, it becomes apparent that most of the highly active students obtain high scores on each part of the exam. This pattern becomes even stronger as the course progresses, and at the end of the course (Figure 4c) we can see a clear prevalence of highly and medium active students in the passing region while inactive students are mostly located in the failing region. While progressing from week 7 to the end of the course one can notice that the number of red and green students are shrinking whereas the medium active group grows. Nevertheless, moving from inactive to medium-active does not suffice to secure a passing grade. These findings strengthen the insights described in Section 4.1 about the importance of in-person teaching and intrinsic motivation. 5. Conclusion In this paper, we demonstrate the value of offline activity as an essential instrument to provide JIT procedural information that significantly contributes to study success. The registration of offline activities in blended learning therefore contributes to additional insights through LA. Activity levels of students also clearly correlate with achievements at different cognitive levels. The results furthermore support the validity of CaMeLOT as a scaffolding of learning goals according to Bloom’s taxonomy: when students do not achieve a minimum level of competence for lower level learning objectives, one knows for sure they will fail the assessment of higher level competences. This opens perspectives for more efficient evaluations: as lower cognitive levels are easier to check with multiple choice and closed questions, automated grading of subexams can be used to assess these competences before assessing the higher cognitive levels with open questions. The results furthermore demonstrate the importance of grounding course design and LA in instructional design theories. Given the limitations of an analysis that refers to no more than one course run, the analysis will be replicated in future runs, which will hopefully further strengthen the conclusions. References [1] S. Strecker, U. Baumöl, D. Karagiannis, A. Koschmider, M. Snoeck, R. Zarnekow, Five inspiring course (re-) designs: Examples of innovations in teaching and learning BISE, Business & Information Systems Engineering 61 (2019) 241–252. [2] E. Serral, M. Snoeck, Conceptual framework for feedback automation in SLEs, in: Smart education and e-learning 2016, Springer, 2016, pp. 97–107. [3] G. D. Catalano, K. Catalano, Transformation: From teacher-centered to student-centered engineering education, Journal of Engineering Education 88 (1999) 59–64. [4] G. Sedrakyan, S. Poelmans, M. Snoeck, Assessing the influence of feedback-inclusive rapid prototyping on understanding the semantics of parallel UML statecharts by novice modellers, Information and Software Technology 82 (2017) 159–172. [5] M. Kaur, Blended learning-its challenges and future, Procedia-social and behavioral sciences 93 (2013) 612–617. [6] D. R. Krathwohl, A revision of Bloom’s taxonomy: An overview, Theory into practice 41 (2002) 212–218. [7] J. J. Van Merriënboer, P. A. Kirschner, Ten steps to complex learning: A systematic approach to four-component instructional design, Routledge, 2017. [8] K. Rosenthal, B. Ternes, S. Strecker, Learning conceptual modeling: structuring overview, research themes and paths for future research, in: ECIS, 2019. [9] D. Bogdanova, M. Snoeck, CaMeLOT: An educational framework for conceptual data modelling, Information and software technology 110 (2019) 92–107. [10] Y.-S. Tsai, What is learning analytics?, 2022. URL: https://www.solaresearch.org/about/ what-is-learning-analytics/. [11] G. Sedrakyan, M. Snoeck, J. De Weerdt, Process mining analysis of conceptual modeling behavior of novices–empirical study using jmermaid modeling and experimental logging environment, Computers in Human Behavior 41 (2014) 486–503. [12] G. Deeva, M. Snoeck, J. De Weerdt, Discovering the impact of students’ modeling behavior on their final performance, in: The Practice of Enterprise Modeling: 11th IFIP WG 8.1. Working Conference, PoEM 2018, Vienna, Austria, October 31–November 2, 2018, Proceedings 11, Springer, 2018, pp. 335–350. [13] J. Claes, I. Vanderfeesten, J. Pinggera, H. A. Reijers, B. Weber, G. Poels, A visual analysis of the process of process modeling, Information Systems and e-Business Management 13 (2015) 147–190. [14] G. Lust, J. Elen, G. Clarebout, Students’ tool-use within a web enhanced course: Explanatory mechanisms of students’ tool-use pattern, Computers in Human Behavior 29 (2013). [15] D. Bogdanova, M. Snoeck, Using MOOC technology and formative assessment in a conceptual modelling course: an experience report, in: Proceedings of the 21st ACM/IEEE international conference on model driven engineering languages and systems: Companion proceedings, 2018, pp. 67–73. [16] D. Gašević, S. Dawson, G. Siemens, Let’s not forget: Learning analytics are about learning, TechTrends 59 (2015) 64–71. [17] D. Bogdanova, M. Snoeck, Domain Modelling in Bloom: Deciphering How We Teach It, in: The Practice of Enterprise Modeling: 10th IFIP WG 8.1. Working Conference, PoEM 2017, Leuven, Belgium, November 22-24, 2017, Proceedings 10, Springer, 2017, pp. 3–17. [18] P. H. Winne, R. S. Baker, et al., The potentials of educational data mining for researching metacognition, motivation and self-regulated learning, Journal of Educational Data Mining 5 (2013) 1–8. [19] A. Ahmad, J. Schneider, D. Griffiths, D. Biedermann, D. Schiffner, W. Greller, H. Drachsler, Connecting the dots–a literature review on learning analytics indicators from a learning design perspective, Journal of Computer Assisted Learning (2022). [20] A. Khan, S. K. Ghosh, Student performance analysis and prediction in classroom learning: A review of educational data mining studies, Education and information technologies 26 (2021) 205–240. [21] A. Dutt, M. A. Ismail, T. Herawan, A systematic review on educational data mining, Ieee Access 5 (2017) 15991–16005. [22] V. Sher, M. Hatala, D. Gašević, Analyzing the consistency in within-activity learning patterns in blended learning, in: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge, 2020, pp. 1–10. [23] S. Van Goidsenhoven, D. Bogdanova, G. Deeva, S. v. Broucke, J. De Weerdt, M. Snoeck, Predicting student success in a blended learning environment, in: Proceedings of the tenth international conference on learning analytics & knowledge, 2020, pp. 17–25.