Generative Multimodal Analysis (GMA) for Learning Process Data Analytics

Generative Multimodal Analysis (GMA) for Learning Process Data Analytics RidwanWhitehead ridwan.whitehead@oulu.fi Learning and Educational Technology (LET) Research Lab University of Oulu

Pentti Kaiteran katu 1 Oulu Finland

AndyNguyen andy.nguyen@oulu.fi Learning and Educational Technology (LET) Research Lab University of Oulu

Pentti Kaiteran katu 1 Oulu Finland

SannaJärvelä sanna.jarvela@oulu.fi Learning and Educational Technology (LET) Research Lab University of Oulu

Pentti Kaiteran katu 1 Oulu Finland

Generative Multimodal Analysis (GMA) for Learning Process Data Analytics 1613-0073 E7FF23CEA0428806D9F2E5D5EFDD3CE7 GROBID - A machine learning software for extracting information from scholarly documents Generative Artificial Intelligence Learning Analytics Multimodal Data 1

This paper introduces Generative Multimodal Analysis (GMA), a novel method designed for utilizing Artificial Intelligence (GenAI) in the analysis of multimodal data derived from learning processes. The method is encapsulated in a systematic framework that integrates and optimizes GenAI technology with multimodal large language models (MLLMs) for application in multimodal learning analytics. The recent emergence and advancement of GenAI, particularly MLLMs, has opened new avenues for the automated interpretation and meaningful analysis of varied data sources. Current research in the field has sightseen diverse applications of GenAI in transforming learning and teaching practice. However, there is a noticeable gap in systematic methodologies for applying GenAI to scrutinize learning process data. This paper aims to bridge this gap by proposing the GMA method in the sphere of multimodal learning analytics with learning process data. In addition to the proposed methodological framework, this study also proposes an operational prototype for the practical implementation of GMA. This prototype serves as a tool for examining multimodal data in learning processes. To demonstrate the applicability and effectiveness of our proposed method, we conducted and presented a case study. Our approach offers essential guidance for learning scientists and educational technology application developers, reflecting the contemporary trends and needs in educational technologies. By providing a structured, innovative approach for employing GenAI in learning process data analysis, this study contributes significantly to the advancement of learning analytics methods.

Introduction

The rapid evolution of educational technologies and learning sciences, particularly the groundbreaking strides in artificial intelligence (AI) technology, has heralded an era where data-driven insights have become valuable in enhancing learning processes. This intersection of AI with educational methodologies is reshaping how information is analyzed, interpreted, and applied to improve teaching strategies and learning outcomes. For instance, Luckin and Cukurova [1] highlighted the potential of AI to provide personalized learning experiences, adapting to individual learner's needs. Likewise, Holmes et al. [2] discussed the transformative role of AI in education, particularly in providing insights into learning patterns. Recently, Järvelä et al. [3] proposed a human-AI collaboration approach for better unfolding the learning processes in the context of socially shared regulation of learning. Despite these advancements, the application of AI in learning analytics has primarily been fragmented. Current research predominantly focuses on isolated applications of AI in educational contexts [4], [5], lacking a comprehensive methodology for systematic analysis. This gap is particularly apparent in the sphere of generative Artificial Intelligence (GenAI), where its potential for learning process data analytics remains underexplored.

The advent of GenAI and its integration into educational contexts, particularly through multimodal large language models (MLLMs), represents a significant leap in the domain of educational technology. GenAI, a specialized subset of artificial intelligence that focuses on generating new content, has seen rapid evolution over the past decade. A key development in this evolution has been the creation of sophisticated multimodal large language models (MLLMs), such as OpenAI's GPT (Generative Pretrained Transformer) series, which exemplify the advancements in natural language processing and understanding. The release of ChatGPT, based on the GPT architecture, in November 2022 represented a significant milestone [6], [7]. ChatGPT, with its advanced language understanding and generation capabilities, offered a more interactive and intuitive way for users to engage with AI. Furthermore, GenAI opens new possibilities in content creation and data analysis. Accordingly, learning analytics, encompassing the collection, measurement, analysis, and reporting of data about learners and their contexts, could greatly benefit from the integration of GenAI. This integration enables more nuanced extraction of insights from diverse learning process data, thereby enhancing our understanding of student learning behaviors and outcomes.

This paper addresses the identified gap in comprehensive methodologies for GenAI in multimodal learning analytics and proposes Generative Multimodal Analysis (GMA) as a structured approach to employ GenAI effectively in this field. Such a methodology is essential due to the growing complexity of learning environments and the variety of data they produce. As argued by Järvelä and Bannert [8], the integration of multimodal data analysis in educational research is crucial for a holistic understanding of learning processes. As an integral part of this study, we have developed a specialized software tool designed for GMA. This tool is intended to serve as a practical resource for researchers and educational developers, providing them with a robust and user-friendly platform to apply GMA methodologies in their work. The effectiveness and relevance of this software tool are demonstrated through its application in a case study. This implementation not only showcases the tool's functionality but also affirms GMA's efficacy and broad applicability in the field of educational research and development.

Generative Multimodal Analysis (GMA) for Learning Process Data Analytics

Generative Multimodal Analysis (GMA) represents a comprehensive methodological framework designed to transform the approach of researchers and analysts in educational settings. This framework leverages the capabilities of GenAI to expedite and enrich the process of extracting and integrating verbal and nonverbal elements from learning process data. Furthermore, GMA extends its integration to include material objects and the environment, capturing how they are utilized by learners engaged in active interaction within the learning context. By harnessing the power of GenAI, GMA facilitates a more profound and holistic understanding of the multifaceted nature of learning environments, where verbal communication, nonverbal cues, physical objects, and the surrounding environment all play integral roles in the learning process.

Figure 1 provides a visual depiction of the Generative Multimodal Analysis (GMA) Framework, illustrating its dynamic capabilities in leveraging generative AI for educational research. Within this framework, generative AI is adept at producing various types of outputs, including 1) predefined events, 2) improvisational events, 3) detailed descriptions of the learning process, and 4) comprehensive descriptions of the learning context. Crucially, as per the human-AI collaboration approach in research suggested by Järvelä, Nguyen and Hadwin [3], it's imperative that these AI-generated outputs are subjected to evaluation and validation by human researchers or analysts. This collaborative approach ensures that the insights offered by AI are grounded in human understanding and expertise as well as its reliability.

Furthermore, the information generated through the GMA Framework can be dissected and examined through various analytical lenses. These include a) process-oriented analysis, which focuses on the dynamics and phases of the learning process; b) quantitative modeling, offering a statistical perspective and uncovering patterns and correlations; and c) qualitative inquiry, which delves into the deeper, nuanced aspects of the learning environment and experiences. This multifaceted analytical approach allows for a comprehensive and multi-dimensional understanding of the learning process, harnessing the strengths of both AI and human analysis.

As for illustration, Figure 2 provides a demonstration of how the GMA is applied to automatically extract specific pre-defined events. In this case, the focus is on identifying and analysing the non-verbal posture states of learners engaged in a collaborative learning setting. The figure showcases the capability of GMA to discern and categorize various non-verbal cues, particularly the postures of learners, which play a significant role in understanding engagement, interaction dynamics, and overall effectiveness of collaborative learning processes. This example highlights the advanced analytical power of GMA in recognizing and interpreting subtle yet crucial aspects of learner behaviour in collaborative learning. Figure 3 showcases the user interface of our newly developed Generative Multimodal Analysis (GMA) Toolkit for describing video data. This interface is specifically designed to facilitate direct interaction with video data, particularly focusing on observational data from learning processes. The example presented within the figure provides a detailed demonstration of how the toolkit can be utilized to analyze observational video data. Specifically, it illustrates an analysis generated by the toolkit from a video capturing a collaborative learning session. This visual representation highlights the toolkit's capabilities in processing and interpreting complex, real-time learning environments, thereby offering valuable insights into the dynamics of collaborative learning.

Discussion and Future Research Directions

Generative Multimodal Analysis (GMA) represents a groundbreaking approach in the field of learning analytics, addressing the complexities of learning processes through the integration of generative AI and multimodal data. This approach is particularly relevant given the diverse nature of learning environments and the myriad forms of data they generate.

In learning analytics, the focus is traditionally on quantifiable data such as test scores, completion rates, and engagement metrics [9], [10]. However, the advent of GMA heralds a shift towards a more nuanced understanding of the learning process. By incorporating GenAI with MLLMs, GMA can seamlessly interpret and synthesize vast and varied datasets, including textual, auditory, and visual inputs, which are often overlooked in conventional analytics models. The inclusion of multimodal data is crucial for a comprehensive understanding of learning dynamics. As indicated by research in the field of educational technology, learning is not a unidimensional process but a complex interplay of cognitive, emotional, and social factors [11]. GMA's ability to analyze multimodal data allows for insights into these dimensions, offering a holistic view of the learning experience.

Future research endeavors should focus on conducting systematic examinations of the various components and procedural steps crucial for effective implementation of GMA. This direction of inquiry is essential to establish a set of clear, well-defined guidelines that can assist researchers in optimally employing GMA methodologies. Additionally, given the ethical complexities surrounding AI in education [12], there is a pressing need for comprehensive research aimed at establishing practical ethical guidelines for the application of GenAI methodologies in educational research. Such guidelines would not only streamline the application of GMA across diverse educational settings but also ensure that its integration into learning analytics is both efficient and impactful. By laying out these parameters,

Figure 1 :1Figure 1: Generative Multimodal analysis (GMA) Framework

Figure 2 :2Figure 2: Example of GMA for detecting pre-defined events

Figure 3 :3Figure 3: Generative Multimodal Analysis (GMA) Toolkit for Video Data

Acknowledgements

This research has been funded by the Research Council of Finland (aka. Academy of Finland) grants 350249, and the University of Oulu profiling project Profi7 Hybrid Intelligence -352788.

Designing educational technologies in the age of AI: A learning sciences-driven approach RLuckin MCukurova 10.1111/bjet.12861 British Journal of Educational Technology 50 6 Nov. 2019 Ethics of AI in Education: Towards a Community-Wide Framework WHolmes 10.1007/s40593-021-00239-1 Int J Artif Intell Educ Apr. 2021 Human and artificial intelligence collaboration for socially shared regulation in learning SJärvelä ANguyen AHadwin British Journal of Educational Technology 2023 Artificial intelligence in higher education: the state of the field HCrompton DBurke 10.1186/s41239-023-00392-8 International Journal of Educational Technology in Higher Education 20 1 Apr. 2023 Systematic review of research on artificial intelligence applications in higher education -where are the educators? OZawacki-Richter VIMarín MBond FGouverneur 10.1186/s41239-019-0171-0 International Journal of Educational Technology in Higher Education 16 1 Oct. 2019 So what if ChatGPT wrote it?" Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy YKDwivedi 10.1016/j.ijinfomgt.2023.102642 International Journal of Information Management 71 102642 Aug. 2023 ChatGPT for good? On opportunities and challenges of large language models for education EKasneci 10.1016/j.lindif.2023.102274 Learning and Individual Differences 103 102274 Apr. 2023 ' Temporal and adaptive processes of regulated learning -What can multimodal data tell? SJärvelä MBannert 10.1016/j.learninstruc.2019.101268 Learning and Instruction 72 101268 Apr. 2021 Challenges for the Future of Educational Data Mining: The Baker Learning Analytics Prizes RSBaker 10.5281/ZENODO.3554745 Journal of Educational Data Mining 11 1 Jun. 2019 Data Analytics in Higher Education: An Integrated View ANguyen LGardner DSheridan Journal of Information Systems Education 31 1 Jan. 2020 Examining socially shared regulation and shared physiological arousal events with multimodal learning analytics ANguyen SJärvelä CRosé HJärvenoja JMalmberg 10.1111/bjet.13280 British Journal of Educational Technology 54 1 2023 Ethical principles for artificial intelligence in education ANguyen HNNgo YHong BDang B.-PTNguyen 10.1007/s10639-022-11316-w Educ Inf Technol 28 4 2023