<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Generative Multimodal Analysis (GMA) for Learning Process Data Analytics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ridwan</forename><surname>Whitehead</surname></persName>
							<email>ridwan.whitehead@oulu.fi</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Learning and Educational Technology (LET) Research Lab</orgName>
								<orgName type="institution">University of Oulu</orgName>
								<address>
									<addrLine>Pentti Kaiteran katu 1</addrLine>
									<settlement>Oulu</settlement>
									<country key="FI">Finland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Andy</forename><surname>Nguyen</surname></persName>
							<email>andy.nguyen@oulu.fi</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Learning and Educational Technology (LET) Research Lab</orgName>
								<orgName type="institution">University of Oulu</orgName>
								<address>
									<addrLine>Pentti Kaiteran katu 1</addrLine>
									<settlement>Oulu</settlement>
									<country key="FI">Finland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sanna</forename><surname>Järvelä</surname></persName>
							<email>sanna.jarvela@oulu.fi</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Learning and Educational Technology (LET) Research Lab</orgName>
								<orgName type="institution">University of Oulu</orgName>
								<address>
									<addrLine>Pentti Kaiteran katu 1</addrLine>
									<settlement>Oulu</settlement>
									<country key="FI">Finland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Generative Multimodal Analysis (GMA) for Learning Process Data Analytics</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">E7FF23CEA0428806D9F2E5D5EFDD3CE7</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:52+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Generative Artificial Intelligence</term>
					<term>Learning Analytics</term>
					<term>Multimodal Data 1</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper introduces Generative Multimodal Analysis (GMA), a novel method designed for utilizing Artificial Intelligence (GenAI) in the analysis of multimodal data derived from learning processes. The method is encapsulated in a systematic framework that integrates and optimizes GenAI technology with multimodal large language models (MLLMs) for application in multimodal learning analytics. The recent emergence and advancement of GenAI, particularly MLLMs, has opened new avenues for the automated interpretation and meaningful analysis of varied data sources. Current research in the field has sightseen diverse applications of GenAI in transforming learning and teaching practice. However, there is a noticeable gap in systematic methodologies for applying GenAI to scrutinize learning process data. This paper aims to bridge this gap by proposing the GMA method in the sphere of multimodal learning analytics with learning process data. In addition to the proposed methodological framework, this study also proposes an operational prototype for the practical implementation of GMA. This prototype serves as a tool for examining multimodal data in learning processes. To demonstrate the applicability and effectiveness of our proposed method, we conducted and presented a case study. Our approach offers essential guidance for learning scientists and educational technology application developers, reflecting the contemporary trends and needs in educational technologies. By providing a structured, innovative approach for employing GenAI in learning process data analysis, this study contributes significantly to the advancement of learning analytics methods.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The rapid evolution of educational technologies and learning sciences, particularly the groundbreaking strides in artificial intelligence (AI) technology, has heralded an era where data-driven insights have become valuable in enhancing learning processes. This intersection of AI with educational methodologies is reshaping how information is analyzed, interpreted, and applied to improve teaching strategies and learning outcomes. For instance, Luckin and Cukurova <ref type="bibr" target="#b0">[1]</ref> highlighted the potential of AI to provide personalized learning experiences, adapting to individual learner's needs. Likewise, Holmes et al. <ref type="bibr" target="#b1">[2]</ref> discussed the transformative role of AI in education, particularly in providing insights into learning patterns. Recently, Järvelä et al. <ref type="bibr" target="#b2">[3]</ref> proposed a human-AI collaboration approach for better unfolding the learning processes in the context of socially shared regulation of learning. Despite these advancements, the application of AI in learning analytics has primarily been fragmented. Current research predominantly focuses on isolated applications of AI in educational contexts <ref type="bibr" target="#b3">[4]</ref>, <ref type="bibr" target="#b4">[5]</ref>, lacking a comprehensive methodology for systematic analysis. This gap is particularly apparent in the sphere of generative Artificial Intelligence (GenAI), where its potential for learning process data analytics remains underexplored.</p><p>The advent of GenAI and its integration into educational contexts, particularly through multimodal large language models (MLLMs), represents a significant leap in the domain of educational technology. GenAI, a specialized subset of artificial intelligence that focuses on generating new content, has seen rapid evolution over the past decade. A key development in this evolution has been the creation of sophisticated multimodal large language models (MLLMs), such as OpenAI's GPT (Generative Pretrained Transformer) series, which exemplify the advancements in natural language processing and understanding. The release of ChatGPT, based on the GPT architecture, in November 2022 represented a significant milestone <ref type="bibr" target="#b5">[6]</ref>, <ref type="bibr" target="#b6">[7]</ref>. ChatGPT, with its advanced language understanding and generation capabilities, offered a more interactive and intuitive way for users to engage with AI. Furthermore, GenAI opens new possibilities in content creation and data analysis. Accordingly, learning analytics, encompassing the collection, measurement, analysis, and reporting of data about learners and their contexts, could greatly benefit from the integration of GenAI. This integration enables more nuanced extraction of insights from diverse learning process data, thereby enhancing our understanding of student learning behaviors and outcomes.</p><p>This paper addresses the identified gap in comprehensive methodologies for GenAI in multimodal learning analytics and proposes Generative Multimodal Analysis (GMA) as a structured approach to employ GenAI effectively in this field. Such a methodology is essential due to the growing complexity of learning environments and the variety of data they produce. As argued by Järvelä and Bannert <ref type="bibr" target="#b7">[8]</ref>, the integration of multimodal data analysis in educational research is crucial for a holistic understanding of learning processes. As an integral part of this study, we have developed a specialized software tool designed for GMA. This tool is intended to serve as a practical resource for researchers and educational developers, providing them with a robust and user-friendly platform to apply GMA methodologies in their work. The effectiveness and relevance of this software tool are demonstrated through its application in a case study. This implementation not only showcases the tool's functionality but also affirms GMA's efficacy and broad applicability in the field of educational research and development.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Generative Multimodal Analysis (GMA) for Learning Process Data Analytics</head><p>Generative Multimodal Analysis (GMA) represents a comprehensive methodological framework designed to transform the approach of researchers and analysts in educational settings. This framework leverages the capabilities of GenAI to expedite and enrich the process of extracting and integrating verbal and nonverbal elements from learning process data. Furthermore, GMA extends its integration to include material objects and the environment, capturing how they are utilized by learners engaged in active interaction within the learning context. By harnessing the power of GenAI, GMA facilitates a more profound and holistic understanding of the multifaceted nature of learning environments, where verbal communication, nonverbal cues, physical objects, and the surrounding environment all play integral roles in the learning process.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> provides a visual depiction of the Generative Multimodal Analysis (GMA) Framework, illustrating its dynamic capabilities in leveraging generative AI for educational research. Within this framework, generative AI is adept at producing various types of outputs, including 1) predefined events, 2) improvisational events, 3) detailed descriptions of the learning process, and 4) comprehensive descriptions of the learning context. Crucially, as per the human-AI collaboration approach in research suggested by Järvelä, Nguyen and Hadwin [3], it's imperative that these AI-generated outputs are subjected to evaluation and validation by human researchers or analysts. This collaborative approach ensures that the insights offered by AI are grounded in human understanding and expertise as well as its reliability.</p><p>Furthermore, the information generated through the GMA Framework can be dissected and examined through various analytical lenses. These include a) process-oriented analysis, which focuses on the dynamics and phases of the learning process; b) quantitative modeling, offering a statistical perspective and uncovering patterns and correlations; and c) qualitative inquiry, which delves into the deeper, nuanced aspects of the learning environment and experiences. This multifaceted analytical approach allows for a comprehensive and multi-dimensional understanding of the learning process, harnessing the strengths of both AI and human analysis.</p><p>As for illustration, Figure <ref type="figure" target="#fig_1">2</ref> provides a demonstration of how the GMA is applied to automatically extract specific pre-defined events. In this case, the focus is on identifying and analysing the non-verbal posture states of learners engaged in a collaborative learning setting. The figure showcases the capability of GMA to discern and categorize various non-verbal cues, particularly the postures of learners, which play a significant role in understanding engagement, interaction dynamics, and overall effectiveness of collaborative learning processes. This example highlights the advanced analytical power of GMA in recognizing and interpreting subtle yet crucial aspects of learner behaviour in collaborative learning. Figure <ref type="figure" target="#fig_2">3</ref> showcases the user interface of our newly developed Generative Multimodal Analysis (GMA) Toolkit for describing video data. This interface is specifically designed to facilitate direct interaction with video data, particularly focusing on observational data from learning processes. The example presented within the figure provides a detailed demonstration of how the toolkit can be utilized to analyze observational video data. Specifically, it illustrates an analysis generated by the toolkit from a video capturing a collaborative learning session. This visual representation highlights the toolkit's capabilities in processing and interpreting complex, real-time learning environments, thereby offering valuable insights into the dynamics of collaborative learning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Discussion and Future Research Directions</head><p>Generative Multimodal Analysis (GMA) represents a groundbreaking approach in the field of learning analytics, addressing the complexities of learning processes through the integration of generative AI and multimodal data. This approach is particularly relevant given the diverse nature of learning environments and the myriad forms of data they generate.</p><p>In learning analytics, the focus is traditionally on quantifiable data such as test scores, completion rates, and engagement metrics <ref type="bibr" target="#b8">[9]</ref>, <ref type="bibr" target="#b9">[10]</ref>. However, the advent of GMA heralds a shift towards a more nuanced understanding of the learning process. By incorporating GenAI with MLLMs, GMA can seamlessly interpret and synthesize vast and varied datasets, including textual, auditory, and visual inputs, which are often overlooked in conventional analytics models. The inclusion of multimodal data is crucial for a comprehensive understanding of learning dynamics. As indicated by research in the field of educational technology, learning is not a unidimensional process but a complex interplay of cognitive, emotional, and social factors <ref type="bibr" target="#b10">[11]</ref>. GMA's ability to analyze multimodal data allows for insights into these dimensions, offering a holistic view of the learning experience.</p><p>Future research endeavors should focus on conducting systematic examinations of the various components and procedural steps crucial for effective implementation of GMA. This direction of inquiry is essential to establish a set of clear, well-defined guidelines that can assist researchers in optimally employing GMA methodologies. Additionally, given the ethical complexities surrounding AI in education <ref type="bibr" target="#b11">[12]</ref>, there is a pressing need for comprehensive research aimed at establishing practical ethical guidelines for the application of GenAI methodologies in educational research. Such guidelines would not only streamline the application of GMA across diverse educational settings but also ensure that its integration into learning analytics is both efficient and impactful. By laying out these parameters, </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Generative Multimodal analysis (GMA) Framework</figDesc><graphic coords="2,77.00,445.08,440.75,225.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example of GMA for detecting pre-defined events</figDesc><graphic coords="3,135.23,300.19,257.77,372.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Generative Multimodal Analysis (GMA) Toolkit for Video Data</figDesc><graphic coords="4,124.11,150.30,248.51,341.84" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Acknowledgements</head><p>This research has been funded by the Research Council of Finland (aka. Academy of Finland) grants 350249, and the University of Oulu profiling project Profi7 Hybrid Intelligence -352788.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Designing educational technologies in the age of AI: A learning sciences-driven approach</title>
		<author>
			<persName><forename type="first">R</forename><surname>Luckin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cukurova</surname></persName>
		</author>
		<idno type="DOI">10.1111/bjet.12861</idno>
	</analytic>
	<monogr>
		<title level="j">British Journal of Educational Technology</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="2824" to="2838" />
			<date type="published" when="2019-11">Nov. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Ethics of AI in Education: Towards a Community-Wide Framework</title>
		<author>
			<persName><forename type="first">W</forename><surname>Holmes</surname></persName>
		</author>
		<idno type="DOI">10.1007/s40593-021-00239-1</idno>
	</analytic>
	<monogr>
		<title level="j">Int J Artif Intell Educ</title>
		<imprint>
			<date type="published" when="2021-04">Apr. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Human and artificial intelligence collaboration for socially shared regulation in learning</title>
		<author>
			<persName><forename type="first">S</forename><surname>Järvelä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hadwin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">British Journal of Educational Technology</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Artificial intelligence in higher education: the state of the field</title>
		<author>
			<persName><forename type="first">H</forename><surname>Crompton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Burke</surname></persName>
		</author>
		<idno type="DOI">10.1186/s41239-023-00392-8</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Educational Technology in Higher Education</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2023-04">Apr. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Systematic review of research on artificial intelligence applications in higher education -where are the educators?</title>
		<author>
			<persName><forename type="first">O</forename><surname>Zawacki-Richter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Marín</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Gouverneur</surname></persName>
		</author>
		<idno type="DOI">10.1186/s41239-019-0171-0</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Educational Technology in Higher Education</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2019-10">Oct. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">So what if ChatGPT wrote it?&quot; Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">K</forename><surname>Dwivedi</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ijinfomgt.2023.102642</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Information Management</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page">102642</biblScope>
			<date type="published" when="2023-08">Aug. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">ChatGPT for good? On opportunities and challenges of large language models for education</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kasneci</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.lindif.2023.102274</idno>
	</analytic>
	<monogr>
		<title level="j">Learning and Individual Differences</title>
		<imprint>
			<biblScope unit="volume">103</biblScope>
			<biblScope unit="page">102274</biblScope>
			<date type="published" when="2023-04">Apr. 2023</date>
		</imprint>
	</monogr>
	<note>&apos;</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Temporal and adaptive processes of regulated learning -What can multimodal data tell?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Järvelä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bannert</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.learninstruc.2019.101268</idno>
	</analytic>
	<monogr>
		<title level="j">Learning and Instruction</title>
		<imprint>
			<biblScope unit="volume">72</biblScope>
			<biblScope unit="page">101268</biblScope>
			<date type="published" when="2021-04">Apr. 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Challenges for the Future of Educational Data Mining: The Baker Learning Analytics Prizes</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Baker</surname></persName>
		</author>
		<idno type="DOI">10.5281/ZENODO.3554745</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Educational Data Mining</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2019-06">Jun. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Data Analytics in Higher Education: An Integrated View</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sheridan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Information Systems Education</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="61" to="71" />
			<date type="published" when="2020-01">Jan. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Examining socially shared regulation and shared physiological arousal events with multimodal learning analytics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Järvelä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rosé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Järvenoja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Malmberg</surname></persName>
		</author>
		<idno type="DOI">10.1111/bjet.13280</idno>
	</analytic>
	<monogr>
		<title level="j">British Journal of Educational Technology</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="293" to="312" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Ethical principles for artificial intelligence in education</title>
		<author>
			<persName><forename type="first">A</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">N</forename><surname>Ngo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B.-P</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10639-022-11316-w</idno>
	</analytic>
	<monogr>
		<title level="j">Educ Inf Technol</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="4221" to="4241" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
