<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Educational Data Analysis using Generative AI</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Abdul</forename><surname>Berr</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electrical Engineering and Computer Science</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sukrit</forename><surname>Leelaluk</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Information Science and Electrical Engineering</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Cheng</forename><surname>Tang</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Information Science and Electrical Engineering</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Li</forename><surname>Chen</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Information Science and Electrical Engineering</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fumiya</forename><surname>Okubo</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Information Science and Electrical Engineering</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Atsushi</forename><surname>Shimada</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Graduate School of Information Science and Electrical Engineering</orgName>
								<orgName type="institution">Kyushu University</orgName>
								<address>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Educational Data Analysis using Generative AI</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0CAFF511C4C30A86E6DA9A9F39F572AD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:53+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With the advent of generative artificial intelligence (AI), the scope of data analysis, prediction of performances, real-time feedback, etc. in learning analytics has widened. The purpose of this study is to explore the possibility of using generative AI to analyze educational data. Moreover, the performances of two large language models (LLMs): GPT-4 and text-davinci-003, are compared with respect to different types of analyses. Additionally, a framework, LangChain, is integrated with the LLM in order to achieve deeper insights into the analysis, which can be beneficial for beginner data scientists. LangChain has a component called an agent, which can help study the analysis being performed stepby-step. Furthermore, the impact of the OpenLA library, which pre-processes the data by calculating the number of reading seconds of students, counting the number of operations performed by students, and making page-wise behavior of each student, is also studied. Besides, factors with the most significant impact on students' performances were also discovered in this analysis. The results show that GPT-4, when using the data pre-processed by OpenLA, provides the best analysis in terms of both, the accuracy of the final answer, and the step-by-step insights provided by LangChain's agent. Also, we learn the significance of reading time and interactions used (Add marker, bookmark, memo) by students in predicting grades.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Over the past few years, the rise of distance learning has helped e-learning environments grow further. This has made the collection of different types of student data easier. Since the start of student data collection in the 1960s, the research in this field has progressed from database management systems to advanced database systems and then finally to advanced data analysis, which incorporates different data mining techniques <ref type="bibr" target="#b0">[1]</ref>. Such developments have led to an increase in leveraging data mining techniques to predict and analyze students' academic performances. For example, intelligent systems have also been made using data mining algorithms like neural networks, decision trees, etc. to analyze and uncover the relationship between different data inputs and performances of students <ref type="bibr" target="#b1">[2]</ref>. Moreover, different machine learning algorithms have been compared and implemented to find out the attributes with the greatest impact on the students' performances <ref type="bibr" target="#b2">[3]</ref>. Similarly, studies have been done on students' behaviors and interactions acquired by log functionality of e-books, and the interpretable model, the Dempster-Shafer classifier (DS classifier), was implemented and compared with other models in order to predict students' academic performances <ref type="bibr" target="#b3">[4]</ref>.</p><p>With the recent advancements in the field of Artificial Intelligence (AI), numerous opportunities are opened in education and research. A few of the significant benefits include personalized learning experiences, adaptive learning materials, real-time feedback, assessments, etc. <ref type="bibr" target="#b4">[5]</ref>. The advent of generative AI, in particular, has provided education with a completely new dynamic. With the help of generative AI, we can generate new content in contrast to the traditional AI which works on predefined rules. As one of the classes of AI models, generative AI can produce content like text, figures, and other media based on the learning of existing data. These models are composed of different deep learning techniques and neural networks that help in analyzing and generating human-like content. Recently, generative AI has been used in many different fields, be it business, marketing, education, or research <ref type="bibr" target="#b5">[6]</ref>. Teaching and learning have been specially improved using generative AI. For instance, many instructors had very positive feedback about generative AI's range of applications for language teaching <ref type="bibr" target="#b6">[7]</ref>. Similarly, teaching assistant chatbots have also been pioneered which provide personalized learning and encourage student inquiry and learning <ref type="bibr" target="#b7">[8]</ref>. However, in order to expand the prowess of generative AI in the field of education, it is imperative to analyze students' data and infer different relationships for the development of applications that can be helpful to enhance the learning experiences of both students and instructors.</p><p>Generative AI has been mainly popularized by the development of ChatGPT by OpenAI. The OpenAI API has a diverse set of models with varying capabilities. The purpose of this study is to explore further applications of generative AI in the field of learning analytics by using OpenAI's GPT-4 model <ref type="bibr" target="#b8">[9]</ref> and text-davinci-003 model <ref type="bibr" target="#b9">[10]</ref> for the tabular data analysis of students' interactions with the e-book and provide deeper insights into the analysis with the help of integration of langchain framework <ref type="bibr" target="#b10">[11]</ref> and OpenLA library.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>The data has been acquired from the e-book environment in the form of log data (Event Stream) <ref type="bibr" target="#b11">[12]</ref>. Afterwards, two sets of data are made. One is pre-processed with the OpenLA library, and the other one is without pre-processing. Further, we integrate the LangChain framework with GPT-4 and text-davinci-003 models to study different insights during the analysis and also the difference between the performance of models with respect to different types of analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Data Collection</head><p>The dataset contains different types of data from 163 students. For instance, student's id, content id, reading time of each page in the form of event stream data, interactions done by students on the e-book for example, adding a marker or bookmark, moving onto the next page, returning to the previous page, adding memo, and so on. Besides, the grades of the students are also stored in a separate file ranging from A to F. A is the highest grade which can be obtained to pass, and D is the lowest grade to pass, and F is the failing grade. Figure <ref type="figure" target="#fig_0">1</ref> below shows an example of the data. The focus of this particular study is the analysis of students' reading behaviors and their interactions, which can be used to deduce significant correlations. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Integrated Analysis System</head><p>LangChain is an open-source framework that allows software developers or data scientists to work with artificial intelligence. The purpose is to integrate LLMs like OpenAI's text-davinci-003 and GPT-4 with external sources. In this study, the comparison of table data analysis of GPT-3.5 and GPT-4 was tried to be drawn. To choose the best model for comparison with GPT-4, all the GPT-3.5 models were tried for the analysis at first. text-davinci-003's ability to understand the prompt seemed to surpass the other models and hence it was chosen for the comparison. LangChain incorporates many components that allow this convenient linking of LLMs with it <ref type="bibr" target="#b13">[13]</ref>. However, in this research the component that is used most is agent.</p><p>Agents use LLMs and help to choose a sequence of actions to take. More importantly, agents have access to many tools, and they decide which tool to use according to the user's input, as the language model takes the prompt constructed by the prompt template to return some output.  Data and the LLM model need to be provided to the agent. The agent created for this study is the Pandas Dataframe agent <ref type="bibr" target="#b15">[15]</ref>. In this case, the data provided is in the form of dataframes. This is a powerful component that allows the handling of large datasets and makes applications capable of question-answering over Pandas Dataframes. Another agent, csv agent, is also created in order to study the performance of LLM on data that is not pre-processed by OpenLA. The data provided in this case is in csv file format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">OpenLA Library</head><p>As discussed earlier, the OpenLA library is used in the pre-processing of the log data <ref type="bibr" target="#b10">[11]</ref>. Here, we only discuss how it reshapes the data before the analysis. In this analysis, we use OpenLA to convert data into three different types: operation count in each content, behavior in each page, and behavior in each page with consideration page transition, e.g., going back and jumping to a page. The pre-processing with OpenLA allows the extraction of logs with the required information. In this experiment, we extract the total number of each operation performed by students in each content. Also, the average number of reading seconds and each operation's total count for each page are acquired. Finally, the time of entry and exit from each page along with the total number of reading seconds of each page were tabulated as well through pre-processing by OpenLA. This helps in easy retrieval of precise information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Prompt Categorization with Respect to Analysis</head><p>In order to study the performance of the two LLMs: GPT-4 and text-davinci-003, three different levels of analyses are performed, and prompts are categorized accordingly. Each level contains 20 prompts. Table <ref type="table" target="#tab_0">1</ref> shows each type of analysis performed. For example, in level 1, Reading time analysis refers to calculation of number of reading seconds of different parts of the e-book, Students and interaction use determines the type and number of interactions used on different parts of the e-book, Grade distribution trend tells how the grades are spread out in the whole dataset, and Reading time relationships illustrates how different factors, like student interactions, are related with the number of seconds students read. In level Give a decision tree model to determine the most influential factors in predicting grades.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Develop a predictive model to estimate the grade of a student based on their reading behavior and interaction patterns</head><p>To sum up the whole procedure of making the integrated analysis system, firstly, we collect data from the e-book environment, then it is pre-processed through the OpenLA library. Afterward, we create two agents, pandas dataframe and csv, for data with OpenLA and without OpenLA respectively. Finally, we give the prompts and evaluate the results based on two parameters, task-specific performance, and the agent's thought/observation/output from the AgentExecutor chain. Task-specific performance refers to the accuracy of the final answer provided, in other words, we confirm if the final output is a logical answer or not. In the case of not being able to provide a correct answer, the AgentExecutor chain is referred to study the thoughts and observations during the analysis. This further insight is helpful on many occasions regardless of the accuracy of the final output. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results</head><p>In this study, we evaluate the performance of two models, GPT-4 and text-davinci-003, with regard to the analysis of students' performances. In addition, we also compare both models' efficiency with and without the use of the OpenLA library. Moreover, we also study the analysis provided by these models. Finally, we will conclude the best generative AI model for the educational table data analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Comparison of Models</head><p>In the experiment, 20 prompts were used for each type of analysis. In order to understand the results more clearly, each category's label has been defined in detail in the section 2.4. The results were evaluated after a comparison of the final analysis done by models and manual, human analysis, done personally, of the raw, log data. Further, each level was subdivided into further categories to make the comparison more precise. For each prompt, the correct result was tried to be drawn manually from the csv files and then compared with the result generated by the AI model. In the case of level 3's predictive modelling analysis, the results were evaluated after human analysis of the AgentExecutor chain, an example of which is shown in figure <ref type="figure" target="#fig_2">3</ref>. For any model or algorithm given by the AI model, the actual practicality of that algorithm was tested manually, separately. The following results demonstrate the percentage of prompts the models were able to accurately answer. From Table <ref type="table" target="#tab_2">3</ref>, we can deduce that the use of the OpenLA library is integral to this table data analysis. Furthermore, we can also see a slight difference in the accuracy of being able to answer correctly between GPT-4 and text-davinci-003. In this case, only one of the analyses: Grade distribution and trends, shows the superiority of GPT-4 over text-davinci-003 when used with OpenLA. On the contrary, all the models underperform when analysing reading time relationships. Similarly, both the models are not able to give accurate answers when analysing the data without being pre-processed by OpenLA, especially in the case of reading time analysis GPT-4 was not able to answer any question, and text-davinci-003 answered only 15% of the questions. In the case of reading time relationships, text-davinci-003 without OpenLA could not answer any question while GPT-4 without OpenLA could answer only 35% of the time. Overall, students and interaction use was the easiest to analyse by both the models and reading time relationships were the hardest. From Table <ref type="table" target="#tab_3">4</ref>, we again see how the use of OpenLA library has a positive impact on the analysis. In the case where OpenLA is used, GPT-4 has better accuracy than text-davinci-003. Overall, it is notable that both models are not as effective for level 2 as for level 1. Similar to before, table 5 also infers that using OpenLA significantly improves the analysis regardless of whichever type of language model was used. Further, we can also see a considerably better performance of GPT-4 in the level 3 category of analysis as compared to text-davinci-003. Moreover, it can be observed that the performance of text-davinci-003 with or without OpenLA for level 3 has dropped compared to previous levels except in the case of optimal learning strategies. All in all, both models perform best in level 1 analysis of students and interaction use, while generating predictive models, in level 3 analysis, seemed to be harder for all the integrated analysis systems. It is important to note that the performance of GPT-4 with OpenLA is consistent in most of the analyses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Study of Students' Data analysis</head><p>Many key results and relationships were found with the analysis provided by generative AI. The most important one out of those are the relationships between students' performances and their reading times and interaction use. Figure <ref type="figure" target="#fig_4">5</ref> illustrates the comparison of graphs generated by GPT-4 with and without OpenLA when prompted with: "plot graph of an average number of different interactions (marker, memo, bookmark) for each grade". On the left is the graph generated by GPT-4 with OpenLA and when compared with the log data, the plot turned out to be a correct representation of the average number of interactions for each grade i.e., students with grade B interacted with e-book the most. On the other hand, the graph on the right is also generated by GPT-4 but with data without preprocessed by OpenLA. In this case, since the original log data contained so many interactions, the plot was not able to correctly represent all of the interactions. In addition, students with grade C used slightly more interactions than students with grade A which also contradicts the original log data. Figure <ref type="figure" target="#fig_5">6</ref> shows the plot, on the left, given by GPT-4 with OpenLA and text-davinci-003 without OpenLA when prompted with: "plot graph of number of reading seconds for each grade." The main result i.e., students with grade B read the most is correct depiction of the actual data as well. However, the difference in reading time between A and C is not as plotted. In other words, students with grade A read considerably more than students with grade C. Also, text-davinci-003 with OpenLA failed to generate any plot at all. On the right, we get the plot for the analysis done by GPT-4 without OpenLA. Although it correctly shows that students with grade B read the most, the plot for students with grade C and A do not match the log data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Discussion</head><p>Having obtained the analyses, we can now note the differences between the two models. From the experiment, we can say that out of all four ways we discussed, GPT-4 with OpenLA gives the best results for all types of analyses performed, and the best performance for every integrated analysis system was achieved in level 1. To understand the reasoning behind the better performance of GPT-4 in general, we should be aware of difference between text-davinci-003's and GPT-4's level of instruction comprehension. Prompt engineering is a vital factor for textdavinci-003 for clear comprehension of instructions <ref type="bibr" target="#b9">[10]</ref>. On the other hand, GPT-4 has been further fine-tuned in accordance with human feedback and infrastructure for better predictability has also been implemented <ref type="bibr" target="#b8">[9]</ref>. In order to discuss the performance of OpenLA it is important to look at the features of data before and after pre-processing. The importance of data pre-processing has previously been addressed in different areas, for example in web usage mining process <ref type="bibr" target="#b16">[16]</ref>. An example of data before pre-processing can already be seen in Figure <ref type="figure" target="#fig_0">1</ref>. Figure <ref type="figure" target="#fig_6">7</ref> below shows one of the examples of the pre-processed data. Comparing Figure <ref type="figure" target="#fig_0">1</ref> and Figure <ref type="figure" target="#fig_6">7</ref>, we can see that the nature of the information we want to know is more precisely broken down by OpenLA and this leads to the better performance of the models used with the preprocessed data by OpenLA.</p><p>Another important inference that can be made is about the insights provided by the AgentExecutor chain. As shown in Figure <ref type="figure" target="#fig_2">3</ref>, we are provided with a step-by-step process of the analysis done by the LLMs. The quality of insight also varies depending on the model used. From the experiment, we get to know that insights provided by GPT-4's, are clearer and deeper to understand the procedure of analysis. Finally, we also studied the impact of reading behaviour and interaction with e-book on the grades of students. These inferences are important for the prediction of grades in future courses as well. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Work</head><p>In this study, we proposed a new mechanism to perform educational table data analysis using generative AI, GPT-4, and text-davinci-003. We compared the performances of both models and also highlighted the impact of the OpenLA library on the analysis. We learned that GPT-4, when provided with data pre-processed by OpenLA, demonstrates the best results for the analysis. It was also discovered that all the systems work best with level 1 analysis of students and interactions use and worst with level 3 analysis of generating predictive models. A potential reason for this result could be the difference between the approach taken by the agent for these two analyses. Since level 1 is associated more with simpler calculations and statistics, the agent does not run into errors frequently as compared to level 3 analysis. In the case of generating predictive models in level 3, many algorithms and functions are applied, resulting in more errors and stoppage of agent due to time or iteration limit. Furthermore, we drew many significant relationships from students' data, which are helpful for studying the impact of factors on students' performances. Finally, we were also able to uncover the analysis step-by-step with the help of LangChain's agent. The study of this analysis can be very important for the development of educational applications in the future. The chronological procedure of data analysis provided by the agent can be very helpful for the learning of beginner data scientists. Furthermore, many applications can be developed to improve students' learning experiences by giving them insights into their daily learning routines. The analysis can be further enhanced by integrating the memory chain into the LLM using the LangChain framework. This will allow further room for prompt engineering in the cases where LLM does not understand the prompt at first.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Characteristics of the data</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Flowchart of Agent's mechanism</figDesc><graphic coords="3,166.25,56.70,262.23,181.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: AgentExecutor Chain</figDesc><graphic coords="3,72.00,367.69,406.08,149.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Summary of whole method</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Comparison of graphs generated by GPT-4 with and without OpenLA for average number of different interactions (marker, memo, bookmark) for each grade, level 1 analysis.</figDesc><graphic coords="7,81.05,56.70,224.23,160.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Comparison of graphs generated by GPT-4 with OpenLA and text-davinci-003 without OpenLA and GPT-4 without OpenLA for number of reading seconds for each grade, level 1 analysis.</figDesc><graphic coords="7,57.10,384.85,224.70,178.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Pre-processed data by OpenLA</figDesc><graphic coords="8,86.20,340.41,442.86,65.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>2, Factors affecting grades draws relationships between different factors, like reading seconds and interactions, and students' grades and Content and reading analysis covers reading patterns across different content ids. Finally, in level 3, Predictive modelling prompts suggests different algorithms to predict grades, Optimal learning strategies discovers how students can improve learning experiences to achieve better grades, Personalized interventions help generate models giving personalized summaries of e-book based on different factors like reading patterns, and finally, Predictive models consists of prompts which generate models to predict performances of future students. This is the categorization of each level.</figDesc><table><row><cell>Level (Reading interaction analysis)</cell><cell>1</cell><cell>Level (Factors students' performance) influencing 2</cell><cell cols="2">Level (Prediction analysis)</cell><cell>3</cell></row><row><cell>1. Reading time analysis</cell><cell></cell><cell>1. Factors affecting grade</cell><cell cols="2">1. Predictive modelling for grades of students</cell></row><row><cell cols="2">2. Students and interaction use</cell><cell>2. Content and reading analysis</cell><cell>2. Optimal strategies</cell><cell>learning</cell></row><row><cell>3. Grade distribution trend</cell><cell></cell><cell></cell><cell>3. Personalized intervention</cell><cell></cell></row><row><cell>4. Reading time relationships</cell><cell></cell><cell></cell><cell cols="2">4. Predictive models</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Prompt example of each level of analysis.</figDesc><table><row><cell>Level 1 (Reading interaction analysis)</cell><cell>Level 2 (Factors influencing students' performance)</cell><cell>Level 3 (Prediction analysis)</cell></row><row><cell>Identify any significant</cell><cell></cell><cell></cell></row><row><cell>differences in reading times</cell><cell></cell><cell></cell></row><row><cell>between different content</cell><cell></cell><cell></cell></row><row><cell>ids.</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Level 1 results of each category of analysis.</figDesc><table><row><cell cols="2">Type of analysis</cell><cell>GPT-4</cell><cell>GPT-4</cell><cell>text-</cell><cell>text-</cell></row><row><cell></cell><cell></cell><cell>with OpenLA</cell><cell>without</cell><cell>davinci-</cell><cell>davinci-</cell></row><row><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell><cell>003 with</cell><cell>003</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell><cell>without</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell></row><row><cell>Reading</cell><cell>time</cell><cell>80%</cell><cell>0%</cell><cell>80%</cell><cell>15%</cell></row><row><cell>analysis</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Students</cell><cell>and</cell><cell>100%</cell><cell>75%</cell><cell>100%</cell><cell>75%</cell></row><row><cell>interaction use</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Grade distribution</cell><cell>65%</cell><cell>65%</cell><cell>50%</cell><cell>30%</cell></row><row><cell>trend</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Reading</cell><cell>time</cell><cell>65%</cell><cell>35%</cell><cell>65%</cell><cell>0%</cell></row><row><cell>relationships</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 :</head><label>4</label><figDesc>Level 2 results of each category of analysis.</figDesc><table><row><cell cols="2">Type of analysis</cell><cell>GPT-4</cell><cell>GPT-4</cell><cell>text-</cell><cell>text-</cell></row><row><cell></cell><cell></cell><cell>with OpenLA</cell><cell>without</cell><cell>davinci-</cell><cell>davinci-</cell></row><row><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell><cell>003 with</cell><cell>003</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell><cell>without</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell></row><row><cell>Factors</cell><cell>affecting</cell><cell>65%</cell><cell>40%</cell><cell>45%</cell><cell>15%</cell></row><row><cell>grade</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Content and reading</cell><cell>65%</cell><cell>15%</cell><cell>65%</cell><cell>35%</cell></row><row><cell>analysis</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 :</head><label>5</label><figDesc>Level 3 results of each category of analysis.</figDesc><table><row><cell cols="2">Type of analysis</cell><cell>GPT-4</cell><cell>GPT-4</cell><cell>text-</cell><cell>text-</cell></row><row><cell></cell><cell></cell><cell>with OpenLA</cell><cell>without</cell><cell>davinci-</cell><cell>davinci-</cell></row><row><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell><cell>003 with</cell><cell>003</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell><cell>without</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>OpenLA</cell></row><row><cell>Predictive</cell><cell></cell><cell>65%</cell><cell>35%</cell><cell>50%</cell><cell>0%</cell></row><row><cell>modelling</cell><cell>for</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">grades of students</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Optimal</cell><cell>learning</cell><cell>50%</cell><cell>0%</cell><cell>50%</cell><cell>50%</cell></row><row><cell>strategies</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Personalized</cell><cell>75%</cell><cell>50%</cell><cell>50%</cell><cell>0%</cell></row><row><cell>intervention</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">Predictive models</cell><cell>65%</cell><cell>35%</cell><cell>15%</cell><cell>15%</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Acknowledgements</head><p>This work was supported by JST CREST Grant Number JPMJCR22D1 and JSPS KAKENHI Grant Number JP22H00551, Japan</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Data mining: concepts and techniques</title>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kamber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Computer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>Elsevier/Morgan Kaufmann</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">An overview of using academic analytics to predict and improve students&apos; achievement: A proposed proactive intelligent intervention</title>
		<author>
			<persName><forename type="first">U</forename><surname>Bin Mat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Buniyamin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Arsad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kassim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Xplore</title>
		<imprint>
			<date type="published" when="2013-12-01">2013. December 1</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Students&apos; Performance Analyses Using Machine Learning Algorithms in WEKA</title>
		<author>
			<persName><forename type="first">V</forename><surname>Nedeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pehlivanova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IOP Conference Series: Materials Science and Engineering</title>
		<imprint>
			<biblScope unit="volume">1031</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">12061</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Applying an interpretable and accurate model to Learning analytics</title>
		<author>
			<persName><forename type="first">N</forename><surname>Baloian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cobaise</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Peñafiel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Majumdar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ogata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">DC@LAK23 Workshop</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>In press</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Generative AI in Education and Research: Opportunities, Concerns, and Solutions</title>
		<author>
			<persName><forename type="first">Eman</forename><surname>Alasadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Baiz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemical Education</title>
		<imprint>
			<biblScope unit="volume">100</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="2965" to="2971" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">A survey of Generative AI Applications</title>
		<author>
			<persName><forename type="first">R</forename><surname>Gozalo-Brizuela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">C</forename><surname>Garrido-Merchán</surname></persName>
		</author>
		<ptr target=".org" />
		<imprint>
			<date type="published" when="2023-06-14">2023. June 14</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">To generate or stop generating response&apos;: Exploring EFL teachers&quot; perspectives on ChatGPT in English language teaching in Thailand</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Ulla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">F</forename><surname>Perales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stephenie</forename><forename type="middle">Ong</forename><surname>Busbus</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Learning</title>
		<imprint>
			<biblScope unit="page" from="1" to="15" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Supporting self-directed learning and self-assessment using TeacherGAIA, a generative AI chatbot application: Learning approaches and prompt engineering</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Choy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Divaharan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yong</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><surname>Tay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Learning</title>
		<imprint>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">GPT-4 Technical Report</title>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>Cornell University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Shao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>Cornell University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">ArXiv</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">OpenLA: Library for Efficient Ebook Log Analysis and Accelerating Learning Analytics</title>
		<author>
			<persName><forename type="first">R</forename><surname>Murata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Minematsu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shimada</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Computer in Education (ICCE 2020)</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="301" to="306" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">H</forename><surname>Ogata</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Oi</surname></persName>
		</author>
		<author>
			<persName><surname>Kousuke Mohri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Okubo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shimada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yamada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Learning Analytics for E-Book-Based Educational Big Data in Higher Education</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hirokawa</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="327" to="350" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Creating Large Language Model Applications Utilizing LangChain: A Primer on Developing LLM Apps Fast</title>
		<author>
			<persName><forename type="first">Oguzhan</forename><surname>Topsakal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cetin</forename><surname>Tahir</surname></persName>
		</author>
		<author>
			<persName><surname>Akinci</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1050" to="1056" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<ptr target="https://python.langchain.com/docs/modules/agents/" />
		<title level="m">Agents | 🦜🔗 Langchain</title>
				<imprint/>
	</monogr>
	<note>Python.langchain</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<ptr target="https://python.langchain.com/docs/integrations/toolkits/pandas" />
		<title level="m">Pandas Dataframe</title>
				<imprint>
			<date type="published" when="2023-12-11">December 11, 2023</date>
		</imprint>
	</monogr>
	<note>Python.langchain</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A review paper on data preprocessing: A critical phase in web usage mining process</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Dwivedi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rawat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Xplore</title>
		<imprint>
			<date type="published" when="2015-10-01">2015. October 1</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
