Large Language Models for Sustainable Assessment and Feedback in Higher Education: Towards a Pedagogical and Technological Framework Daniele Agostini1,*,† , Federica Picasso1,† 1 University of Trento, Palazzo Fedrigotti, corso Bettini 31, Rovereto (TN), 38068, Italy Abstract Nowadays, there is growing attention on enhancing the quality of teaching, learning and assessment processes. As a recent EU Report underlines, the assessment and feedback area remains a problematic issue regarding educational professionals’ training and adopting new practices. In fact, traditional summative assessment practices are predominantly used in European countries, against the recommendations of the Bologna Process guidelines that promote the implementation of alternative assessment practices that seem crucial in order to engage and provide lifelong learning skills for students, also with the use of technology. Looking at the literature, a series of sustainability problems arise when these requests meet real-world teaching, particularly when academic instructors face the assessment of extensive classes. With the fast advancement in Large Language Models (LLMs) and their increasing availability, affordability and capability, part of the solution to these problems might be at hand. In fact, LLMs can process large amounts of text, summarise and give feedback about it following predetermined criteria. The insights of that analysis can be used both for giving feedback to the student and helping the instructor assess the text. With the proper pedagogical and technological framework, LLMs can disengage instructors from some of the time-related sustainability issues and so from the only choice of the multiple-choice test and similar. For this reason, as a first step, we are proposing a starting point for such a framework to a panel of experts following the Delphi methodology and reporting the results. Keywords Assessment, Evaluation, Higher Education, Educational Technology, Technology-Enhanced Assessment, Artificial Intelligence, Large Language Models 1. AI Teaching, Learning and Assessment in Higher Education: the state of the art Recent attention has focused on enhancing teaching, learning and assessment quality [1]. How- ever, traditional summative assessments are still dominant in Europe, despite Bologna Process guidelines promoting alternative practices to develop students’ lifelong learning skills [2, 3]. With extensive classes, implementing these practices raises sustainability issues for instructors. Large language models (LLMs) may help by processing large text amounts, summarising, and st International Workshop on High-performance Artificial Intelligence Systems in Education, 2023, Rome, IT * Corresponding author. † These authors contributed equally. $ daniele.agostini@unitn.it (D. Agostini); federica.picasso@unitn.it (F. Picasso)  0000-0002-9919-5391 (D. Agostini); 0000-0002-8381-6456 (F. Picasso) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings providing feedback based on specified criteria [4]. Used pedagogically, LLMs could relieve instructors’ time pressures, expanding beyond multiple-choice tests. As a UNICEF definition affirmed: “AI refers to machine-based systems that can, given a set of human-defined ob- jectives, make predictions, recommendations, or decisions that influence real or virtual environments. AI systems interact with us and act on our environment, either directly or indirectly. Often, they appear to operate autonomously and can adapt their behaviour by learning about the context" [6, p.16]. To connect this topic to the education field, it is possible to affirm that, in terms of techno- logical advancements, theoretical contributions and impact on education, the field of Artificial Intelligence in Education (AIED) has seen success over the past 25 years [7, 8, 9, 10]. Instead of just automating the instruction of students sitting in front of computers, AI could help open up teaching and learning opportunities that would otherwise be difficult to achieve, question conventional pedagogies, or assist teachers in becoming more successful. Other AIED technolo- gies currently track student progress and give tailored feedback to determine if the student has mastered the topic in issue. AIED technologies built to support collaborative learning might gather similar information, and intelligent essay assessment tools can potentially draw conclu- sions about a student’s knowledge. All of this information and more could be gathered during a student’s time in formal educational settings (the learning sciences have long recognised the value of students engaging in constructive assessment activities), along with information about the student’s participation in non-formal learning (such as learning a musical instrument, a craft, or other skills) and informal learning (such as language learning or enculturation by immersion) [11]. The potential of AI in educational settings, as well as the necessity for AI literacy, places educators at the forefront of these new and exciting breakthroughs that were previously relegated to obscure computer science laboratories. Simultaneously, teachers and administrators are required to have clear perspectives on the potential of AI in education and, eventually, to incorporate this ground-breaking technology into their practice [12]. To deeply focus on the characteristics of AIED concept, for example, Holmes and colleagues [13] created the AIED taxonomy, a system that is helpful to categorise AIED tools and applications into three different but intersecting categories:(1) student-focused, (2) teacher-focused, and (3) institution-focused AIED. 2. Theoretical background 2.1. AI and Education: what about sustainable and authentic assessment? Higher education aims to provide meaningful, relevant courses, where graduates can learn to work and live in an increasingly digital society [14]. In our contemporary education system, our students need to be supported in becoming effective lifelong learners, who must be prepared for the assessment tasks they will encounter in their lives, but they also need to become lifelong assessors, possessing assessment skills acquired through continuous work. It could be possible through the implementation of Assessment for Learning intended as an approach that emphasises the assessment process as an "essential moment of the educational experience, Table 1 A taxonomy of AIED systems [13, 12 p.550] AIED Focus Student focused AIED Teacher focused AIED Institution focus AIED • Intelligent Tutoring Systems • Plagiarism detection • Admissions (e.g., student se- (ITS) lection) • AI-assisted Apps (e.g., maths, • Smart Curation of Learning • Course-planning, Scheduling, text-to-speech, language learn- Materials Timetabling ing) • AI-assisted Simulations (e.g., • Classroom Monitoring • School Security games-based learning, VR, AR) • Automatic Essay Writing • Automatic Summative As- • Identifying Dropouts and Stu- (AEW) sessment dents at risk • Chatbots • AI Teaching Assistant (includ- • e-Proctoring ing assessment assistant) • Automatic Formative Assess- • Classroom Orchestration ment (AFA) • Learning Network Orchestra- tors • Dialogue-based Tutoring Sys- tems (DBTS) • Exploratory Learning Envi- ronments (ELE) • AI-assisted Lifelong Learning Assistant characterised by situations in which learners are enabled to analyse and understand the processes in which they are involved and can thus participate in decisions about their learning goals by becoming increasingly aware of their progress" [15, 16, p.56] Learners as lifelong assessors must be able to: • Estimate the possession of criteria for evaluating situations or carrying out an assignment; • Seek and understand contextual feedback to construct new knowledge; • Interpret and use feedback to achieve daily goals and challenges [17, 16, p.73]. Students need to gain, during their learning process, the assessment expertise, so the com- petence required for students to effectively understand assessment criteria and to be able to use the feedback received to close the gap and improve their own learning [19]. They will be supported in this process by teachers who, through continuous assessment and making judgements about the learning products of students, will develop more effective standards of judgement to define the expected competence of students themselves [18, 16, p.76]. But how to really support students’ development of assessment expertise? As Sadler pointed out [19], it is necessary to involve students in direct and authentic assessment experiences, supporting them in the acquisition of the concept of quality, and training them in order to make complex judgements according to a multiplicity of criteria [19]. What are the principles to apply in order to create sustainable assessment contexts and then scaffold authentic assessment experiences? Boud [17] drew up nine useful principles for reflecting on and designing sustainable assessment and feedback practices. For the author, it is indeed important that there is a timely sharing of clear assessment criteria for students; it is also crucial that students are seen as individuals who can achieve success and, in terms of assessment processes, these must be useful in making students confident in their success and in this sense it seems useful to consider separating feedback processes from the awarding of grades. The focus on learning during the assessment process must take priority over the focus on performance: it seems important therefore to support the development of self-assessment competencies and encourage the use of reflective peer assessment practices. One of the last fundamental aspects is related to the completion of the feedback loop as a tool for reviewing student work and finally, the importance of introducing a review process of assessment practices with the implementation of formative assessment processes is emphasised. In terms of assessment and feedback in connection with AI systems, Swiecki and colleagues affirmed that “AI-based techniques have been developed to fully or partially automate parts of the traditional assessment practice. AI can generate assessment tasks, find appropriate peers to grade work, and automatically score student work. These techniques offload tasks from humans to AI and help to make assessment practices more feasible to maintain” [20, p.2]. The power of AI related to the assessment and feedback processes is connected to the fact that, while traditional assessment practices could provide a partial overview about the students’ performance, several AI techniques thanks to their characteristics can promote a wider vision of learning process and progress. In relation to the topic of authenticity and sustainability of assessment, AI systems can help to collect, represent, and assess data in a complex way: authentic assessment processes can be very articulated in terms of task and general design, so AI can be helpful for academics to monitor learning process towards an assessment of student progress [21, 20]. In fact, authentic assessment requires students to “use the same competencies, or combinations of knowledge, skills, and attitudes, that they need to apply in the criterion situation in professional life” [22, p.69]. Authenticity has been recognised as a fundamental element of assessment design that encourages learning. Authentic assessment tries to reproduce the activities and performance criteria often encountered in the workplace and has been shown to have a favourable influence on student learning, autonomy, motivation, self-regulation, and metacognition; qualities that are significantly associated with employability [23]. Again, international authors even suggest that the authenticity of the assessment tasks is a need for reaching the expert level of problem- solving. Likewise, strengthening the authenticity of an assessment has the potential to have an encouraging effect on student learning and motivation [24, 25, 26, 22]. Finally, UNESCO [26], which has been in the last years amongst the most influential and active institutions that reflect on the implications of AI in society, provided the following guidelines for AI in assessment. 1. Testing and implementing artificial intelligence technologies is crucial for supporting the assessment of various dimensions of competencies and outcomes. 2. Caution is essential when adopting automated assessment with responses to rule-based closed questions. 3. Employing formative assessment leveraged by artificial intelligence as an integrated function of Learning Management Systems (LMS) is key to analysing student learning data with increased accuracy and efficiency and reducing human biases. 4. Progressive assessments based on artificial intelligence are imperative to provide regular updates to teachers, students, and parents. 5. Examining and evaluating the use of facial recognition and other artificial intelligence for user authentication and monitoring in remote online assessments is paramount. This study moves along those research axes. 2.2. Large Language Models Over the past few years, Large Language Models (LLMs) have become increasingly prevalent in society and educational settings. These AI-powered models are capable of generating, analysing, and summarising text, as well as engaging in dialogic interactions with humans [27]. One of the most well-known examples of LLMs is OpenAI’s ChatGPT, which is based on GPT 3.5 and GPT 4 architectures. Other notable LLMs include Antropic’s Claude (1 and 2), Bing Chat (another GPT4-based model), and Google Bard. While these models are extremely powerful, there are concerns about data privacy and results consistency [28]. However, there are other options available. With the release of open-source and open-access models such as Meta’s LLAMA and LLAMA 2, as well as TII’s Falcon, and the growth of platforms like HuggingFace, which acts as a repository and framework, there are many possibilities for local LLMs with great capabilities. These models can be customised, fine-tuned, or even trained specifically for one’s use case, allowing for greater flexibility and control [29]. To better understand the possibilities related to the use of these models, Tamkin et al. [4] proposed the following crucial points. In fact, LLMs can: • Generate: LLMs can generate human-like text. This can be used to provide detailed explanations, create content, or even generate potential essay or report structures. • Summarise: LLMs can summarise long pieces of text. This can help in providing concise summaries of lengthy student submissions. The summary can take into account different parameters in the text, providing information exactly on the aspects that the teacher wants to assess. • Posing and Answering Questions: LLMs can understand a piece of text and answer questions about it as well as asking questions about it, if required to. This can be used to create interactive feedback and learning experiences. • Translate: LLMs can translate text from one language to another. This can be useful in multilingual educational settings and to adapt content for foreign language student’s inclusion. It also adds to the overall sustainability of the teacher’s job in such situations. • Analyse the sentiment: LLMs can understand the sentiment expressed in a piece of text. This can be used to gauge student sentiment in feedback, assignments or discussion forums. • Classify: LLMs can classify text into predefined categories. This can be used for assisted grading or categorising student feedback. • Detecting plagiarism: By comparing the similarity between different pieces of text, LLMs can help detect potential cases of plagiarism both between students and between students and the source material. • Measure Semantic Similarity: LLMs can measure the semantic similarity between two pieces of text. This can be used to match student queries with relevant answers or resources and help the teacher in the assessment of the student’s work. • Generate Feedback: Based on the assessment of a student’s work LLMs can generate personalised feedback. It would work even better if the LLM would have some teacher’s notes on the assignment to work with. • Assess Knowledge: LLMs can be used to assess a student’s understanding of a topic based on their written submissions, especially if properly trained on correct assignments and having an assessment rubric to refer to. LLMs are able to analyse massive amounts of text, aggregate it, and then offer feedback based on previously established standards [4]. The outcomes of that analysis can be applied to provide feedback to the student as well as to assist the instructor in evaluating the text. LLMs can remove teachers from some of the time-related sustainability difficulties, and thus from the sole choice of the multiple-choice test and similar, with the correct pedagogical and technical framework. In detail, Kasneci and colleagues [27] define the following opportunities for teachers and students regarding the implementation of AI in teaching and learning university context: • “For university students, large language models can assist in the research and writing tasks, as well as in the development of critical thinking and problem-solving skills. These models can be used to generate summaries and outlines of texts, which can help students to quickly understand the main points of a text and to organise their thoughts for writing. Additionally, large language models can also assist in the development of research skills by providing students with information and resources on a particular topic and hinting at unexplored aspects and current research topics, which can help them to better understand and analyse the material. • For personalised learning, teachers can use large language models to create personalised learning experiences for their students. These models can analyse student’s writing and responses, and provide tailored feedback and suggest materials that align with the student’s specific learning needs. Such support can save teachers’ time and effort in creating personalised materials and feedback, and also allow them to focus on other aspects of teaching, such as creating engaging and interactive lessons” [27, pp.2-3]. In specific relation to assessment and feedback practice, the correlation with LLM can be summarised in four different points: 1. Automatisation: Assessment and Feedback can be totally or partially automated, although, at the moment, only supervised, human-mediated, assessment is advised [20, 30], there are some cases in which even a complete automation worked very well [29]. 2. Sustainability: relative to the time variable, these models make scalable types of assess- ment that previously were not. This allows the teacher to always apply the most suitable method of assessment for verifying learning objectives [31, 20]. 3. Objectivity: if trained correctly, LLMs should not have bias, they tend to be more objective and consistent than a human being and adhere to the established criteria. 4. AI in the loop: LLM, teachers and students can be part of the same process in which the IA is assigned only those tasks in which it is super-human [32, 33]. 3. Research methods 3.1. Objectives and research question In light of the evidence already produced by the international literature on the topic of AI and education, this research study aims to create and validate a model for the use of AI in Educational Assessment in Higher Education. The work is based on one main research question: • Could university teachers use AI tools to adopt approaches that support more effective, sustainable and authentic assessment? 3.2. The Model The designed model takes into account the existing literature connected to the topic of Assess- ment for Learning, Authentic Assessment and Sustainable Assessment [15, 22, 17]. Starting from these literature pieces of evidence, we are working on the development of a model useful to adopt AI in the assessment processes in the Higher Education context. The model considers the role that AI plays in the assessment and feedback practices connected to academics and students in the virtuous cycle of the learning spiral. The model itself will be assessed following four different levels proposed by Kaptelinin and colleagues [34] through the Activity Theory checklists for the design, the evaluation and the use of technology: • Design: we will introduce this checklist in order to evaluate the design process itself. • Evaluation Phase 1: in the first phase of the evaluation process, thanks to the introduction of the Delphi study approach and then, thanks to the collaboration of the experts, we will use the checklist connected to the Activity Theory to assess the structure of the model itself and collect prompts and suggestions. • Evaluation Phase 2: in the second phase of the evaluation process, we aim to introduce an evaluation of how we are going to propose the use of the model itself, again based on the validated checklists. • Use: in this last phase, we will introduce a specific checklist to evaluate the model and its related impacts on the teaching, learning and assessment processes. Every checklist is developed following four different areas: 1. Means and ends: the extent to which the technology facilitates and constrains the attain- ment of users’ goals and the impact of the technology on provoking or resolving conflicts between different goals. 2. Social and physical aspects of the environment: integration of target technology with requirements, tools, resources, and social rules of the environment. 3. Learning, cognition, and articulation: internal versus external components of activity and support of their mutual transformations with target technology. 4. Development: developmental transformation of the foregoing components as a whole [34]. The model will be composed of two levels of adoption: • AI-Mediated Summative Assessment: level focused on assessment processes connected to Technology Enhanced Assessment practices, so the power of AI in connection to the possibility of introducing assessment and feedback timely, customised and informed by AI data [30]. • AI-Mediated Formative Assessment: level focused on the power of the AI implementation in assessment and feedback in order to monitor the whole learning process and to guide formative design actions and students’ self and peer assessment processes [35, 36]. 3.3. The Delphi technique To validate the proposed model, we planned to introduce in our research the Delphi Study technique, intended as “a scientific method to organise and manage structured group communication processes with the aim of generating insights on either current or prospective challenges [...] the Delphi technique builds on the anonymity of participating experts who are invited to assess and comment on different statements or questions related to a specific research topic” [37, p.2]. In a Delphi survey, the opinions, generated by the individuated group of experts across the multiple discussion rounds organised on a specific topic, are collected. The multi-round structure can be introduced sequentially, or immediately thanks to specific software. The structured group communication process should create a convergence or a divergence of opinions, producing a more dynamic and accurate collection of data in comparison to traditional opinion-polling techniques. This method allows researchers to focus on the sharing process, reducing risks related to group dynamics that may emerge during in-person collaborative processes [38, 39, 40, 41, 42, 37]. To sum up, the Delphi process can be divided in the following different steps: 1. Defining and recruiting experts: experts could be professionals with specific knowledge and relevant experience in a particular discipline and research area. The panel size is calibrated depending on the study’s purpose. 2. Developing Delphi questionnaire: a Delphi questionnaire can be structured from primary data (interviews) or literature analysis to enhance validity. Experts can be involved with Paper-based or e-Delphi. 3. Round 1: this phase can be qualitative, really powerful to generate ideas (e.g. open-ended questions) or quantitative (e.g. rating scale). To certify the rigour of each round, Kilroy and Driscoll (2006) suggest that the response rate should not fall below 70%. 4. Analysis and design of Round 2: phase characterised by the results analysis and in connection with non-consensus issues, another questionnaire containing non-consensus issues and the Round 1 results are sent out to the experts (Round 2). The feedback sustains the experts’ comparison of their initial opinions with the group result. Additional rounds are organised until the achievement of an acceptable level of consensus free of issues or controversies [43]. 4. Results and discussions Starting from JISC’s [30] and UNESCO’s [26] guidelines, we developed our model called AI- MAAS (AI-Mediated Assessment Academics and Students), composed of two different levels of application and interpretation, with a focus on the implementation of AI in Technology Enhanced Assessment and Formative Assessment processes. The model revolves around three focal points, i.e. with respect to the cyclic and balanced intersection of AI, teachers and students, following Vygotskij and Leont’ev’s model of artefact mediation [44]. AI Mediated Summative Assessment level of implementation describes the three elements (Academics, AI and Students) and the connection between them as follows. 4.1. AI Mediated Summative Assessment Level This first level (Fig. 1) focuses on general, usual, assessment. In this kind of process, AI can give the assessment a formative twist and significance, adding to the interaction with the student, but keeping the whole process sustainable for the academic. In this approach, most of formative exchanges would be between AI and Student, and mostly one-way (i.e. AI to Student), with early feedback on the product, and a final report on the assessment and future actions being the most important. It is important to notice that the AI’s final feedback to the student must be, at this stage, moderated by the academic. This approach is supported by the capability of AI in connection to the possibility of introducing assessment and feedback timely, customised and informed by AI data [31]. Elements of this process are: • AI: -Constructive role: AI can help teachers with the construction and delivery of early feedback and assessment. In connection, academics can define and share rubrics and assessment criteria to scaffold the assessment process. -Feedback mechanism: academics play a key role as actors who can give reinforcement feedback to the AI system itself, always to improve jointly developed evaluation processes. -Evaluation and Reporting: the relationship between AI and students is characterised by the exchange of the students’ products to be assessed and then AI as the producer of specific reports that contain suggestions for learning improvement. AI with the role of tutor that shares early and timely feedback supported by the academics’ expertise. • Academics: -Experts provision: academics as experts able to build and share tailored information to sustain AI actions. -Feedback management: academics as professionals who are able to manage timely, personalised and AI-informed feedback. Figure 1: AI-Mediated Summative Assessment level • Students (students’ product): -Product creation: students as crucial actors are able to build specific products to be assessed thanks to the collaboration between academics and AI. -Guidance role: students as important elements to guide the AI Mediated Assessment processes with focused feedback. 4.2. AI Mediated Formative Assessment Level AI Mediated Formative Assessment level of implementation describes the three elements and the connection between them as follows: The second level (Fig. 2) focuses on proper formative assessment processes. In this approach the lecturer has designed the teaching to follow this approach, and the assessment is continuous, not relegated to the final stages of the course. Most interactions are bi-directional and occur between AI and Students, and Students and the lecturer. In this model, AI is directed by the lecturer and impersonates various roles, always in the form of collaboration with the students as a mentor, a tutor, or a peer. At the same time, AI capabilities to monitor the whole learning process and to inform formative design actions will be employed to support the academic [36, 37]. Elements of this process are: Figure 2: AI-Mediated Formative Assessment level • AI: -Constructive role: the relationship between AI and academics is set up with a dynamic process of exchange in terms of expertise, resources and tasks. -Feedback mechanism: the data produced by AI can be fundamental for monitoring and redesigning academics’ teaching and assessment practice. -Evaluation and Reporting: AI can play the role of student and the students can act as teachers in order to support and give prompts to AI that, at the same time, can play the role of tutor, mentor or group member (peer teaching relationship with students). • Academics: -Expertise Provision: in connection with AI, academics define roles, rules and criteria for AI itself. -Feedback management: in terms of relationships with students, academics pay attention to the assessment of the whole learning process, giving timely, personalised and AI-informed feedback. • Students (students’ product): -Constructive role: students can activate critical thinking actions on AI answers, in order to stimulate deep and complex reflective processes, through specific students’ inquiry to produce effective insights for AI. At the same time, students can discuss, share results produced by academics and AI, and also activate peer learning and assessment processes. -Guidance role: students can generate feedback on AI Mediated Assessment itself. As previously mentioned, the model will be assessed and validated using the Delphi method [5] and following the Activity Theory checklist [34] during the design, validation and experimenta- tion processes. 5. Conclusions and future research actions Starting from the opportunities connected to the use of AI in education from both perspectives of students and teachers [27], it is important to understand how to better include these new opportunities to enhance teaching, learning and assessment processes in Higher Education contexts. For this purpose, our research is contextualised in an academic environment that has to cope with constant renewal in terms of approaches and strategies to deal with a major change at design, organisational and conceptual level. In connection with the topic of assessment and feedback from a perspective of assessment for learning, sustainability and authenticity [15, 17, 22], it is important to reflect and design specific formative and practical actions to sustain students and teachers in the implementation of AI systems as powerful agents to support the progress of the educational system. In terms of future research perspective, the designed actions include, after the validation of the AI-MAAS model through the Delphi study, experimentation using the model with academics, with a following phase which will comprehend the impact analysis and the assessment of the efficacy. References [1] EHEA. (2015). Yerevan Communiqué. https://www.ehea.info/page-ministerial-conference- yerevan-2015 [2] European Commission/EACEA/Eurydice, 2018 https://eurydice.eacea.ec.europa.eu/publications/2018- eurydice-publications [3] Y. Punie, C. Redecker, editor(s) (2017). European Framework for the Digital Competence of Educators. DigCompEdu, EUR 28775 EN, Publications Office of the European Union, Luxembourg, 10.2760/159770 (online), JRC107466. [4] A. Tamkin, M. Brundage, J. Clark, and D. Ganguli, "Understanding the capabilities, limi- tations, and societal impact of large language models," arXiv preprint arXiv:2102.02503, 2021. [5] N. Dalkey, "An experimental study of group opinion: the Delphi method," Futures, vol. 1, no. 5, pp. 408-426, 1969. [6] UNICEF, "Policy guidance on AI for children," Author, 2021. [Online]. Avail- able: https://www.unicef.org/globalinsight/media/2356/file/UNICEF-Global-Insight-policy- guidance-AI-children-2.0-2021.pdf.pdf [7] K. VanLehn, "The behavior of tutoring systems," International journal of artificial intelli- gence in education, vol. 16, no. 3, pp. 227-265, 2006. [8] K. R. Koedinger and A. Corbett, "Cognitive tutors: Technology bringing learning sciences to the classroom," 2006. [9] N. T. Heffernan and C. L. Heffernan, "The ASSISTments ecosystem: Building a platform that brings scientists and teachers together for minimally invasive research on human learning and teaching," International Journal of Artificial Intelligence in Education, vol. 24, pp. 470-497, 2014. [10] I. Roll and R. Wylie, "Evolution and revolution in artificial intelligence in education," International Journal of Artificial Intelligence in Education, vol. 26, pp. 582-599, 2016. [11] W. Holmes, M. Bialik, and C. Fadel, "Artificial intelligence in education," Globethics Publi- cations, 2023. [12] W. Holmes and I. Tuomi, "State of the art and practice in AI in education," European Journal of Education, vol. 57, no. 4, pp. 542-570, 2022. [13] W. Holmes, M. Bialik, and C. Fadel, "Artificial intelligence in Education: Promises and implications for teaching & learning," The Center for Curriculum Redesign, 2019. [14] J. Nieminen, M. Bearman, and R. Ajjawi, "Designing the digital in authentic assessment: is it fit for purpose?," Assessment & Evaluation in Higher Education, vol. 48, no. 4, pp. 529-543, 2023. [15] K. Sambell, L. McDowell, and C. Montgomery, "Assessment for learning in higher educa- tion," Routledge, 2013. [16] V. Grion and A. Serbati, "Valutazione sostenibile e feedback nei contesti universitari. Prospettive emergenti, ricerche e pratiche," PensaMultimedia, 2019. [17] D. Boud, "Sustainable assessment: rethinking assessment for the learning society," Studies in continuing education, vol. 22, no. 2, pp. 151-167, 2000. [18] D. J. Nicol and D. Macfarlane-Dick, "Formative assessment and self-regulated learning: A model and seven principles of good feedback practice," Studies in higher education, vol. 31, no. 2, pp. 199-218, 2006. [19] D. R. Sadler, "Formative assessment: Revisiting the territory," Assessment in Education, vol. 5, no. 1, pp. 77-84, 1989. [20] Z. Swiecki, H. Khosravi, G. Chen, R. Martinez-Maldonado, J. M. Lodge, S. Milligan, N. Selwyn, and D. Gašević, "Assessment in the age of artificial intelligence," Computers and Education: Artificial Intelligence, vol. 3, 100075, 2022. [21] V. Murphy, J. Fox, S. Freeman, and N. Hughes, "“Keeping it Real”: A review of the benefits, challenges and steps towards implementing authentic assessment," All Ireland Journal of Higher Education, vol. 9, no. 3, 2017. [22] J. T. Gulikers, T. J. Bastiaens, and P. A. Kirschner, "A five-dimensional framework for authentic assessment," Educational technology research and development, vol. 52, no. 3, pp. 67-86, 2004. [23] V. Villarroel, S. Bloxham, D. Bruna, C. Bruna, and C. Herrera-Seda, "Authentic assessment: creating a blueprint for course design," Assessment & Evaluation in Higher Education, vol. 43, no. 5, pp. 840-854, 2018. [24] J. Herrington and A. Herrington, "Authentic assessment and multimedia: How university students respond to a model of authentic assessment," Higher Educational Research & Development, vol. 77, no. 3, pp. 305-322, 1998. [25] K. Sambell, L. McDowell, and S. Brown, "But is it fair?: An exploratory study of student perceptions of the consequential validity of assessments," Studies in Educational Evaluation, vol. 2.3, no. 4, pp. 349-371, 1997. [26] S. Gielen, ¥. Dochy, and S. Dierick, "The influence of assessment on learning," in M. Segers, F. Dochy, & E. Cascallar (Eds.), Optimising new modes of assessment: In search of quality and standards, pp. 37-54, Dordrecht, The Netherlands: Kluwer Academic Publishers, 2003. [27] F. Miao, W. Holmes, R. Huang, and H. Zhang, "AI and education: A guidance for policymak- ers," UNESCO Publishing, 2021. [Online]. Available: https://doi.org/10.54675/PCSP7350 [28] E. Kasneci, K. Sessler, S. Küchemann, M. Bannert, D. Dementieva, F. Fischer, U. Gasser, G. Groh, S. Günnemann, E. Hüllermeier, S. Krusche, G. Kutyniok, T. Michaeli, C. Nerdel, J. Pfeffer, O. Poquet, M. Sailer, A. Schmidt, T. Seidel, M. Stadler, J. Weller, J. Kuhn, and G. Kasneci, "ChatGPT for good? On opportunities and challenges of large language models for education," Learning and Individual Differences, vol. 103, 102274, ISSN 1041-6080, 2023. [Online]. Available: https://doi.org/10.1016/j.lindif.2023.102274 [29] L. Chen, M. Zaharia, and J. Zou, "How is ChatGPT’s behavior changing over time?," arXiv preprint arXiv:2307.09009, 2023. [Online]. Available: https://arxiv.org/pdf/2307.09009.pdf [30] P. P. Martin, D. Kranz, P. Wulff, and N. Graulich, "Exploring new depths: Applying machine learning for the analysis of student argumentation in chemistry," Journal of Research in Science Teaching, pp. 1–36, 2023. [Online]. Available: https://doi.org/10.1002/tea.21903 [31] M. Webb, "A Generative AI Primer," JISC, 2023. [Online]. Available: https://nationalcentreforai.jiscinvolve.org/wp/2023/05/11/generative-ai-primer/#3- 1 [32] V. González-Calatayud, P. Prendes-Espinosa, and R. Roig-Vila, "Artificial intelligence for student assessment: A systematic review," Applied Sciences, vol. 11, no. 12, 5467, 2021. [33] D.C. Englebart, "Augmenting human intellect: A conceptual framework," SRI Summary Report AFOSR-3223, October 1962. [Online]. Available: https://www.dougengelbart.org/pubs/augment-3906.html [34] T. W. Malone, "Superminds: The surprising power of people and computers thinking together," Little, Brown Spark, 2018. [35] V. Kaptelinin and B. Nardi, "Acting with Technology: Activity Theory and interaction Design," MIT Press, 2006. [Online]. Available: https://doi.org/10.5210/fm.v12i4.1772 [36] E. R. Mollick and L. Mollick, "Assigning AI: Seven Approaches for Students, with Prompts," June 12, 2023. [Online]. Available at SSRN: https://ssrn.com/abstract=4475995 or http://dx.doi.org/10.2139/ssrn.4475995 [37] OpenAI, "Teaching with AI," 2023. [Online]. Available: https://openai.com/blog/teaching- with-ai [38] D. Beiderbeck, N. Frevel, H. von der Gracht, S. L. Schmidt, and V. M. Schweitzer, "Preparing, conducting, and analyzing Delphi surveys: Cross-disciplinary practices, new directions, and advancements," MethodsX, vol. 8, 101401, 2021. [39] S. Aengenheyster, K. Cuhls, L. Gerhold, M. Heiskanen-Schüttler, J. Huck, and M. Muszynska, "Real-Time Delphi in practice - A comparative analysis of existing software-based tools," Technological Forecasting and Social Change, vol. 118, pp. 15-27, 2017. [40] T. Gnatzy, J. Warth, H. von der Gracht, and I. L. Darkow, "Validating an innovative real- time Delphi approach—A methodological comparison between real-time and conventional Delphi studies," Technological Forecasting and Social Change, vol. 78, no. 9, pp. 1681-1694, 2011. [41] T. Gordon and A. Pease, "RT Delphi: An efficient, ’round-less’ almost real time Delphi method," Technological Forecasting and Social Change, vol. 73, no. 4, pp. 321-333, 2006. [42] H. P. McKenna, "The Delphi technique: a worthwhile research approach for nursing?," Journal of advanced nursing, vol. 19, no. 6, pp. 1221-1225, 1994. [43] P. L. Williams and C. Webb, "The Delphi technique: A methodological discussion," Journal of advanced nursing, vol. 19, no. 1, pp. 180-186, 1994. [44] S. Chuenjitwongsa, "How to conduct a Delphi study," Medical Education, 2017. [Online]. Available: https://meded.walesdeanery.org/how-to-guides. [45] N. V. Cong-Lem, "Vygotsky’s, Leontiev’s and Engeström’s cultural-historical (activity) the- ories: Overview, clarifications and implications," Integrative Psychological and Behavioral Science, vol. 56, no. 4, pp. 1091-1112, 2022.