A pedagogical agent with embedded data mining functions to support collaborative writing Daniel Epstein Eliseo Reategui Graduate Program of Computers in Education Graduate Program of Computers in Education Federal University of Rio Grande do Sul Federal University of Rio Grande do Sul (UFRGS) (UFRGS) Porto Alegre - Brazil Porto Alegre - Brazil daepstein@gmail.com eliseoreategui@gmail.com ABSTRACT Internet growth has induced the development of a large num- ber of collaborative tools for online writing and informa- tion sharing. Educators quickly realized the benefits of such tools for learners, allowing them to work online, to share their knowledge and help each other. Distance learning is laboration, increasing group awareness, creating a sense of a key concept in today’s educational research; collaborative consciousness in the group members about cooperative team learning environments are becoming widespread, being more work, and exposing authors to different aspects of writing dynamic and resourceful. However, distance learning also [1]. introduced a series of problems, such as high evasion rates resulting from lack of support and personalized feedback. It Alongside with the internet growth and online collaborative has also introduced difficulties for educators to follow and learning environments, data mining has become more impor- reviews students’ assignments. Based on this scenario, the tant and popular in the education field. From the student’s work presented here proposes the development of a peda- perspective, the possibility to easily search for learning ma- gogical agent supported by an intelligent tutoring system terial using indexes and other reference tools increases their to provide students and teachers with assistance in order resources at the same time that reduces the effort needed to to minimize some of these problems. The use of a peda- find relevant information. From the teachers’ perspective, it gogical agent allows students to have a constant feedback can help them to summarize students’ writings in a distant and guidance based on the identification of problems that learning context [2], to assess the quality of posts in discus- may emerge from an online collaborative writing activity. sion forums [3] and to evaluate the students participation The presence of this agent is intended to help students co- in discussion forums [4]. Furthermore, it can provide use- ordinate their efforts in writing a text collaboratively, and ful feedback to teachers so they can easily identify the main improve their work in terms of coherence. Furthermore, the concepts in students’ writings and the connection between pedagogical agent is also be able to assist teachers, reporting these concepts [2]. problems and simplifying their tasks related to the analysis of the work produced by each student. To support our ped- The possibility to work collaboratively is attractive both agogical agent we propose the use of data mining tools to for students and teachers. It allows students to exchange extract information related to the students’ writings, and a knowledge, to help each other and complement their work recommender system to suggest additional resources. with different ideas. However, it also creates several diffi- culties for teachers when evaluating the work done, once it’s hard and demanding to monitor each student production Keywords [5]. Besides, in a distance learning context, certain barriers Pedagogical Agent, Intelligent Tutoring System, Data Min- may hinder the establishment and maintenance of distance ing, Collaborative, Distance Learning learning programs, such as technical problems, infrastruc- ture, motivational difficulties, necessary skills, social prob- 1. INTRODUCTION AND MOTIVATION lems and time/interruptions [6]. This project’s goal is to In the last few years the Web 2.0 and 3.0 have helped pro- contribute with the development of a social learning environ- liferate a large number of educationally driven tools, such ment in which collaborative writing takes place in a cohesive way, minimizing technical and interactive/social difficulties that are inherent to online collaborative work. This work has also been motivated by the fast increase in the number of collaborative writing tools available online, and the problems that often originate from their use. For instance, the lack of interaction and actual collaboration in collaborative writing tasks, and the lack of supervision and feedback to students’ work. This project proposes a pedagogical agent to be used in them during this process. The experiences and knowledge a collaborative writing environment, so that it may assist acquired may be reused in their future experiences [19, 20, teachers and students in their tasks. The agent uses data 21, 22]. The PA may also be seen is a tool capable of mediat- mining to identify problems in the students’ writings, and ing learning, once it may interact with learners, individualize based on this information it tries to guide them in improving feedbacks and foster autonomy and collaborative skills. So- their collaborative text production. Besides, the agent gives cial interaction, according to the sociocultural perspective, teachers more accurate information about the students’ par- is essential for the promotion of learning and development ticipation, their difficulties and interactive writing process. [19, 20, 21]. 2. RELATED WORK 3. RESEARCH PROPOSAL Intelligent tutoring systems (ITS ) are computer tools capa- Based on the highlighted difficulties of the development of ble of providing customized instruction or feedback to learn- collaborative work online, we propose a pedagogical agent ers [7]. They usually operate without the need of human to be inserted in an online collaborative writing environ- intervention. They differ from traditional content-delivery ment. This agent will be capable of helping students through computerized learning systems for their ability to improve immediate feedback about their collaborative text produc- the effectiveness of a learner’s experience through the use of tion, and it will assist teachers through the presentation of an artificial intelligence [8]. information/indicator regarding students’ participation and progress in the assigned tasks. Our goal is to provide a full ITS often uses a variety of computational resources for an- time assistance to the users of the collaborative environment alyzing the users interaction with the system. These are and to reduce the amount of work needed to analyze their adaptive mechanisms, capable of personalizing learning ac- interactions and work. cording to individual student characteristics, such as knowl- edge on the subject, mood and emotion [9] and learning style In order to do that, we designed a pedagogical agent to [10]. They may be programmed to identify user’s informa- be integrated in the intelligent tutoring system. The ITS tion as they interact with the system and choose from many will be responsible for collecting all information regarding actions the one that most likely would be beneficial to each the students activities in the environment. The student particular user. could be adding text, images, audio, video or any other resource to the project or simply reviewing and modifying However, the ability to properly help the user often depends some previous work. In any case, the ITS must keep a log on the interface between the system and the user. Not rarely, of those interactions in order to determine which action to ITS require a virtual character to interact with users. These perform (when needed). Among the different information characters are called pedagogical agents. The use of peda- collected by the system, we may list: the student’s contri- gogical agent (PA) in educational applications has demon- bution to the project (either by the addition of new contents strated that these animated characters may improve stu- or revision/edition of previous work); frequency (how many dent’s engagement and learning experience [11]. times each student accessed the collaborative environment, for how long and when) and a concept map summarizing A PA is a human-like virtual character that has the advan- what has been written by each student. Once we are con- tage to operate continuously and autonomously. It is ca- sidering a collaborative environment, several users may be pable of searching and interpreting information received or at the system at the same time and it is important for the perceived through the system and provides a more natural ITS to identify which student is responsible for which ac- interaction with the user. PAs are capable of adapting their tion. Whether the students access the environment simulta- actions and interventions, providing feedback and guiding neously or separated, it is important for all users to have an problem solving, reflection, understanding and collaborative identifier that will inform the system which user is current learning [12, 13, 14]. online and modifying any given document. Among the many benefits of using PAs are the increase of All the information collected by the ITS will be processed motivation, perception of ease and comfort in the learn- using data mining techniques. This allows the system to ing environment, the promotion of fundamental behaviors of identify what type of contribution the user has made to the learning, the realization of a need for personal relationships project and infer how cohesive and coherent the text pro- in learning and gains in terms of memory, understanding duced collaboratively is. A data miner similar to Sobek[2] and problem solving [15]. Not only PAs can present con- will be used to perform these tasks. Sobek is a text miner tents to the users, as they may suggest additional resources, that uses statistical analysis to obtain the most relevant con- highlighting important issues and recommending new ex- cepts in a text and the relationships between them. A data ercises and reference materials according to user’s progress mining process will be used to convert multimedia resources [16]. Studies have shown that the use of a PA with text min- present in the project in concepts and relations. ing features could help students bring relevant contributions to a reading discussion [17] The system will combine the data extracted from the users’ writings with the data provided by the teacher to evaluate According to sociocultural theory, learning can be consid- if the students project is related to the requirements. In ered a regulatory process that is mediated by social interac- order to do it, the agent will compare the concepts and re- tion among individuals, cultural artifacts (computer, peda- lationships extracted from the students’ writings with the gogical agent) and speech [18]. Users’ interactions help them concepts and relationships extracted from the task speci- in knowledge constructing and sharing, being internalized by fication and resources provided by the teacher. Breno et. al. [4] showed that this kind of comparison could provide work is constantly edited by other students). The second useful information regarding the quality/relevance of stu- form of interaction will be a support interface, where the dents’ contributions in discussion forums. The results of teacher will be able to request specific information regard- this comparison will determine if the PA has to make any ing student’s activities in the environment. All those PA intervention to help students improve their text. It is par- features are meant to provide a more personalized contact ticularly important for this intervention to result in positive between the agent, the students and/or teachers. reinforcement. Another aspect the agent is concern about is the coherence There are several types of interventions planned for both of the project. The students may not be together when students and teacher. The most common type of interven- writing the project and it may result in disjoint texts, unre- tions for students is a direct message sent by the PA inquir- lated or redundant information. Therefore, it is important ing about some aspect of the project. Those inquiries are for the agent to identify coherence problems and contact the intended to foster critical thinking and help students cor- students who produced the incoherent parts of the project. rect what the ITS identifies as a problem. The messages The agent may also use the information gathered from those are sent when the students’ work is incomplete or lack co- incoherent parts to search for additional material and learn- hesion. In both cases, it is possible for the agent to suggest ing objects that could help students fill the possible gaps in additional material that may help them correct the prob- the project. Although we considered using Latent Seman- lem. Another possible intervention is the use of e-mail mes- tic Analysis to perform this task [23], it is still problematic sages when students are not participating in the collabora- to decide on how to interpret multimedia resources that are tive work, or when their contributions are not coherent with neither text nor learning objects (in which case it is possible the remaining project. The last type of intervention is not to use its keywords and descriptions). an automatic answer from the PA, but an explicit request for help from a student. Specific functions are being developed to enable students to ask the PA for further information 4. STATE OF THE PROJECT about some aspect of the project, or about its structure and In the current state of the project, we are developing a script coherence. program that will collect the data from students’ writings and send it to the pedagogical agent. This is part of the A key feature in those interactions between PA and students development of our intelligent tutoring system. This is also is the agent’s ability to identify additional resources that one of the most challenging parts of this project, as it is may help the students’ text production. As the students’ important to correctly interpret and evaluate multimedia project may include several types of media and different for- resources. Through the use of scripts without a particular mat of resources, the agent is able to recommend students user interface, we intend to create a more reusable system, with additional learning objects extracted from repositories allowing it to be used in many tools and environments with- as well as from the web. Learning objects are usually in- out the need for changing the program’s core. dexed using metadata, where keyword and object descrip- tions are very common, information that is used to search Our data mining system based on Sobek is already being for learning objects that are most related to the topic at modified to provide useful information regarding resource hand. Using a similar technique to the one used by Breno similarity, concepts and relationships. The input for our et. al. [4], the learning objects that present the highest simi- data miner is very restrictive, but we are working on mak- larity values are presented to the students. Learning objects ing it more general. This is most useful to investigate how to selected will be separated by their format (video, audio, im- successfully mine multimedia resources. Using the learning age, text, etc.) so that it will be easier for the student to object repository, we may conduct experiments that will as- select the most appropriate ones. sess the quality of the results and the reliability of our data mining tool. Some of the PA’s functions are specific to help teachers. As it is often difficult and time consuming for teachers to ana- The experiments will be carried out in projects using Google lyze the individual production of each student in collabora- Drive 1 . The choice for this environment has been the ex- tive writing tasks, the PA will provide accurate information tensive database and number of projects that are developed about each student’s contributions and progress based on with this tecnology daily. It is one of the most known and the information collected by the ITS. 