Using Structural Domain and Learner Models to Link Multiple Data Sources for Learning Analytics Michael D. Kickmeier-Rust, Dietrich Albert Graz University of Technology, Graz, Austria michael.kickmeier-rust@tugraz.at, dietrich.albert@tugraz.at Abstract. The uptake of Learning Analytics is often limited to online courses and other digital environments such as MOOCs and it is sparsely used in schools. The reason is that school-based teaching and learning is still an ‘ana- logue’ and personal process that is not producing the digital data that are neces- sary to conduct in-depth learning analytics. The Lea’s Box project is addressing this problem by supporting teachers in their daily practice to collect data as easy and complete as possible to have at least the ‘little data’ required to make their teaching more individual and more formative. In addition, the project attempts to develop competence-oriented techniques for learning analytics on the basis of solid theories that have been developed in the context of intelligent tutorial systems. In this paper we present a summary about the developments and expe- riences with the tools and techniques in schools. Keywords: Learning Analytics, Competence-based Knowledge Space Theory, Formal Concept Analysis 1 Introduction Learning analytics (LA) and educational data mining (EDM) are more than recent buzz words in educational research: they signify one of the most promising develop- ments in improving teaching and learning. While many attempts to enhance learning with mere technology failed in the past, making sense of a large amount of data col- lected over a long period of time and conveying it to teachers in a suitable form is indeed the area where computers and technology can add value for future classrooms. However, reasoning about data, and in particular learning-related data, is not trivial and requires a robust foundation of well-elaborated psycho-pedagogical theories. The fundamental idea of learning analytics is not new, of course. In essence, the aim is using as much information about learners as possible to understand the meaning of the data in terms of the learners’ strengths and weaknesses, abilities, competences and declarative knowledge, attitudes and social networks, as well as learning progress, with the final goal of providing the best and most appropriate personalized support. Thus, the concept of learning analytics is quite similar to the idea of formative as- sessment. “Good” teachers of all time have strived to achieve exactly this goal. How- ever, collecting, aggregating, storing and interpreting information about learners that originates from various sources and over a longer period of time (e.g., a semester, a school year, or even in a lifelong learning sense) requires smart technology. To ana- lyze this vast amount of data, give it educational meaning, visualize the results, repre- sent the learner in a holistic and fair manner, and provide appropriate feedback, teachers need to be equipped with the appropriate technology. With that regard, a substantial body of research work and tools already exist. The European Lea’s Box project aims to continue and enrich on-going developments and facilitates the broad use of learning analytics in the “real educational world”. Lea’s Box stands for “Learning Analytics Tool Box” and concentrates on a compe- tence-centered, multi-source formative analytics methodology based on sound psy- cho-pedagogical models, such as the Competence-based Knowledge Space Theory (CbKST) and the Formal Concept Analysis (FCA) which are used to put structural domain and learner models in the center of LA. The tangible result of Lea’s Box man- ifest in form of a Web platform for teachers and learners provide links to the existing components and interfaces to a broad range of educational data sources. Teachers will be able to link the various tools and methods that they are already using in their daily practice and that provide software APIs (e.g., Moodle courses, electronic tests, Google Docs, etc.) in one central location. More importantly, the platform hosts the newly developed LA/EDM services, empowering educators to conduct competence- based analysis of rich data sets. A key focus of the platform will enable teachers not only to combine existing bits of data but to allow them to “generate” and collect data in very simple forms, not requiring sophisticated hard- or software solutions. Finally, we want to open new ways to display the results of learning analytics - leaving the rather statistical dashboard approach, moving towards structural visualizations and towards opening the internal learner models. 2 Structural Domain and Learner Modelling The original Knowledge Space Theory (KST), founded by Doignon and Falmagne [1, 2] and extensions such as the CbKST, are coming from the genre of autonomous in- telligent and adaptive tutoring systems. The idea was to broaden the ideas of the linear Item Response Theory (IRT) scaling, where a number of items are arranged on a sin- gle, linear dimension of “difficulty”. In essence, KST provided a basis for structuring a domain of knowledge and for representing the knowledge based on prerequisite relations. More recent advancements of the theory accounted for a probabilistic view of test results and they introduced a separation of observable performance and the actually underlying abilities and knowledge of a person. Such developments lead to a variety of theoretical, competence-based approaches (cf. [3] for an overview). An empirically well-validated approach to CbKST was introduced by [4]; basically, the idea was to assume a finite set of more or less atomic competencies (in the sense of some well-defined, small scale descriptions of some sort of aptitude, ability, knowledge, or skill) and a prerequisite relation between those competences. In a first step, CbKST attempts to develop a model of the learning domain, e.g. al- gebra. Examples for such competencies might be the knowledge what an integer is or the ability to add two positive integers and so on. The level of granularity to which a domain is broken down depends on the envisaged application and might range from a very course-grained level on the basis of lessons (for example to plan a school term) to a very fine-grained level of atomic entities of knowledge/ability (for example as the basis of an intelligent problem solving support application). In a second step, CbKST looks into a natural course of learning and development and into logical prerequisites between competencies. Usually, learning and the development of new abilities as well as the stabilization of skills occurs along developmental trajectories. On the basis of a set of competencies and a set of prerequisite relationships between them, we can for- mally derive a collection of so-called competence states (Figure 1). Due to such pre- requisite relations between the competencies, not all subsets of competencies (which would result in the power set) are plausible competence states. Fig. 1: A prototypical competence space. So far, the structural model focuses on latent, unobservable competencies; loosely speaking the model makes hypotheses about the brain’s black box. By utilizing inter- pretation and representation functions the latent competencies are mapped to evidence or indicators relevant for a given domain. Such indicators might be test items but might refer to all sorts of performance or behavior (e.g., the concrete steps when working with a spread sheet application). Due to these functions, latent competencies and observable performance can be linked in a broad form. This means that an entire series of indicators can be linked to underlying competencies. CbKST accounts for the fact that indictors such as test items cannot be perfect evidence for the latent knowledge or ability. There is always the possibility that a person makes a lucky guess or exhibits a correct behavior/activity just by chance. In turn, a person might fail in a test item although the necessary knowledge is actually available (e.g., being inattentive or careless). Thus, CbKST considers indicators on a probability-based level, this means that mastering a test item suggest having the underlying competen- cies with a certain probability. Conceptually, this constitutes a probability distribution over the competence structure. A further significant advantage of such approach is that learning is not only considered a one dimensional course on a linear trajectory, equal for all learners. Learning rather occur along one of an entire range of possible learning paths. 3 Visualizing Structural Models As claimed, Hasse diagrams are capable of holding a number of important infor- mation for an educator to evaluate the learning progress and also to make recommen- dations. In this paper we want to highlight such advantages. 3.1 Competence States and Levels As outlined, a competency space is the collection of meaningful states a learner can be in. Depending on the domain, the amount of possible states might be huge. The big advantage, however, is that depending on the degree of structure in the domain, by far not all possible combinations of competencies are reasonable and thus part of the space. When zooming into the diagram, a teacher can exactly identify the set of com- petencies that is most likely for the learner, by zooming out color-coding can illustrate the most likely locations of a learner within the space. When looking at the entire space, it is obvious at first site at which completion level a learner is approximately (rather at the beginning or almost finished). These zoom levels are shown in Figure 2. Technically, there is a variety of options to achieve the coding, for example, bolding, greying, or color coding, whereas likely states are displayed more distinctly than such with low probability. Equal to individual states, Hasse diagrams can represent group distributions. De- fined by a certain confidence interval of probabilities those states and areas can be made more salient that hold the highest percentage of learners of a group. By this means, specific areas in the competency space become apparent within which the most learners are and, in contrast also positive or negative outliners pop out the dia- gram. A different method was suggested by [4], who altered the size of the nodes to represent the groups’ sizes; the larger a node the more learners hold a particular state. 3.2 Learning Paths In addition to having insight into groups’ and individuals’ current states of learn- ing, the learning history, the so-called learning paths, are of interested for educators; on the one hand for planning future activities, on the other hand, for negotiation and documenting the achievements of a learning episode (e.g., a semester). Learning paths can be simply displayed by highlighting the edges between the most likely state(s) over time. As for the states, various probable paths can be realized by making more Fig. 2: Hasse diagram illustrating the probability distribution over a competence space on three zoom levels. likely paths more intensive (by color coding or line thickness). Figure 2 shows a simple example (red line). A key strength of presenting learning paths, as indicated, is opening up the learner model to the learners (perhaps parents) themselves [4] – to explain where they started at the beginning of a course and how they proceeded dur- ing the course and which competencies they hold today. This perhaps can be com- plemented with comparisons to others or groups. Not least, learning paths can unveil information about the effectiveness and impact of certain learning activities, materi- als, or the teacher herself. 4 Multi-Modal, Multi-Source Data The features of CbKST-ish modelling bear clear advantages for LA. However, the key question is, where do they data for computing the probabilities of competence states come from. In typical digital learning scenarios such as e-Learning lessons, MOOCs, or Webinars the data come rather naturally. Students are permanently acting in one closed digital system and consequently rich and clean is the basis of data. In most other educational settings (e.g., in typical K18 classroom scenarios) students do not act in or in front of electronic devices. Most actives occur in an analogues way and digital activities are done with a vast range of different devices, apps, and soft- ware tools – from Google Docs to Facebook, from a multimedia app to a Moodle quiz. This means that data is recorded sparsely, the sources are manifold, the data sets are extremely heterogeneous in nature and so is the explanatory power of data. With multi-modal data, that is, data that is collected by a multitude of sensory de- vises such as gaze trackers, smart watches, wearables, etc., the problem gets worse and more complicated. The delicate task is to interpret heart rate information in terms of engagement or learning performance. Building one solid picture of a learner on this basis demands an underlying anchor – in form of a robust learner and domain model. In this paper we argued that the CbKST modelling offers a range of options to bring data of multiple sources and even of multiple devices together. Heart rates or gaze paths can serve as probabilistic indicators for strong or weak learning processes, for engagement, or motivation. The approach of separating latent competencies (which more or less develop and exist in the black box named ‘human brain’) from the performance (that what we can observe), bears particular advantages. On the one hand, performance, e.g. test scores, classroom participation, homework, etc., is not only determined by competencies or aptitude; there is a variety of aspects contributing to a certain performance, e.g., moti- vation, daily constitution, tiredness, external distractors, nutrition, health status, etc. On the other hand, CbKST-ish structures are rather stable, once set up and validated properly. The advantage lays in the fact that performance such as test results, behav- iors, achievements, etc. is considered as probability-based indicators for certain com- petencies. Mathematically this relationship is established in form of interpretation and representation functions [5], which links an arbitrary set of performances/behaviors to one or more competencies, either in an increasing or in a decreasing sense. This, in the end, allows linking all available and perhaps changing data sources to one and the same competence space. It’s not about a single test, it’s about all available infor- mation we can gather, even it is considered being of little importance, all sorts of information may contribute to strengthen the model, the view of the learner. In case the amount or quality of data is weak, CbKST allows conservative interpretations, based on the arising probability distributions, in case there is a richer data basis, the probability distributions are more reliable, valid, and robust. For the educator, and this is important, the uncertainty is mirrored in the degree of likelihood. On a weak data basis, the probabilities of competence states differ substantially less than on the basis of richer data. Such information, however, can change the educator’s view and evalu- ation of a student’s achievements. In the end, this approach supports a fairer and more substantiated approach to grading or providing formatively inspired feedback. The described approach is realized in the Lea’s Box project in form of an easy to use web platform. This platform is designed for teachers and provides a number of internal tools for recording and analyzing data. The platform provides also APIs (e.g., xAPI) to link various external data sources to the central domain and learner models. Teachers and students can retrieve information about the learning processes, the com- petency probabilities, and learning trajectories in manifold ways and with a variety of visualizations. These range from simple cartoonish visualizations to complex directed graphs (cf. Figure 1). Our experiences show that the approach provides practical LA solutions for practitioners and interesting new insights into learning processes. More- over, the structural models offer promising beacons to bring the data of diverse data sources together in a meaningful way. 5 Acknowledgements This work described in this paper is based on LEA’s BOX project, supported by the European Commission contracted under number 619762, of the 7th Framework Pro- gramme. This document does not represent the opinion of the EC and the EC is not responsible for any use that might be made of its content. References 1. Doignon, J.-P., & Falmagne, J-C. Knowledge spaces. Berlin: Springer (1999). 2. Falmagne, J-C., & Doignon, J.-P. Learning Spaces. Berlin: Springer (2011). 3. Albert, D., & Lukas, J. Knowledge spaces: Theories, empirical research, and applications. Mahwah, NJ: Lawrence Erlbaum Associates (1999). 4. Nakamura, Y., Tsuji, H., Seta, K., Hashimoto, K., and Albert, D. Visualization of Learn- er’s State and Learning Paths with Knowledge Structures. In A. König et al. (Eds.), KES 2011, Part IV. Lecture Notes in Artifical Intelligence 6884, pp. 261-270. Berlin: Springer (2011). 5. Korossy, K. Modelling knowledge as competence and performance. In D. Albert & J. Lu- kas (Eds.), Knowledge Spaces: Theories, empirical research, and applications (pp. 103– 132). Mahwah, NJ: Lawrence Erlbaum Associates (1999).