Graph-based Modelling of Students’ Interaction Data from
          Exploratory Learning Environments

          Alexandra Poulovassilis              Sergio Gutierrez-Santos                  Manolis Mavrikis
             London Knowledge Lab                 London Knowledge Lab                London Knowledge Lab
            Birkbeck, Univ. of London            Birkbeck, Univ. of London           UCL Institute of Education
              ap@dcs.bbk.ac.uk                  sergut@dcs.bbk.ac.uk                 m.mavrikis@lkl.ac.uk


ABSTRACT                                                         learning of algebraic generalisation [15]; and the iTalk2Learn
Students’ interaction data from learning environments has        system that aims to support 8-10 year old students’ learning
an inherent temporal dimension, with successive events be-       of fractions [7]. Both systems provide students with math-
ing related through the “next event” relationship. Exploratory   ematical microworlds in which they undertake construction
learning environments (ELEs), in particular, can generate        tasks: in MiGen creating 2-dimensional tiled models using
very large volumes of such data, making their interpretation     a tool called eXpresser and in iTalk2learn creating fractions
a challenging task. Using two mathematical microworlds           using the FractionsLab tool. In eXpresser, tasks typically re-
as exemplars, we illustrate how modelling students’ event-       quire the construction of several models, moving from spe-
based interaction data as a graph can open up new querying       cific models involving specific numeric values to a general
and analysis opportunities. We demonstrate the possibilities     model involving the use of one or more variables; in parallel,
that graph-based modelling can provide for querying and          students are asked to formulate algebraic rules specifying
analysing the data, enabling investigation of student-system     the number of tiles of each colour that are needed to fully
interactions and leading to the improvement of future ver-       colour their models. In FractionsLab, tasks require the con-
sions of the ELEs under investigation.                           struction, comparison and manipulation of fractions, and
                                                                 students are encouraged to talk aloud about aspects of their
                                                                 constructions, such as whether two fractions are equivalent.
Keywords
Exploratory Learning Environments, Interaction Data, Graph       Both systems include intelligent components that provide
Modelling                                                        different levels of feedback to students, ranging from un-
                                                                 solicited prompts and nudges, to low-interruption feedback
1.   INTRODUCTION                                                that students can choose to view if they wish. The aim
Much recent research has focussed on Exploratory Learn-          of this feedback is to balance students’ freedom to explore
ing Environments (ELEs) which encourage students’ open-          while at the same time providing sufficient support to en-
ended interaction within a knowledge domain, coupled with        sure that learning is being achieved [9]. The intelligent sup-
intelligent techniques that aim to provide pedagogical sup-      port is designed through detailed cognitive task analysis and
port to ensure students’ productive interaction [9]. The data    Wizard-of-Oz studies [13], and it relies on meaningful indica-
gathered from students’ interactions with such ELEs pro-         tors being detected as students are undertaking construction
vides a rich source of information for both pedagogical and      tasks. Examples of such indicators in MiGen are ‘student
technical research, to help understand how students are us-      has made a building block’ (part of a model), ‘student has
ing the ELE and how the intelligent support that it provides     unlocked a number’ (i.e. has created a variable), ‘student
may be enhanced to better support students’ learning.            has unlocked too many numbers for this task’; while ex-
                                                                 amples of such indicators in FractionsLab are ‘student has
In this paper, we consider how modelling students’ event-        created a fraction’, ‘student has changed a fraction’ (numer-
based interaction data as a graph makes possible graph-          ator or denominator), ‘student has released a fraction’ (i.e.
based queries and analyses that can provide insights into        has finished changing it).
the ways that students are using the affordances of the sys-
tem and the effects of system interventions on students’ be-     Teacher Assistance tools can subscribe to receive real-time
haviour. Our case studies are two intelligent ELEs: the          information relating to occurrences of indicators for each
MiGen system, that aims to foster 11-14 year old students’       student, and can present aspects of this information visu-
                                                                 ally to the teacher [8]. Indicators are either task independent
                                                                 (TI) or task dependent (TD). The former refer to aspects of
                                                                 the student’s interaction that are related to the microworld
                                                                 itself and do not depend on the specific task the student is
                                                                 working on, while the latter require knowledge of the task
                                                                 the student is working on, may relate to combinations of
                                                                 student actions, and their detection requires intelligent rea-
                                                                 soning to be applied (a mixture of case-based, rule-based and
                                                                 probablistic techniques). Detailed discussions of MiGen’s TI
and TD indicators and how the latter are inferred may be                Event                        EventType
found in [8].                                                         dateTime                       eventID
                                                                      taskID                         eventStatus
                                                                                      occurrenceOf
In this paper we explore how graph-based representation of            constrID                       eventCat
                                                                      userID
event-based interaction data arising from ELEs such as Mi-            sessionID
Gen and FractionsLab can aid in the querying and analysis
of such data, with the aim of exploring both the behaviours                          next
of the students in undertaking the exploratory learning tasks
set and the effectiveness of the intelligent support being
provided by the system to the students. Data relating to
                                                                                  Figure 1: Core Graph Data Model
learning environments has often been modelled as a graph
in previous work, for example in [10] for providing support
to moderators in e-discussion environments; in [16, 18] for        There is a relationship ‘next’ linking an instance of Event
supporting learning of argumentation; in [17] for modelling        to the next Event that occurs for the same user, task and
data and metadata relating to episodes of work and learning        session. There is a relationship ‘occurrenceOf’ linking each
in a lifelong learning setting; in [1] for learning path discov-   instance of Event to an instance of the EventType class.
ery as students “navigate” through learning objects; in [3] for
recognising students’ activity planning in ELEs; and in [23]       The instances of the EventType class include: startTask,
for gaining better understanding of learners’ interactions and     endTask, numberCreated, numberUnlocked, unlockedNum-
ties in professional networks.                                     berChanged, buildingBlockMade, correctModelRuleCreated,
                                                                   incorrectModelRuleCreated, interventionGenerated, interven-
Previous work that is close to ours is the work on interac-        tionShown, in the case of eXpresser (see [8] for the full list);
tion networks and hint generation [6, 21, 20, 4, 5], in which      and startTask, endTask, fractionCreated, fractionChanged,
the graphs used consist of nodes representing states within a      fractionReleased, inverventionShown, in the case of Frac-
problem-solving space and edges representing students’ ac-         tionsLab.
tions in transitioning between states. This approach targets
learning environments where students are required to select        We see that instances of the EventType class have several
and apply rules, and the interaction network aims to rep-          attributes, including:
resent concisely information relating to students’ problem-
solving sequences in moving from state to state. Our focus
here differs from this in that we are using graphs to model           • eventID: a unique numerical identifier for each type of
fine-grained event-based interaction data arising from ELEs.            indicator;
In our graphs, nodes are used to represent indicator occur-
rences (i.e. events, not problem states) and edges between            • eventStatus: this may be -1, 0 or 1, respectively stat-
such nodes represent the “next event” relationship. Also,               ing that an occurrence of this type of indicator shows
rather than using the information derived from querying and             that the student is making negative, neutral or posi-
analysing this data to automatically generate hints, our fo-            tive progress towards achieving the task goals; an addi-
cus is on investigating how students are using the system               tional status 2 is used for indicators relating to system
and the effects of the system’s interventions in order to un-           interventions;
derstand how students interact with the ELEs and improve
their future versions.                                                • eventCat: the category into which this indicator type
                                                                        falls; for example, startTask and endTask are task-
                                                                        related indicators; interventionGenerated and inter-
2.   GRAPH-BASED MODELLING                                              ventionShown are system-related ones; numberCreated,
Figure 1 illustrates our Graph Data Model for ELE interac-              numberUnlocked, unlockedNumberChanged are number-
tion data. We see two classes of nodes: Event — represent-              related; and fractionCreated, fractionChanged, frac-
ing indicator occurrences; and EventType — representing                 tionReleased are fraction-related.
different indicator types. The instances of the Event class
are occurrences of indicators that are detected or generated       Figure 2 shows a fragment of MiGen interaction data con-
by the system as each student undertakes a task. We see            forming to this graph data model. Specifically, it relates to
that instances of Event have several attributes: dateTime:         the interactions of user 5 as he/she is working on task 2
the date and time of the indicator occurrence; userID: the         during session 9. The user makes three constructions during
student it relates to; sessionID: the class session that the       this task (with constrIDs 1, 2 and 3). The start and end
student was participating in at the time; taskID: the taskID       of the task are delimited by an occurrence of the startTask
that the student was working on; and constrID: the con-            and endTask indicator type, respectively — events 23041
struction that the student was working on1 .                       and 33154. We see that the two events following 23041 re-
                                                                   late to an intervention being generated and being shown to
1
 The model in Fig. 1 focusses on the interaction data. The         the student (this is likely to be because the student was in-
full data relating to ELEs such as eXpresser and Fraction-         active for over a minute after starting the task); following
sLab would also include classes relating to users, tasks, ses-     which, the student creates a number — event 24115.
sions and constructions; and attributes describing instances
of these classes, such as a user’s name and year-group, a
task’s name and description, a construction’s content and          There are additional attributes relating to events, not shown
description, and a session’s description and duration.             here for simplicity, capturing values relating to the student’s
            23041                                                                                                                        344712
                                                   startTask                                                                                                                    startTask
           dateTime:           occurrenceOf                                                                                             dateTime:           occurrenceOf
           20150331091524                         eventID:0                                                                             20150215091741                         eventID:0
           taskID:2                               eventStatus:0                                                                         taskID:56                              eventStatus:0
                                                  eventCat:taskEv                                                         next                                                 eventCat:taskEv
           constrID:1                                                                                                                   constrID:4
           userID:5                                                                                                                     userID:5
           sessionID:9                                                                                                                  sessionID:1
                                                        intervention-                   intervention-
  next                                                  Generated                       Shown
                                                                                                                               ...                                                  fractionChanged             fractionReleased
              23921                                                                                                                        344758
                                 occurrenceOf         eventID:6001                     eventID:6002                                                           occurrenceOf         eventID:1002                eventID:1003
            dateTime:                                 eventStatus:2                    eventStatus:2                                     dateTime:                                 eventStatus:1               eventStatus:1
            20150331091637                            eventCat:systemEv                eventCat:systemEv                   next          20150215091828                            eventCat:sfractionEv        eventCat:fractionEv
            taskID:2                                                                                                                     taskID:56
            constrID:1                                                                                                                   constrID:4
            userID:5                                                                                                                     userID:5
            sessionID:9                       occurrenceOf                                     numberCreated                             sessionID:1                       occurrenceOf                                interventionShown

                                                                                               eventID:1006                                                                                                            eventID:6002
    next                                                                                       eventStatus:1                     next                                                        occurrenceOf              eventStatus:2
               23923                                                occurrenceOf               eventCat:numberEv                            344759                                                                     eventCat:systemEv
              dateTime:                   24115                                                                                            dateTime:                   344760
              20150331091638            dateTime:                              33154                                                       20150215091828            dateTime:                        344761
              taskID:2                  20150331091655                                                                                     taskID:56                 20150215091832
              constrID:1                                                      dateTime:         occurrenceOf                               constrID:4                                              dateTime:
                                        taskID:2                              20150331094453                                                                         taskID:56                     20150215091833   occurrenceOf
              userID:5                  constrID:1                                                                                         userID:5                  constrID:4
              sessionID:9        next   userID:5
                                                                        ...   taskID:2                  endTask                            sessionID:1        next   userID:5
                                                                                                                                                                                                   taskID:56                       clickButton
                                                                              constrID:3                eventID:9999                                                                               constrID:4                      eventID:3002
                                        sessionID:9            next           userID:5                                                                               sessionID:1            next   userID:5
                                                                                                        eventStatus:0                                                                                                              eventStatus:0
                                                                              sessionID:9               eventCat:taskEv                                                                            sessionID:1                     eventCat:taskEv


                Figure 2: Fragment of Graph Data                                                                                             Figure 3: Fragment of Graph Data


                                                                                                                          of a node would be represented by an edge and its value by
constructions and information relating to the system’s in-                                                                a literal-valued node. So, for example, the information that
terventions. For example, for event 24115, the value of the                                                               the taskID of event 23041 is 2 would be represented by an
number created, say 5; for event 23921, the feedback strat-                                                                            taskID
                                                                                                                          edge 23041 −−−−−→ 2. The query examples in the next section
egy used by the system to generate this intervention, say                                                                 assume this “classical” graph representation.
strategy 8; and for event 23923, the content of the message
displayed to the user, say “How many green tiles do you need
to make your pattern?” and whether this is a high-level in-
                                                                                                                          3.         GRAPH QUERIES AND ANALYSES
terruption by the system or a low-level interruption that                                                                 Because the sub-graph induced by edges labelled ‘next’ con-
the student can choose to view or not. Such information                                                                   sists of a set of paths, the data readily lends itself to explo-
can be captured through additional edges outgoing from an                                                                 ration using conjunctive regular path (CRP) queries [2]. A
                                               value
event instance to a literal-valued node: 24115 −−−→ 5, 23921                                                              CRP query, Q, consisting of n conjuncts is of the form
strategy         message
−−−−−−→ 8, 23932 −−−−−− → “How many green tiles do you need                                                                             (Z1 , . . . , Zm ) ← (X1 , R1 , Y1 ), . . . , (Xn , Rn , Yn )
                                level
to make your pattern?”, 23932 −−−→ “high”. Since graph data
                                                                                                                          where each Xi and Yi is a variable or a constant, each Zi is
models are semi-structured (and graph data therefore does
                                                                                                                          a variable that appears also in the right hand side of Q, and
not need to strictly conform to a single schema), this kind
                                                                                                                          each Ri is a regular expression over the set of edge labels.
of heterogeneity in the data is readily accommodated.
                                                                                                                          In this context, a regular expression, R, has the following
                                                                                                                          syntax:
Figure 3 similarly shows a fragment of FractionsLab inter-
action data, relating to the interactions of user 5 working                                                                                   R :=  | a | | (R1 .R2 ) | (R1 |R2 ) | R∗ | R+
on task 56 during session 1. The user makes one construc-
                                                                                                                          where  denotes the empty string, a denotes an edge label,
tion during this task. We see events relating to the stu-
                                                                                                                          denotes the disjunction of all edge labels, and the operators
dent changing and ‘releasing’ a fraction. Following which
                                                                                                                          have their usual meaning. The answer to a CRP query on a
the system displays a message (in this case, it was a high-
                                                                                                                          graph G is obtained by finding for each 1 ≤ i ≤ n a binary
interruption message of encouragement “Great! Well Done”).
                                                                                                                          relation ri over the scheme (Xi , Yi ), where there is a tuple
                                                                                                                          (x, y) in ri if and only if there is a path from x to y in G
We see from Figures 2 and 3 that the sub-graph induced by
                                                                                                                          such that: x = Xi if Xi is a constant; y = Yi if Yi is a
edges labelled ‘next’ consists of a set of paths, one path for
                                                                                                                          constant; and the concatenation of the edge labels in the
each task undertaken by a specific user in a specific session.
                                                                                                                          path satisfies the regular expression Ri . The answer is then
The entire graph is a DAG (directed acyclic graph): there
                                                                                                                          given by forming the natural join of the binary relations
are no cycles induced by the edges labelled ‘next’ since each
                                                                                                                          r1 , . . . , rn and finally projecting on Z1 , . . . , Zm .
links an earlier indicator occurrence to a later one; while
the instances of EventType and other literal-valued nodes
                                                                                                                          To illustrate, the following CRP query returns pairs of events
can have only incoming edges. The entire graph is also a
                                                                                                                          x, y such that x is an intervention message shown to the user
bipartite graph, with the two parts comprising (i) the in-
                                                                                                                          by the system and y indicates that the user’s next action –
stances of Event, and (ii) the instances of EventType and
                                                                                                                          in eXpresser – was to create a number (note, variables in
the literal-valued nodes.
                                                                                                                          queries are distinguished by an initial question mark):
As a final observation, we note that Figures 1 – 3 adopt
a “property graph” notation (e.g. as used in the Neo4J                                                                    (?X,?Y) <- (?X,occurrenceOf,interventionShown),
graph database, neo4j.com) in which nodes may have at-                                                                               (?X,next,?Y),
tributes. In a “classical” graph data model, each attribute                                                                          (?Y,occurrenceOf,numberCreated)
The result would contain pairs such as (23923,24115) from          (?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated),
Figure 2, demonstrating that there are indeed situations                         (?X,constrID,?C), (?X,next+,?Y),
where an intervention message displayed by the MiGen sys-                        (?Y,constrlID,?C), (?Y,occurrenceOf,?Z)
tem leads directly to the creation of a number by the student.

The following query returns pairs of events x, y such that         The result would contain triples such as
that x is an intervention message shown to the user by the         (23921, 23923, interventionShown),
system and y is the user’s next action; the type of y is also      (23921, 24115, numberCreated),
returned, through the variable ?Z:                                 (23921, 24136, numberUnlocked),
                                                                   (23921, 24189, unlockedNumberChanged),
                                                                   relating to construction 1 made by user 5 during session 9
(?X,?Y,?Z) <- (?X,occurrenceOf,interventionShown),                 for task 2 (two more events — 24136 and 24189 — relat-
              (?X,next,?Y),                                        ing to construction 1 have been assumed here, in addition
              (?Y,occurrenceOf,?Z)                                 to 23923 amd 24115 shown in Figure 2, for illustrative pur-
                                                                   poses). The results would not contain (23921,33154,end-
The result would contain triples such as (23923,24115,num-         Task), since event 33154 relates to construction 3.
berCreated) from Figure 2 and (344760,344761,clickButton)
from Figure 3, allowing researchers to see what types of           To show more clearly the answers to the previous query in
events directly follow the display of an intervention mes-         the form of possible event paths, we can use extended regular
sage. This would allow the confirmation or contradiction of        path (ERP) queries [11], in which a regular expression can
researchers’ expectations regarding the immediate effect of        be associated with a path variable and path variables can
intervention messages on students’ behaviours.                     appear in the left-hand-side of queries. Thus, for example,
                                                                   the following query returns the possible paths from x to y:
Focussing for the rest of this section on the data in Figure 2,
the following query returns pairs of events x, y such that x is
                                                                   (?X,?P,?Y,?Z) <-
any type of event and y indicates that the user’s next action
                                                                         (?X,occurrenceOf,interventionGenerated),
was to unlock a number; the type of x is also returned,
                                                                         (?X,constrID,?C), (?X,next+:?P,?Y),
through the variable ?Z:
                                                                         (?Y,constrID,?C), (?Y,occurrenceOf,?Z)

(?X,?Y,?Z) <- (?X,occurrenceOf,?Z),
              (?X,next,?Y),                                        The result would contain answers such as
              (?Y,occurrenceOf,numberUnlocked)                     (23921, [next], 23923, interventionShown),
                                                                   (23921, [next, 23923, next], 24115, numberCreated),
                                                                   (23921, [next, 23923, next, 24115, next], 24136, numberUn-
The result would allow researchers to see what types of
                                                                   locked),
events immediately precede the unlocking of a number (i.e.
                                                                   (23921, [next, 23923, next, 24115, next, 24136, next], 24189,
the creation of a variable). This would allow confirmation
                                                                   unlockedNumberChanged).
of researchers’ expectations about the design of the MiGen
system’s intelligent support in guiding students towards gen-
                                                                   The use of the regular expressions next and next+ in the
eralising their models by changing a fixed number to an ‘un-
                                                                   previous queries matches precisely one edge labelled ‘next’,
locked’ one.
                                                                   or any number of such edges (greater than or equal to 1),
                                                                   respectively. However, for finer control and ranking of query
The following query returns pairs of events x, y such that
                                                                   answers, it is possible to use approximate answering of CRP
that x is an intervention generated by the system and y is
                                                                   and ERP queries (see [11, 17]), in which edit operations such
any subsequent event linked to x through a path comprising
                                                                   as insertion, deletion or substitution of an edge label can be
one or more ‘next’ edges; the type of y is also returned,
                                                                   applied to regular expressions.
through the variable ?Z:
                                                                   For example, using the techniques described in [11, 17], the
(?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated),             user can chose to allow the insertion of the label ‘next’ into
              (?X,next+,?Y),                                       a regular expression, at an edit cost of 1. Submitting then
              (?Y,occurrenceOf,?Z)                                 this query:

The result would contain triples such as (23921, 23923, inter-     (?X,?P,?Y,?Z) <-
ventionShown), (23921, 24115, numberCreated), ... (23921,                (?X,occurrenceOf,interventionGenerated),
33154, endTask), allowing researchers to see what types of               (?X,constrID,?C), APPROX(?X,next:?P,?Y),
events directly or indirectly follow the display of an interven-         (?Y,constrID,?C), (?Y,occurrenceOf,?Z)
tion message by the system. This would allow the confirma-
tion or contradiction of researchers’ expectations regarding
the longer-term effect of intervention messages on students’       would return first exact answers, such as
behaviours.                                                        (23921, [next], 23923, interventionShown). The regular ex-
                                                                   pression next in the conjunct APPROX(?X,next:?P,?Y) would
We can modify the query to retain only pairs x, y that relate      then be automatically approximated to next.next, leading
to the same construction:                                          to answers such as
                                                                                                   Transition Matrix(Session 3)


(23921, [next, 23923, next], 24115, numberCreated)                                                        6003      e      s
                                                                                                   6002                          1001
at an edit distance of 1 from the original query. Following                                 6001                                        1002

this, the regular expression next.next would be automati-                            5004                                                      1003

cally approximated to next.next.next, leading to answers                       5003                                                                 1004

such as                                                                      5002                                                                     1005
(23921, [next, 23923, next, 24115, next], 24136, numberUn-
locked)                                                                      5001                                                                      1006


at distance 2. This incremental return of paths of increas-                   3009                                                                    1007

ing length can continue for as long as the user wishes, and                     3008                                                            1008

allows researchers to examine increasingly longer-term ef-                             3007                                                  1009

fects of intervention messages on students’ behaviours. It                                    3006                                    1010
                                                                                                     3002                      1011
would also be possible for users to specify from the outset a                                                1015       1014


minimum and maximum edit distance to be used in approx-                                                                                               0.96342
                                                                                                                                                      0.00024384

imating and evaluating the query, for example to request
paths encompassing between 2 and 4 edges labelled ‘next’.
                                                                        Figure 4: Transitions between Event Types
Queries based on evaluating regular expressions over a graph-
based representation of interaction data, such as those above,
can aid in the exploration of students’ behaviours as they are           applied across a whole dataset, or focussing on partic-
undertaking tasks using ELEs and the effectiveness of the                ular students, tasks or sessions;
intelligent support being provided by the ELE. The query
                                                                       • nodes that have a high probability of being visited on a
processing techniques employed are based on incremental
                                                                         randomly chosen shortest path between two randomly
query evaluation algorithms which run in polynomial time
                                                                         chosen nodes have high betweenness centrality; deter-
with respect to the size of the database graph and the size
                                                                         mining this measure for pairs of event type nodes (ig-
of the query and which return answers in order of increasing
                                                                         noring the directionality of the ‘occurrenceOf’ edges)
edit distance [11]. A recent paper [19] gives details of an
                                                                         would identify event types that play key mediating
implementation, which is based on the construction of an
                                                                         roles between other event types.
automaton (NFA) for each query conjunct, the incremental
construction of a weighted product automaton from each
conjunct’s automaton and the data graph, and the use of           We have already undertaken some ad hoc analyses of in-
a ranked join to combine answers being incrementally pro-         teraction data arising from classroom sessions using ELEs.
duced from the evaluation of each conjunct. The paper also        For example, Figure 4 shows the normalised incoming tran-
presents a performance study undertaken on two data sets          sitions for a 1-hour classroom session involving 22 students
— lifelong learning data and metadata [17] and YAGO [22].         using MiGen (in the diagram, s denotes the ‘startTask’ and
The first of these has rather ‘linear’ data, similar to the in-   e the ‘endTask’ event types). Event types with an adja-
teraction data discussed here, while the second has ‘bushier’     cent circle show transitions where this type of event occurs
connectivity. Query performance is generally better for the       repeatedly in succession. The thickness of each arrow or
former than the latter, and the paper discusses several pos-      circle indicates the value of the transition probability: the
sible approaches towards query optimisation.                      thicker the line, the higher the probability. Red (light grey)
                                                                  is used for probabilities < 0.2 and black for probabilities
In addition to evaluating queries over the interaction data,      ≥ 0.2. We can observe a black arrow 3007 → 1005, indicat-
by representing the data in the form of a graph it is possible    ing transitions from events of type 3007 (detection by the
to apply graph structure analyses such as the following:          system that the student has made an implausible building
                                                                  block for this task) to events of type 1005 (modification of a
   • path finding and clustering: this would be useful for        rule by the student). Such an observation raises a hypoth-
     determining patterns of interest across a whole dataset,     esis for more detailed analysis or further student observa-
     or focussing on particular students, tasks or sessions       tion, namely: “does the construction of an incorrect building
     c.f. [4];                                                    block lead students to self-correct their rules?”. Developing
                                                                  a better understanding of such complex interaction can lead
   • average path length: this would be useful for determin-      to improvement of the system. For this particular example,
     ing the amount of student activity (i.e. the number of       we designed a new prompt that suggests to students to first
     indicator occurrences being generated per task) across       consider the building block against the given task before
     a whole dataset, or focussing on particular students,        proceeding unnecessarily in correcting their rules. More ex-
     tasks or sessions;                                           amples of such ad hoc analyses are given in [14]. Represent-
                                                                  ing the interaction data in graph form will allow more sys-
   • graph diameter: to determine the greatest distance be-       tematic, flexible and scalable application of graph-structure
     tween any two nodes (which, due to the nature of the         algorithms such as those identified above.
     data, would be event type nodes); this would be an in-
     dication the most long-running and/or most intensive
     task(s);                                                     4.    CONCLUSIONS AND FUTURE WORK
                                                                  We have presented a graph model for representing event-
   • degree centrality: determining the in-degree centrality      based interaction data arising from Exploratory Learning
     of event type nodes would identify key event types oc-       Environments, drawing on the data generated when students
     curring in students’ interactions; this analysis could be    undertake exploratory learning tasks with the eXpresser and
FractionsLab microworlds. Although developed in the con-                networks: Generating high level hints based on
text of these systems, the model is a very general one and              network community clustering. EDM, 2012.
can easily be used or extended to model similar data from           [7] B. Grawemeyer and et al. Light-bulb moment?:
other ELEs.                                                             Towards adaptive presentation of feedback based on
                                                                        students’ affective state. IUI, pages 400–404, 2015.
We have explored the possibilities that evaluating regular          [8] S. Gutierrez-Santos, E. Geraniou, D. Pearce-Lazard,
path queries over this graph-based representation might pro-            and A. Poulovassilis. Design of Teacher Assistance
vide for exploring the behaviours of students as they are               Tools in an Exploratory Learning Environment for
working in the ELE and the effectiveness of the intelligent             Algebraic Generalization. IEEE Trans. Learn. Tech.,
support that it provides to them. We have also identified               5(4):366–376, 2012.
additional graph algorithms that may yield further insights         [9] S. Gutierrez-Santos, M. Mavrikis, and G. D. Magoulas.
about learners, tasks and significant indicators.                       A Separation of Concerns for Engineering Intelligent
                                                                        Support for Exploratory Learning Environments. J.
Planned worked includes transformation and uploading of                 Research and Practice in Inf. Tech., 44:347–360, 2013.
the interaction data sets gathered during trials and full class-   [10] A. Harrer, R. Hever, and S. Ziebarth. Empowering
room sessions of the two systems into an industrial-strength            researchers to detect interaction patterns in
graph database such as Neo4J, following the graph model                 e-collaboration. Frontiers in Artificial Intelligence and
presented in Section 2; followed by the design, implemen-               Applications, 158:503, 2007.
tation and evaluation of meaningful queries, analyses and
                                                                   [11] C. Hurtado, A. Poulovassilis, and P. Wood. Finding
visualisations over the graph data, building on the work
                                                                        top-k approximate answers to path queries. FQAS,
presented in Section 3. Equipped with an appropriate user
                                                                        pages 465–476, 2009.
interface, educational researchers, designers or even teach-
ers with less technical expertise could in this way explore        [12] K. Koedinger and et al. A data repository for the
the data from their perspective. This has the potential to              EDM community: The PSLC datashop. Handbook of
lead to an improved understanding of interaction in this                Educational Data Mining, 43, 2010.
context and to feed back to the design of the ELEs. We             [13] M. Mavrikis and S. Gutierrez-Santos. Not all Wizards
see this approach very much in the spirit of “polyglot per-             are from Oz: Iterative design of intelligent learning
sistence” (i.e. using different data storage methods to ad-             environments by communication capacity tapering.
dress different data manipulation problems), and hence be-              Computers and Education, 54(3):641–651, 2010.
ing used in conjunction with other EDM resources such as           [14] M. Mavrikis, Z. Zheng, S. Gutierrez-Santos, and
DataShop [12]. Another direction of research is investigation           A. Poulovassilis. Visualisation and analysis of
of how the flexible querying processing techniques for graph            students’ interaction data in exploratory learning
data (including both query approximation and query relax-               environments. Workshop on Web-Based Technology
ation) that have been developed in the context of querying              for Training and Education (at WWW), 2015.
lifelong learners’ data and metadata [11, 17] might be ap-         [15] R. Noss and et al. The design of a system to support
plied or adapted to the much finer-granularity interaction              exploratory learning of algebraic generalisation.
data described here and the more challenging pedagogical                Computers and Education, 59(1):63–82, 2012.
setting of providing effective intelligent support to learners     [16] N. Pinkwart and et al. Graph grammars: An ITS
undertaking exploratory tasks in ELEs.                                  technology for diagram representations. FLAIRS,
                                                                        pages 433–438, 2008.
Acknowledgments                                                    [17] A. Poulovassilis, P. Selmer, and P. Wood. Flexible
                                                                        querying of lifelong learner metadata. IEEE Trans.
This work has been funded by the ESRC/EPSRC MiGen
                                                                        Learn. Tech., 5(2):117–129, 2012.
project, the EU FP7 projects iTalk2Learn (#318051) and
M C Squared (#610467). We thank all the members of                 [18] O. Scheuer and B. McLaren. CASE: A configurable
these projects for their help and insights.                             argumentation support engine. IEEE Trans. Learn.
                                                                        Tech., 6(2):144–157, 2013.
                                                                   [19] P. Selmer, A. Poulovassilis, and W. P.T. Implementing
5.   REFERENCES                                                         flexible operators for regular path queries. GraphQ
 [1] N. Belacel, G. Durand, and F. LaPlante. A binary                   (EDBT/ICDT Workshops), pages 149–156, 2015.
     integer programming model for global optimization of          [20] V. Sheshadri, C. Lynch, and T. Barnes. InVis: An
     learning path discovery. G-EDM, 2014.                              EDM tool for graphical rendering and analysis of
 [2] D. Calvanese and et al. Containment of conjunctive                 student interaction data. G-EDM, 2014.
     regular path queries with inverse. KRR, pages                 [21] J. Stamper, M. Eagle, T. Barnes, and M. Croy.
     176–185, 2015.                                                     Experimental evaluation of automatic hint generation
 [3] R. Dekel and K. Gal. On-line plan recognition in                   for a logic tutor. Artificial Intelligence in Education,
     exploratory learning environments. G-EDM, 2014.                    pages 345–352, 2011.
 [4] M. Eagle and T. Barnes. Exploring differences in              [22] F. Suchanek, G. Kasneci, and G. Weikum. YAGO: a
     problem solving with data-driven approach maps.                    core of semantic knowledge. WWW, 2007.
     EDM, 2014.                                                    [23] D. Suthers. From contingencies to network-level
 [5] M. Eagle, D. Hicks, B. Peddycord III, and T. Barnes.               phenomena: Multilevel analysis of activity and actors
     Exploring networks of problem-solving interactions.                in heterogeneous networked learning environments.
     LAK, pages 21–30, 2015.                                            LAK, 2015.
 [6] M. Eagle, M. Johnson, and T. Barnes. Interaction