Student Behavioral Embeddings and Their Relationship to
      Outcomes in a Collaborative Online Course

                     Renzhe Yu                                        Zachary Pardos                                              John Scott
                        UC Irvine                                          UC Berkeley                                            UC Berkeley
               renzhey@uci.edu                                  pardos@berkeley.edu                                jmscott212@berkeley.edu


ABSTRACT                                                                                  approach to representing a student as a function of a co-
In online collaborative learning environments, prior work has                             interaction network temporally formed by peers interacting
found moderate success in correlating behaviors to learning                               in different ways in different weeks of the course. In reflec-
after passing them through the lens of human knowledge                                    tion of the prior empirical work, we test the correspondence
(e.g., hand labeled content taxonomies). However, these                                   of these representations to learning outcomes. First, we in-
manual approaches may not be cost-effective for triggering                                vestigate if the sociality of a student, or how much she is
in-time support, especially given the complexity of interper-                             involved in the collaborative community, can be predicted
sonal and temporal behavioral patterns under rich interac-                                from these low-level behavioral representations, as this is a
tions. In this paper, we test the hypothesis that a neu-                                  direct goal of the special course design we analyze. Second,
ral embedding of students that synthesizes their event-level                              given the moderate relationship between interpersonal con-
course behaviors, without hand labels or knowledge about                                  nections and learning performance in the literature, we test
the specific course design, can be used to make predictions                               whether these vector representations are indicative of their
of desired outcomes and thus inform intelligent support at                                final course performance. This exploration has strong ped-
scale. While our student representations predicted student                                agogical implications because an unsupervised student-level
interactivity (i.e., sociality) measures, they failed to better                           representation that captures signals of effective learning can
predict course grades and grade improvement as compared                                   be further deployed in intelligent systems to give just in-time
to a naive baseline. We reflect on this result as a data point                            feedback/interventions in the face of interconnected behav-
added to the nascent trend of raw student behaviors (e.g.,                                ioral streams.
clickstream) proving difficult to directly correlate to learn-
ing outcomes and discuss the implications for big education
data modeling.
                                                                                          1.1       Collaborative learning behavior and out-
                                                                                                    comes
Keywords                                                                                  Generations of learning theories and pedagogies have high-
Collaborative learning environment, neural embedding, skip-                               lighted the benefits of social processes for effective learning
gram, online course, higher education, behavior, predictive                               [15, 13]. Accordingly, there has been a multitude of stud-
modeling                                                                                  ies that characterize these processes and examine how they
                                                                                          relate to learning outcomes from granular learning behavior
1.   INTRODUCTION                                                                         data [2]. One typical context of these studies is collabora-
Representation of collaborative learning behaviors in their                               tive learning environments where students are required to
raw formats has been challenging due to the complicated in-                               work together in one way or another. As the interpersonal
ternal dependencies. Theory-driven approaches can extract                                 and temporal dependencies complicate the social processes,
some conceptually important measures of these learning pro-                               multiple methodological paradigms have been adopted to
cesses but might not give good grounds for real-time learner                              represent students’ collaborative learning behavior.
support due to the human effort required. In this paper, we
examine an aggregate, unsupervised representation of these                                To model the structures of interpersonal connections, so-
collaborative learning behaviors in the context of a formal                               cial network analysis (SNA) conceptualizes learners as nodes
course that features sharing, remixing and interacting with                               and their various formats of interaction as edges and typi-
student artifacts. We use a connectionist, neural network                                 cally identifies global or local structures. Some studies are
                                                                                          concentrated on the discovery of global structures such as
                                                                                          core-periphery structures [6] and cohesive groups [3], while
                                                                                          a number of others take more local perspectives and find the
                                                                                          predictive power of network positions for learning outcomes
                                                                                          [1, 5]. An alternative paradigm is the extension of psychome-
                                                                                          tric or knowledge tracing models to collaborative settings,
                                                                                          where collaboration status or group membership informa-
                                                                                          tion is used to construct additional terms in the original
                                                                                          functions [16, 9]. These adapted models have shown im-
                                                                                          proved predictive power of students’ learning performance.


                Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
The approaches above represent students’ collaborative learn-      The course lasted for 14 weeks in Spring 2016. Each week
ing behaviors via theory-based or human-engineered models          except for the spring break, students worked through five ac-
and each captures some dimensions that are predictive of           tivity phases that involved sharing, commenting, and creat-
various learning outcomes. At the same time, they might            ing assets and whiteboards, organized under course hashtags
run the risk of misidentifying the model forms and leave           that students included in their posts. These SuiteC activi-
some of the behavioral signals unattended, compared to             ties accounted for 25% of the final grade. whereas another
more bottom-up, data-driven methods.                               55% came from two long-form written papers that required
                                                                   students to integrate course readings. These two major as-
                                                                   sessments occurred around the middle and the end of the
1.2     Connectionist student representation                       semester, respectively. The remaining 20% of the course
In domains where it is difficult to enumerate and give values
                                                                   grade consisted of eight ethnographic field notes authored
to features that satisfactorily represent the items, distribu-
                                                                   by students on their site visits.
tional approaches to modeling them might be useful. For ex-
ample, the meaning of words in a lexicon is socially mediated
                                                                   We acquired all the time-stamped click events within SuiteC
and does not lend themselves well to description through
                                                                   for this course, with a total of 684,095 entries. Each entry
feature engineering. Thus, the connectionist representation
                                                                   recorded a granular action that a student performed on the
approach, which uses neural networks to learn a continu-
                                                                   foregoing tools, e.g. view an asset, add element to a white-
ous feature vector representing all of the contexts of a word
                                                                   board, etc. Attributes of the action included event type,
in a corpus, has become popular [8]. Similar challenges are
                                                                   timestamp, associated asset/whiteboard id, anonymized user
present when it comes to positioning students based on their
                                                                   id, user role, among others. After removing events that fell
fine-grained behavior in open-ended learning environments.
                                                                   out of the normal period of the semester and that were not
In response, recent research has attempted to apply con-
                                                                   triggered by a student, we kept 658,967 entries for our anal-
nectionist models to learn a continuous vector of a student
                                                                   ysis. In addition, the gradebook which contained scores for
which represents all the contexts of her raw behaviors. For
                                                                   the two major assessments and the final course grades was
example, sequences of student responses in intelligent tutor-
                                                                   also available.
ing systems or student actions in MOOCs are used to map
students to continuous vector spaces [14, 11]. Co-enrollment
sequences with other students, although not in micro-level         3. METHODS
learning context, are used to represent undergraduate stu-         3.1 Student representation using the skip-gram
dents throughout their degree [7]. While low-level behav-              model
ioral embeddings have been used in non-social contexts to          In this section, we describe our methodological approach
predict student performance, applying these techniques to          to unsupervised student feature learning by way of neural
collaborative settings may offer further insight into the com-     embeddings. We model our student representation after [8],
plicated social processes.                                         who used a neural network architecture called a skip-gram to
                                                                   learn word representations from their context distributions
2.    DATASET                                                      in a corpus. Given a word sequence {w1 , w2 , . . . , wT }, this
In this study, we analyzed a fully online course offered to        model maximizes the average log probability of contextual
residential students at a four-year public university in the       words:
United States. The course was focused on sociocultural as-                            T
                                                                                  1 X        X
pects of literacy and global education. To facilitate collabo-                                          log p(wt+j |wt )       (1)
rative learning, the course design featured a number of activ-                    T t=1
                                                                                          −c≤j≤c,j6=0
ities related to sharing, discussing, remixing and composing
                                                                   where c is the contextual window size and the conditional
media with peer students. These activities were enabled by
                                                                   probability p(wt+j |wt ) is computed using a softmax function
SuiteC, a toolkit that was integrated into the Canvas learn-
                                                                   over all possible words in the corpus for each given wt . Be-
ing management system (LMS) [4]. There were three main
                                                                   cause words that share meanings are more likely to occur in
components of SuiteC:
                                                                   similar contexts, the word vectors of synonyms learned via
     • Asset Library is a social platform where students con-      this model should be in proximity in the high-dimensional
       tribute and share various media content in the form         space. Moreover, the learnt word vectors encode semantic
       of “assets,” and interact with peer assets by viewing,      relationships into interesting yet simple mathematical prop-
       liking and commenting on them. Figure 1a shows the          erties. For example,vP aris is closest to vBerlin −vGermany +
       gateway page of the Asset Library with the feed of          vF rance . This simplicity is why we are particularly inter-
       recently contributed asset.                                 ested in whether this technique can similarly characterize
                                                                   students from their complicated collaborative learning be-
     • Whiteboards is an authoring tool that allows students       haviors, thus facilitating easy identification of targeted ac-
       to work individually or collaboratively on designing        tions (e.g. pairing students that sum up to a “beacon”).
       multimedia artifacts. Students can import assets as         In our implementation, student sequences are constructed
       whiteboard elements and export finished whiteboards         according to their order of appearance in the raw click-
       as assets for peer interaction. Figure 1b illustrates the   stream events sorted by time. We construct separate se-
       interface when students collaborate on a whiteboard.        quences for each week because, with the weekly course de-
                                                                   sign, procrastinators for Week 1 and early birds for Week 2,
     • Engagement Index is a gamification tool that tracks         although chronologically adjacent, may not share common
       and evaluates student engagement in the SuiteC tools        traits. Moreover, as different event types in the raw dataset
       and provides a leaderboard for social comparison.           may or may not represent distinct behavioral signals, we ex-
             (a) Gateway page of the Asset Library                           (b) Interface of Whiteboard collaboration

                                                   Figure 1: SuiteC components


periment with three approaches to grouping raw event types:        Table 1: Example of student tokenization for connectionist
                                                                   representation (student2vec)

   • Raw event type grouping: There are 38 unique values                             (a) Raw clickstream data table
     in the “event” column of the raw data set, depicting               Timestamp      Week                Event      Student ID
     the action that a student takes (e.g. create asset com-            2/22 23:19        3           View asset             101
     ment). We construct separate weekly sequences for                  2/23 13:12        3    Create whiteboard             104
     each of these values and feed all resulting sequences to           2/25 21:23        3           View asset             102
     the model.                                                         2/26 12:10        3    Create whiteboard             102
                                                                        2/27 14:27        3    Create whiteboard             104
   • Instructor coding grouping: We ask the instructor to               2/27 15:03        3           View asset             103
     group the 38 events based on their perceived nature to             2/28 13:08        4    Create whiteboard             102
     more accurately capture the kind of participation rep-              3/1 15:27        4           View asset             103
     resented by a specific event. This process produces 15              3/2 16:04        4    Create whiteboard             101
     groups, where each event belongs to one group only.                 3/2 21:21        4    Create whiteboard             104
     For example, when students are authoring a White-                   3/3 15:23        4           View asset             101
     board, 9 different events could be triggered as they                3/5 12:13        4           View asset             102
     add shapes, assets, and free-hand drawing elements
     to their canvas, so all of nine events are categorized             (b) Student sequences as input to the student2vec model
     as “Whiteboard Composing.” We separately construct
     weekly sequences for each group and feed all sequences               Event × week                   Student ID sequence
     to the model.                                                        View asset, Week 3             101, 103
                                                                          Create whiteboard, Week 3      104, 102, 104
   • No grouping: We do not differentiate event types and                 View asset, Week 4             103, 101, 102
     simply construct weekly sequences from the entire dataset.           Create whiteboard, Week 4      102, 101, 104

Table 1 gives a generalized example of our approach. In
the “raw event” approach, the “event” column contains the
original event name in the dataset. For “instructor coding”,       behavior as they conceptually do. Thus, we test the abil-
that column is the group that the raw event belongs to. The        ity of these student vectors to predict an array of human-
“no grouping” approach, however, treats the columns as if          engineered measures of learning. We use predictive model-
filling the same value for all entries in the table. When-         ing as a more formal alternative to qualitatively examining
ever a student appears two more or times consecutively in          algebraic properties of these vectors or looking at whether
a sequence, we remove the duplicate occurrence(s). In the          they exhibit meaningful clusters with respect to the learning
remainder of this paper, we refer to this representation ap-       measures.
proach as student2vec. As for the hyperparameters of the
model, we search [8, 32, 64] for the vector size and [5, 20, 40]   First, we collaborate with the instructor1 and construct four
for the contextual window size and plot all the results in         metrics of sociality (tendency to engage in interactive activ-
Section 4.2.                                                       ities) for each student:

3.2    Predicting sociality and learning outcome                         • median asset popularity: across all assets that a stu-
                                                                           dent (co-)creates throughout the semester, the median
       measures                                                            of their popularity values, where popularity of an asset
We are interested in how well the unsupervised student2vec
                                                                   1
representations capture signals of students’ social learning           The instructor is the third author on this paper
     is defined as the unique number of non-author students       we recalculate these measures only among students who re-
     who interact with it                                         ceived grades, the average number of authored assets goes
                                                                  up to 6.5, the popularity per asset remains similar with 2.1
   • total asset popularity: across all assets that a student     peers, and the standard deviation of both measures shrinks
     (co-)creates throughout the semester, the sum of their       substantially due to the removal of a large number of zero
     popularity values                                            values (not reported here).
   • count asset authored: the total number of assets that
     a student (co-)creates throughout the semester
   • count peer asset visited: the total number of assets
     that a student interacts with of which she is not an
     author

The first two variables measure popularity, or “passive” pro-
cesses of socialization, while the latter two capture “active”
processes. All four variables are calculated from asset-related
logged events, which is a tiny fraction (∼ 5%) of all recorded
activities.

We further look at course grades as reflected in formal as-
sessments, including the following variables:
   • final score: the final grade in the gradebook, out of
     100
   • grade gain: difference between the scores of the second
     (final paper) and the first (midterm paper) assessment
                                                                  Figure 2: Rank correlation between learning outcome mea-
We then build models to predict these six measures using          sures (first four rows/columns) and student interaction mea-
the learned student vectors. Because the number of data           sures (last four rows/columns), with statistically insignifi-
points is much smaller than in typical deep learning appli-       cant correlations (p > 0.1) crossed out
cations, we implement two simple models: linear regression
and feed-forward neural network with a single layer of 8
neurons. Each dimension of the student vector serves as           4.2    Predictive analysis
a feature in the model input. As the magnitude of these           We examine Spearman’s rank correlations between course
vectors might correlate with the number of occurrences of         performance and sociality measures. Figure 2 depicts the
students and hence with sociality measures, we standardize        correlation matrix in graphical terms. The three course
them to unit length before feeding into the model. For each       grades are moderately to highly correlated with each other,
target measure, only students with valid values are included      all statistically significant at the 0.1 level (upper-left quad-
in the model training and testing processes. Four-fold cross-     rant). The correlation between sociality and performance
validations are performed for all the models and in each fold,    is more complicated (lower-left quadrant). In more cases
20% of the training data is used as the validation set during     the correlation is weak or insignificant, but two sociality
the training process to avoid overfitting.                        measures (the number and the total popularity of assets au-
                                                                  thored) and two final outcomes (paper and course total)
                                                                  have moderate to high correlations. Lastly, the four social-
4. RESULTS                                                        ity measures are mostly significantly correlated with each
4.1 Descriptive analysis                                          other, with low to moderate magnitudes (lower-right quad-
A summary of the basic statistics of six variables we use         rant).
as prediction targets, plus the scores for two assessments is
shown in Table 2. The inconsistent number of observations         We illustrate the prediction performance by target variables
reflect missing values in some of the variables. The last         in Figure 3. In each model configuration, rooted mean squared
two measures of students’ activity have valid values for all      error (RMSE) is used as the evaluation metric for testing re-
114 students appearing in the dataset. Among them, 15             sults. We define a naive baseline where the mean value of
students did not author any asset throughout the course           the training sample is used as the predicted value in each
and therefore have missing values for the two popularity          fold. To evaluate the performance of a model in relation to
measures. Moreover, only 79 students finished the course          this baseline, we calculate the percentage of improvement
with grades.                                                      from baseline:

All of the three course grades average 85-90 points with a
standard deviation of around 6 points (out of 100). Also, the
                                                                                         RM SEbaseline − RM SEmodel
difference between median and mean is small for all three,               %∆RM SE =                                            (2)
suggesting relatively symmetric distributions. A student au-                                  RM SEbaseline
thored on average 4.8 assets a week (62.11 in total), which
aligns with the weekly course requirements. Each of these         Each histogram in Figure 3 depicts the %∆RM SE across
4.8 assets had around 2.4 peer visitors (150.36 in total). If     different combinations of hyperparameters of student2vec,
                                         Table 2: Descriptive statistics of main variables

                           Variable                        N     mean         std    min     median     max
                           midterm paper                   79    88.21       6.18      72         88     100
                           final paper                     79    85.94       6.10      65         86   98.67
                           final score                     79    87.82       6.35   69.27     89.06    98.86
                           grade gain                      79    -2.27       6.14     -16      -2.33      15
                           median asset popularity         99     1.52       1.48       0          1    10.5
                           total asset popularity          99   150.36      91.92       0        148     502
                           count asset authored           114    62.11      39.96       0       74.5     148
                           count peer asset visited       114   154.46     134.74       0        132     534

Table 3: Summary of prediction error (RMSE) using the best-performing student2vec representation (vector size: 8; context
window size: 20; event grouping: instructor’s coding)

                  Target                       Baseline     Neural net (% improved)         Regression (% improved)
                  median asset popularity          1.48                   1.45 (2.26)                    1.50 (-1.08)
                  total asset popularity         91.51                  78.40 (14.33)                   80.27 (12.28)
                  count asset authored           34.68                  27.33 (21.19)                   27.57 (20.50)
                  count peer asset visited      129.90                  119.36 (8.11)                 113.97 (12.26)
                  final score                      6.39                   6.39 (0.06)                   8.01 (-25.40)
                  grade gain                       6.12                  6.41 (-4.80)                    6.30 (-2.94)


including vector size, contextual window size and event group-           other passive learning activity information did not improve
ing (each combination referred to as a “case”). This approach            prediction of future assessment performance beyond what
to presenting results allows for a high-level view of the pre-           past assessment performance alone achieved [10]. This re-
dictive power of this student representation approach. Fig-              sult re-emerged in a college-level chemistry tutor setting,
ure 3a suggests that student2vec has moderate predictive                 where past assessment performance alone predicted future
power on sociality measures, especially the total amount                 assessment performance as well or better than if mixed with
of popularity a student gains and the number of assets a                 detailed eye-tracking telemetry [12].
student authors where it can beat the baseline by 12% on
average. By contrast, Figure 3b sees a complete failure of               The analyses presented in this paper reveal similar chal-
these student representations to predict learning outcomes:              lenges yet some opportunity for using student clickstream
in most cases the prediction performance is outweighed by                from a mostly collaborative course to predict learning out-
a naive baseline. These results suggest that the connection-             comes. We found that our representations of students, sum-
ist representation can, at least, extract low-level behavioral           marized from their low-level behaviors of sharing, creating,
signals that relate to social processes but not those that con-          and socializing around artifacts, did correspond to human-
tribute to performance.                                                  engineered sociality measures, but not to assessed perfor-
                                                                         mance in the course as much as a naive baseline. Given
Finally, we qualitatively compare the performance of differ-             our relatively low magnitude of data, an exceptionally high
ent cases. Across the three event grouping approaches, in-               prediction accuracy was not expected, and the results may
structor coding produces similar performance to raw event,               be seen as the lower bound of these representations’ predic-
while both perform better in general than no grouping. To                tive power. However, their null relationship with summative
give an example of the best results, we select the vector size           assessment results still serves as another data point suggest-
of 8, context window size of 20 coupled with instructor’s cod-           ing difficulty in linking raw behavior, absent of prior grade
ing, and report the detailed performance metrics associated              information, with assessment performance.
with different prediction targets in Table 3. With regard
to sociality measures, student2vec can improve the baseline              On the other hand, the model was able to predict measures
RMSE by 8-21%, except for median asset popularity. In                    of students’ interactivity above baselines, and these manu-
predicting course outcomes, however, this student represen-              ally engineered measures do not consistently predict course
tation performs 0.06% better than the baseline at best (6.39             performance. These suggest that vector representations in
vs. 6.39 for final score).                                               general might not be the culprit. A similar methodology
                                                                         for representing undergraduate students also predicted on-
                                                                         time graduation with over 90% accuracy [7], an improvement
5.   DISCUSSIONS AND CONCLUSIONS                                         over their baseline. These mixed results nudge us to reflect
Granular learning process data in online learning environ-
                                                                         on the roles of data-driven behavioral representations and
ments afford the possibilities of real-time personalized learner
                                                                         theory-based feature engineering [5, 9, 17] in building use-
support by way of detecting behavioral signals of unsuccess-
                                                                         ful predictive models of student learning (and thus, support
ful learning. However, the correspondence between low-level
                                                                         systems) in the context of collaborative learning. It is per-
student actions and their performance on assessments, out-
                                                                         haps not enough to learn representations of students based
side of social pedagogies, has been a tenuous one, challeng-
                                                                         on behavior without a more careful dissection of the nature
ing this possibility in the wild. In the context of an edX
                                                                         of the behavior. This takeaway parallels the observation in
MOOC, it was found that the addition of video viewing and
                                                                  [2] R. Ferguson and S. B. Shum. Social Learning
                                                                      Analytics: Five Approaches. In Proceedings of the 2nd
                                                                      International Conference on Learning Analytics and
                                                                      Knowledge, pages 23–33, Vancouver, BC, Canada,
                                                                      2012. ACM Press.
                                                                  [3] N. Gillani and R. Eynon. Communication patterns in
                                                                      massively open online courses. Internet and Higher
                                                                      Education, 23:18–26, 2014.
                                                                  [4] S. M. Jayaprakash, J. M. Scott, and P. Kerschen.
                                                                      Connectivist Learning Using SuiteC - Create,
                                                                      Connect, Collaborate, Compete! In Practitioner Track
                                                                      Proceedings of the 7th International Learning
                                                                      Analytics & Knowledge Conference, pages 69–76,
                                                                      Vancouver, BC, Canada, 2017.
                                                                  [5] S. Joksimović, A. Manataki, D. Gašević, S. Dawson,
                                                                      V. Kovanović, and I. F. de Kereki. Translating
                                                                      network position into performance: Importance of
                                                                      centrality in different network configurations. In
                                                                      Proceedings of the Sixth International Conference on
                                                                      Learning Analytics & Knowledge, pages 314–323,
                                                                      Edinburgh, United Kingdom, 2016. ACM.
               (a) Sociality prediction targets                   [6] S. B. Kellogg, S. Booth, and K. M. Oliver. A Social
                                                                      Network Perspective on Peer Support Learning in
                                                                      MOOCs for Educators. International Review of
                                                                      Research in Open and Distance Learning,
                                                                      15(5):263–289, 2014.
                                                                  [7] Y. Luo and Z. A. Pardos. Diagnosing University
                                                                      Student Subject Proficiency and Predicting Degree
                                                                      Completion in Vector Space. In Proceedings of the
                                                                      Eighth AAAI Symposium on Educational Advances in
                                                                      Artificial Intelligence, New Orleans, LA, USA, 2018.
                                                                  [8] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and
                                                                      J. Dean. Distributed representations of words and
          (b) Learning outcome prediction targets                     phrases and their compositionality. In Proceedings of
                                                                      the 26th International Conference on Neural
Figure 3: Histograms of prediction results for different tar-         Information Processing Systems, pages 3111–3119.
get variables using student2vec representations. Each graph           Curran Associates Inc., 2013.
illustrates the performances of predicting the variable in its    [9] J. K. Olsen, V. Aleven, and N. Rummel. Predicting
title across different combinations of model hyperparameters          Student Performance In a Collaborative Learning
(i.e., “cases” on the y-axis).                                        Environment. In Proceedings of the 8th International
                                                                      Conference on Educational Data Mining (EDM), 2015.
                                                                 [10] Z. A. Pardos, Y. Bergner, D. T. Seaton, and D. E.
EDM that refined knowledge component modeling is often                Pritchard. Adapting Bayesian Knowledge Tracing to a
necessary to accurately estimate cognitive mastery. Nev-              Massive Open Online Course in edX. In Proceedings of
ertheless, it was a natural expectation, in our data-driven           the 6th International Conference on Educational Data
approach, that similar students, in terms of when and what            Mining (EDM), pages 137–144, Memphis, TN, 2013.
they do, would also be similar in their course outcomes. Al-     [11] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami,
though this turned out not to be the case in the instance             L. J. Guibas, and J. Sohl-Dickstein. Deep Knowledge
we examined, it remains an open question for learning sci-            Tracing. In Advances in Neural Information
ence researchers to consider if this is merely an anomaly or          Processing Systems, pages 505–513, 2015.
part of greater lesson to be learned on effective ways to fit    [12] M. A. Rau and Z. Pardos. Adding eye-tracking AOI
behavior into the learner process tracing picture. For our            data to models of representation skills does not
research, a combination of interpretable activity represen-           improve prediction accuracy. In Proceedings of the 9th
tation and the current embedding approach may be tested               International Conference on Educational Data Mining,
in the future to gain some insights into the mechanism of             pages 622–623, 2016.
interaction between the two in the learning process.             [13] G. Siemens. Connectivism : A Learning Theory for
                                                                      the Digital Age. International Journal of Instructional
6.   REFERENCES                                                       Technology and Distance Learning, 2(1):1–7, 2005.
 [1] H. Cho, G. Gay, B. Davidson, and A. Ingraffea. Social       [14] M. Teruel and L. A. Alemany. Co-embeddings for
     networks, communication styles, and learning                     Student Modeling in Virtual Learning Environments.
     performance in a CSCL community. Computers &                     In Proceedings of the 26th Conference on User
     Education, 49(2):309–329, 2007.                                  Modeling, Adaptation and Personalization, pages
     73–80, Singapore, Singapore, 2018. ACM Press.
[15] L. S. Vygotsky. Interaction between Learning and
     Development. In Mind in Society: Development of
     Higher Psychological Processes, pages 71–91. Harvard
     University Press, Cambridge, MA, USA, 1978.
[16] M. Wilson, P. Gochyyev, and K. Scalise. Modeling
     Data From Collaborative Assessments: Learning in
     Digital Interactive Social Networks. Journal of
     Educational Measurement, 54(1):85–102, feb 2017.
[17] D. Yang, M. Wen, and C. Rose. Weakly Supervised
     Role Identification in Teamwork Interactions. In
     Proceedings of the 53rd Annual Meeting of the
     Association for Computational Linguistics and the 7th
     International Joint Conference on Natural Language
     Processing, pages 1671–1680, Stroudsburg, PA, USA,
     2015.