Patterns of Confusion: Using Mouse Logs to Predict
                  User’s Emotional State

                                        Avar Pentel

                 Tallinn University, Institute of Informatics, Tallinn, Estonia

                                      pentel@tlu.ee


       Abstract. This paper describes an unobtrusive method for user confusion detec-
       tion by monitoring mouse movements. A special computer game was designed
       to collect mouse logs. Users’ self-reports and statistical measures were used in
       order to identify the states of confusion. Mouse movement’s rate, full path
       length to shortest path length ratio, changes in directions and speed were used
       as features in the training dataset. Support Vector Machines, Logistic Regres-
       sion, C4.5 and Random Forest were used to build classification models. Models
       generated by Support Vector Machine yield to best classification results with f-
       score 0.946.


       Keywords: confusion detection, behavioral biometrics, mouse dynamics.


1      Introduction

   The ability to recognize, interpret and express emotions plays a key role in human
communication and increasingly in HCI. In the context of learning systems, the abil-
ity to detect user emotional states, gives promising applications to adaptive recom-
mendations, adaptive interfaces, etc. Usually special equipment is used for emotion
detection: electroencephalogram, skin conductance, blood volume pressure [1,2] or
gaze and facial data [3,4]. But when it goes to real life application, we can relay no
more, than unobtrusive standard computer inputs like mouse or keyboard.
   The theory of “embodied cognition” [5] gives a theoretical framework studying
mouse movements in order to predict mental states. Barsalou suggests that this bi-
directional relationship between mental states and bodily states emerges because the
core of social and cognitive information processing lies in the simulation of original
information [6]. There are some studies [7,8,9,10] about mouse movement and emo-
tions, which all suggest a link between mouse movement and emotions. Yet, most of
these studies are conducted with relatively small samples. Secondly, all these studies
are dependent on the specific context of an experiment, and general link between
emotions and mouse movements is not investigated.
   In the current study, we aim to find a link between confusion and mouse move-
ments and try to avoid both of previously mentioned shortcomings by using larger
sample, and avoiding specific context in our experiment.
2        Methodology

2.1      Data Collection Procedure and Sample
A simple computer game was built to collect user mouse data. The idea of the game
come from Christmas Calendar chocolate boxes, where the chocolates are hidden
behind numbered doors. There are usually numbers from 1 to 24, and in order to make
the right door harder to find, numbers are randomly arranged and they look differ-
ently. Similarly, we designed a game, which fills screen with randomly arranged but-
tons labeled with numbers 1 to 24. All buttons are different size and color (Fig.1).
User task is to click on all buttons in the right order as fast as possible. To keep up the
motivation, the game was installed in school computer class as a part of login system,
i.e. in order to log in users were forced to play this game. There was also an option to
play it many times. It was also publicly announced, that best performers would be
awarded. For every game session mouse activity (movements and clicks) was logged.


      Fig. 1. A Christmas Calendar game built for data collection. The user has to click as fast as
                             possible on all buttons in the right order

   Our logging procedure was an event based, which means that mouse position was
not recorded in fixed interval, but only if difference in position of mouse occurred. In
our case this difference of position was set to 30 pixels. Our mouse logs consisted
triples of x and y coordinates and timestamp. We recorded data from 516 game ses-
sions played by 262 individual users. As each game session consisted of 24 searching
tasks (to find next number), we had all together 12384 comparable records, each of
them presenting mouse movement logs between two button clicks.


2.2      Labeling Data with Emotional state
We also interviewed selected participants (N = 44) right after the game. Reviewing
together the whole game session again, we asked to describe the emotions during the
game. Initially we asked users to position his/her emotions on Russell’s circular
model [11], but pre testing revealed, that in the current set of the experiment, users
were only able to describe two categories of emotions - the state of confusion and the
state of content. Therefore we continued to collect self-report data on a 7 point Likert
scale where 1 = content, and 7 = confused. While users were not able to specify the
exact time when the state of confusion began or end, we divided the game session to
24 separate searching tasks, and linked those emotion feedback data to a whole task.
All together we got 44 x 24 = 1056 tasks labeled with emotion data.
   It is intuitively clear, that in such circumstances, confusion and target finding speed
are related. While target finding speed differs individually, all these finding scores
were standardized session-wise, and then Pearson correlation with confusion self-
report data were found. As expected, there was significant correlation between confu-
sion and standardized finding time (r = 0.86). Also, all tasks associated with confu-
sion, had standardized finding speed half standard deviation below the mean, and
those associated with a feeling of content, half standard deviations above the mean.
While our interviews covered only less than 10% of all game sessions, we extended
this relation to all other game sessions too.
   We suppose, that very quick results may not include confusion at all, i.e. user is
aware from the beginning about the location of the target. But in order to minimize
possible confusion, which may be present in the beginning of each task, we divided
finding time to the half and used only the last half of log data as characterizing non-
confusion. Similarly, it is obvious, that in tasks that were characterized as confusing,
the state of confusion does not cover the whole time between two button clicks. Obvi-
ously, confusion must end in some moment, when user notices the next button. It is
reasonable to suppose that confusion ends somewhere in the second half of the
searching process. Therefore, we split each of these slower result logs to the half and
used only the first half of searching task as characterization of confusion (Fig. 2.).


     Fig. 2. Separation of mouse logs representing state of confusion and non-confusion.

   Out of these two subsets we excluded repeated sessions by the same users, and ex-
treme results. From the remaining data we created balanced training dataset of 2282
records.
2.3       Features
In the current study, we extracted 33 features based on distance, speed, direction and
direction change angles (Table 1.). Feature selection procedure with Chi Squared
attribute evaluation and SVM attribute evaluation revealed, that strongest features
were those of speed based and those of based on relations of shortest distance and
actual distance. Best models with those attributes yield to F-score 0.96 width SVM
and Logistic regression.

                                           Table 1. Features.

Type          Feature       Explanation
Distance*     Precision     Shortest distance between two button clicks and actual mouse path length ratio.
Speed**       Speed         Actual mouse path length between two button clicks divided by task comple-
                            tion time.
              AdjSpeed      Actual mouse path length between two button clicks divided by shortest path,
                            and then divided by task completion time.
Direction     Direc-        Number of mouse movements in particular direction. We divided movement
              tionX         directions to 8 distinctive segments as north, northeast, east, etc. We counted
                            all movements in particular direction segment, and divided to all movements.
Direction     TurnA         Mouse movements’ path was recorded as consecutive straight lines of 30px
changes       Turn10,       length. We measured each angle between two consecutive movements and
              Turn20,...    extracted 18 features representing turns from 0 to 180 degrees by 10-degree
              Turn 180      step. Counted results were normalized by whole number of movements.
              TurnA+        All turns greater than angle A (A counted by 45-degree step).

* Excluded in training feature set of the models titled as “target unknown” in Table 2.
** Excluded in all training feature sets.

   For our final model we had to exclude features that were related to speed, because
speed was previously used by us for associating tasks with emotional states. Without
speed related features, models F-score dropped from 0.96 to 0.946.
   As our goal was to identify confusion patterns without knowing the real target, we
also excluded the feature that was calculated by using information about shortest dis-
tance. All reminded features were based on movement direction and direction
changes. Direction based features were number of movements on specific direction
divided by mouse path length. Direction changes were measured as the angle between
previous and next movement. Within these features strongest features were direction
changes closer to 180 degrees, more than 135 degrees and between 160 and 170 de-
grees.


2.4       Machine Learning Algorithms and Technology
For classification we tested four popular machine-learning algorithms: Logistic re-
gression, Support Vector Machine, Random Forest, and C4.5. Motivation of choosing
those algorithms is based on literature [12,13]. The suitability of listed algorithms for
given data types and for given binary classification task was also taken in to account.
In our task we used Java implementations of listed algorithms that are available in
freeware data analysis package Weka [14].
   For evaluation, we used 10 fold cross validation. We partitioned our data into 10
even sized and random parts, and then using one part for validation and another 9 as
training dataset. We did so 10 times and then averaged validation results.


3      Results

   As mentioned before, when excluding all speed-based features, our SVM model
with standardized data yield to F-score 0.946. When excluding all distance-based
features, results dropped considerably, but all our classifiers still yield to F-scores
over 0.8. In following table (Table 2.) are presented results of different classifiers
generated with features that are calculated using data about known target (i.e. the
shortest path) and without these features.

              Table 2. The results of the models trained with different feature sets

                                      Target known                    Target unknown
           Model
                           Accuracy      F-score     ROC       Accuracy    F-score     ROC

    SVM (standardized)      94.61%        0.946      0.946     82.38%        0.824     0.825

    Logistic Regression     93.49%        0.935      0.978     82.72%        0.827     0.889

    Random Forest           92.07%        0.921      0.971     84.47%        0.845     0.825

    C4.5                    91.96%        0.919      0.937     83.59%        0.835     0.836


4      Discussion and Conclusion

Simple feature set of directions, direction changes and relations between actual and
shortest distance proved to be useful in classification confused and non-confused user.
As we can see from Table 1, knowing the target makes predictions better, but even
without knowing the target, frequent direction changes in mouse movement, are still
good predictors of confusion. This might be an indirect confirmation to studies about
the correlation between gaze and mouse movements.
   However, we have to address the limitations of such set of experiment. Depending
on the tasks and page layout, user mouse movements might differ considerably. Our
results are applicable in situations, where users have to find something particular on
unfamiliar (web) environment, in set of menus, links or graphical elements. But our
approach might not work in web page considered for reading. For example, while
somebody is used to fallow line with mouse cursor while reading the text, the mouse
logs will show frequent changes in directions, which in by our model will predict
confusion. Therefore more study is needed in different types of environments.
References
 1. G. Chanel, J. Kronegg, D. Grandjean, and T. Pun, “Emotion Assessment : Arousal Evalua-
    tion Using EEG’s and Peripheral Physiological Signals,” in Proc. Int. Workshop on Mul-
    timedia Content Representation, Classification and Security, pp. 530-537 (2006)
 2. E. Leon, G. Clarke, V. Callaghan, and F. Sepulveda, “A user-independent real-time emo-
    tion recognition system for software agents in domestic environments,” Engineering Ap-
    plications of Artificial Intelligence, vol. 20, no. 3, pp. 337–345 (2007)
 3. Happy, S.L. et al. Automated Alertness and Emotion Detection for Empathic Feedback
    During E-Learning, IEEE Digital Library (2013)
 4. Jaques, N. et al. Predicting Affect from Gaze Data during Interaction with an Intelligent
    Tutoring System, Lecture Notes in Computer Science Volume 8474, pp 29-38 (2014)
 5. P. M. Niedenthal, "Embodying emotion," Science, vol. 316, pp. 1002-1005, (2007)
 6. L. W. Barsalou, "Grounded cognition," Annual Review of Psychology, vol. 59, pp. 617-
    645, (2008)
 7. J. Scheirer, R. Fernandez, J. Klein, and R. W. Picard, "Frustrating the user on purpose: a
    step toward building an affective computer," Interacting with computers, vol. 14, pp. 93-
    118, (2002)
 8. Zimmermann, P. S. Guttormsen, B. Danuser, and P. Gomez, "Affective computing - a ra-
    tionale for measuring mood with mouse and keyboard," International journal of occupa-
    tional safety and ergonomics, vol. 9, pp. 539-551, (2003)
 9. Zimmermann, P. "Beyond Usability–Measuring Aspects of User Experience," Doctoral
    Thesis (2008)
10. Maehr, W. eMotion: Estimation of User's Emotional State by Mouse Motions: VDM Ver-
    lag, (2008)
11. Russell, J. A. 1980. A Circumplex Model of Affect. Journal of Personality and Social Psy-
    chology. Vol. 39, No. 6. 1161-1178 (1980)
12. Wu, X. et al. Top 10 algorithms in data mining. Knowledge and Information Systems. vol
    14, 1–37. Springer (2008)
13. Mihaescu, M. C. Applied Intelligent Data Analysis: Algorithms for Information Retrieval
    and Educational Data Mining, pp. 64-111. Zip publishing, Columbus, Ohio (2013)
14. Weka. Weka 3: Data Mining Software in Java. Machine Learning Group at the University
    of Waikato. http://www.cs.waikato.ac.nz/ml/weka/