=Paper= {{Paper |id=Vol-2903/IUI21WS-HUMANIZE-4 |storemode=property |title=Classifeye: Classification of Personal Characteristics Based on Eye Tracking Data in a Recommender System Interface |pdfUrl=https://ceur-ws.org/Vol-2903/IUI21WS-HUMANIZE-4.pdf |volume=Vol-2903 |authors=Martijn Millecamp,Cristina Conati,Katrien Verbert |dblpUrl=https://dblp.org/rec/conf/iui/MillecampCV21 }} ==Classifeye: Classification of Personal Characteristics Based on Eye Tracking Data in a Recommender System Interface== https://ceur-ws.org/Vol-2903/IUI21WS-HUMANIZE-4.pdf
Classifeye: Classification of Personal
Characteristics Based on Eye Tracking Data in a
Recommender System Interface
Martijn Millecampa , Cristina Conatib and Katrien Verberta
a Department of computer science, KU Leuven, Celestijnenlaan 200A bus 2402, Leuven, Belgium
b Department of computer Science, ICICS/CS 107, 2366 Main Mall, Vancouver, BC, Canada



                                       Abstract
                                       Due to the increasing importance of recommender systems in our life, the call to make these systems
                                       more transparent becomes louder. However, providing explanations is not as easy as it seems, as research
                                       has shown that different users have varying reactions to explanations. So not only the recommendations,
                                       but also the explanations should be personalised. As a first step towards these personalised explanations,
                                       we explore the possibility to classify users based on their gaze pattern during the interaction with a music
                                       recommender system. More specifically, we classify three personal characteristics that have been shown
                                       to play a role in the interaction with music recommendations: need for cognition, openness and musical
                                       sophistication. Our results show that classification based on eye tracking has potential for need for
                                       cognition and openness, as we are able to do better than random, but not for musical sophistication as
                                       no classifier did better than a uniform random baseline.

                                       Keywords
                                       eye tracking, classification, recommender system, openness, need for cognition, musical sophistication


1. Introduction                                                                                  RS to the user [4, 5]. Especially the combina-
                                                                                                 tion of these explanations with control can
In the field of recommender systems (RS), re-                                                    help users not only to understand the RS, but
searchers are increasingly aware that opti-                                                      also to steer the RS with input and feedback
mizing accuracy is not enough to reach the                                                       [5]. Despite the increased interest in explana-
full potential of recommender systems (RS)                                                       tions for RS, it is still not clear how to imple-
[1, 2]. For example, users will not choose a                                                     ment explanations in practice as users have
recommended item unless they have trust in                                                       varying reactions to them which shows the
the system [3]. One possible way to increase                                                     need to personalize explanation to the user
this trust is providing explanations which re-                                                   [6].
veal (a part of) the internal reasoning of the                                                      However, before the system could adapt ex-
                                                                                                 planations to personal characteristics (PCs),
HUMANIZE: Joint Proceedings of the ACM IUI 2021                                                  it needs to be aware of the PCs of the user. A
Workshops, April 13–17, 2021, College Station, USA                                               possible way to obtain these characteristics is
" martijn.millecamp@kuleuven.be (M. Millecamp);                                                  by explicitly asking the users to fill in ques-
conati@cs.ubc.ca (C. Conati);
katrien.verbert@kuleuven.be (K. Verbert)
                                                                                                 tionnaires [7] or by implicitly inferring PCs
 0000-0002-5542-0067 (M. Millecamp);                                                            through an analysis of the social media of the
0000-0002-8434-9335 (C. Conati); 0000-0001-6699-7710                                             user [8]. Nonetheless, asking users to fill in
(K. Verbert)                                                                                     questionnaires or to give access to their so-
                                    © 2021 Copyright © 2021 for this paper by its authors. Use
                                    permitted under Creative Commons License Attribution
                                    4.0 International (CC BY 4.0).
                                                                                                 cial media is often not desirable. Moreover, to
 CEUR
               http://ceur-ws.org
                                    CEUR   Workshop                       Proceedings            personalize explanations it is not necessary
                                    (CEUR-WS.org)
 Workshop      ISSN 1613-0073
 Proceedings
to obtain a fine-grained result, but a classifi- 2. Related work
cation into two categories suffices [9].
   For this reason, we explore in this paper With the increasing role of RS in our daily
whether it is possible to classify users’ per- lives, the call for explainable, transparent RS
sonality traits during the interaction with a also becomes louder so that users can make
music RS with explanations by analyzing their better informed decisions whether or not to
gaze. We will focus on three different PCs: follow the recommendations [11, 6]. In com-
openness, need for cognition (NFC) and mu- bination with controls, this transparency also
sical sophistication (MS) [9, 10]. These PCs enables users to correct the RS whenever they
will be explained in detail in Section 2.        feel it makes wrong assumptions [5]. How-
   Openness is one of the Big Five personal- ever, research has shown that different users
ity traits which measures how open a per- have different reactions to explanations [6,
son is to new experiences. Millecamp et al. 12, 13]. In the field of music RS, recent re-
[9] showed that there was a significant dif- search has shown that there are three PCs
ference in the gaze pattern between low and that could influence the way users perceive
high openness users. This is the reason we explanations: openness, NFC and MS [10, 9].
hypothesize that classifying openness based         Openness is one of the five factors of the
on gaze might be possible.                       Five Factor Model, also known as the Big 5
   Similarly, we hypothesize that inferring MS, model [14]. This model describes personality
which is a measure of domain knowledge in in five different traits and it has been used
the music domain, from gaze data might be in several studies which showed the positive
possible as the study of Millecamp et al. [9] impact of considering personality in RS [15].
also found significant differences in gaze pat- The factor openness describes the breadth, depth
tern between low and high MS.                    and complexity of an individualś mental and
   NFC is a cognitive style which influences experiental life [16]. It has been shown that
the way a person prefers to process informa- openness is related to the preferred amount
tion and thus looks at information. Previ- of diversity in RS and to the willingness to
ous studies already showed that NFC moder- use a system with explanations [17, 18, 9].
ates the perception of explanations in a music      Need for cognition has been shown to in-
recommender system, which was the motiva-        fluence   the success of a RS [13, 12, 19, 20] and
tion to explore whether inferring NFC from       is defined    as “a measure of the tendency for
gaze would be possible.                          an  individual    to engage in, and enjoy, effort-
   Next to exploring the general accuracy, we    ful  cognitive   activities” [21]. NFC has been
also want to explore how much data we need       shown    to  have   an impact  on the willingness
to infer these PCs.                              of  users  to  rely  on  a RS  [12], on the confi-
   The contribution of this paper is twofold.    dence   in a playlist  created  in a music RS with
First, to our knowledge, we are the first to ex- explanations    [10],  on preference  matching  [22],
plore whether it is possible to infer PCs dur-   on  the  style  of explanations    they prefer [13]
ing the interaction with a RS in the presence and on the reason why users need a transpar-
of explanations. Second, we make the gath- ent RS [23].
ered dataset publicly available to support the      Musical sophistication is defined by Mul-
research in this area. This dataset is unique    lensiefen   et al. [24] as a concept to describe
because it provides both gaze data and data      the  multi-faceted    nature of musical expertise.
about PCs.                                       In  the  music    domain,    Millecamp et al. [9]
                                                 showed that users with high MS feel more
supported to make a decision in a RS inter-        Table 1
face that provided explanations than an in-        An overview of personal characteristics measured,
terface without such explanations, while this      together with their highest and lowest possible
made no difference for users with low MS.          scores and summary statistics for the scores of the
Another study showed that users with high          participants
domain experience perceive a higher diver-          PC            Possible Range      Median Score
sity in a scatter plot than in a simpler bubble
chart [25].                                         Age                 18-65              24
   To acquire the PCs of users, the most com-       MS                 18-126              64
                                                    NFC                 0-100             68.75
mon way is to ask users to fill in validated
                                                    Openness            0-100              55
questionnaire [7], but there exist also other
approaches such as inferring PCs by analyz-
ing the social media of the user [26, 8], by
                                                   can be found in [9]. As mentioned in Sec-
analyzing a conversation with a chatbot [27]
                                                   tion 2, we focus in this study on openness,
or by analyzing the physical signals such as
                                                   NFC and MS as previous research has found
brain activity [28] and gaze data [7].
                                                   that these PCs could affect the perception of
   The previously mentioned works rely on
                                                   explanations in a music RS [9, 10] and the
fine-grained personality scores. In contrast,
                                                   study of Millecamp et al. [9] already showed
in our work we focus on adapting interfaces
                                                   that openness and MS change the gaze pat-
to users for which we only need a classifica-
                                                   tern between an interface in the presence or
tion in two groups. We aim to base this clas-
                                                   absence of explanations. To measure these
sification on the gaze pattern during the in-
                                                   three characteristics, users were asked to fill
teraction with a music RS interface instead of
                                                   out three questionnaires before the experi-
asking users to watch carefully selected stim-
                                                   ment started. To measure openness, we used
uli, to fill in questionnaires or to share their
                                                   the 44-item Big Five Inventory [34] and se-
social media profile. Previous studies which
                                                   lected afterwards the questions related to open-
classified users based on their gaze pattern
                                                   ness. For NFC, we used the 18-items ques-
during normal activities are almost all only
                                                   tionnaire of Cacioppo et al. [21] and for MS
focused on cognitive abilities and visualiza-
                                                   the Goldsmiths Musical Sophistication Index
tion experience [29, 30, 31, 32]. One excep-       1 was used. The dataset we used in this study
tion is the study of Hoppe et al. [33] which
                                                   consists of the gaze data of 30 participants
inferred the Big Five personality traits by
                                                   (21 male). For the three PCs, the participants
studying the gaze of a walk through a cam-
                                                   were divided into a high and low group based
pus. This study is different from our work,
                                                   on a median split. This resulted in equally
as we investigate if it is possible to infer PCs
                                                   distributed groups for MS and NFC and al-
while interacting with a music RS and also
                                                   most equally groups for openness (16 in the
focus on different PCs.
                                                   low and 14 in the high openness group). An
                                                   short overview of the characteristics of the
3. Data                                            participants can be found in Table 1.
                                                      The gaze data was recorded with a Tobii
The gaze data that is used in this study was       4C remote eye tracker at a sampling rate of
generated in a user study by Millecamp et al.      90Hz. Each sample contained information about
[9]. We will provide a brief summary of this           1 https://www.gold.ac.uk/music-mind-brain/
experiment, but a more elaborate description
                                                   gold-msi/ May 2020
the focus point on the screen denoted as an 4. Classifiers
x and y coordinate, the distance between the
participant and the screen, and the validity of 4.1. Features
these measures. To calibrate the eye tracker,
                                                  The Tobii 4C does not come with software
the experiment started with a standard cali-
                                                  to detect fixations and saccades so we iden-
bration procedure provided by Tobii Core Soft-
                                                  tified fixations and saccades using an imple-
ware. After the calibration, users were asked
                                                  mentation of the ID-T algorithm [35] with a
to explore the interface of a music RS in the
                                                  dispersion threshold of one degree and a du-
presence of feature-based explanations until
                                                  ration threshold of 100ms [35]. This means
they understood all functionalities. A screen-
                                                  that in this study a fixation is identified as a
shot of the interface is shown in Figure 1.
                                                  circle on the screen in which the user keeps
   As shown in Part A of this figure, users
                                                  focusing for at least 100ms without moving
first can search for an artist they like through
                                                  their eyes more than one degree. All other
a search bar in the top left corner. When they
                                                  movements are then identified as saccades,
add the artist, this artist is shown in Part B.
                                                  i.e. quick movements of gaze from one fix-
Based on this artist, the system starts to gen-
                                                  ation to another [30].
erate recommendations which were listed in
                                                     Based on these saccades and fixations, we
a two-column format as shown in Part F.
                                                  generated a set of eye-tracking features as listed
When users hover over the cover of the pic-
                                                  in Table 2. Most of these features are selected
ture of a recommended song, they can click
                                                  because they are widely used in previous eye
a play button to listen to a 30s preview of the
                                                  tracking studies [7, 30, 36]. In addition to
song. On the right side of each explanation,
                                                  these features, we included Most frequent sac-
they can click on the thumb-up icon to add
                                                  cade direction and fixations in a 4x4 heatmap
the song to their playlist. Through the sliders
                                                  as the study of Hoppe et al. [33] indicated that
shown in Part D of Figure 1, users can mod-
                             2                    these features are important in the extraction
ify several audio features such as popular-
                                                  of personality. We did not include features
ity, energy and danceability which are also
                                                  that contain explicit information about the
taken into account in the recommendation
                                                  content of the interface, so called areas of in-
process. To help users steer these sliders, the
                                                  terest (AOI) even as previous work has shown
minimum and the maximum for each audio
                                                  that these features could have more predic-
feature is shown for each artist.
                                                  tive power [30]. The reason for this is that
   After the user explored all the options of
                                                  this information is already partially captured
the interface, the recording of the gaze started.
                                                  in a more general way by Most frequent sac-
As shown in Part E of Figure 1, users were
                                                  cade direction and fixations in a 4x4 heatmap.
asked to create a playlist of five songs. To cre-
                                                  Thus, at this stage we chose to investigate
ate this playlist, they could use all function-
                                                  how far we can go with display-independent
alities without any restriction. When they
                                                  features, which also have the advantage of
added the fifth song to their playlist, we
                                                  possibly being more generalizable to other in-
stopped the recording of the gaze. On aver-
                                                  terfaces.
age, users took 4 minutes 26 seconds to com-
plete their playlist. As part of this paper’s
contribution, this data is publicly available 3 . 4.2. Data windows
   2 https://developer.spotify.com/documentation/   To explore whether classification of the three
web-api/reference/tracks/get-audio-features/        PCs would be possible with only a partial
   3 augment.cs.kuleuven.be/datasets/classifeye
                                                                                                K
               A                                                  E
                       B
                                                                               F
                           G             H                 J
                   C
                                             I




                D




Figure 1: The interface with the different parts highlighted in orange. A: Searchbox, B: Artist, C:
Attributes of the artist, D: Preference of the user, E: Task, F: Recommendations, G: Cover of a song, H:
Explanations I: (dis)like buttons, J: Play button K: list of (dis)liked songs


Table 2
Description of eye tracking features

Features                            Description
Saccade rate                        Number of saccades divided by segment duration
Avg. saccade length                 Average distance between the two fixations delimiting the saccade
Avg. saccade amplitude              Average size of saccade in degrees of visual angle
Avg. saccade velocity               Average velocity (saccade amplitude / saccade duration) of saccades
Peak saccade velocity               Maximum saccade velocity in segment
Most frequent saccade direction     Most frequent saccade direction (segments of 45°)
Fixation rate                       Number of fixations divided by segment duration
Avg. fixation duration              Average duration of fixation in ms
Ratio Fixations/Saccades            Ratio of total nb of fixations divided by total nb of saccades
4x4 Heatmap                         Percentage of fixations in 16 raster areas
Avg. pupil size                     Average pupil size of both eyes


amount of data, we generated three differ-           60% and the last window consisted of the first
ent data windows to simulate partial obser-          90% of data. Despite the fact that this ap-
vations of gaze data during the task similar         proach requires a task to be fully completed
to Steichen et al. [30] and Conati et al. [31].      to determine what 100% of the data consti-
Each window consists of a partial observa-           tutes, it still allows to provide valuable in-
tion of each participant based on relative du-       sights into trends and patterns about infer-
ration: the first window consisted of the first      ring PCs from gaze data [30]. Each of these
30% of data, the second window of the first          windows consist of three different measure-
ments and for each of these measurements Table 3
the data was divided in ten different segments Description of parameters of the different classi-
of equal length. For each of these segments, fiers
we generated the mentioned set of eye-tracking Classifier                           Parameter
features resulting in a feature vector of 260
                                                   Baseline                         strategy: uniform
features for each measurement.                     Logistic Regression              solver: liblinear
   The reasoning behind creating these dif- Random Forest                           estimators: 100
ferent datasets is to verify whether we would Gaussian Naive Bayes                  na
be able to adapt the RS interface to the needs Linear Support Vector Machines gamma: scale
                                                                                    probability: True
of the user during the task. As such we did
                                                   Gradient Boosting                maximum depth: 4
not include a window with 100% of the data
as the adaptation would be too late. Addi-
tionally, previous research [30] showed already
                                                   that Random forests worked the best. How-
that after a certain amount of data, the ac-
                                                   ever, Berkovsky et al. [7] conclude that Naive
curacy started to converge or even that the
                                                   Bayes and Support Vector Machines are the
accuracy decreases after a certain amount of
                                                   best. Additionally, Gradient Boosting performed
data. In this study, we want to explore whether
                                                   well in the study of Barral et al. [39]. Be-
we would notice similar trends for different
                                                   cause of the small sample size, we chose not
PCs.
                                                   to use deep learning methods. For each of
                                                   these classifiers we tried to optimize the ac-
4.3. Classification methods                        curacy. The resulting parameters can be found
To classify users in a low and high category, in Table 3.
we used scikit-learn to train five different clas-   To strengthen the stability of the results,
sifiers and a baseline [37]. To evaluate the       we  ran this evaluation 10 times with differ-
performance of the classifiers, we applied a       ent random seeds. We calculated the average
leave-one-out methodology. Because of this         accuracy    over all participants, and all runs to
evaluation methodology and the                     measure    performance    of the classifier.
uniform groups, we could not use the most
common majority class baseline which pre- 5. Results
dicts the most likely class (this would lead to
0% accuracy) [30, 31, 33]. As a consequence, To examine whether it is possible to classify
we choose a random uniform baseline which users in the correct personality group and whether
has a theoretical accuracy of 50%. To clas- this classification works better on specific win-
sify the characteristics, we trained Logistic dows, we ran for each PC a two-way repeated
Regression, Random Forest, Gaussian Naive measures ANOVA with accuracy as the de-
Bayes, Linear Support Vector Machines and pendent variable and both classifier and win-
Gradient Boosting. The reasoning behind the dow as independent variables. As we run mul-
implementation of all these classifiers is that tiple ANOVA’s and pairwise comparisons, the
in previous research there is no consensus reported p-values are adjusted using the Ben-
about which classifiers work the best. Ste- jamini and Hoghberg procedure [40] to con-
ichen et al. [30] found that Logistic Regres- trol for the family-wise false discovery rate.
sion performed better than Decision Trees, The main results of this analysis are shown
Support Vector Machines and Neural Networks. in Figure 2 and we will report the results for
Lallé et al. [38] and Hoppe et al. [33] found
each of the PCs in detail in the next para- 6. Discussion
graphs.
Need for cognition. The results of the two- Our results show that we have a higher accu-
way repeated measures ANOVA revealed a racy than the random baseline for NFC and
significant main effect of classifier on accu- for openness in the first window, but that we
racy (F(7.14) =18.8, p<.001). To investigate this were not able to do beat the random baseline
main effect, we ran post-hoc pairwise com- classifier for MS.
parisons which showed that the mean accu-            For the classification of openness, it is in-
racy of the logistic regression classifier (0.59) teresting that we are able to outperform the
performed statistically better than the base- baseline while openness was one of the few
line (p=.0491) which is shown in Figure 2a. traits of which Hoppe et al. [33] could not
This figure also shows the accuracy in the outperform the baseline. This might be due
three different windows and that the peak ac- to a different classification technique as Hoppe
curacy (0.67) is reached in the last window.      et al. only used a Random Forest classifier
   Musical sophistication. The results of while we outperformed the baseline with a
the two-way repeated measures ANOVA re- Gradient Boost classifier. Another possible
vealed that no classifier could outperform the reason could be that this difference is due to
baseline and that most of the classifiers per- the fact that we trained the classifiers on dif-
formed even worse.                                ferent data windows and that our results show
   Openness. The results of the two-way re- that the performance to classify openness is
peated measures ANOVA revealed a signifi- only significantly better than the baseline in
cant interaction effect of classifier with win- the first window. As far as we know, no other
dow on accuracy (F(14,28)=4.88, p<.001). An studies formally showed that classifying PCs
analysis of the effect of classifier showed a on early stages of the task can outperform
significant effect for the                        classifiers trained on more data. However,
first window (F(7,16)=4.512, p=.006) and a post- other studies such as the study of Steichen et
hoc test revealed that in this window the         al. [30] already discussed this trend for per-
Gradient Boost performed significantly bet- ceptual speed, verbal working memory and
ter than the baseline (p=.020). The analysis visual working memory. They argued that
of the effect of window showed a significant these characteristics most strongly affect the
effect for the Gradient Boost classifier          gaze pattern of the user during the initial phase
(F(2,6)=8.12, p=.020) and a post-hoc analysis of a task and that other factors dilute the gaze
showed that the gradient boost classifier per- pattern as the task continues. This is prob-
formed significantly better in the first win- ably also the reason why we are only able
dow than in the second (p=.028) and the third to classify openness in the beginning of the
window (p=.029). Figure 2b shows that the task. However, this is not necessary a prob-
highest accuracy of Gradient Boost                lem as we want to adapt an interface to the
is reached in the first window (0.66). This ac- openness of a user as early on as possible.
curacy is significantly higher than the accu- Nevertheless, the obtained accuracy is still too
racy of the baseline and the accuracy of Gra- low to be used to adapt the explanations. Also,
dient Boost in the other windows.                 more research is needed to verify that open-
                                                  ness will always affect the gaze during the be-
                                                  ginning of a task or only when they see a new
                                                  interface.
                                                     To classify NFC, our results show a signif-
                                                                                                            Openness
                          Need For Cognition


                                           0.67                                                    0.66*




                                                                                   Accuracy
                              0.58
                                                             Logistic Regression
 Accuracy




                                               Mean: 0.59*
               0.53                                                                                                                    Base
                                                             Base
                                                                                                                          Mean: 0.41   Gradient Boost

                                                                                                              0.31
                                                                                                                        0.26



                  First       Second       Third                                                    First    Second     Third




            (a) Accuracy of Logistic Regression for NFC.                                      (b) Accuracy of Gradient Boost for openness.
Figure 2: Accuracy of classifiers that perform significantly better than the baseline.



icant main effect of Logistic Regression on sible reason for this could be that we did not
accuracy. The reason that we do not see a include AOI related features which were in-
significant difference between the windows cluded in the above-mentioned studies. An
could be that NFC is correlated with decision- interesting further line of research is to ver-
making processes [12] and creating a playlist ify whether including these AOI features can
in a music RS constantly involves making de- improve accuracy.
cisions. Despite the significant main effect,
the accuracy to classify NFC seems not high
enough to adapt the interface, especially not 7. Conclusion
in the first two windows. As a consequence,
                                                 In this paper, we explored whether it would
this means that further research needs to fo-
                                                 be possible to adapt the explanations in a mu-
cus on reaching a higher accuracy in the be-
                                                 sic RS interface based on personal character-
ginning of the interaction to be able to adapt
                                                 istics. To do so, we investigated whether a
explanations early on in the process or on
                                                 classification of personal characteristics could
adapting the interface if the user re-visits the
                                                 be inferred by studying the gaze pattern dur-
application. Additionally, further research
                                                 ing the creation of a playlist in this system.
should investigate why Logistic Regression
                                                 More concretely, we classified musical sophis-
performed the best to classify NFC as this is
                                                 tication, need for cognition and openness be-
similar to previous studies in which Logis-
                                                 cause these characteristics have shown to im-
tic Regression performed well to classify PCs,
                                                 pact the user experience of explanation in a
but we do not have an explanation why logis-
                                                 RS [9]. We trained the classifiers on different
tic regression outperforms other algorithms
                                                 windows to detect whether the classification
[30, 41].
                                                 would already work with only a partial ob-
   As a previous study in the field of music
                                                 servation of the creation of a playlist.
RS showed that MS influences the way users
                                                    Our results show that even as our accu-
look to a music RS interface and previous stud-
                                                 racy is not yet high enough for practical use,
ies in the field of information retrieval also
                                                 we are able to outperform a baseline to clas-
showed the potential of predicting domain
                                                 sify need for cognition with Logistic Regres-
knowledge based on eye tracking [42, 43, 9],
                                                 sion. If we only consider the first third of
we expected to be able to classify MS based
                                                 the data, our results show that the classifica-
on gaze data. However, our results show that
                                                 tion of openness with Gradient Boost beats
we could not outperform the baseline. A pos-
the baseline. Despite the limitations in terms        Transparency for Emerging Technolo-
of accuracy, this finding is important because        gies Workshop, 2019, p. 5.
it shows the potential to adapt explanations      [5] N. Tintarev, J. Masthoff, A survey of
during the interaction with a music RS inter-         explanations in recommender systems,
face. In a next step, we want to increase the         in: 2007 IEEE 23rd international con-
accuracy of the classifiers particularly in the       ference on data engineering workshop,
beginning of the interaction which we plan            IEEE, 2007, pp. 801–810.
to do by gathering more training data and         [6] A. Springer, S. Whittaker, Progressive
by using different features such as AOI re-           disclosure: empirically motivated ap-
lated features. Additionally, more research is        proaches to designing effective trans-
needed to verify whether the results of this          parency, in: Proceedings of the 24th
study could be generalized to different tasks         International Conference on Intelligent
and interfaces which we also plan to address          User Interfaces, 2019, pp. 107–120.
in future research.                               [7] S. Berkovsky, R. Taib, I. Koprinska,
                                                      E. Wang, Y. Zeng, J. Li, S. Kleitman,
                                                      Detecting personality traits using eye-
Acknowledgments                                       tracking data, in: Proceedings of the
                                                      2019 CHI Conference on Human Fac-
Part of this research has been supported by
                                                      tors in Computing Systems, 2019, pp. 1–
the KU Leuven Research Council (grant agree-
                                                      12.
ment C24/16/017) and the Research Founda-
                                                  [8] G. Park, H. A. Schwartz, J. C. Eichstaedt,
tion Flanders (FWO).
                                                      M. L. Kern, M. Kosinski, D. J. Stillwell,
                                                      L. H. Ungar, M. E. Seligman, Automatic
References                                            personality assessment through social
                                                      media language., Journal of personality
  [1] R. R. Sinha, K. Swearingen, et al., Com-        and social psychology 108 (2015) 934.
      paring recommendations made by on- [9] M. Millecamp, N. N. Htun, C. Conati,
      line systems and friends., DELOS 106            K. Verbert, What’s in a user? towards
      (2001).                                         personalising transparency for music
  [2] C. He, D. Parra, K. Verbert, Interactive        recommender interfaces, in: Proceed-
      recommender systems: A survey of the            ings of the 28th ACM Conference on
      state of the art and future research chal-      User Modeling, Adaptation and Person-
      lenges and opportunities, Expert Sys-           alization, 2020, pp. 173–182.
      tems with Applications 56 (2016) 9–27. [10] M. Millecamp, N. N. Htun, C. Conati,
  [3] J. Kunkel, T. Donkers, L. Michael, C.-M.        K. Verbert, To explain or not to ex-
      Barbu, J. Ziegler, Let me explain: Im-          plain: the effects of personal charac-
      pact of personal and impersonal expla-          teristics when explaining music recom-
      nations on trust in recommender sys-            mendations, in: Proceedings of the 24th
      tems, in: Proceedings of the 2019 CHI           International Conference on Intelligent
      Conference on Human Factors in Com-             User Interfaces, 2019, pp. 397–407.
      puting Systems, 2019, pp. 1–12.            [11] M. Naiseh, N. Jiang, J. Ma, R. Ali, Ex-
  [4] A. Springer, S. Whittaker,        Making        plainable recommendations in intelli-
      transparency clear, in: Algorithmic             gent systems: delivery methods, modal-
                                                      ities and risks, in: International Confer-
                                                      ence on Research Challenges in Infor-
     mation Science, Springer, 2020, pp. 212–          based recommender systems: technolo-
     228.                                              gies and research issues, in: Proceed-
[12] S. T. Tong, E. F. Corriero, R. G. Math-           ings of the 10th international confer-
     eny, J. T. Hancock, Online daters’ will-          ence on Electronic commerce, 2008, pp.
     ingness to use recommender technol-               1–10.
     ogy for mate selection decisions., in: In-   [21] J. T. Cacioppo, R. E. Petty, C. Feng Kao,
     tRS@ RecSys, 2018, pp. 45–52.                     The efficient assessment of need for
[13] S. Naveed, T. Donkers, J. Ziegler,                cognition, Journal of personality as-
     Argumentation-based explanations in               sessment 48 (1984) 306–307.
     recommender systems: Conceptual              [22] K. Y. Tam, S. Y. Ho, Web personaliza-
     framework and empirical results, in:              tion as a persuasion strategy: An elabo-
     Adjunct Publication of the 26th Con-              ration likelihood model perspective, In-
     ference on User Modeling, Adaptation              formation systems research 16 (2005)
     and Personalization, 2018, pp. 293–298.           271–291.
[14] L. R. Goldberg, The structure of pheno-      [23] M. Millecamp, R. Haveneers, K. Verbert,
     typic personality traits., American psy-          Cogito ergo quid? the effect of cognitive
     chologist 48 (1993) 26.                           style in a transparent mobile music rec-
[15] R. Hu, P. Pu, Enhancing collaborative             ommender system, in: Proceedings of
     filtering systems with personality infor-         the 28th ACM Conference on User Mod-
     mation, in: Proceedings of the fifth              eling, Adaptation and Personalization,
     ACM conference on Recommender sys-                2020, pp. 323–327.
     tems, 2011, pp. 197–204.                     [24] D. Müllensiefen, B. Gingras, J. Musil,
[16] V. Benet-Martinez, O. P. John, Los cinco          L. Stewart, The musicality of non-
     grandes across cultures and ethnic                musicians: an index for assessing mu-
     groups: Multitrait-multimethod analy-             sical sophistication in the general pop-
     ses of the big five in spanish and en-            ulation, PloS one 9 (2014) e89642.
     glish., Journal of personality and social    [25] Y. Jin, N. Tintarev, K. Verbert, Effects
     psychology 75 (1998) 729.                         of individual traits on diversity-aware
[17] N. Tintarev, M. Dennis, J. Masthoff,              music recommender user interfaces, in:
     Adapting recommendation diversity to              Proceedings of the 26th Conference on
     openness to experience: A study of hu-            User Modeling, Adaptation and Person-
     man behaviour, in: International Con-             alization, 2018, pp. 291–299.
     ference on User Modeling, Adaptation,        [26] J. Golbeck, C. Robles, M. Edmondson,
     and Personalization, Springer, 2013, pp.          K. Turner, Predicting personality from
     190–202.                                          twitter, in: 2011 IEEE third interna-
[18] L. Chen, W. Wu, L. He, How personality            tional conference on privacy, security,
     influences users’ needs for recommen-             risk and trust and 2011 IEEE third inter-
     dation diversity?, in: CHI’13 extended            national conference on social comput-
     abstracts on human factors in comput-             ing, IEEE, 2011, pp. 149–156.
     ing systems, 2013, pp. 829–834.              [27] M. X. Zhou, G. Mark, J. Li, H. Yang,
[19] U. Gretzel, D. R. Fesenmaier, Persua-             Trusting virtual agents: the effect of
     sion in recommender systems, Interna-             personality, ACM Transactions on In-
     tional Journal of Electronic Commerce             teractive Intelligent Systems (TiiS) 9
     11 (2006) 81–100.                                 (2019) 1–36.
[20] A. Felfernig, R. Burke, Constraint-          [28] J. Wache, R. Subramanian, M. K. Abadi,
     R.-L. Vieriu, N. Sebe, S. Winkler, Im-            search & applications, ACM, 2000, pp.
     plicit user-centric personality recogni-          71–78.
     tion based on physiological responses        [36] J. H. Goldberg, X. P. Kotval, Com-
     to emotional videos, in: Proceedings of           puter interface evaluation using eye
     the 2015 ACM on International Confer-             movements: methods and constructs,
     ence on Multimodal Interaction, 2015,             International journal of industrial er-
     pp. 239–246.                                      gonomics 24 (1999) 631–645.
[29] D. Toker, C. Conati, B. Steichen,            [37] F. Pedregosa, G. Varoquaux, A. Gram-
     G. Carenini, Individual user charac-              fort, V. Michel, B. Thirion, O. Grisel,
     teristics and information visualization:          M. Blondel, P. Prettenhofer, R. Weiss,
     connecting the dots through eye track-            V. Dubourg, J. Vanderplas, A. Passos,
     ing, in: proceedings of the SIGCHI                D. Cournapeau, M. Brucher, M. Perrot,
     Conference on Human Factors in                    E. Duchesnay, Scikit-learn: Machine
     Computing Systems, 2013, pp. 295–304.             learning in Python, Journal of Machine
[30] B. Steichen, C. Conati, G. Carenini,              Learning Research 12 (2011) 2825–2830.
     Inferring visualization task properties,     [38] S. Lallé, C. Conati, G. Carenini, Predic-
     user performance, and user cognitive              tion of individual learning curves across
     abilities from eye gaze data, ACM                 information visualizations, User Mod-
     Transactions on Interactive Intelligent           eling and User-Adapted Interaction 26
     Systems (TiiS) 4 (2014) 1–29.                     (2016) 307–345.
[31] C. Conati, S. Lallé, A. Rahman, D. Toker,    [39] O. Barral, S. Lallé, G. Guz, A. Iranpour,
     Further results on predicting cognitive           C. Conati, Eye-tracking to predict user
     abilities for adaptive visualizations, IJ-        cognitive abilities and performance for
     CAI International Joint Conference on             user-adaptive narrative visualizations,
     Artificial Intelligence (2017) 1568–1574.         in: Proceedings of the 2020 Interna-
     doi:10.24963/ijcai.2017/217.                      tional Conference on Multimodal Inter-
[32] M. Gingerich, C. Conati, Constructing             action, 2020, pp. 163–173.
     models of user and task characteristics      [40] Y. Benjamini, Y. Hochberg, Controlling
     from eye gaze data for user-adaptive in-          the false discovery rate: a practical and
     formation highlighting, in: Proceed-              powerful approach to multiple testing,
     ings of the AAAI Conference on Arti-              Journal of the Royal statistical society:
     ficial Intelligence, 1, 2015.                     series B (Methodological) 57 (1995) 289–
[33] S. Hoppe, T. Loetscher, S. A. Morey,              300.
     A. Bulling, Eye movements during             [41] S. Kardan, C. Conati, Exploring gaze
     everyday behavior predict personality             data for determining user learning with
     traits,    Frontiers in Human Neuro-              an interactive simulation, in: Inter-
     science 12 (2018) 1–8. doi:10.3389/               national Conference on User Model-
     fnhum.2018.00105.                                 ing, Adaptation, and Personalization,
[34] O. P. John, E. M. Donahue, R. L. Kentle,          Springer, 2012, pp. 126–138.
     The big five inventory—versions 4a and       [42] M. J. Cole, J. Gwizdka, C. Liu, N. J.
     54, 1991.                                         Belkin, X. Zhang, Inferring user knowl-
[35] D. D. Salvucci, J. H. Goldberg, Iden-             edge level from eye movement patterns,
     tifying fixations and saccades in eye-            Information Processing & Management
     tracking protocols, in: Proceedings of            49 (2013) 1075–1091.
     the 2000 symposium on Eye tracking re-       [43] X. Zhang, M. Cole, N. Belkin, Predicting
users’ domain knowledge from search
behaviors, in: Proceedings of the 34th
international ACM SIGIR conference on
Research and development in Informa-
tion Retrieval, 2011, pp. 1225–1226.