<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>User Activity Anomaly Detection by Mouse Movements in Web Surveys?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Mastrotto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anderson Nelson</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dev Sharma</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ergeta Muca</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristina Liapchin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Losada</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mayur Bansal</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>n S. S</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bauman Moscow State Technical University</institution>
          ,
          <addr-line>ul. Baumanskaya 2-ya, 5/1, 105005, Moscow, Russia, https://bmstu.ru/en</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Columbia University</institution>
          ,
          <addr-line>116th St and Broadway, New York, NY 10027, USA https://</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>dotin Inc</institution>
          ,
          <addr-line>Francisco Ln. 194, 94539, Fremont CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>63</fpage>
      <lpage>78</lpage>
      <abstract>
        <p>We present an approach to classify user validity in survey responses by using a machine learning techniques. The approach is based on collecting user mouse activity on web-surveys and fast predicting validity of the survey in general without analysis of speci c answers. Expert rules based, LSTM- and HMM-based approaches are considered. The approach might be used in web-survey applications to detect suspicious users behaviour and request from them proper answering instead of false data recording.</p>
      </abstract>
      <kwd-group>
        <kwd>Psychometric datasets</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Survey validation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Survey responses can be a crucial data point for researchers and organization
seeking to gain feedback and insight. Modern survey design incentives users to
complete as many surveys as possible in order to be compensated, in some
situations, users are falsifying the response, thus rendering the response invalid.
Organization and researchers can reach the wrong conclusion if the user responses are
largely invalid. Mouse and keyboard are most common controls available for PC
users. Even now, with plenty of touch screen devices, from programmatic point
of view, touch screen generates mouse related commands. We gathered mouse
data tracking and created features on: Time, Screen coverage, Distance traveled,
and Direction of movements. The basis of creating these features was on the
literature review of mouse path analytic as well as common business knowledge.
Although not all features ended up being used in our nal models, they played a
big role in our exploratory data analysis and in developing our models to help us
get the best and most accurate results. A detailed table containing all features
created and used in modeling can be found in Table 6 in the Appendix.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Works</title>
      <p>Using of machine learning approaches with all available for collection data is
very common approach for researchers last years. We found di erent directions
of research of mouse tracks: mood analysis, authentication based on user speci c
analysis, common behaviour analysis.</p>
      <p>
        One of early works related to emotion analysis [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] considered a special
prepared mouse with additional sensors like electrogalvanic skin conductance,
temperature, humidity and pressure sensors. But their mouse events subsystem
calculated speed of mouse pointer's movement, acceleration of mouse pointer's
movement, amplitude of hand tremble, scroll wheel use right- and left-click
frequency, idle time. The authors use these values in their common regression
model, but there are no correlations presented in term of exact mouse movement
use.
      </p>
      <p>
        The work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] demonstrates use of multimodal user identi cation based on
keyboard and mouse activity. The authors used False Rejection Rate as a quality
value and show it 3:2%. Main features their used for mouse analysis: traveled
distance between clicks, time intervals between releasing and next pressing, and
vice versa, double click values like times, time interval, distance, and similar
drag-and-drop parameters.
      </p>
      <p>
        A little bit simpli ed approach for a user authentication was shown in the
paper [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Here, only distance travelled by the mouse was used. And two hypotheses
were considered: mouse speed increases with the distance travelled, mouse speed
is di erent in di erent directions considerably. The key idea was to restrict the
screen for mouse activity recording by a set of 9 buttons placed inside a square.
The control parameters were used false acceptance and false rejection rates (FAR
and FRR) with 1:53 and 5:65 maximum values respectively.
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], also, describes an approach for user authentication, but mouse
extracted features are operation frequency, silence ratio as a percent of idle time,
movement time and o sets, average movement time and distance, distribution of
cursor positions, horizontal, vertical, tangential velocity, acceleration and jerk,
slope angle and curvature. Dimensional reduction was implemented with di
usion map algorithm. And relationship between heat di usion and random walk
Markov chain was calculated. Di usion distances were used in a Hop eld network
based classi er. The results were shown as F AR 5:05 and F RR 4:15.
      </p>
      <p>
        Later work [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses multiple classi ers for solving the same task of user
authentication and demonstrates better results F AR 0:064 and F RR 0:576.
Their features in mouse tracking analysis were total number of point for a certain
interval, total amount of time when mouse movement was in delay, how many
times the Trajectory was in delay, number of action, total Length and STDEV
of the Trajectory Length and Slope, curvature as number of changes between
the angles and total length of the Trajectory. The authors used SVM, K-Nearest
Neighbor and Nave Bayes classi ers.
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is devoted to user speci c behaviour on keyboard and mouse
use. The authors used following set of mouse related features: distance, speed,
acceleration, direction and angle, element clicks, click duration and scroll, and
pauses. For data collection the tool MOKEETO was developed, and that tool
provided both mouse and keyboard related events. The authors used SMOTE
oversampling and PCA for preprocessing. And decision trees, random forest,
support vector machine, and Nave Bayes classi ers. The results demonstrate
ability to di erentiate users behaviour but there are no separate mouse and
keyboard features investigation were shown.
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] considered use of mouse movement for e-learning activities
recognition. In the paper Possibilistic Hidden Markov Model and Possibilistic
Conditional Random Fields model approaches were described. The key idea of
the paper is to catch an area of interests as a mouse cursor xation over some
image on a screen with the OGAMA tool. The tool gives some tasks and records
mouse activity. As features for analysis in that case, the authors used total time
of a task, time between two cursor xations, distance. The authors demonstrate
up to 90% accuracy of a task recognition.
      </p>
      <p>
        In the paper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] a mouse cursor motions analysis for emotion reading was
considered. The authors demonstrated a set of images, asked to show which ones
are appropriate answers, and recorded a cursor movements. As the authors tried
to work in emotional area, they also combined tests with di erent music, movies
and art background. Key features for them in a cursor analysis were attraction
and direction changes (zigzag). The authors used SVM method of mouse tracks
analysis and were able to recognize only some of common emotions.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methods Selection</title>
      <p>Based on our initial exploratory data analysis, we proceeded with building a few
di erent models to help us identify fraudulent survey responses with the goal of
improving the current validation method used by dotin Inc. We developed the
following three methods throughout the course of this project:
{ Expert rules based approach
{ Long Short-Term Memory based approach (LSTM) (Supervised Learning)
{ Hidden Markov Model based approach (HMM) (Unsupervised Learning)
We decided to use these three approaches to compare how the di erent methods
would perform, considering the lack of accurately labeled data in the original
dataset. That way, we would be able to make better and more well-informed
recommendations for dotin Inc. with regards to a new validation method to use
for their psychometric survey responses.</p>
    </sec>
    <sec id="sec-4">
      <title>Data</title>
      <p>Data Collection
We created a survey with 16 web pages consisting of 144 questions, and collected
the survey response, mouse coordinates , clicks, scrolls, and radio clicks. The
survey was conducted by means the service Amazon Mechanical Turk, and we
collected the country of origin and the occupation as additional data. Lastly, we
also collected the dimensions of the devise the data was being taken on. The
data allowed us to understand if the response survey response changed at any
time, determine if the survey was being on a tablet or PC,
4.2</p>
      <p>Data Exploration
The data highlighted that completion time varied per user. We observed
instances where it would improbable to complete the survey in good faith, i.e user
taking 11 seconds. As part of the data cleaning e orts we ltered the users that
did not click on on all the radio buttons.</p>
      <p>Our hypothesis was proven correct, once the data was visualized. We observed
a consistent pattern among the users identi ed in the outlier category. It would
be highly improbable that a user should have a need to select responses along
one section of the survey page. Especially since the team created questions that
would require di erent response on di erent sections of the page.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Expert Rules Approach</title>
      <p>From our exploratory data analysis we identi ed that the tracking method used
to generate the mouse path dataset presented some challenges as many of the
user's paths weren't fully recorded. Out of the 755 user's data, only 54 ful lled
the basic requirement of clicking the 196 radio buttons pertaining to individual
questions. Therefore, we added our data collection recommendations in the nal
section of our paper. Before diving into the modeling, our team found essential
to create alternative ways to ag anomalous users other than dotin's current
Fig. 1. User 598 - Normal
validation method. In order to generate such features, we used both common
business sense and advanced outlier detection techniques that allow us to
understand each user from di erent angles. Such features will serve as a way to
validate dotin's current validation method as well as allow us to generate basic
business rules to ag suspicious behavior. Some of these features will then be
used to test our models.
5.1</p>
      <p>Anomalies by Scores
From our analysis, we discovered that 150 of the 755 users surveyed answer at
least one page of the survey with all of the same scores. We then assume that
there is no page where such an event would be plausible, therefore these users
are agged as suspicious.
5.2</p>
      <p>Anomalies by Time
We then proceeded to focus on the time perspective by estimating the read
time that an honest user would take to read the survey and compared it with
the actual completion time taken by each individual user. The benchmark read
time of a regular user was derived from Medium's read-time algorithm, which is
based on the average reading speed of an adult ( 256 wpm). The read time was
calculated for all the individual questions in our users' surveys and compared
to the time it took them to click one radio button to another (an indication of
them moving from one question to the other). From our analysis, on average, a
user that completed the entire survey would need 5 minutes and 30 seconds to
at least read all the 196 questions, yet 33% of our surveyed users took less time
than that. Therefore, we agged users that take less than the calculated reading
times as anomalous.
5.3</p>
      <p>Anomalies by Topic
Finally, we focused on the rst 40 questions of the survey to create our own
topics and scored each user based on how they deviate in answering the survey
questions. For each topic, we aggregated questions that are either positive or
negative (i.e. Tidy/Untidy) and we analyzed how users answer di erently for
similar questions. The underlying assumption is that if the user is deviating from
their answers each time, this indicates that he/she is not fully paying attention
to the questions. Questions with opposite behavioral traits should then present
scores that are opposite (low standard deviation). In our analysis, we chose a
threshold for a standard deviation of 2 to identify unfocused users, consequently
resulting in 33% of users answering opposite questions with similar answers, (i.e.
T idy = 5; U ntidy = 5). Based on this analysis, such users will then be agged
as suspicious.
5.4</p>
      <p>Aggregated Flag Scores
In order to identify our suspicious users based on these 3 features, we assign
a ag score to each user. This ag score indicates the level of suspicion that
our rule-based approach suggests. The value of the ag score ranges from 0 to 1
where 0 indicates that the user can be validated and a value greater than 0 means
that the user appears as an anomaly in at least one feature, which suggests that
the user is a red ag. Based on our results, we decided to select as outliers all
users with a ag score 0, consequently identifying 310 users i.e. (44% of the total
users).</p>
      <p>Generating a new validation variable with Autoencoders. Due to the
outlier detection nature of the features explained above, we decided to take an
unsupervised learning approach to create a new validation method. We used an
outlier detection algorithm to create our own labels of valid and non-valid users.
The nature of our dataset then required an approach that could deal with many
variables but few observations (704 observations that represent features for each
user).</p>
      <p>Training the Autoencoder. In order to train our autoencoder, we handpicked
144 users that based on our analysis had completed the entire survey and whose
mouse activity data was clean. The autoencoder model trained on these users
had 25 Neurons on the input and output layers and two hidden layers of 2
neurons each. The compression used a sigmoid activation function and the mean
squared error of the process was 11.49. The results showed that 76% of the users
were classi ed as non-outliers while the rest were classi ed as outliers. Although
this method did take into account mouse behavior, we wanted to focus on mouse
movement at a more granular level. We further use the output of this validation
method as a dependent variable in our LSTM model.
6</p>
    </sec>
    <sec id="sec-6">
      <title>LSTM Based Approach</title>
      <p>Recurrent Neural Networks (RNN) have grown to be a popular tool in Natural
Language Processing for Language Modeling. Hence, RNN implementations are
no strangers to sequence-based applications. As in language modeling, an RNN
is responsible for predicting the next token. Our approach to applying RNNs to
the problem at hand consists of two key stages:
{ Training a model that can predict a user's next movement.
{ Transferring the learning from the rst model to a classi er model for
predicting survey response validation trained using autoencoders.
6.1</p>
      <p>Data Preparation
In order to feed an RNN, we needed to transform our data into a sequential
format that the RNN can understand. For this purpose, we created string-based
tokens which identi ed the cardinal directions and magnitudes of a user's
movements. Page changes are identi ed with the \pagechange" token. All of a user's
movements were appended to a single tokenized list of strings. For example, a
user's movements might start o as [\nw", \1", \sw", \3" . . . . \pagechange"
\ne", \2"]. For memory e ciency, movements were averaged out between radio
clicks.</p>
      <p>Since our RNN's loss function would be Cross-Entropy instead of Mean
Squared Error, we scaled the magnitudes signi cantly to create large bins. This
means that if our model predicts \8" as a magnitude whereas it should have
predicted \7" for example, it is justi ed to penalize the model just as if it would
have predicted a \2" because a one-point shift in magnitude is quite signi cant.
Lastly, we split our data into training and validation sets based on a 70:30 split
respectively.
6.2</p>
      <p>Model Architecture
We used Long Short-Term Memory (LSTM) as a model as they are robust
against the vanishing gradient problem. Similar to RNNs, our models carried two
types of parameters: token embeddings and hidden states. Weights also included
those which the LSTM uses to determine how signi cant of an adjustment should
be made for the new sequential input. Tokenized user movements were inputted
in mini-batches of 8 and trained on a 6GB 1070 GPU. For batches of a user
where input length di ered, padding was added to the end of shorter sequences.
We used the cross-entropy loss function, and the evaluation metric for both
the language model and the classi er was Accuracy. Once the rst model was
trained, we replaced the nal linear layer with a classi cation head of N x 2
dimensions, which produced a binary label where N is the input dimensions of
the nal hidden state from our LSTM.
6.3</p>
      <p>Results
In stage 1 of the language model, we trained the model on our training set. We
achieved the following results in predicting the next token on our validation set,
see Table 2.</p>
      <p>We received an accuracy of 64% after twenty- ve epochs. Now that we
have developed a model that was able to predict the next word, we removed
the language model head and replaced it with a classi er head with randomly
generated parameters. Hence, we trained this head to classify the validation
status of surveys. Following are the results of predicting the survey validation
status after ve epochs, see Table 3.</p>
      <p>The LSTM produced a 90% accuracy on predicting whether a user's survey
response is valid or invalid. Following is the confusion matrix and the classi
cation report, see g 4.</p>
      <p>This approach produced the highest recall. This means that this model was
the best at catching the most amount of invalid surveys identi ed by the
autoencoder.
6.4
The LSTM approach was able to produce strong results, and it can certainly
be used in an ensemble of multiple models to prevent general over tting. Given
its e cient runtime and high accuracy, we can also recommend it as a model
of choice to predict autoencoder based labels if restrictions are posed. However,
we ultimately stand by that the most generalizable results are achieved using a
combination of approaches.
7</p>
      <p>HMM</p>
    </sec>
    <sec id="sec-7">
      <title>Based Approach</title>
      <p>Our third proposed method to determine the users' authenticity in survey
responses is by analyzing the sequence of user movement using a Hidden Markov
Model (HMM). HMM is an approach to model sequential data, and implies that
the Markov Model underlying the data is unknown. Probabilistic graphical
models such as HMM have been successfully used to identify user web activity. For
such models, the sequences of observation are crucial for training and inference
processes. We made a series of assumptions and data transformations, and we
will provide an overview of the steps to produce the model and summary results
and ndings.</p>
      <p>We converted the window aspect ratios into device types and discovered that
certain users elected to take the survey on a laptop or mobile device. We believe
that the movement patterns observed by people on mobile devices di er from
those on a laptop. We solely focused on users who completed the survey using a
laptop for modeling purposes.</p>
      <p>We focused on users' coordinates across the survey duration and discovered
that there's a lot of noise in the movements. To run an e ective model, we
converted the coordinates into discrete observations representing cardinal
directions. For instance, a movement to the right of the x-axis and up on the y-axis
is labeled as North East. In total, nine labels were created: North East, North
West, North, South East, South West, South, West, East, and No Movement.
Using these directions as states S, we create a sequence of observations
concerning mouse movement activity by observing a user as they complete the survey.
The priority is to understand the overall direction of the user movement.</p>
      <p>We recognize that users are navigating through survey pages, so we use the
coordinates of the next button to estimate when each user moves to the next
page. After analyzing each survey page, we realized that each user has a unique
layout and the mouse path that users exhibit varies. Furthermore, considering
that the number of mouse movement records varies per page, we decided to
analyze the rst 200 observations per user. We also removed the users that took
the survey multiple times. After multiple attempts those users have become
accustomed to the survey design and movement would be based on memory.</p>
      <p>Only 66 users met the de ned criteria for further analysis in this approach.
We trained the HMM using the Baum-Welch algorithm to estimate the transition
matrix, state distribution, and output distribution. We train the algorithm to
recognize the patterns in each page and apply the forward algorithm to calculate
the observation log probability of each observed user sequence per page. A low
log probability is interpreted as having a less likely occurrence. See table 5 for
an example of the results.</p>
      <p>We scale each observation and apply an isolation forest to identify those
suspicious users. Out of the 66 users, 11%, or 7 users were labeled as suspicious.
User Id: 422, 727, 866, 1272, 1297, 1314, 1495.</p>
      <p>We compare two users for page 7 to illustrate their mouse movements. User
1576 movements move across the entire page ( g. 5) while user 1272 movements
are targeted and deliberate ( g. 6).</p>
      <p>Fig. 5. User 1576 mouse movement
for page 7: Normal</p>
      <p>Fig. 6. User 1272 mouse movement
for page 7: Outlier
7.1</p>
      <p>
        Assumptions and Limitations
The accuracy of the HMM is dependent on the validity of the assumptions,
and the quality of the data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We therefore identify the assumptions and
limitations of this approach.
      </p>
      <p>{ The captured data doesn't distinguish when users are using their mouse to
complete the survey vs browsing the internet.
{ The model assumes that the majority of users are completing the survey in
good faith. If most users are falsely completing the survey, then the users
that are attempting to complete the survey in good faith will be agged.
{ The model was trained on the rst 200 sequential observations, and user's
patterns could di er as they progress through the pages. There are some
users with 15,000 observations. Using an analogy, we are assuming that we
can predict whether someone will win a 100m race using the rst 10m.
{ The page labels were estimated using the coordinates of the next button on
each page. Those labels represent our best estimate and may not truly re ect
when the user page changes.
7.2
Despite the e ciency of such a probabilistic graphical model in segmenting and
labeling stochastic sequences, its performance is adversely a ected by the
imperfect quality of data used for the construction of sequential observations. While
the HMM can be useful in providing the probability of sequence, due to the
quality of the data it shouldn't be the sole source. Therefore, we would suggest
using a combination of methods in order to identify invalid survey responses.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>To conclude, we have developed three di erent methods to validate psychometric
survey responses for dotin Inc. These three methods helped us answer our initial
research questions, in particular:
1. Does the level of suspicious behavior vary across di erent types of survey
questions?</p>
      <p>From our outliers section, we were able to create general business rules to
help us identify user behavior across pages.</p>
      <p>{ Users that use the same scores across a single page can be agged as
suspicious.
{ Users that take more than 5:30 minutes to answer the survey can be
agged as suspicious.
{ Users that score above a standard deviation of 2 in our topic modeling,
will be agged as suspicious.</p>
      <p>It is important to highlight the importance of having such business rules in
the identi cation of suspicious behavior as agging users could be an easy
to implement expert rule approach to validating surveys. We envision this
method to become the rst line of defense from suspicious users, and an
easy to implement solution to ag suspicious behavior across each page, and
ultimately, the entire survey.
2. How do we use user mouse activity to validate survey answers to
psychometric questions?</p>
      <p>Through this analysis, we are looking to gain a better understanding of
the user journey throughout the survey. The goal is to see if di erent ways
of interacting with the survey could be a baseline to create a model that
through direction and magnitude of mouse movement would help us identify
whether a user is correctly lling out the survey.</p>
      <p>To tackle the question we used both supervised and unsupervised techniques:
{ Unsupervised/Supervised: LSTM We implemented an autoencoder
to generate an independent label, independent of dotin's current
approach. We then used such variables as labels in an LSTM model that
can classify suspicious user behavior.
{ Unsupervised: HMM We used a probabilistic approach that analyzed
the sequence of user movement with the Hidden Markov Model and
complemented it with the Isolation Forest Algorithm to nd the number
of suspicious users.</p>
      <p>Putting together our ndings, we can now compare the performance and
results generated by the three di erent methods - Table 4:</p>
      <p>As we can see, each model was trained on a di erent set of users due to the
limitations we faced with the quality of the original data. Therefore we would
not recommend using one single model at this point, yet we could proceed with
a hybrid approach that takes into consideration 3 models to validate users. We
believe that an improved data collection method will further help improve the
results of the individual models, as well as the overall hybrid model, enabling
improve the accuracy of their validation method for psychometric survey responses.</p>
      <p>Extend data sets by results of new surveys, combine all 3 models together,
see g. 7. Create a standard deviation score for each of the 3 approach, and use
e.g. weighted averaged scores to classify users.</p>
      <p>Fig. 7. Validation Framework Overview
multiple users in real-world learning scenarios. IEEE Access 6, 1{26 (08 2018).
https://doi.org/10.1109/ACCESS.2018.2854966</p>
    </sec>
    <sec id="sec-9">
      <title>Appendix</title>
    </sec>
    <sec id="sec-10">
      <title>Results of HMM for suspicious users</title>
      <p>Distance
Direction
Number of times a user has answered a question
Number of times user has performed any mouse activity
(scroll + moves + clicks)
Target Variable for supervised machine learning
(boolean); classi cation modeling
Average time taken between one click and the next;
aggregated by user
Max time lapsed Total time taken by the user to complete the survey
Time since last movement Total time since the last mouse movement
Time since last click Total time since the last mouse click on a radio button
Factor of di erence Quantify how the time it takes each user to complete the
survey compares to expected read time calculations.</p>
      <p>Total Distance Total distance traveled by the user (Euclidean distance)
Measure width covered A feature to give us a measure of screen coverage by user
in terms of width (x coordinate)
Measure height covered A feature to give us a measure of screen coverage by user
in terms of height (y coordinate)
Moves left , perc of left The count and percentage of instances when the user
movements moves from right to left on the screen
Moves right, perc of right The count and percentage of instances when the user
movements moves from left to right on the screen
Moves up, perc of up move- The count and percentage of instances when the user
ments moves from bottom to top on the screen
Moves down, perc of down The count and percentage of instances when the user
movements moves from top to bottom on the screen
No horizontal movement Count and percentage of instances when user shows no
horizontal movement on the screen
No vertical movement Count and percentage of instances when user shows no
vertical movement on the screen</p>
      <p>Choice of answers for each category for each question
Bf votes 1,2,3,4,5,
Bs votes 1,2,3,4,5,
Miq votes 1,2,3,4,5,
pgi votes 1,2,3,4,5,6,7
Bf abs min max response, Checks whether the user has selected all 1s (absolute
minBs abs min max response, imum value of question choice selection) or 5s/7s
(absoMiq abs min max response, lute max value of question choice selection) per question
pgi abs min max response category type (bf questions, bs questions, miq questions,
pgi questions). Boolean
Standard deviation on simi- Checks how user responses deviate on questions that are
lar questions similar in nature</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Elbahi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Omri</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahjoub</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garrouch</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Mouse movement and probabilistic graphical models based e-learning activity recognition improvement possibilistic model</article-title>
          .
          <source>Arabian Journal for Science and Engineering</source>
          <volume>41</volume>
          (
          <issue>8</issue>
          ),
          <volume>2847</volume>
          {
          <fpage>2862</fpage>
          (
          <year>2016</year>
          ). https://doi.org/10.1007/s13369-016-2025-6, https://doi.org/10.1007/s13369-016-2025-6
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kaklauskas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zavadskas</surname>
            ,
            <given-names>E.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seniut</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dzemyda</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stankevic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simkevicius</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stankevic</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paliskiene</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matuliauskaite</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kildiene</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bartkiene</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivanikovas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gribniak</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Web-based biometric computer mouse advisory system to analyze a user's emotions and work productivity</article-title>
          .
          <source>Eng. Appl. Artif. Intell</source>
          .
          <volume>24</volume>
          (
          <issue>6</issue>
          ),
          <volume>928</volume>
          {
          <fpage>945</fpage>
          (
          <year>2011</year>
          ). https://doi.org/10.1016/j.engappai.
          <year>2011</year>
          .
          <volume>04</volume>
          .006, https://doi.org/10.1016/j.engappai.
          <year>2011</year>
          .
          <volume>04</volume>
          .006
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Karim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heickal</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasanuzzaman</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>User authentication from mouse movement data using multiple classi ers</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Machine Learning and Computing</source>
          . p.
          <volume>122</volume>
          {
          <fpage>127</fpage>
          . ICMLC 2017,
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA (
          <year>2017</year>
          ). https://doi.org/10.1145/3055635.3056620, https://doi.org/10.1145/3055635.3056620
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sondhi</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A multimodal behavioral biometric technique for user identi cation using mouse and keystroke dynamics</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>111</volume>
          , 15{
          <volume>20</volume>
          (02
          <year>2015</year>
          ). https://doi.org/10.5120/
          <fpage>19558</fpage>
          -
          <lpage>1307</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Salmeron-Majadas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>O.C.</given-names>
          </string-name>
          , G. Boticario,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>A machine learning approach to leverage individual keyboard and mouse interaction behavior from</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arya</surname>
            ,
            <given-names>K.V.</given-names>
          </string-name>
          :
          <article-title>Mouse interaction based authentication system by classifying the distance travelled by the mouse</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>17</volume>
          (03
          <year>2011</year>
          ). https://doi.org/10.5120/
          <fpage>2181</fpage>
          -
          <lpage>2752</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Stamp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Introduction to Machine Learning with Applications in Information Security</article-title>
          . Chapman Hall/CRC, 1st edn. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Suganya</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muthumari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balasubramanian</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Improving the Performance of Mouse Dynamics Based Authentication Using Machine Learning Algorithm</article-title>
          .
          <source>International Journal of Innovation and Scienti c Research</source>
          <volume>24</volume>
          (
          <issue>1</issue>
          ),
          <volume>202</volume>
          {
          <fpage>209</fpage>
          (
          <year>2016</year>
          ), http://www.ijisr.issr-journals.
          <source>org/abstract.php?article=IJISR-16-073-02</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Yamauchi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Reading emotion from A ective computing approach</article-title>
          . Cognitive Science https://doi.org/10.1111/cogs.12557 mouse cursor motions:
          <volume>42</volume>
          ,
          <issue>1</issue>
          {
          <fpage>49</fpage>
          (11
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>