Person Identification Based on Keystroke
       Dynamics: Demo and Open Challenge

                                  Krisztian Buza

            Brain Imaging Center, Research Center for Natural Sciences
               Hungarian Academy of Sciences, Budapest, Hungary
                           buza@biointelligence.hu
                        http://www.biointelligence.hu


      Abstract. Person identification based on keystroke dynamics is a chal-
      lenging task with applications in various domains ranging from online
      education to internet banking. State-of-the-art solutions for this task
      are based on machine learning. In this paper, we present our solution
      which is based on dynamic time warping (DTW) and ECkNN, a recent
      hubness-aware regression technique. We performed initial evaluation on a
      dataset containing 200 typing sessions and we show that the proposed ap-
      proach outperforms popular time-series classifiers. Additionally, we point
      out that we integrated the proposed approach into a Python-based web
      server which allows to demonstrate real-world applications of the pro-
      posed person identification technique. Furthermore, in order to motivate
      research in this domain, we announce an open challenge.

      Keywords: person identification, keystroke dynamics, hubness-aware
      regression, challenge


1   Introduction
Conventional techniques for person identification range from passwords to bio-
metric identification, such as fingerprints, iris-patterns, electroencephalograph-
based and electrocardiograph-based person identification [6, 8]. Online services,
such as online banking or online courses, require cheap, widely accessible and
reliable person identification techniques. It was shown that the dynamics of typ-
ing is characteristic to particular users, and users are hardly able to mimic the
typing dynamics of others [10].
    Although the dynamics of typing, e.g. the time series of the duration of
keystrokes, is characteristic to users, it is obvious that even the same user can
not always type with the exactly same dynamics. This is illustrated in Fig. 1. The
figure shows the durations of the first 25 keystrokes in case of typing the same
text by two different users. Each of the users typed the same text two times. The
time series of user1 are shown in the left of the figure, while the time series of
user2 are shown in the right of the figure. As one can see, the time series of the
same user are more similar to each other than the time series of different users.


Copyright c by the paper’s authors. Copying permitted only for private and academic
purposes.
In: S. España, M. Ivanović, M. Savić (eds.): Proceedings of the CAiSE’16 Forum at
the 28th International Conference on Advanced Information Systems Engineering,
Ljubljana, Slovenia, 13-17.6.2016, published at http://ceur-ws.org
162     Krisztian Buza


Fig. 1. The duration of 25 consecutive keystrokes in case of two different users (in the
left and right) when typing the same text.


In particular, a peak (i.e., an exceptionally long keystroke) close to the fifth
position is characteristic to user1, whereas exceptionally short keystrokes close
to position twenty are characteristic to user2. In case if we consider a large set
of users, such as millions of students participating in online education, it may be
difficult and time-consuming for human experts to identify patterns that are able
to reliably distinguishing users from each other. Therefore, approaches based on
machine learning are required for user identification based on typing patterns.
   In our solution, we consider the task of person identification based on the
dynamics of typing as a time-series classification problem, for which various ap-
proaches have been introduced ranging from neural networks [11, 18] over Hidden
Markov Models [7] to support vector machines [5] and Bayesian networks [2].
However, the 1-nearest neighbour (1NN) classifier with dynamic time warping
(DTW) as distance measure was shown to be an extremely competitive classifier,
outperforming many complex models, such as neural networks, Hidden Markov
Models or super-kernel fusion scheme [19].
    Although the empirical evidence is also justified by theoretical results [3, 4],
one of the recently observed shortcomings of nearest neighbour models is their
suboptimal performance in the presence of bad hubs [12, 17]. Informally, we say
that an instance x is a bad hub, if x appears as a nearest neighbour of surprisingly
many other instances, but x belongs to a class which is different from the class
of those instances that have x as their nearest neighbour. With hubness, we refer
to the presence of bad hubs, a phenomenon that has been observed in various
datasets, including time series datasets [15]. For a more formal definition of
bad hubs, we refer to [15], in which hubness-aware classifiers are surveyed and
applied to the classification of time series. Here, we only note that we made
                                     Person Identification Demo and Challenge        163

similar observations for time series representing keystroke dynamics, i.e., bad
hubs are present in such data as well.
    As the aforementioned studies show, bad hubs are responsible for surpris-
ingly large fraction of the total classification error of nearest neighbour classi-
fiers, therefore, reduction of the detrimental effect of bad hubs can substantially
improve the accuracy of time-series classification. Consequently, we base our so-
lution on ECkNN, one of the recent hubness-aware machine learning techniques.
In Section 4 we present the results of our initial evaluation on a dataset contain-
ing 200 typing sessions and we show that the proposed approach outperforms
popular time-series classifiers. Furthermore, we point out that we integrated the
proposed approach into a Python-based web server which allows to demonstrate
real-world applications of the proposed technique. Moreover, in order to motivate
research in this domain, we announce an open challenge.


2    Nearest neighbour regression with error correction


Nearest neighbour regression with error correction (ECkNN) is a hubness-aware
extension of the kNN regression. By design it is suitable to various types of data,
e.g. vector data, time series, etc., given that an appropriate distance measure
between the instances of the dataset is available. As we work with time-series
data describing the dynamics of typing, the instances are time series in our case.
Next, we describe ECkNN in more detail.
    In its training phase, ECkNN implements error correction on the training
data. In particular, the corrected label yc (x) of an instance x is defined as

                                  
                                   |I1 |
                                          P
                                       x
                                            y(xi )   if |Ix | ≥ 1
                       yc (x) =        xi ∈Ix                       ,                (1)
                                  y(x),             otherwise

where Ix denotes the set of those training instances that have x as one of their
k-nearest neighbours, |Ix | is the size of the set Ix and y(x) is the original (i.e., un-
corrected) label of instance x. When ECkNN is applied to predict labels for new
instances, it performs k-nearest neighbour regression using the corrected labels.
That is: for a new instance x*, ECkNN searches for the k-nearest neighbours of
x* among the training instances and outputs the average of the corrected labels
of the neighbours as the estimated label of x*. For more details about ECkNN
we refer to [1].
    As mentioned previously, the dynamics of typing is captured by time series
data. In order to use ECkNN with time series data, it is necessary to determine
the nearest neighbours of time series. For this purpose, we use Dynamic Time
Warping (DTW), an extraordinarily popular time series distance measure that
is robust to elongations and noise [13, 15].
164      Krisztian Buza

3     Person identification based on typing patterns

We base our solution on the wide-spread “classification-via-regression” approach.
In particular, we use ECkNN regression for the person identification task in the
following way: for each pair of users (u, v) we train a separate model. While
doing that, we associate time series describing the typing dynamics of user u
with label 0. Similarly, the time series of user v are associated with label 1.
When a new time series is presented to the trained model, the model outputs a
continuous value between 0 and 1 (bounds are inclusive). Values close to 0 (or 1,
resp.) indicate that, according to the model, the new time series is more likely
to represent the typing dynamics of user u (or v, respectively).
    Usage of pairwise models, i.e., models that distinguish between two users, is
consistent with the circumstances under which the person identification problem
arise in real-world applications: for example, in case of online exams and online
banking, the user claims an identity and the task is to decide if the claimed
identity matches the user’s true identity. Assume that the claimed identity is u∗ ,
and other users of the system are denoted by ui , 1 ≤ i ≤ n. In this case, for each
user ui (except u∗ ), we decide if the typing pattern is more consistent with the
typing patterns of u∗ than ui . These n decisions can be implemented in parallel
if the number of users is high and several computational units are available.1
    At each of the above decisions, a simple decision threshold of 0.5 could be
applied, i.e., given a model trained to distinguish between the typing patterns
of users u and v, if the model outputs less than 0.5 when a new time series
is presented to the model, then the decision is u, otherwise the decision is v.
However, the simple threshold of 0.5 may be suboptimal, therefore, we learn the
threshold in the following way: once the model is trained, we present the time
series of the training set to the model and obtain the output of the model for
the training time series. Then, we determine the threshold that gives the highest
accuracy on the training data.


4     Initial experimental evaluation

Typing Dynamics Data We collected time series data describing the dynamics
of typing, or typing patterns for short. In the initial evaluation, we used the
data from four different users, each of them donated approximately 50 typing
patterns, resulting in a collection of 200 typing patterns in total. In each of
the typing sessions, the users were asked to type the same short text of a few
sentences. In particular, the users were asked to type the following text based
on the English Wikipedia page about Neil Armstrong:
1
     As there might be users with similar typing dynamics, depending on the costs of
    different types of errors, in order to successfully authenticate user u∗ we might allow
    a few of these pairwise decisions to “fail” in the sense that the model outputs that
    the typing pattern is more likely to belong to user ui instead of u*. Concretely, in
    case if we allow t of the pairwise decisions to “fail” in the above sense, we say that
    we set the value of the tolerance parameter to t.
                                  Person Identification Demo and Challenge      165


           Fig. 2. Accuracy of our approach (ECkNN) and the baselines.


That’s one small step for a man, one giant leap for mankind. Armstrong prepared
his famous epigram on his own. In a post-flight press conference, he said that he
decided on the words just prior to leaving the lunar module.

In each typing session, we measured both (i) the time between consecutive
keystrokes and (ii) the duration of each keystroke, i.e., the time between pressing
and releasing a key. We used a self-made JavaScript application, a PHP script
and a Python script to capture the aforementioned time series, save and prepro-
cess the data. Note that due to typing errors, the length of the typing patterns
(and the corresponding time series) varies slightly.

Experimental protocol In order to simulate the scenario in which users provide
few typing patterns when they register into a system, we used the first five typing
patterns per user as training data. The remaining typing patterns were used as
test data in order to evaluate the system.
     We trained models to distinguish two users: i.e, we trained a separate model
for each pair of users. For each model, we measured the accuracy, i.e., the ratio
of correctly classified instances. We report accuracies averaged over all the pairs
of users. We used the binomial test suggested by Salzberg [14] to judge if the
differences between the models are statistically significant or not.
     We measured the accuracy of our approach, denoted as ECkNN, and the
baselines in case of (i) using only the time series of the times between consecutive
keystrokes, (ii) using only the time series of the duration of keystrokes, and
(iii) using both of the aforementioned two time series. In the latter case, we
combined the output of the two models used in the first two cases by averaging
their outputs.
     We used the publicly available ECkNN-implementation from the PyHubs
library.2 We set k = 5 for ECkNN which is in accordance with other works on
hubness-aware machine learning [1, 16].
     As described in Section I, 1NN-DTW was reported as an extremely com-
petitive time series classifier, therefore we used it as one of the baselines. Addi-
2
    http://biointelligence.hu/pyhubs
166     Krisztian Buza


            Fig. 3. Accuracy of our approach for various types of data.


tionally, we performed experiments with k-nearest neighbour regression (kNN)
with DTW and k=5 and a decision threshold learned on the training data (see
Section 3 for the details of the “classification-via-regression” approach and how
we learned the decision threshold).

Experimental results Fig. 2 and Fig. 3 summarize the results of our experiments.
Fig. 2 shows classification accuracy of the combined models that use both types
of time series. We observed that ECkNN outperforms both baselines statisti-
cally significantly according to binomial tests at significance level of p = 0.001.
In Fig. 3, we examine the performance of both types of time series in more detail:
keypress duration time-series seems to be more informative than the times be-
tween consecutive keystrokes. Most importantly, the combination of both types
of information leads to the best performance out of the examined cases.


5     Demo and open challenge
In order to demonstrate applications of the proposed approach in real-world
systems, we integrated it into a Python-based web server.
    In order to motivate further research in this domain and to allow the eval-
uation of various models and preprocessing pipelines, we published raw data
describing the dynamics of typing and announce an open challenge. The de-
scription of the challenge and the dataset is available at:
      http://www.biointelligence.hu/typing-challenge/index.php
The data used for the challenge contains more than 500 typing sessions. It was
collected from 12 users, therefore, the associated identification task is inherently
more challenging than the task we considered for the initial evaluation. With
“open challenge” we mean that the challenge shall run and submissions of new
solutions shall be possible as long as there is interest both from the side of the
scientific community and the organizers.
                                  Person Identification Demo and Challenge     167

    The performance of the proposed approach and the baselines is available
on the leaderboard as “ECKNN”, “1NN” and “KNN”. In the “Person Authen-
tication” task (Task 1), we set the tolerance parameter to t = 2, whereas in
the “Person Identification” task (Task 2) we used the majority vote of all the
pairwise decision models.


6   Conclusions and outlook
In this paper, we considered the task of person identification based on the dy-
namics of typing. We presented initial results of our experiments with ECkNN,
a hubness-aware regression technique. We compared the results of ECkNN with
1NN-DTW and kNN regression which are highly competitive baselines.
    We integrated the proposed approach into a Python-based web server which
aims to demonstrate the applicability in real-world systems. In order to encour-
age large scale evaluation of various machine learning techniques and future
research, we announced an open challenge.
    In case of person identification based on keystroke dynamics, a relatively
small set of reliably labeled data is given, e.g. the typing patterns that were
recorded when the user registered to the system. However, substantially more
unlabeled data (or “weakly labeled” data, under the assumption that the user
claims an identity when she tries to log in) may be collected during the usage
of the system. The presence of unlabeled data was not taken into account in
our work, but it might be exploited using semi-supervised machine learning
techniques, such as the SUCCESS approach [9] that was designed for time series
classification.
    As hubness-aware approaches performed well for the person identification
based on the dynamics of typing, we envision that similar techniques may be
applied to person identification based on electroencephalograph (EEG) and elec-
trocardiograph (ECG) signals as well.

Acknowledgments. We thank Ladislav Peška for implementing the submission
system used for the challenge. We thank Dóra Neubrandt for her help with
the implementation of the server demonstrating person identification based on
keystroke dynamics. This research was performed within the framework of the
grant of the Hungarian Scientific Research Fund – OTKA PD 111710. This
paper was supported by the János Bolyai Research Scholarship of the Hungarian
Academy of Sciences.


References
 1. Buza, K., Nanopoulos, A., Nagy, G.: Nearest neighbor regression in the presence
    of bad hubs. Knowledge-Based Systems 86, 250–260 (2015)
 2. Buza, K., Schmidt-Thieme, L.: Motif-based classification of time series with
    bayesian networks and svms. In: Advances in Data Analysis, Data Handling and
    Business Intelligence, pp. 105–114. Springer (2009)
168     Krisztian Buza

 3. Chen, G.H., Nikolov, S., Shah, D.: A latent source model for nonparametric time
    series classification. In: Advances in Neural Information Processing Systems. pp.
    1088–1096 (2013)
 4. Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition,
    vol. 31. Springer Science & Business Media (2013)
 5. Eads, D.R., Hill, D., Davis, S., Perkins, S.J., Ma, J., Porter, R.B., Theiler, J.P.:
    Genetic algorithms and support vector machines for time series classification. In:
    International Symposium on Optical Science and Technology. pp. 74–85. Interna-
    tional Society for Optics and Photonics (2002)
 6. Gargiulo, F., Fratini, A., Sansone, M., Sansone, C.: Subject identification via ecg
    fiducial-based systems: Influence of the type of qt interval correction. Computer
    methods and programs in biomedicine 121(3), 127–136 (2015)
 7. Kim, S., Smyth, P., Luther, S.: Modeling waveform shapes with random effects
    segmental hidden markov models. In: Proceedings of the 20th conference on Un-
    certainty in artificial intelligence. pp. 309–316. AUAI Press (2004)
 8. Marcel, S., Del Millan, J.R.: Person authentication using brainwaves (eeg) and
    maximum a posteriori model adaptation. Pattern Analysis and Machine Intelli-
    gence, IEEE Transactions on 29(4), 743–752 (2007)
 9. Marussy, K., Buza, K.: Success: a new approach for semi-supervised classification
    of time-series. In: Artificial Intelligence and Soft Computing. pp. 437–447. Springer
    (2013)
10. Monrose, F., Rubin, A.D.: Keystroke dynamics as a biometric for authentication.
    Future Generation computer systems 16(4), 351–359 (2000)
11. Nanopoulos, A., Alcock, R., Manolopoulos, Y.: Feature-based classification of time-
    series data. International Journal of Computer Research 10(3), 49–61 (2001)
12. Radovanović, M., Nanopoulos, A., Ivanović, M.: Hubs in space: Popular nearest
    neighbors in high-dimensional data. The Journal of Machine Learning Research
    11, 2487–2531 (2010)
13. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken
    word recognition. Acoustics, Speech and Signal Processing, IEEE Transactions on
    26(1), 43–49 (1978)
14. Salzberg, S.L.: On comparing classifiers: Pitfalls to avoid and a recommended ap-
    proach. Data mining and knowledge discovery 1(3), 317–328 (1997)
15. Tomašev, N., Buza, K., Marussy, K., Kis, P.B.: Hubness-aware classification, in-
    stance selection and feature construction: Survey and extensions to time-series. In:
    Feature selection for data and pattern recognition, pp. 231–262. Springer (2015)
16. Tomašev, N., Mladenić, D.: Nearest neighbor voting in high-dimensional data:
    Learning from past occurrences. In: Data Mining Workshops (ICDMW), 2011 IEEE
    11th International Conference on. pp. 1215–1218. IEEE (2011)
17. Tomašev, N., Radovanovic, M., Mladenic, D., Ivanovic, M.: A probabilistic ap-
    proach to nearest-neighbor classification: Naive hubness bayesian knn,. In: Proc.
    20th ACM Int. Conf. on Information and Knowledge Management (CIKM). pp.
    2173–2176 (2011)
18. Wong, F.W.M.H., Supian, A.S.M., Ismail, A.F., Kin, L.W., Soon, O.C.: Enhanced
    user authentication through typing biometrics with artificial neural networks and k-
    nearest neighbor algorithm. In: Signals, Systems and Computers, 2001. Conference
    Record of the Thirty-Fifth Asilomar Conference on. vol. 2, pp. 911–915. IEEE
    (2001)
19. Xi, X., Keogh, E., Shelton, C., Wei, L., Ratanamahatana, C.A.: Fast time series
    classification using numerosity reduction. In: Proceedings of the 23rd international
    conference on Machine learning. pp. 1033–1040. ACM (2006)