The Warping Window Size Effects the Accuracy of Person Identification
                  based on Keystroke Dynamics

                                                Zakarya Farou and Krisztian Buza

                         Telekom Innovation Labortories, Department of Data Science and Engineering
                             Faculty of Informatics, ELTE - Eötvös Loránd University in Budapest
                                             {fzfxih,buza}@inf.elte.hu
                                               http://t-labs.elte.hu/

Abstract: 1-nearest neighbor (1NN) with Dynamic                      metrics is less obtrusive and they do not require extra
Time Warping (DTW) distance is a popular time se-                    hardware [20].
ries classification technique. In the last decades, re-                 Recently, keystroke dynamics has gained popular-
search on DTW aimed to improve its classification ac-                ity [2, 5]. Keystroke dynamics may be used for contin-
curacy, memory usage, and efficiency. According to                   uous user authentication or to enhance existing authen-
a recent study, the appropriate selection of the Warp-               tication methods, such as password-based authentica-
ing Window Size (WWS) is crucial for the accuracy                    tion or the credit card number-based access to payment
of 1NN-DTW. In this work, we consider person iden-                   services. Keystroke dynamics is appealing for many
tification based on keystroke dynamics as a time se-                 reasons: it is non-invasive (users type on the computer
ries classification task, and we examine whether WWS                 keyboard anyway), it does not require extra hardware
is crucial in this case. We performed experiments on                 and is available after the authentication step at the start
a real-world dataset containing more than 400 typ-                   of the computer session [11].
ing sessions obtained from 12 users. In the case of                     Various methods have been used for user authenti-
this dataset, each user typed the same fixed text. We                cation based on keystroke dynamics, such as Bayesian
found that the aforementioned hypothesis was correct,                classifiers, neural networks, SVM, Markov chains and
i.e., the classification accuracy indeed depends on the              dimensionality reduction [13, 19], see also [4] for a
WWS. Furthermore, according to our observations, in                  recent review.
our case, the optimal WWS is less than 3% which is
substantially different from 10% that has been used as
a default value in various works.                                    2     Background
Keywords: keystroke dynamics, person identification,
                                                                     Measuring the similarity between a pair of time se-
dynamic time warping, DTW, warping window size,
                                                                     ries is an important task which is usually applied in
nearest neighbor.
                                                                     many data mining applications such as anomaly de-
                                                                     tection [7], clustering [1, 9], and classification tasks
1    Introduction                                                    [10, 12]. In a naive solution, the Euclidean distance
                                                                     could be used to calculate the similarity between time
Biometric authentication systems are based on user                   series, however, the Dynamic Time Warping (DTW)
traits and characteristics. There are two major forms                distance allows for shifts and elongations, therefore it
of biometrics: those based on physiological attributes               is more appropriate in various applications [1, 3, 6].
and those based on behavioral characteristics. The                   We will first begin with a review of DTW and its re-
physiological type includes biometrics based on stable               cent extensions.
body traits, such as the face, iris, fingerprint, and the
hand, and are considered to be more robust and secure.               2.1   Measuring The Similarity of Time Series
However, they are also considered to be more vulnera-
ble to intrusions, expensive and require special equip-              Given two time series T1 of length n, and T2 of length m
ment [17]. On the other hand, behavioral biometrics
include learned movements such as handwritten sig-                          T1 = ( T1 [1] , T1 [2], ... , T1 [i] , ... , T1 [n] )
natures, keyboard dynamics (typing pattern), mouse
movements, gait, and speech. Collecting these bio-                         T2 = ( T2 [1] , T2 [2], ... , T2 [ j] , ... , T2 [m] ),
                                                                     their similarity can be computed in various ways.
     Copyright c 2019 for this paper by its authors. Use permitted
                                                                        The simplest way to define the similarity S is to
under Creative Commons License Attribution 4.0 International (CC     treat these sequences as vectors and to compute the
BY 4.0).                                                             Euclidean distance between them directly. However,
                                                                               Figure 2: Example of the calculation of the DTW-
                                                                               matrix. a) The DTW-matrix. The time series T1 and T2
                                                                               are shown o the left and top of the matrix. The marked
                                                                               entries of the matrix correspond to the mapping be-
                                                                               tween both time series. b) The calculation of the value
                                                                               of a cell.
Figure 1: Comparison of two time series: Euclidean
distance (top) compares the values at the same posi-
tions, whereas DTW (bottom) allows for shifts and                              the sum of entries of D along that path:
elongations.
                                                                                            DTW (T1 , T2 ) = min ∑ Dwk ,                    (1)
                                                                                                               W ∈P w ∈W
                                                                                                                     k
this requires both time series to be of the same length,
i.e., m = n, but this is not always the case in real-world                     where P is the set of all warping paths.
applications [15]. That’s why the similarity should                              This warping path can be found using dynamic pro-
be computed between sequences of various lengths                               gramming to evaluate the following recurrence:
(m 6= n). To this end, DTW is an appropriate choice
                                                                                                            
that allows for non-linear matching of two time series                                                       σ [i − 1, j − 1]
with different lengths [15], see Fig. 1.                                              σ [i, j] = di, j + min σ [i − 1, j]      (2)
                                                                                                               σ [i, j − 1]
                                                                                                            

2.2    Review of DTW
                                                                               Note that σ (i, j) is the cumulative distance along the
DTW [18] is an edit distance. Given various edit steps                         optimal warping path.
(elongation, replacement of value) and their costs, the                           In most applications, it may not be reasonable to al-
DTW distance of two time series is the minimal cost of                         low for arbitrarily large shifts and elongations, there-
transforming one of the time series into the other one.                        fore, the calculations are usually restricted to the en-
Technically, DTW distance is calculated by filling an n                        tries around the diagonal of the matrix, i.e.,
by m matrix, called DTW cost matrix (denoted as σ ),
where n and m refer to the length of time series T1 and                                   σi, j is calculated ⇔ |i − j| ≤ w,
T2 respectively, see Alg. 1 and Fig. 2.
   In order to understand DTW in more detail, we de-
fine an n by m matrix D, in which each entry di, j cor-                         Algorithm 1: DTW Distance
responds to the squared distance between two values                              Input: T1 : array [1..n], T2 : array [1..m]
of the time series: di, j = (T1 [i] − T2 [ j])2 . To find the                    Output: distance
best match between the two time series, we retrieve a                            σ = array[0..n][0..m]
path through the matrix, called warping path, that min-                          for i = 1, ..., n do
imizes the total cumulative distance. The warping path                               for j = 1, ..., m do
is a sequence of neighboring entries1 of D so that it be-                                σ [i,j] = ∞
gins in the top left (first row, first column) corner of D,                      σ [0,0] = 0
and it ends in the bottom right corner (last row, last                           for i = 1, ..., n do
column). In particular, the optimal warping path W ∗ is                              for j = 1, ..., m do
the path that minimizes the cost. The DTW distance is                                    cost = (T1 [i] − T2 [ j])2
                                                                                         σ [i,j] = cost +
                                                                                               min(σ [i-1, j], σ [i , j-1], σ [i-1, j-1])
      1 With neighbors of d we mean: d
                            i, j          i−1, j−1 , di−1, j ,   di−1, j+1 ,     return σ [n,m]
di, j−1 , di, j+1 , di+1, j−1 , di+1, j , di+1, j+1 .
where w is the Warping Window Size (WWS) which
controls the maximum amount of warping. In the
cases, where some of the terms σi, j−1 , σi−1, j , σi−1, j−1
are undefined (i.e., they have not been calculated), they
are excluded from Eq. (2). WWS is often set to 10%
of the length of the time series.
   The effect of WWS on the classification perfor-
mance has been studied, for example, in [14] and [8].
Dau et al. [8] claimed that obtaining the best perfor-
mance from DTW requires setting WWS carefully,
i.e., using the appropriate WWS can produce signif-
icant improvements in classification accuracy. In the
following section, we perform an experiment to exam-
ine whether this hypothesis is true or not in case of
typing dynamics data.


3     Experiments
The goal of our experiments is to examine the effect of
warping window size on the accuracy of person iden-
tification.
                                                                  Figure 3: Example of user typing pattern raw
                                                                  data,    see http://biointelligence.hu/
3.1    Person Authentication Dataset                              typing-challenge/task1/index.php for
                                                                  the description of the raw data.
We used the dataset associated with Task 1 of the Per-
son Identification Challenge. Typing dynamics of 12
users were recorded in 458 sessions with a JavaScript             typing pattern sequence. Before running the DTW dis-
(JS) application. In each typing session, the users were          tance between each pair of keystroke time series, the
asked to type some sentences and the keyboard events              data must be pre-proceeded, then we classify users’
keyup, keydown and keypress were captured by the JS               test data.
application. For each of the aforementioned events,
the time (in milliseconds) was also recorded, see [5]
and the webpage of the challenge2 for details.                    Pre-processing of Raw Typing Patterns In order to
   We are given the true identity of the users (coded by          extract down-down and down-up durations, the raw
integer numbers from 1 to 12) for 5 typing patterns per           data must be preprocessed. In particular, we per-
user. We use this data as training data.                          formed the following preprocessing steps:
   Additionally, hypothetical user identities for the rest
of the typing patterns are given. This data is used as              • select time and relevant event types, i.e., keydown
test data.                                                            and keyup.
   Our task is to decide if the hypothetical identities
match the true identities of the users who typed those
                                                                    • Sometimes the first event of the typing pattern is a
patterns.
                                                                      keyup event (see Fig. 3) which is caused by click-
   The submission system at the webpage of the chal-
                                                                      ing the shift key to change the cursor between
lenge is used to evaluate our predictions for the test
                                                                      registration form fields, so we removed it.
data.

                                                                    • For simplicity, we removed sequential successive
3.2    Preprocessing and Feature Extraction                           keydown or successive keyup events, and we kept
                                                                      the latest keydown and the first keyup (see Fig. 4).
In order to experimentally examine the role of the
warping window size, we use 1-nearest neighbor
with DTW with various warping window sizes w ∈
{0, 1, 2 . . . , lmax } where lmax is the length of the longest
                                                                  Extraction of Features Alg. 2 and Alg. 3 show
                                                                  the approach for extracting down-down (between
      2 biointelligence.hu/typing-challenge/                      keystroke) and down-up (keystroke duration) metrics.
                                                          Algorithm 3: Extraction of the duration of
                                                          keystrokes
                                                           Input: typingP : List []
                                                           Output: ksDuration: List []
                                                           counter = 0
                                                           ksDuration = []
                                                           while not end of list do
                                                               ksDuration[counter] =
                                                                 typingP[counter + 1] - typingP[counter]
                                                               counter = counter + 2
Figure 4: Elimination of successive keydown and
                                                           return ksDuration
keyup events

 Algorithm 2: Extraction of the times between
 keystrokes
  Input: typingP : List []
  Output: ksDuration: List []
  counter = 0
  typing = []
  betweenKS = []
  while not end of list do
      if typingP[counter]== ’keydown’ then
           typing.append(typingP[counter])
      counter = counter + 1                              Figure 5: The classification accuracy as function of
  counter = 0                                            warping window size using between-keystroke-times
  while not end of list do
      betweenKS[counter] =
        typing[counter + 1] - typing[counter]            4   Conclusion
      counter = counter + 1
  return betweenKS                                       Based on the result of Ratanamahatana et al.[16] and
                                                         our own experiments on the data associated with the
                                                         typing pattern-based person identification challenge,
3.3   Experimental Results                               we can conclude that working with a small warping
                                                         window (approximately 3% in our case) could be ben-
To test the effect of the warping window size on the     eficial not only to reduce the execution time and mem-
classification accuracy, we performed empirical exper-   ory usage but also to increase the accuracy of the clas-
iments on the datasets containing (i) keystroke dura-    sification. We also noticed that working with a lager
tions and (ii) between-keystroke-times. Additionally,    warping window size could reduce the accuracy of the
we experimented with a combination of the two afore-     classification in some cases. That’s why it is better to
mentioned options.                                       use a low warping window size.
   We vary the warping window size from 0 (Eu-              Our results confirm the hypothesis of Dau et al. [8]
clidean) to lmax (no constraint/full calculation) and
record the accuracy for each warping window size.
The results are shown in Fig. 5, Fig. 6 and Fig. 7.


3.4   Discussion

A study made by Ratanamahatana et al. [16] where a
similar experiment has been performed with 6 datasets
retrieved from “UEA & UCR Time Series Classifica-
tion Repository” ended by the following conclusions:
”Wider warping constraints do not always improve the
accuracy, as commonly believed [21].” More often,
the accuracy peaks very early at much smaller window     Figure 6: The classification accuracy as function of
size, as shown in Table 1.                               the warping window size using keystroke duration
        Table 1: The warping window size that yields maximum classification accuracy for each dataset.

                     Dataset                                        Accuracy (%)         WWS (%)
                     Face                                                     96.43                3
                     Gun                                                      99.00                3
                     Trace                                                   100.00                1
                     Leaf                                                     96.38               10
                     Control chart                                            99.67                8
                     Two-pattern                                                100                3
                     Typing Pattern – keystroke duration                      69.10              2.6
                     Typing Pattern – between-keystroke-times                 65.70              0.8
                     Typing Pattern – combined                                69.67              2.1


                                                             [3] Bar-Joseph, Z., Gerber, G., Gifford, D., Jaakkola, T. ,
                                                                 Simon, I.: A new approach to analyzing gene expres-
                                                                 sion time series data. In Proceedings of the 6th Annual
                                                                 International Conference on Research in Computational
                                                                 Molecular Biology (2002) 39–48
                                                             [4] Bhatt, S., Santhanam, T.: Keystroke dynamics for bio-
                                                                 metric authentication–A survey. In Proceedings of the
                                                                 International Conference on Pattern Recognition, Infor-
                                                                 matics and Mobile Engineering. IEEE (2013) 17–23
                                                             [5] Buza, K.: Person Identification Based on Keystroke Dy-
                                                                 namics: Demo and Open Challenge. In CAiSE Forum
Figure 7:      The classification accuracy as func-              (2016) 161–168
tion of the warping window size using the minimum            [6] Caiani, E. G., Porta, A., Baselli, G., Turiel, M., Muz-
of the two DTW-distances (DTW-distance based on                  zupappa, S., Pieruzzi, F., Cerutti, S.: Warped-average
between-keystroke-times and DTW-distance based on                template technique to track on a cycle-by-cycle basis
keystroke duration)                                              the cardiac filling phases on left ventricular volume.
                                                                 In Computers in Cardiology 1998. IEEE 25 (Cat. No.
                                                                 98CH36292) (1998) 73–76
for the case of person identification based on typing        [7] Dasgupta, D., Forrest, S.: Novelty detection in time se-
patterns. According to the aforementioned hypothesis,            ries data using ideas from immunology. In Proceedings
”obtaining the best performance from DTW requires                of the international conference on intelligent systems
setting its only parameter, the maximum amount of                (1996) 82–87
warping (w) and thus can produce significant improve-        [8] Dau, H. A., Silva, D. F., Petitjean, F., Forestier, G., Bag-
ments in classification accuracy”.                               nall, A., Mueen, A., Keogh, E.: Optimizing dynamic
                                                                 time warping’s window width for time series data min-
                                                                 ing applications. Data Mining and Knowledge Discov-
Acknowledgments                                                  ery 32(4) (2018) 1074–1120
                                                             [9] Debrégeas, A., Hébrail, G.: Interactive Interpretation of
This work was supported by the project no. 20460-                Kohonen Maps Applied to Curves. In KDD (1998) 179–
3/2018/ FEKUTSTRAT within the Institutional Excel-               183
lence Program in Higher Education of the Hungarian           [10] Diez, J. J. R., González, C. J. A.: Applying boost-
Ministry of Human Capacities.                                    ing to similarity literals for time series classification. In
                                                                 International workshop on multiple classifier systems.
                                                                 Springer, Berlin, Heidelberg (2000) 210–219
References                                                   [11] Gunetti, D., Picardi, C.: Keystroke analysis of free text.
                                                                 ACM Transactions on Information and System Security
[1] Aach, J., Church, G. M.: Aligning gene expression            (TISSEC) 8(3) (2005) 312–347
    time series with time warping algorithms. Bioinformat-   [12] Kadous, M. W.: Learning Comprehensible Descrip-
    ics 17(6) (2001) 495–508                                     tions of Multivariate Time Series. In ICML 454 (1999)
[2] Antal, M., Szabó, L. Zs., László, I.: Keystroke dynam-       463
    ics on android platform. Procedia Technology 19 (2015)   [13] Kalina, J., Schlenker, A.: Dimensionality reduction
    820–826
    methods for biomedical data. Lékař a technika – Clin-
    ician and Technology, 48(1) (2018) 29-35
[14] Kurbalija, V., Radovanović, M., Geler, Z., Ivanović,
    M.: The influence of global constraints on similarity
    measures for time-series databases. Knowledge-Based
    Systems 56 (2014) 49–67
[15] Rabiner, L. R., Juang, B. H., Rutledge, J. C.: Funda-
    mentals of speech recognition. Englewood Cliffs: PTR
    Prentice Hall 14 (1993)
[16] Ratanamahatana, A., Keogh, E.: Everything you know
    about dynamic time warping is wrong. In 3rd Workshop
    on Mining Temporal and Sequential Data, in conjunc-
    tion with 10th ACM SIGKDD Int. Conf. Knowledge
    Discovery and Data Mining (KDD-2004), Seattle, WA
    1 (2004) 1–11
[17] Revett, K., Gorunescu, F., Gorunescu, M., Ene, M.,
    Magalhães, P. S. T., Santos, H. D. D.: A machine learn-
    ing approach to keystroke dynamics based user authen-
    tication. International Journal of Electronic Security and
    Digital Forensics1(1) 55–70
[18] Sakoe, H., Chiba, S., Waibel, A., Lee, K. F.: Dynamic
    programming algorithm optimization for spoken word
    recognition. Readings in speech recognition 159 (1990)
    224
[19] Stefan, D., Shu, X., Yao, D.D.: Robustness of
    keystroke-dynamics based biometrics against synthetic
    forgeries. Computers & security 31(1) (2012) 109–121
[20] Sultana, M., Paul, P. P., Gavrilova, M.: A concept of
    social behavioral biometrics: motivation, current devel-
    opments, and future trends. In 2014 International Con-
    ference on Cyberworlds. IEEE (2014) 271–278
[21] Zhu, Y., Shasha, D.: Warping indexes with envelope
    transforms for query by humming. In Proceedings of the
    2003 ACM SIGMOD international conference on Man-
    agement of data. ACM (2003) 181–192