The Warping Window Size Effects the Accuracy of Person Identification based on Keystroke Dynamics Zakarya Farou and Krisztian Buza Telekom Innovation Labortories, Department of Data Science and Engineering Faculty of Informatics, ELTE - Eötvös Loránd University in Budapest {fzfxih,buza}@inf.elte.hu http://t-labs.elte.hu/ Abstract: 1-nearest neighbor (1NN) with Dynamic metrics is less obtrusive and they do not require extra Time Warping (DTW) distance is a popular time se- hardware [20]. ries classification technique. In the last decades, re- Recently, keystroke dynamics has gained popular- search on DTW aimed to improve its classification ac- ity [2, 5]. Keystroke dynamics may be used for contin- curacy, memory usage, and efficiency. According to uous user authentication or to enhance existing authen- a recent study, the appropriate selection of the Warp- tication methods, such as password-based authentica- ing Window Size (WWS) is crucial for the accuracy tion or the credit card number-based access to payment of 1NN-DTW. In this work, we consider person iden- services. Keystroke dynamics is appealing for many tification based on keystroke dynamics as a time se- reasons: it is non-invasive (users type on the computer ries classification task, and we examine whether WWS keyboard anyway), it does not require extra hardware is crucial in this case. We performed experiments on and is available after the authentication step at the start a real-world dataset containing more than 400 typ- of the computer session [11]. ing sessions obtained from 12 users. In the case of Various methods have been used for user authenti- this dataset, each user typed the same fixed text. We cation based on keystroke dynamics, such as Bayesian found that the aforementioned hypothesis was correct, classifiers, neural networks, SVM, Markov chains and i.e., the classification accuracy indeed depends on the dimensionality reduction [13, 19], see also [4] for a WWS. Furthermore, according to our observations, in recent review. our case, the optimal WWS is less than 3% which is substantially different from 10% that has been used as a default value in various works. 2 Background Keywords: keystroke dynamics, person identification, Measuring the similarity between a pair of time se- dynamic time warping, DTW, warping window size, ries is an important task which is usually applied in nearest neighbor. many data mining applications such as anomaly de- tection [7], clustering [1, 9], and classification tasks 1 Introduction [10, 12]. In a naive solution, the Euclidean distance could be used to calculate the similarity between time Biometric authentication systems are based on user series, however, the Dynamic Time Warping (DTW) traits and characteristics. There are two major forms distance allows for shifts and elongations, therefore it of biometrics: those based on physiological attributes is more appropriate in various applications [1, 3, 6]. and those based on behavioral characteristics. The We will first begin with a review of DTW and its re- physiological type includes biometrics based on stable cent extensions. body traits, such as the face, iris, fingerprint, and the hand, and are considered to be more robust and secure. 2.1 Measuring The Similarity of Time Series However, they are also considered to be more vulnera- ble to intrusions, expensive and require special equip- Given two time series T1 of length n, and T2 of length m ment [17]. On the other hand, behavioral biometrics include learned movements such as handwritten sig- T1 = ( T1 [1] , T1 [2], ... , T1 [i] , ... , T1 [n] ) natures, keyboard dynamics (typing pattern), mouse movements, gait, and speech. Collecting these bio- T2 = ( T2 [1] , T2 [2], ... , T2 [ j] , ... , T2 [m] ), their similarity can be computed in various ways. Copyright c 2019 for this paper by its authors. Use permitted The simplest way to define the similarity S is to under Creative Commons License Attribution 4.0 International (CC treat these sequences as vectors and to compute the BY 4.0). Euclidean distance between them directly. However, Figure 2: Example of the calculation of the DTW- matrix. a) The DTW-matrix. The time series T1 and T2 are shown o the left and top of the matrix. The marked entries of the matrix correspond to the mapping be- tween both time series. b) The calculation of the value of a cell. Figure 1: Comparison of two time series: Euclidean distance (top) compares the values at the same posi- tions, whereas DTW (bottom) allows for shifts and the sum of entries of D along that path: elongations. DTW (T1 , T2 ) = min ∑ Dwk , (1) W ∈P w ∈W k this requires both time series to be of the same length, i.e., m = n, but this is not always the case in real-world where P is the set of all warping paths. applications [15]. That’s why the similarity should This warping path can be found using dynamic pro- be computed between sequences of various lengths gramming to evaluate the following recurrence: (m 6= n). To this end, DTW is an appropriate choice  that allows for non-linear matching of two time series  σ [i − 1, j − 1] with different lengths [15], see Fig. 1. σ [i, j] = di, j + min σ [i − 1, j] (2) σ [i, j − 1]  2.2 Review of DTW Note that σ (i, j) is the cumulative distance along the DTW [18] is an edit distance. Given various edit steps optimal warping path. (elongation, replacement of value) and their costs, the In most applications, it may not be reasonable to al- DTW distance of two time series is the minimal cost of low for arbitrarily large shifts and elongations, there- transforming one of the time series into the other one. fore, the calculations are usually restricted to the en- Technically, DTW distance is calculated by filling an n tries around the diagonal of the matrix, i.e., by m matrix, called DTW cost matrix (denoted as σ ), where n and m refer to the length of time series T1 and σi, j is calculated ⇔ |i − j| ≤ w, T2 respectively, see Alg. 1 and Fig. 2. In order to understand DTW in more detail, we de- fine an n by m matrix D, in which each entry di, j cor- Algorithm 1: DTW Distance responds to the squared distance between two values Input: T1 : array [1..n], T2 : array [1..m] of the time series: di, j = (T1 [i] − T2 [ j])2 . To find the Output: distance best match between the two time series, we retrieve a σ = array[0..n][0..m] path through the matrix, called warping path, that min- for i = 1, ..., n do imizes the total cumulative distance. The warping path for j = 1, ..., m do is a sequence of neighboring entries1 of D so that it be- σ [i,j] = ∞ gins in the top left (first row, first column) corner of D, σ [0,0] = 0 and it ends in the bottom right corner (last row, last for i = 1, ..., n do column). In particular, the optimal warping path W ∗ is for j = 1, ..., m do the path that minimizes the cost. The DTW distance is cost = (T1 [i] − T2 [ j])2 σ [i,j] = cost + min(σ [i-1, j], σ [i , j-1], σ [i-1, j-1]) 1 With neighbors of d we mean: d i, j i−1, j−1 , di−1, j , di−1, j+1 , return σ [n,m] di, j−1 , di, j+1 , di+1, j−1 , di+1, j , di+1, j+1 . where w is the Warping Window Size (WWS) which controls the maximum amount of warping. In the cases, where some of the terms σi, j−1 , σi−1, j , σi−1, j−1 are undefined (i.e., they have not been calculated), they are excluded from Eq. (2). WWS is often set to 10% of the length of the time series. The effect of WWS on the classification perfor- mance has been studied, for example, in [14] and [8]. Dau et al. [8] claimed that obtaining the best perfor- mance from DTW requires setting WWS carefully, i.e., using the appropriate WWS can produce signif- icant improvements in classification accuracy. In the following section, we perform an experiment to exam- ine whether this hypothesis is true or not in case of typing dynamics data. 3 Experiments The goal of our experiments is to examine the effect of warping window size on the accuracy of person iden- tification. Figure 3: Example of user typing pattern raw data, see http://biointelligence.hu/ 3.1 Person Authentication Dataset typing-challenge/task1/index.php for the description of the raw data. We used the dataset associated with Task 1 of the Per- son Identification Challenge. Typing dynamics of 12 users were recorded in 458 sessions with a JavaScript typing pattern sequence. Before running the DTW dis- (JS) application. In each typing session, the users were tance between each pair of keystroke time series, the asked to type some sentences and the keyboard events data must be pre-proceeded, then we classify users’ keyup, keydown and keypress were captured by the JS test data. application. For each of the aforementioned events, the time (in milliseconds) was also recorded, see [5] and the webpage of the challenge2 for details. Pre-processing of Raw Typing Patterns In order to We are given the true identity of the users (coded by extract down-down and down-up durations, the raw integer numbers from 1 to 12) for 5 typing patterns per data must be preprocessed. In particular, we per- user. We use this data as training data. formed the following preprocessing steps: Additionally, hypothetical user identities for the rest of the typing patterns are given. This data is used as • select time and relevant event types, i.e., keydown test data. and keyup. Our task is to decide if the hypothetical identities match the true identities of the users who typed those • Sometimes the first event of the typing pattern is a patterns. keyup event (see Fig. 3) which is caused by click- The submission system at the webpage of the chal- ing the shift key to change the cursor between lenge is used to evaluate our predictions for the test registration form fields, so we removed it. data. • For simplicity, we removed sequential successive 3.2 Preprocessing and Feature Extraction keydown or successive keyup events, and we kept the latest keydown and the first keyup (see Fig. 4). In order to experimentally examine the role of the warping window size, we use 1-nearest neighbor with DTW with various warping window sizes w ∈ {0, 1, 2 . . . , lmax } where lmax is the length of the longest Extraction of Features Alg. 2 and Alg. 3 show the approach for extracting down-down (between 2 biointelligence.hu/typing-challenge/ keystroke) and down-up (keystroke duration) metrics. Algorithm 3: Extraction of the duration of keystrokes Input: typingP : List [] Output: ksDuration: List [] counter = 0 ksDuration = [] while not end of list do ksDuration[counter] = typingP[counter + 1] - typingP[counter] counter = counter + 2 Figure 4: Elimination of successive keydown and return ksDuration keyup events Algorithm 2: Extraction of the times between keystrokes Input: typingP : List [] Output: ksDuration: List [] counter = 0 typing = [] betweenKS = [] while not end of list do if typingP[counter]== ’keydown’ then typing.append(typingP[counter]) counter = counter + 1 Figure 5: The classification accuracy as function of counter = 0 warping window size using between-keystroke-times while not end of list do betweenKS[counter] = typing[counter + 1] - typing[counter] 4 Conclusion counter = counter + 1 return betweenKS Based on the result of Ratanamahatana et al.[16] and our own experiments on the data associated with the typing pattern-based person identification challenge, 3.3 Experimental Results we can conclude that working with a small warping window (approximately 3% in our case) could be ben- To test the effect of the warping window size on the eficial not only to reduce the execution time and mem- classification accuracy, we performed empirical exper- ory usage but also to increase the accuracy of the clas- iments on the datasets containing (i) keystroke dura- sification. We also noticed that working with a lager tions and (ii) between-keystroke-times. Additionally, warping window size could reduce the accuracy of the we experimented with a combination of the two afore- classification in some cases. That’s why it is better to mentioned options. use a low warping window size. We vary the warping window size from 0 (Eu- Our results confirm the hypothesis of Dau et al. [8] clidean) to lmax (no constraint/full calculation) and record the accuracy for each warping window size. The results are shown in Fig. 5, Fig. 6 and Fig. 7. 3.4 Discussion A study made by Ratanamahatana et al. [16] where a similar experiment has been performed with 6 datasets retrieved from “UEA & UCR Time Series Classifica- tion Repository” ended by the following conclusions: ”Wider warping constraints do not always improve the accuracy, as commonly believed [21].” More often, the accuracy peaks very early at much smaller window Figure 6: The classification accuracy as function of size, as shown in Table 1. the warping window size using keystroke duration Table 1: The warping window size that yields maximum classification accuracy for each dataset. Dataset Accuracy (%) WWS (%) Face 96.43 3 Gun 99.00 3 Trace 100.00 1 Leaf 96.38 10 Control chart 99.67 8 Two-pattern 100 3 Typing Pattern – keystroke duration 69.10 2.6 Typing Pattern – between-keystroke-times 65.70 0.8 Typing Pattern – combined 69.67 2.1 [3] Bar-Joseph, Z., Gerber, G., Gifford, D., Jaakkola, T. , Simon, I.: A new approach to analyzing gene expres- sion time series data. In Proceedings of the 6th Annual International Conference on Research in Computational Molecular Biology (2002) 39–48 [4] Bhatt, S., Santhanam, T.: Keystroke dynamics for bio- metric authentication–A survey. In Proceedings of the International Conference on Pattern Recognition, Infor- matics and Mobile Engineering. IEEE (2013) 17–23 [5] Buza, K.: Person Identification Based on Keystroke Dy- namics: Demo and Open Challenge. In CAiSE Forum Figure 7: The classification accuracy as func- (2016) 161–168 tion of the warping window size using the minimum [6] Caiani, E. G., Porta, A., Baselli, G., Turiel, M., Muz- of the two DTW-distances (DTW-distance based on zupappa, S., Pieruzzi, F., Cerutti, S.: Warped-average between-keystroke-times and DTW-distance based on template technique to track on a cycle-by-cycle basis keystroke duration) the cardiac filling phases on left ventricular volume. In Computers in Cardiology 1998. IEEE 25 (Cat. No. 98CH36292) (1998) 73–76 for the case of person identification based on typing [7] Dasgupta, D., Forrest, S.: Novelty detection in time se- patterns. According to the aforementioned hypothesis, ries data using ideas from immunology. In Proceedings ”obtaining the best performance from DTW requires of the international conference on intelligent systems setting its only parameter, the maximum amount of (1996) 82–87 warping (w) and thus can produce significant improve- [8] Dau, H. A., Silva, D. F., Petitjean, F., Forestier, G., Bag- ments in classification accuracy”. nall, A., Mueen, A., Keogh, E.: Optimizing dynamic time warping’s window width for time series data min- ing applications. Data Mining and Knowledge Discov- Acknowledgments ery 32(4) (2018) 1074–1120 [9] Debrégeas, A., Hébrail, G.: Interactive Interpretation of This work was supported by the project no. 20460- Kohonen Maps Applied to Curves. In KDD (1998) 179– 3/2018/ FEKUTSTRAT within the Institutional Excel- 183 lence Program in Higher Education of the Hungarian [10] Diez, J. J. R., González, C. J. A.: Applying boost- Ministry of Human Capacities. ing to similarity literals for time series classification. In International workshop on multiple classifier systems. Springer, Berlin, Heidelberg (2000) 210–219 References [11] Gunetti, D., Picardi, C.: Keystroke analysis of free text. ACM Transactions on Information and System Security [1] Aach, J., Church, G. M.: Aligning gene expression (TISSEC) 8(3) (2005) 312–347 time series with time warping algorithms. Bioinformat- [12] Kadous, M. W.: Learning Comprehensible Descrip- ics 17(6) (2001) 495–508 tions of Multivariate Time Series. In ICML 454 (1999) [2] Antal, M., Szabó, L. Zs., László, I.: Keystroke dynam- 463 ics on android platform. Procedia Technology 19 (2015) [13] Kalina, J., Schlenker, A.: Dimensionality reduction 820–826 methods for biomedical data. Lékař a technika – Clin- ician and Technology, 48(1) (2018) 29-35 [14] Kurbalija, V., Radovanović, M., Geler, Z., Ivanović, M.: The influence of global constraints on similarity measures for time-series databases. Knowledge-Based Systems 56 (2014) 49–67 [15] Rabiner, L. R., Juang, B. H., Rutledge, J. C.: Funda- mentals of speech recognition. Englewood Cliffs: PTR Prentice Hall 14 (1993) [16] Ratanamahatana, A., Keogh, E.: Everything you know about dynamic time warping is wrong. In 3rd Workshop on Mining Temporal and Sequential Data, in conjunc- tion with 10th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD-2004), Seattle, WA 1 (2004) 1–11 [17] Revett, K., Gorunescu, F., Gorunescu, M., Ene, M., Magalhães, P. S. T., Santos, H. D. D.: A machine learn- ing approach to keystroke dynamics based user authen- tication. International Journal of Electronic Security and Digital Forensics1(1) 55–70 [18] Sakoe, H., Chiba, S., Waibel, A., Lee, K. F.: Dynamic programming algorithm optimization for spoken word recognition. Readings in speech recognition 159 (1990) 224 [19] Stefan, D., Shu, X., Yao, D.D.: Robustness of keystroke-dynamics based biometrics against synthetic forgeries. Computers & security 31(1) (2012) 109–121 [20] Sultana, M., Paul, P. P., Gavrilova, M.: A concept of social behavioral biometrics: motivation, current devel- opments, and future trends. In 2014 International Con- ference on Cyberworlds. IEEE (2014) 271–278 [21] Zhu, Y., Shasha, D.: Warping indexes with envelope transforms for query by humming. In Proceedings of the 2003 ACM SIGMOD international conference on Man- agement of data. ACM (2003) 181–192