=Paper= {{Paper |id=Vol-2803/paper8 |storemode=property |title=Method of user authentication on the basis of recognition of computer handwriting peculiarities (short paper) |pdfUrl=https://ceur-ws.org/Vol-2803/paper8.pdf |volume=Vol-2803 |authors=Leonid S. Kryzhevich }} ==Method of user authentication on the basis of recognition of computer handwriting peculiarities (short paper)== https://ceur-ws.org/Vol-2803/paper8.pdf
Method of user authentication on the basis of recognition of
computer handwriting peculiarities
Leonid S. Kryzhevicha
a
     Kursk state university, 33 Radisheva str., Kursk, 305000, Russian Federation


                  Abstract
                  This article deals with the following hypothesis: each person has unique peculiarities of text
                  typing. The process of typing can be expressed in the form of various metrics and analyzed
                  with the help of statistical methods.

                  Keywords
                  normal distribution, de Moivre–Laplace integral theorem, Pearson's nonparametric test χ2


1. Introduction1                                                                            information science of Kursk State University
                                                                                            participated in the experiment [1]. Their aim
    Nowadays people keep almost all sorts of                                                was to type a text which included at least four
data in digital forms, databases or cloud                                                   sentences. At the same time, a special program
storage services, which can be accessed online.                                             measured the following characteristics for
It is possible to keep important documents,                                                 each symbol: the amount of time of a
treaties, banking data, passwords. If these                                                 keystroke from the moment when the program
forms of data are stolen, people can lose their                                             was run (in milliseconds); ASCII of a pressed
personal or business information, their bank                                                key; whether a key was pressed (1) or released
accounts can be wasted. Therefore, the number                                               (0).
of evil-doers, who want to steal various forms                                                  In Figure 1: data fileFigure 1 you can see
of information, is increasing.                                                              the file which includes statistical data for the
    There are different ways to protect                                                     further analysis.
information. However, they are constantly
getting out of date. To detect a transgressor, it
is necessary to find out if this person has
system access rights. This fact has led to ideas
to authenticate users with the help of digital
handwriting.
    Each person has unique peculiarities of text
typing. People type texts at a definite speed.
                                                                                            Figure 1: data file
The amount of time of keystrokes can vary as
                                                                                                The purpose of the experiment is to
well. We decided to measure these
                                                                                            determine individual features of one typing
characteristics and analyze them.
                                                                                            session in order to find out in what way it
                                                                                            differs from some other test patterns of other
2. Conditions of the experiment                                                             users.

   An experiment was carried out to get test                                                3. Data analysis
results. About one hundred students of the
faculty of mathematics, physics and
                                                                                                Let us examine the analysis of statistics of
                                                                                            the first feature noted – the amount of time of
Models and Methods for Researching Information Systems
in Transport, Dec. 11-12, 2020, St. Peterburg, Russia                                       a keystroke. If we take all the consecutive
EMAIL: Leonid@programist.ru (L.S. Kryzhevich);                                              measurements in pairs for the same symbol
ORCID: 0000-0002-6736-498X (L.S. Kryzhevich);
            ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative   (when it was pressed and when it was
            Commons License Attribution 4.0 International (CC BY 4.0).

            CEUR Workshop Proceedings (CEUR-WS.org)
                                                                                            released) from the test pattern and subtract the
                                                                                            press time from the release time, we can see

                                                                                                                                       52
the duration of press for each of the symbols.           x1 – abscissa axis or time;
Let us depict test durations for all the symbols         f – frequency,
in a two-dimensional chart. The horizontal               (x1 *f) which should be used to calculate
axis of the graph denominates time of a              the weighted arithmetic mean;
keystroke in milliseconds and the vertical axis          S – cumulative frequency, which is
denominates frequency of a keystroke (it is the      calculated by adding each previous frequency
ratio of the number of keystrokes of the             to the following one; (|xi - xср |*fi ) value,
definite duration to the total number of             which is the difference between the current xi
keystrokes). If the data are sorted according to     and the weighted arithmetic mean multiplied
the press time, the chart can be depicted in the     by the current frequency;
following way (Figure 2).                                ((xi − xср )2 *fi ) value, which is the
                                                     difference between the current xi and the
                                                     weighted arithmetic mean which is raised to
                                                     the second power and multiplied by the current
                                                     frequency;
                                                         (fi /f) – the ratio of the relative frequency to
                                                     the total sum.
                                                         We should calculate the weighted
                                                     arithmetic mean:
Figure 2: the time/frequency bar chart for the                           ∑ xi ∗fi       47656
first typing session of a test person                             x̅ =    ∑ fi
                                                                                    =           = 99,49
                                                                                        479
                                                        These values are necessary for further
3.1. Checking               for        normal        calculations. Let us create a Table 1 that
                                                     includes them.
distribution                                            The dispersion shows the measure of
                                                     scatter of all the values in the series around the
   Let us make a suggestion that this                average value.
distribution is normal. To check it, we should          Let us calculate the mean square deviation:
analyze the received data with the help of
                                                     σ = √D = √626,079 = 25,022
Pearson's nonparametric test χ2 .
                                                        Let us check the suggestion that Х is
   Let us divide our series into fourteen            normally distributed with the help of Pearson's
disjoint intervals. For each of the intervals we                                    (ni−ni∗)2
should count the number of test values which         chi-squared test K=∑            , where n*i –
                                                                               ni∗
are included in it. It is obligatory to include at   theoretical frequencies, which are calculated
least five results of each key pressed into each                                 n∗h
                                                     according to the formula ni= σ ∗ φi .
of the intervals [2]. If we follow this rule, we
can average out the values of these intervals
according to the arithmetic mean and we can              Let us choose the mode for the following
create a new chart (Figure 3).                       distribution. The mode is the most frequent
                                                     value among the examined indices. In our
                                                     case, we can choose the mode as xi = 96 (the
                                                     value of frequency is 59).
                                                         The median is also xi = 96 because it is the
                                                     first index where the value of the cumulative
                                                     frequency is higher 479/2≈240.
                                                         In symmetrical distribution series the
                                                     values of the mode and the median are similar
                                                     to the average value (xср =Me=Mo), and in
Figure 3: the averaged time/frequency bar            moderately asymmetrical series they can be
chart for the first typing session of a test         calculated in the following way:
person                                                              3*(xav -Me) ≈ xav -Mo.

   In order to find out if the distribution is
normal, we should use Pearson's test χ2 [3].
   We should use the following indices:

                                                                                                          53
Table 1
The calculation table for empirical frequencies
of the first typing session
  xi      The        Relative        xi * pi    Cumul     xi     |x - xav|*pi   (x - xav)2 *pi    Cumulative
          num       frequency,                   ative                                           frequency, S
          ber,         pi=fi/f                 frequen
           fi                                    cy, S



  48       10         0.0209          480      0.0209
                                                          48      514.906       26512.824            10

  56       14         0.0292          784      0,0501
                                                          56      608.868       26480.059            24

  64       30         0.0626         1920      0,1127
                                                          64      1064.718      37787.492            54

  72       35         0.0731         2520      0,1858
                                                          72      962.171       26450.669            89

  80       51         0.106          4080      0,2918
                                                          80      994.021       19374.069            140

  88       52         0.109          4576      0,4008
                                                          88      597.511        6865.769            192

  96       59         0.123          5664      0,5238
                                                          96      205.946         718.875            251

 104       50         0.104          5200      0,6278
                                                         104       225.47        1016.732            301

 112       54         0.113          6048      0,7408
                                                         112      675.507        8450.187            355

 120       37         0.0772         4440      0,818
                                                         120      758.848       15563.505            392

 128       30         0.0626         3840      0,8806
                                                         128      855.282       24383.567            422

 136       27         0.0564         3672      0,937
                                                         136      985.754       35989.269            449

 144       16         0.0334         2304      0,9704
                                                         144       712.15       31697.379            465

 152       14         0.0292         2128      0,9996
                                                         152      735.132       38601.311            479

Total      479          1            47656
                                                         Total    9896.284      299891.708


    The range of deviation, which is the                    The following indices are used in the
difference between the minimum and                       formula: n = 479, h=8 (the interval width),
maximum values of х, is R = 152 - 48 = 104.              σ = 25.022, xср = 99.49, φi – the appropriate
    Wе can calculate the mean deviation:                 value from Laplace’s table.
                        ̅ ∗fi 9896,284
                 ∑ |xi −x|
           d=        ∑ fi
                             = 479 =20,66.                  We can calculate the theoretical
   Let    us     calculate     the    dispersion    D    frequencies in Table 2.
       ̅ )2 ∗fi 299891,708
 ∑(|x −x|                                                   Now we should compare the empirical and
= i∑           =           =626,079.                     theoretical frequencies.
       fi          479



                                                                                                      54
Table 2                                               Table 3
The calculation table for theoretical                 The calculation table for comparison of
frequencies of the first typing session               theoretical and empirical frequencies of the
                                                      first typing session
 i             xi        ui        φi       n*i        i      xi        ui         φi          n*i

 1             48      -2.0578   0,0478    7.32        1     48      -2.0578     0,0478       7.32

 2             56      -1.7381   0,0878   13.446       2     56      -1.7381     0,0878      13.446

 3             64      -1.4184   0,1456   22.298       3     64      -1.4184     0,1456      22.298

 4             72      -1.0987   0,2179   33.371       4     72      -1.0987     0,2179      33.371

 5             80      -0.779    0,2943   45.071       5     80      -0.779      0,2943      45.071

 6             88      -0.4592   0,3589   54.965       6     88      -0.4592     0,3589      54.965

 7             96      -0.1395   0,3951   60.509       7     96      -0.1395     0,3951      60.509

 8             104     0.1802    0,3918   60.003       8     104     0.1802      0,3918      60.003

 9             112     0.4999    0,3521   53.923       9     112     0.4999      0,3521      53.923

10             120     0.8197    0,285    43.647      10     120     0.8197       0,285      43.647

11             128     1.1394    0,2083   31.901      11     128     1.1394      0,2083      31.901

12             136     1.4591    0,1374   21.043      12     136     1.4591      0,1374      21.043

13             144     1.7788    0,0818   12.527      13     144     1.7788      0,0818      12.527

14             152     2.0986    0,044     6.739      14     152     2.0986       0,044      6.739


   We can create one more Table 3, with the              The higher Kemp value differs from Kcr, the
help of which we are going to find the                more convincing arguments against our main
observed value of Pearson’s test      χ2 =            hypothesis can be provided [3].
∑
     (ni −n∗i )2
                   .                                     Its bound Kcr = χ2(k-r-1;α) can be
         n∗i                                          calculated according to the distribution tables
     We should include the following indices in       χ2 and the set values xav and σ(determined
the Table 3: i- the sequence number, ni – the         according to the series), k = 14, r=2,the
observed frequencies, n∗i – theoretical               significance level α is determined as 0,05.
frequencies, (ni - n∗i ) – the difference between        Kcr(0.05;11) = 19.67514; Kemp = 17.99.
the observed and theoretical frequencies,(ni −           The observed value of Pearson’s statistics
 n∗i ) 2 / n∗i – the difference, which is raised to   does not touch the critical region: (Kemp the mode is 96.
 Figure 4: the averaged time/frequency bar            Half of the sum of the cumulative
 chart for the second typing session of a test     frequency is 216. It is xi = 96. Thus, the
 person                                            median is 96.
                                                      The range of deviation is 152 - 56 = 96.
    Let us create a Table 4 for the second            The mean deviation is
 distribution according to the described above.

                                                                                              56
                            ̅ ∗fi 7168,921
                     ∑ |xi−x|
                d=       ∑ fi
                                 = 430 =16,67.
                                                          Table 6
Table 5                                                   The calculation table for comparison of
The calculation table for theoretical                     theoretical and empirical frequencies of the
frequencies of the second typing session                  second typing session
                                                          i        ni             n∗i      ni -ni∗   (ni -n∗i )2    (ni -n∗i )2/n∗i
 i        xi            ui                φi      n∗i
                                                          1         5            7.498     2.498      6.2398            0.832
 1        56          -2.0983           0,044    7.498
                                                          2        15           15.7627   0.7627      0.5818            0.0369
 2        64          -1.702            0,0925   15.763
                                                          3        35           28.816    -6.184      38.242            1.327
 3        72          -1.3057           0,1691   28.816
                                                          4        38           44.9366   6.9366     48.1161            1.071
 4        80          -0.9094           0,2637   44.937
                                                          5        45           56.0983   11.0983    123.1722           2.196
 5        86          -0.6122           0,3292   56.098
                                                          6        53           59.3872   6.3872      40.796            0.687
 6        88          -0.5131           0,3485   59.387
                                                          7        56           67.4986   11.4986    132.2176           1.959
 7        96          -0.1168           0,3961   67.499
                                                          8        51           65.181    14.181     201.102            3.085
 8        104         0.2795            0,3825   65.181
                                                          9        41           53.9512   12.9512    167.7325           3.109
 9        112         0.6758            0,3166   53.951
                                                          10       35           37.9499   2.9499      8.7016            0.229
10        120         1.0721            0,2227   37.95
                                                          11       30           23.0732   -6.9268     47.98             2.079
11        128         1.4684            0,1354   23.073
                                                          12       15           11.8263   -3.1737    10.0723            0.852
12        136         1.8647            0,0694   11.826
                                                          13        8           5.1634    -2.8366     8.0465            1.558
13        144          2.261            0,0303   5.163
                                                          14        3           1.9767    -1.0233     1.0471             0.53
14        152         2.6573            0,0116   1.977
                                                          ∑       430            430                                    19.551
   Each value of the range differs from
another index by 16.67                                       Let us calculate the theoretical frequencies
                                                          (Table 5), paying attention to the appropriate
     Let us calculate the dispersion:                     values from Laplace’s table.
               ̅ )2∗fi 175228,847
        ∑(|xi−x|                                             Let us compare the empirical and
     D=      ∑ fi
                      = 430       = 407,509
                                                          theoretical frequencies. We can create a
   The mean square deviation is σ = √D =                  calculation Table 6 for the second typing
√407,509 = 20,187                                         session where the above mentioned values
   We can check the suggestion that Х is                  should be included. The table helps us to
normally distributed with the help of Pearson's           determine the observed value of the test: χ2 =
chi-squared test [3]. We should calculate the                 (ni −n∗i )2
                                                          ∑                 .
theoretical frequencies, paying attention to the                  n∗i
fact that: n = 430, h=8 (the interval width), σ =            According to the described above principle,
20.187, xср= 98.36.                                       we can see that: K cr(0.05;11) = 19.67514;
       n∗h             430∗8                              Kemp = 19.55. Thus, (Kemp < Kcr)=> the
   ni= ∗ φi =>ni=            ∗ φi =170,41 φi .
          σ                    20,187                     distribution is normal.



                                                                                                                   57
3.2.    Comparison of series                         0,104384+ 0,085595+ 0,073069+ 0,06263+
                                                     0,031315+ 0,016701+ 0,006263=2,0459.
   Two sets of samples for one person are                For the second typing session we can see
portrayed in the next Figure 5.                      ∑ni=1 h ∗ lmin =0,020876827       +0,03131524
                                                     +0,073068894 +0,079331942 +0,106471816 +
                                                     0,110647182+ 0,123173278+ 0,106471816+
                                                     0,112734864+ 0,077244259 +0,06263048 +
                                                     0,056367432 + 0,033402923+0,029227557
                                                     =1,749.
                                                         The      hit    rate   is    K1=     1,749
                                                     /2,0459=0,85510≈86% is the level of
                                                     coincidence between the two results of the
                                                     same user.
                                                         We can check the hit rate between the
                                                     values of normal distributions, which are
   Figure 5: joint graphs for the sets of            corresponding to the sets noted [5]. We should
samples of the first and the second typing           use de Moivre–Laplace integral formula for
sessions                                             normal distribution.
                                                                                  2
                                                                    1   +∞ −(t−m)
    To show everything better, we can depict            Φ(x) =         ∫    e  22σ    dt,
                                                                   σ√2π 0
the graphs in the form of bar charts (Figure 6).
                                                     where
The red bars denote the averaged chart of the
                                                        σ – standard deviation;
first typing session, the blue bars are related to
                                                        t – the amount of time of a keystroke in
the second typing session.
                                                     milliseconds;
                                                        m – expected value.
                                                        According to that function, we can create
                                                     the graphs of the two cases of the normal
                                                     distribution, which are shown in Figure 7.




   Figure 6: the graphs of the sample sets

    To determine how much the typing style of
one test person differs from his own, we
should examine the crossing area of the                 Figure 7: graph of normal distributions,
graphs[4]. The first set of samples crosses the      corresponding to both samples
second set completely. Therefore, we should
consider the second set to be the crossing area,        where
whereas the first set of samples is the joining         S1 – the area, which is limited to the first
area.                                                graph,
    We should use the following formula:                S2 – the area, which is limited to the
    ∑ni=1 h ∗ lmax – ∑ni=1 h ∗ lmin , where:         second graph.
    h – width of the bars;                                  The hit rate of the theoretical graphs is
    lmax = max(li1 ,li2 ) – the maximum value            S11∩ S2 53,082
                                                     K2= S ∪S =59,075=0,899582≈90% - is the
out of the bar heights, which are grouped in               1   2

pairs, from the two graphs;                          level of the coincidence.
    lmin = min(li1 ,li2 ) – the minimum value,                Even taking into consideration the
respectively.                                        high error level, we have 86% of coincidence
    According to the described formula, for the      for the empirical and 90% of coincidence the
first typing session we can see ∑ni=1 h ∗            theoretical values. Therefore, we can conclude
lmax =0,010438+          0,029228+       0,06263+    that each person has individual peculiarities
0,073069+ 0,093946+ 0,108559+ 0,11691+               connected with the duration of pressing keys
                                                     he or she follows while typing texts.

                                                                                               58
4. Scaling by multiple series                       5. Summary
   In Figure 8 we can see a range of the               This method of identification during the
expected value for the amount of time of            process of the user’s authorization can be used
folding different keys pressed [4] in the           in samplings of various volumes. K value of
sessions of the same user during different days     each user can differ a bit in different typing
(the days are marked in different colours).         sessions. The fact that K value can be close or
                                                    not so close to 1 depends on the level of
                                                    development of the user’s keyboard
                                                    handwriting. If a user has weak typing skills,
                                                    the critical value K for his authorization can
                                                    be determined according to the results of the
                                                    comparative analysis of his several typing
                                                    sessions[8]. The further analysis of typing
                                                    sessions of such users can be made more
                                                    accurate if we do not take into consideration
                                                    those keys, the amounts of time of holding
Figure 8: the graph for the cases of normal         which pressed have a high level of standard
distribution                                        deviation (for example, far higher than the
                                                    standard deviation of the whole typing
    In the bottom right corner on the axis of the   session).
ordinates, we can see the average amount of
time of holding the keys pressed.                   References
     In Figure 9, the similar characteristics are
illustrated to show the typing sessions of
different users.                                    [1] Aragón-Mendizábal E., Delgado-Casas
                                                        C., Romero-Oliva M. F., A comparative
    The comparative analysis of the received
results gives an opportunity to conclude that           study of handwriting and computer typing
the amount of time of holding different keys            in note-taking by university students,
pressed is a very informative value that shows          Comunicar (2016). doi: 10.3916/C48-2016-
a user’s typing technique[6]. Despite partly            10
random scatter of averaged amounts of time of       [2] Summary and classification of statistics,
                                                        2018,      URL:      http://www.grandars.ru/
holding keys pressed, the statistical analysis of
the differences lets identify various versions of       student/statistika/gruppirovka-
keyboard typing of the same user and                    statisticheskih-dannyh.html
distinguish typing variants of different            [3] Shulenin, V.P., Mathematical statistics,
users[7].                                               NTL Publishing House, Tomsk, 2012.
                                                    [4] Kryzhevich L.S., Rakov A.S. Kostenko
                                                        I.V., Arkhipova V.V. Lukin D.E., Testing
                                                        statistical hypotheses about the time
                                                        parameters of keytyping, "Problems of
                                                        cybersecurity, modeling and information
                                                        processing in modern sociotechnical
                                                        systems", KSU, Kursk, 2017.
                                                    [5] Gmurman V.E., Probability theory and
                                                        mathematical statistics, 9th edition,
                                                        Vysshaya shkola, Мoscow, 2003.
                                                    [6] Fedorowich L. M., Côté J. N., Effects of
Figure 9: the graph for the distribution of             standing on typing task performance and
typing sessions of different users                      upper limb discomfort, vascular and
   The results of the experiment show that, in          muscular indicators, Applied Ergonomics
most cases, periods of time of holding keys             (2018). doi: 10.1016/j.apergo.2018.05. 009.
pressed are random sets of samples, which are       [7] Kryzhevich L. S., Matyushina S. N.,
normally distributed.                                   Kostenko I. V., Providing access to
                                                        electronic equipment based on computer

                                                                                              59
    handwriting recognition, "Current research
    in the field of exact sciences and their study
    in secondary and higher educational
    institutions", KSU, Kursk, 2015.
[8] Yoo W. G., Effects of different computer
    typing speeds on acceleration and peak
    contact pressure of the fingertips during
    computer typing, Journal of Physical
    Therapy        Science       (2015).       doi:
    10.1589/jpts.27.57




                                                      60