<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Method of user authentication on the basis of recognition of computer handwriting peculiarities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leonid S. Kryzhevich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kursk state university</institution>
          ,
          <addr-line>33 Radisheva str., Kursk, 305000, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <fpage>52</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>This article deals with the following hypothesis: each person has unique peculiarities of text typing. The process of typing can be expressed in the form of various metrics and analyzed with the help of statistical methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;normal distribution</kwd>
        <kwd>de Moivre-Laplace integral theorem</kwd>
        <kwd>Pearson's nonparametric test χ2</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction1</p>
      <p>Nowadays people keep almost all sorts of
data in digital forms, databases or cloud
storage services, which can be accessed online.
It is possible to keep important documents,
treaties, banking data, passwords. If these
forms of data are stolen, people can lose their
personal or business information, their bank
accounts can be wasted. Therefore, the number
of evil-doers, who want to steal various forms
of information, is increasing.</p>
      <p>There are different ways to protect
information. However, they are constantly
getting out of date. To detect a transgressor, it
is necessary to find out if this person has
system access rights. This fact has led to ideas
to authenticate users with the help of digital
handwriting.</p>
      <p>Each person has unique peculiarities of text
typing. People type texts at a definite speed.
The amount of time of keystrokes can vary as
well. We decided to measure these
characteristics and analyze them.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Conditions of the experiment</title>
      <p>
        An experiment was carried out to get test
results. About one hundred students of the
faculty of mathematics, physics and
information science of Kursk State University
participated in the experiment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Their aim
was to type a text which included at least four
sentences. At the same time, a special program
measured the following characteristics for
each symbol: the amount of time of a
keystroke from the moment when the program
was run (in milliseconds); ASCII of a pressed
key; whether a key was pressed (1) or released
(0).
      </p>
      <p>In Figure 1: data fileFigure 1 you can see
the file which includes statistical data for the
further analysis.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Data analysis</title>
      <p>Let us examine the analysis of statistics of
the first feature noted – the amount of time of
a keystroke. If we take all the consecutive
measurements in pairs for the same symbol
(when it was pressed and when it was
released) from the test pattern and subtract the
press time from the release time, we can see
the duration of press for each of the symbols.
Let us depict test durations for all the symbols
in a two-dimensional chart. The horizontal
axis of the graph denominates time of a
keystroke in milliseconds and the vertical axis
denominates frequency of a keystroke (it is the
ratio of the number of keystrokes of the
definite duration to the total number of
keystrokes). If the data are sorted according to
the press time, the chart can be depicted in the
following way (Figure 2).</p>
      <p>Let us make a suggestion that this
distribution is normal. To check it, we should
analyze the received data with the help of
Pearson's nonparametric test χ2.</p>
      <p>
        Let us divide our series into fourteen
disjoint intervals. For each of the intervals we
should count the number of test values which
are included in it. It is obligatory to include at
least five results of each key pressed into each
of the intervals [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. If we follow this rule, we
can average out the values of these intervals
according to the arithmetic mean and we can
create a new chart (Figure 3).
      </p>
      <p>Let us choose the mode for the following
distribution. The mode is the most frequent
value among the examined indices. In our
case, we can choose the mode as xi = 96 (the
value of frequency is 59).</p>
      <p>The median is also x = 96 because it is the
i
first index where the value of the cumulative
frequency is higher 479/2≈240.</p>
      <p>In symmetrical distribution series the
values of the mode and the median are similar
to the average value (xср=Me=Mo), and in
moderately asymmetrical series they can be
calculated in the following way:
3*(xav-Me) ≈ xav-Mo.</p>
      <p>The range of deviation, which is the
difference between the minimum and
maximum values of х, is R = 152 - 48 = 104.</p>
      <p>Wе can calculate the mean deviation:</p>
      <p>d =∑ |xi−x̅|∗fi=9896,284=20,66.
=∑(L|xei−tx̅|)2u∗sfi=2c9a9l8∑c9uf1il,7a0te8=62t4h67e,9079d.ispersion D
∑ fi 479</p>
      <p>The following indices are used in the
formula: n = 479, h=8 (the interval width),
σ = 25.022, xср = 99.49, φi – the appropriate
fvraelquWueeefnrcoimecsaLinnapTlaaccbeale’lsc2ut.alabtlee. the theoretical</p>
      <p>Now we should compare the empirical and
theoretical frequencies.</p>
      <p>xi
|x - xav|*pi
(x - xav)2 *pi</p>
      <p>We can create one more Table 3, with the
help of which we are going to find the
observed value of Pearson’s test χ2 =
∑ (ni−ni∗)2.</p>
      <p>ni∗</p>
      <p>We should include the following indices in
the Table 3: i- the sequence number, ni – the
observed frequencies, ni∗ – theoretical
frequencies, (ni - ni∗) – the difference between
the observed and theoretical frequencies,(ni −
n∗) 2/ n – the difference, which is raised to
∗
i i
the second power and divided by the current
value of the theoretical frequency.</p>
      <p>Later we should calculate the following
indices: Kemp – the observed value of the
bound of the critical region and Kcr - the
theoretical value of the bound of the critical
region.
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
xi
48
56
64
72
80
88
96
104
112
120
128
136
144
152</p>
      <p>ui
-2.0578
-1.7381
-1.4184
-1.0987
-0.779
-0.4592
-0.1395</p>
      <p>
        The higher Kemp value differs from Kcr, the
more convincing arguments against our main
hypothesis can be provided [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Its bound Kcr = χ2(k-r-1;α) can be
calculated according to the distribution tables
χ2 and the set values xav and σ(determined
according to the series), k = 14, r=2,the
significance level α is determined as 0,05.</p>
      <p>Kcr(0.05;11) = 19.67514; Kemp = 17.99.</p>
      <p>The observed value of Pearson’s statistics
does not touch the critical region: (Kemp&lt;Kcr.)
It can be fair to say that the data from the
series follow the rules of normal distribution.</p>
      <p>Paying attention to the same ideas, we can
check the second set of data series (Figure 4)
of the same person but for different text
extracts with the help of Pearson’s test.</p>
      <p>ni=nσ∗h ∗ φi =&gt;ni=2403,01∗887 ∗ φi =170,41 φi.
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
∑
ni
5
15
35
38
45
53
56
51
41
35
30
15
8
3
430</p>
      <p>ni∗</p>
      <p>Let us calculate the theoretical frequencies
(Table 5), paying attention to the appropriate
values from Laplace’s table.</p>
      <p>Let us compare the empirical and
theoretical frequencies. We can create a
calculation Table 6 for the second typing
session where the above mentioned values
should be included. The table helps us to
determine the observed value of the test: χ2 =
∑ (ni−ni∗)2.</p>
      <p>ni∗</p>
      <p>According to the described above principle,
we can see that: Kcr(0.05;11) = 19.67514;
Kemp = 19.55. Thus, (Kemp &lt; Kcr)=&gt; the
distribution is normal.
3.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Comparison of series</title>
      <p>Two sets of samples for one person are
portrayed in the next Figure 5.</p>
      <p>
        To show everything better, we can depict
the graphs in the form of bar charts (Figure 6).
The red bars denote the averaged chart of the
first typing session, the blue bars are related to
the second typing session.
To determine how much the typing style of
one test person differs from his own, we
should examine the crossing area of the
graphs[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The first set of samples crosses the
second set completely. Therefore, we should
consider the second set to be the crossing area,
whereas the first set of samples is the joining
area.
      </p>
      <p>We should use the following formula:
∑in=1 h ∗ lmax – ∑in=1 h ∗ lmin , where:
h – width of the bars;
lmax= max(li1,li2) – the maximum value
out of the bar heights, which are grouped in
pairs, from the two graphs;</p>
      <p>lmin= min(li1,li2) – the minimum value,
respectively.</p>
      <p>According to the described formula, for the
first typing session we can see ∑in=1 h ∗
lmax=0,010438+ 0,029228+ 0,06263+
0,073069+ 0,093946+ 0,108559+ 0,11691+
0,104384+ 0,085595+ 0,073069+ 0,06263+
0,031315+ 0,016701+ 0,006263=2,0459.</p>
      <p>For the second typing session we can see
∑in=1 h ∗ lmin=0,020876827 +0,03131524
+0,073068894 +0,079331942 +0,106471816 +
0,110647182+ 0,123173278+ 0,106471816+
0,112734864+ 0,077244259 +0,06263048 +
0,056367432 + 0,033402923+0,029227557
=1,749.</p>
      <p>The hit rate is K1= 1,749
/2,0459=0,85510≈86% is the level of
coincidence between the two results of the
same user.</p>
      <p>
        We can check the hit rate between the
values of normal distributions, which are
corresponding to the sets noted [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We should
use de Moivre–Laplace integral formula for
normal distribution.
      </p>
      <p>+∞ −(t−m)2
Φ(x) = e 2σ2 dt,</p>
      <p>1 ∫
σ√2π 0
where
σ – standard deviation;
t – the amount of time of a keystroke in
milliseconds;
m – expected value.</p>
      <p>According to that function, we can create
the graphs of the two cases of the normal
distribution, which are shown in Figure 7.
K2=S11∩ S2=53,082=0,899582≈90% - is the</p>
      <p>S1∪S2 59,075
level of the coincidence.</p>
      <p>Even taking into consideration the
high error level, we have 86% of coincidence
for the empirical and 90% of coincidence the
theoretical values. Therefore, we can conclude
that each person has individual peculiarities
connected with the duration of pressing keys
he or she follows while typing texts.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Scaling by multiple series</title>
    </sec>
    <sec id="sec-6">
      <title>5. Summary</title>
      <p>
        In Figure 8 we can see a range of the
expected value for the amount of time of
folding different keys pressed [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in the
sessions of the same user during different days
(the days are marked in different colours).
      </p>
      <p>In the bottom right corner on the axis of the
ordinates, we can see the average amount of
time of holding the keys pressed.</p>
      <p>In Figure 9, the similar characteristics are
illustrated to show the typing sessions of
different users.</p>
      <p>
        The comparative analysis of the received
results gives an opportunity to conclude that
the amount of time of holding different keys
pressed is a very informative value that shows
a user’s typing technique[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Despite partly
random scatter of averaged amounts of time of
holding keys pressed, the statistical analysis of
the differences lets identify various versions of
keyboard typing of the same user and
distinguish typing variants of different
users[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>This method of identification during the
process of the user’s authorization can be used
in samplings of various volumes. K value of
each user can differ a bit in different typing
sessions. The fact that K value can be close or
not so close to 1 depends on the level of
development of the user’s keyboard
handwriting. If a user has weak typing skills,
the critical value K for his authorization can
be determined according to the results of the
comparative analysis of his several typing
sessions[8]. The further analysis of typing
sessions of such users can be made more
accurate if we do not take into consideration
those keys, the amounts of time of holding
which pressed have a high level of standard
deviation (for example, far higher than the
standard deviation of the whole typing
session).
handwriting recognition, "Current research
in the field of exact sciences and their study
in secondary and higher educational
institutions", KSU, Kursk, 2015.
[8] Yoo W. G., Effects of different computer
typing speeds on acceleration and peak
contact pressure of the fingertips during
computer typing, Journal of Physical
Therapy Science (2015). doi:
10.1589/jpts.27.57</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>AragÃ</given-names>
            <surname>³</surname>
          </string-name>
          n-MendizÃ¡bal E.,
          <string-name>
            <surname>Delgado-Casas</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romero-Oliva M. F.</surname>
          </string-name>
          ,
          <article-title>A comparative study of handwriting and computer typing in note-taking by university students</article-title>
          ,
          <source>Comunicar</source>
          (
          <year>2016</year>
          ). doi:
          <volume>10</volume>
          .3916/C48-2016- 10
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] Summary and</article-title>
          classification of statistics,
          <year>2018</year>
          , URL: http://www.grandars.ru/ student/statistika/gruppirovkastatisticheskih-dannyh.html
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Shulenin</surname>
            ,
            <given-names>V.P.</given-names>
          </string-name>
          , Mathematical statistics, NTL Publishing House, Tomsk,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Kryzhevich</surname>
            <given-names>L.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rakov</surname>
            <given-names>A.S. Kostenko I.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arkhipova</surname>
            <given-names>V.V. Lukin D.E.</given-names>
          </string-name>
          ,
          <article-title>Testing statistical hypotheses about the time parameters of keytyping, "Problems of cybersecurity, modeling and information processing in modern sociotechnical systems"</article-title>
          , KSU, Kursk,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Gmurman</surname>
            <given-names>V.E.</given-names>
          </string-name>
          ,
          <source>Probability theory and mathematical statistics, 9th edition</source>
          , Vysshaya shkola, Мoscow,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Fedorowich</surname>
            <given-names>L. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Côté</surname>
            <given-names>J. N.,</given-names>
          </string-name>
          <article-title>Effects of standing on typing task performance and upper limb discomfort, vascular and muscular indicators</article-title>
          , Applied Ergonomics (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1016/j.apergo.
          <year>2018</year>
          .
          <volume>05</volume>
          . 009.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Kryzhevich</surname>
            <given-names>L. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matyushina</surname>
            <given-names>S. N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kostenko</surname>
            <given-names>I. V.</given-names>
          </string-name>
          ,
          <article-title>Providing access to electronic equipment based on computer</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>