284


      Assessing Password Protection Effectiveness Using
                     Markov Processes*

         Oleg V. Boychenko1[0000-0003-3326-1015], Ilya V. Gavrikov1[0000-0002-7047-9059]
               1 V. I. Vernadsky Crimean Federal University, Simferopol, Russia

                                     bolek61@mail.ru


        Abstract. Passwords are an integral part of modern digital life, but their effec-
        tiveness has been repeatedly challenged. This paper explores the role of password
        authentication in modern information systems, the changes it has experienced
        over time, as well as new alternative approaches to authentication and their inter-
        actions with classical password-based systems. Technical and organizational rec-
        ommendations are formulated based on security research and modern trends in
        information technology, and an application of Markov processes to assessing
        password quality is presented, as a quantitative measure of password system se-
        curity and effectiveness.


        Keywords: Security, Authentication, Passwords, Markov Processes.


1       Introduction

Passwords have been a mainstay of information system security ever since the early
days of computer use, as a natural security measure that is both easy to understand and
to implement. Any system that requires shared use conceivably requires a mechanism
to delineate responsibilities, limit access to resources and compartmentalize data be-
tween users. The first-ever computer system employing password-based authentication
is considered to be the MIT Compatible Time-Sharing System (CTSS), which began
service in 1963 [1].
    However, advances in computing power and the increasing complexity of modern
information systems have caused users, developers, and IT vendors to re-evaluate pass-
word authentication and password-based security as a whole. As passwords are simple
to implement a measure that provides a sufficient level of security, they became a stand-
ard way of authenticating users in computer systems with a high level of usability for
both the users and the implementing party. While past computer systems often only
required authentication locally, within the scope of a single machine, the advent of the
Internet has necessitated the use of passwords to authenticate thousands, even millions
of different users.


*   Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License
    Attribution 4.0 International (CC BY 4.0).
                                                                                       285


   However, with the wide availability of networked services come risks and dangers:
unlike before, modern attackers need not always have physical access to a user’s ma-
chine or even infect it with computer viruses or other malicious programs. The recent
shift to cloud-based services and data storage means attackers can gain access to sensi-
tive data as long as they can gain access to a user’s account with a cloud service. Be-
cause passwords remain the first (and sometimes only) line of defense in modern au-
thentication mechanisms, attackers can exploit weaknesses in IT systems and the hu-
man factor to gain illicit access to passwords and therefore sensitive data.


2      Technical Approaches to Password Authentication

As with many IT systems, a password authentication system must strike a balance be-
tween usability and security. A system that is exceedingly simple to use is likely not to
be very secure; conversely, an extremely secure system is likely not to be very user and
implementer-friendly.
   For example, the first password systems, like the CTSS, stored passwords in
plaintext; this was also the vector of the first “attack”, which could be described in
modern terms as a social engineering attack taking advantage of weak data security [1].
   With the development of cryptography, it became possible to encrypt passwords and
store them as ciphertext; that is, text that is unreadable to humans, but that could be
deciphered or otherwise used by the computer. In 1974, Evans, Kantrowitz, and Weiss
proposed a password authentication security measure in [2] that remains the de-facto
standard approach: instead of storing the passwords plainly, they should be transformed
using a one-way function, and the result of that transformation should be used for com-
parison purposes. Today, hash functions are used as one-way functions to this end –
hash functions produce a fixed-length cryptographic digest of a message of arbitrary
size. Some of the more well-known hash functions are MD5 (now considered outdated
and insecure), the SHA family (SHA-1, SHA-2, etc.), crypt, and others.
   While the main attack against hash-supported password authentication is the brute
force attack, iterating over probable values, hash functions are still not perfect or im-
mune from other forms of attack. One danger associated with hashing functions is the
possibility of collisions, which are instances of two different input messages producing
the same digest. Notably, MD5, once a widely-used hash function, was proven insecure
and susceptible to collisions along with a set of other hash functions in [3]. Weaknesses
in the SHA-1 algorithm were first discovered by Stevens et al., described in [4]; this
was later expanded upon by the same authors in [5] and by other researchers. Collisions
usually arise due to imperfections in the algorithm itself, and these attacks are generally
remediated by switching to a different, more secure hashing algorithm, as was done for
MD5 (to SHA-1) and is now being done for SHA-1 (to the SHA-2 and SHA-3 families
of algorithms).
   Another avenue of attack against hashed password authentication is the use of vari-
ous dictionaries, tables, and other means of hash digest lookup. Such attacks trade-off
storage space for computation time, storing pre-computed digests for commonly used
passwords to drastically reduce the time required to find the necessary input data.
286


Oechslin in [6] first proposed the concept of a so-called “rainbow table”, which uses
special chains between source data and hash digests to optimize lookup times. Lookup
attacks are rendered ineffective by the use of “salting”, a technique that uses an arbitrary
value concatenated with the input data to produce a new hash digest, different from the
one that would be generated from the raw input data alone. The salt must be stored with
the hash digest for the system to be able to reproduce the results later; however, as the
value is chosen is arbitrary, and different for each password, generating a lookup table
would be equivalent to a brute force attack in terms of time and resource requirements.


3      Passwords and the Human Factor

The previous section shows that, provided certain simple recommendations are applied,
the only feasible technical attack against hash-based password authentication is a brute
force one. However, the human factor is often the leading cause of breaches and a prime
vector of attack. The main avenues of human-targeted attacks are being deceived into
revealing their passwords, being compelled by force, and relying on personal associa-
tions and knowledge to create passwords, as well as relying on external information
storage to remember them.
   The problem of password creation and memorization is especially relevant to the
field of information security, as security guidelines that cannot be followed due to hu-
man psychology are largely pointless and unenforceable. Therefore, they must be for-
mulated concerning the average user’s capabilities and limitations.
   Uniformly random passwords are the hardest to guess, but also the hardest to re-
member, especially when there are many to remember. A 2007 study by Florencio and
Herley [8] showed that the average user accessed 25 accounts. However, even though
password policies dictate that the password for each service should be unique, the av-
erage user only had 6.5 unique passwords, indicating a relatively high degree of pass-
word reuse. It is therefore likely that the number of accounts for a user has grown in the
years since, and likely faster than the number of unique passwords.


4      Practical Applications

In a password security context, quantifying its efficiency means quantifying the quality
of a user’s password. Various algorithms attempt to assess the quality of a password.
Many are based on the concept of information entropy, which quantifies how “surpris-
ing”, or improbable, a random variable – here, the string of bytes – is on the whole.
   According to the definition of Shannon information entropy, if pi is the probability
of symbol number i appearing in the stream of symbols of length n > 0, the entropy for
that message for an information unit of base b would be:
                                                                                       287


   Assuming a perfectly random password is used – i.e. the password is a string of L
uniformly random symbols selected from a symbol set of length N – and that bits are
used as the information unit, (1) is simplified:


   This corollary is equivalent to the Hartley function introduced by Ralph Hartley in
1928. It follows, therefore, that for a truly random password it is preferable to draw
from as broad a set of symbols as possible from the start, adjusting the level of security
by adjusting the length of the password. However, symbols should be different enough
to avoid being present in a common subset, i.e. if the symbol set being used is “alpha-
numeric characters and punctuation marks”, care should be taken that the uniformly
random resulting password does not consist e.g. solely of letters, as that effectively
reduces the character set length from over 70 to just 52.
   However, (1), and by an extension (2), are unsuitable for assessing entropy of many
real-world passwords, as they are generated by people and thus contain patterns, based
on language or otherwise. The authors propose Markov processes as a framework for
calculating password quality because they are well-suited for quantifying how predict-
able a password is from the point of view of common patterns.
   Let X = {X0, X1, ..., XT } - be a sequence of T random variables (T ≤ 0), V = {V1,
V2, ..., VM} – the set of M states in a Markov process. In more familiar terms, X is the
password itself, where each of the random variables is a single character, and V is the
set of all possible characters in a password.
   The conditional probability P(Xt = xt | Xt-1 = xt-1, ..., X1 = x1) describes a situation
where the event at time t depends on all of the previous states of X. However, if it
depends only on the immediately preceding state, i.e.


  then it is a first-order Markov process. A zero-order Markov process is a process
where every event is independent of every other event, i.e. the events are perfectly ran-
dom. Entropy for a zero-order Markov process is calculated thus:


  For a first-order Markov process, the calculation becomes:


   And so on for higher-order processes. Markov processes are a useful estimate of how
predictable a given password is since many common passwords are words or word
combinations and thus well-described by Markov processes. A second-order Markov
process would generally be an acceptable compromise between complexity and accu-
racy.
   Markov probabilities may be calculated using publicly available databases of com-
monly used passwords and dictionaries of most common words in a given language;
288


for non-English-speaking users, it is generally necessary to assess both English- lan-
guage and non-English-language probabilities. Additionally, common number-letter
substitutions and casing differences could be accounted for on a software level for
added accuracy.
   The above technical and organizational recommendations have been tried and for-
mulated during the authors’ development of online university testing and learning sys-
tems. Salted hashing of passwords and the above approach to password quality assess-
ment have been used by the authors in multiple systems, described in [9] and [10]; the
systems have been copyrighted in [11-13].
   However, it is recommended that passwords not be the only factor in a login system.
Given the wide availability of mobile devices equipped with biometrics – mostly fin-
gerprint scanners – it becomes easy to use biometrics as a true second factor, instead of
OTPs or other solutions. The authors have explored the possibility of using mobile de-
vices as a biometric authentication platform in a 2016 study [14] and developed a pro-
totype for a mobile biometric authentication system used as an authentication module
for web services. The system was designed to be fully passwordless, utilizing a com-
mon protocol to authenticate users via biometrics modules built into mobile devices
and communicate with the web service requesting authentication. The prototype, cop-
yrighted in [15], included a fingerprint module, which used an Android device’s fin-
gerprint sensor to authenticate.


5      Conclusion

Password authentication has changed and evolved over the years, but the user experi-
ence has largely remained the same. However, their effectiveness is now challenged by
the ever-growing number of services that require them, and the ever-stricter require-
ments for their complexity as the number of attacks on Internet users grows.
   Passwords are easy to implement for developers of authentication systems and inte-
grators who connect them to services. However, modern passwords should be reason-
ably long and mostly random, which presents challenges for users – such passwords are
hard to remember, so weaker passwords are created, and later reused.
   The risk of “cracking” a password in a system has mostly been eliminated, but the
risk of correctly guessing a password, even if it is through brute force iteration, remains
significant. Some solutions have been proposed: password management software, or
different approaches to generating passwords that are easier to remember. However,
the problem remains.
   Today, issues with password security have largely been secured against using multi-
factor or multi-step authentication: the use of other means of authenticating a user in
addition to their password – from simple ones like temporary codes delivered by text
message or temporary time-limited passwords, to more complex solutions like biomet-
rics or hardware token authentication.
   As such, passwords will likely remain in some form as an authentication factor for a
long time yet. However, the advent of widespread biometrics and an overall higher level
                                                                                            289


of network security is making it possible to use passwordless and multi-factor authen-
tication more widely than ever before. On the whole, perfect must not be the enemy of
the good, but there must be a basic level of security for any system – and passwords
alone are no longer sufficient.


References
 1. McMillan, R.: The world’s First Computer Password? It was Useless too. WIRED. 2012.
    https://www.wired.com/2012/01/computer-password/ (2012)
 2. Evans, A., Kantrowitz, W., Weis,s E.: A User Authentication Scheme not Requiring Secrecy
    in      the      Computer.      Commun.         ACM17(8).        1974.     P.      437–442.
    https://doi.org/10.1145/361082.361087 (1974)
 3. Wang, X., Feng, D., Lai, X., Yu, H.: Collisions for Hash Functions MD4, MD5, HAVAL-
    128, and RIPEMD. Cryptology ePrint Archive. Report 2004/199. 2004.
    https://eprint.iacr.org/2004/199 (2004)
 4. Stevens, M., Karpman, P., Peyrin, T.: Freestart Collision for Full SHA-1. Cryptology ePrint
    Archive. Report 2015/967. 2015. https://eprint.iacr.org/ 2015/967 (2015)
 5. Stevens, M., Bursztein, E., Karpman, P., Albertini, A., Markov, Y.: The First Collision for
    Full SHA1. Annual International Cryptology Conference. 2017. P. 570—596. (2017)
 6. Oechslin, P.: Making a Faster Cryptanalytic Time-Memory Trade-Off. Annual International
    Cryptology Conference. 2003. P. 617–630. (2003)
 7. Ma, W., Campbell, J., Tran, D., Kleeman, D.: Password Entropy and Password Quality.
    2010 Fourth International Conference on Network and System security. РР. 583–587.
    (2010).
 8. Florencio, D., Herley, C.: A Large-Scale Study of Web Password Habits. Proceedings of the
    16th international conference on World Wide Web. P. 657–666. (2007).
 9. Boychenko, O., Gavrikov, I.: Using Two-Factor Authentication for Securing User Accounts
    in an Online Testing and Education System. International Scientific Review of the Problems
    and Prospects of Modern Science and Education: Proceedings of the XVI International Sci-
    entific and Practical Conference. V. 9 (19). P. 15–16. (2016).
10. Gavrikov, I.: Using Two-Factor Authentication for Securing User Accounts in an Online
    Testing and Education System. European Research: Innovation in Science, Education, and
    Technology: Proceedings of the XVII International Scientific and Practical Conference. N.
    6 (17). P. 32–33. (2016).
11. Apatova, N., Boychenko, O., Gavrikov, I.: BI Virtual Education System. Certificate
    2016660742          (in    Russian).     2017.     https://www1.fips.ru/registers-doc-view/
    fips_servlet?DB=EVM&DocNumber=2016660742&TypeFile=html (2017)
12. Apatova, N., Boychenko, O., Georgiadi, A., Gavrikov, I.: UWT Online Testing System.
    Certificate                 2017611276                   https://www1.fips.ru/registers-doc-
    view/fips_servlet?DB=EVM&DocNumber= 2017611276&TypeFile=html (in Russian).
    (2017).
290


13. Apatova, N., Boychenko, O., Kapustina, E., Gavrikov, I.: CALS: Computer Assisted Learn-
    ing System.           Certificate       2017615183 https://www1.fips.ru/registers-doc-
    view/fips_servlet?DB=EVM& DocNumber=2017615183&TypeFile=html (in Russian).
    (2017).
14. Boychenko, O., Gavrikov, I.: Implementing a Biometric Security Solution Using Mobile
    Devices. Regional information science and security: Proceedings of the XV International
    conference. P. 73–75. (in Russian). (2016).
15. Boychenko, O., Gavrikov, I.: FPAS: Mobile Biometric Authentication System. Certificate
    2017619639                        https://www1.fips.ru/                  registers-doc-
    view/fips_servlet?DB=EVM&DocNumber=2017619639& TypeFile=html (in Russian).
    (2017).