<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Revisiting Monte Carlo Strength Evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Stanek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Faculty of Mathematics</institution>
          ,
          <addr-line>Physics and Informatics</addr-line>
          ,
          <institution>Comenius University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Monte Carlo method, proposed by Dell'Amico and Filippone, estimates a password's rank within a probabilistic model for password generation, i.e., it determines the password's strength according to this model. We propose several ideas to improve the precision or speed of the estimation. Through experimental tests, we demonstrate that improved sampling can yield slightly better precision. Moreover, additional precomputation results in faster estimations with a modest increase in memory usage.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Password</kwd>
        <kwd>Monte Carlo</kwd>
        <kwd>Strength Evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The first idea is to interpolate password’s rank within
the sampled interval it belongs, according its
probabilPasswords remain a frequently used authentication ity. The second idea aims to reduce probability overlap
method, despite numerous initiatives, technologies, and in sampled passwords. Both these ideas, presented in
implementations aiming for passwordless authentication. Section 3.2, seek to improve the estimator’s precision.
Although the popularity of methods such as Windows The estimation speed for a password, originally based
Hello, Passkey, and WebAuthn has increased, the security on binary search, can be enhanced with some additional
of passwords continues to be a significant topic in many data computed in advance (the third idea, see Section
application areas. 3.3). All ideas have been tested experimentally to assess</p>
      <p>
        Evaluating the strength of a password is useful for pro- their merit. The results are presented in Section 4. Our
viding users with feedback on their chosen passwords. experiments demonstrate that improved sampling can
This feedback can assist users in selecting stronger pass- yield slightly better precision. However, the efect of
inwords. Often, the strength is calculated as a password’s terpolation on precision is inconclusive, and we cannot
rank, i.e., how many passwords will be generated by rely on this technique to improve precision.
some chosen algorithm until our password is produced. We utilize the reference implementation of the Monte
There are various tools that calculate the strength of the Carlo estimator, which was published by one of the
aupassword, for example zxcvbn [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], or password scorer thors of the original paper on GitHub [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and we employ
tool in PCFG cracker [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. the RockYou dataset for our experiments. Given that our
      </p>
      <p>
        Dell’Amico and Filippone proposed a Monte Carlo al- focus lies on the estimator itself, the choice of dataset is
gorithm that estimates a password’s rank within a proba- relatively unimportant.
bilistic model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The algorithm work for any
probabilistic password generation model, and the authors proved
that estimated results converge to the actual ranks. 2. How the Monte Carlo Estimator
      </p>
      <p>
        The Monte Carlo estimator is also used to evaluate works
and compare diferent probabilistic models for password
generation. The original paper compares -grams
models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the PCFG model using probabilistic context-free
grammar [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and the Backof model [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Recent example
of using the Monte Carlo estimator is the evaluation of a
password guessing method that employs a random forest
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        We mostly follow [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] in this section. Let Γ be a set of all
allowed passwords. A probabilistic password model aims
to capture how humans select password, assigning higher
probabilities to more frequently chosen passwords and
lower probabilities to less common ones. Let ( )
denotes a probability assigned to password  by the model,
such that ∑︀ ∈Γ ( ) = 1. Diferent models yield
diferOur contribution. We propose three ideas for improv- ent probability distributions.
ing the precision or speed of the Monte Carlo estimator. When the model is used for an attack, it enumerates
password in descending order of probability. Therefore,
ITAT’24: Workshop on Applied Security, September 20–24, 2024, the strength of a password  is the number of passwords
Drienica, SK with a higher probability:
$ martin.stanek@fmph.uniba.sk (M. Stanek)
      </p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) ( ) = |{ ∈ Γ; ( ) &gt; ( )}|.
Remark. In this context, the authors do not address the substantial size: 3.17 MB for 4-gram, 7.45 MB for 5-gram,
possibility that the model may assign identical probabili- 43.5 MB for Backof, and 0.99 MB for PCFG. An attempt
ties to multiple passwords, resulting in a non-monotonic to use up to 10% of the RockYou dataset for training leads
. The definition of ( ) assigns all passwords that to unacceptable model sizes, where Backof model being
share the same probability the lowest rank in their group. the largest, as shown in Figure 1.</p>
      <p>This approach can be considered prudent from a security The model defines how passwords are represented,
standpoint. generated, and how their probabilities are calculated.</p>
      <p>
        Computing the exact value of ( ), for a random  , Since these methods are specific for each model, we do
has prohibitively large time complexity. The Monte Carlo not aim to improve the model size. However, the Monte
estimator uses sampling and approximation to provide Carlo estimator utilizes an additional arrays, where
probeficient and suficiently accurate estimation. It relies on abilities and ranks of sampled passwords are
precomtwo properties of the underlying model: puted. The original paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] experiments with various
sample sizes up to 100,000 (having “relative error 1%”), but
• The model allows for eficiently computing ( ) mostly uses the default sample size of 10,000. The default
for any password  . sample size requires 160 kB of memory1 and its
domi• There is an eficient sampling method that gener- nated by the memory required for any model trained on
ates a password according to the model’s distri- a dataset of reasonable length.
      </p>
      <p>bution.</p>
      <p>Precomputation. The estimator generates a sample Θ
of  passwords (sampling with replacement). The sample
Θ = { 1, . . . ,  } is sorted by descending probability,
i.e., ( 1) ≥ . . . ≥ ( ). The cumulative ranks of
sampled passwords are calculated as follows:</p>
      <p>1 ∑︁

=1 (  )</p>
      <p>1
 =
for  = 1, . . . , .</p>
      <sec id="sec-1-1">
        <title>3.2. Precision</title>
        <p>The estimator assigns the same rank  to any
password  for which the probability falls within the range
(  ) &gt; ( ) ≥ ( +1). Intuitively, passwords with
distinct probabilities should not get the same numeric
estimate. Certainly, this is not an issue when the password
strength is presented on a reduced scale using descriptive
characteristics like weak – medium –strong – very strong,
or using a trafic lights metaphor red – amber – green.</p>
        <p>The estimator needs to store the probabilities. The
cumulative ranks can be easily recomputed. However,
both these arrays are usually significantly smaller than
representation of the model, see Section 3.</p>
        <p>
          Idea 1. Interpolate rank values within intervals using an
appropriate function. The most basic approach, without
additional parameters, is linear interpolation. This has no
impact on memory complexity and a negligible impact
Remark. The implementation [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] uses negative log2 prob- on time complexity. Figure 2 shows a graph of password
abilities, i.e., scaling (  ) to − log2 (  ). ranks, on a logarithmic scale, for various models and the
sample size of 10,000. It appears that linear interpolation
Estimation. In order to estimate ( ) for some pass- on the logarithmic scale should perform well for these
word  , the probability ( ) is computed first. Then the models.
binary search is used to compute the largest index  such
that (  ) &gt; ( ). The result, estimated rank of  is
( ) ≈  . Hence, the time complexity of the estimator
is (log ).
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Areas for improvement</title>
      <sec id="sec-2-1">
        <title>3.1. Memory requirement</title>
        <p>The RockYou dataset contains more than 14 million
unique passwords. The more passwords are used to train
a model, the better and more precise results we can
expect, such as in our case for password strength
estimation. However, there is a point beyond which additional
training data provide only negligible improvement, while
further increasing the model’s size. Notably, even the set
of 10,000 most frequent passwords generates models of 1Real numbers are represented as the numpy.float64 datatype.</p>
        <p>The precision of the estimator depends on the
sample size. More specifically, it depends on the number of
unique probabilities in the set  = {( 1), . . . , ( )}.</p>
        <p>We define the overlap of Θ as the fraction of probability
values that are already in the set, and therefore do not
contribute to the estimator’s precision: 1 − |  |/. Table
1 shows the average overlap for diferent models and
sample sizes. Surprising diferences in overlap are
observed among diferent models. An expected increase in
overlap is observed with an increasing sample size, since
the overlap depends substantially on password
probability distribution, given by the model from which the
passwords are generated. On the other hand, a larger
training dataset results in greater diversity of passwords,
leading to slightly lower overlap.
4-gram
5-gram
PCFG</p>
        <p>Backoff
1020
)
lae1016
c
lsog1012
(
k 8
ran 10
d 4
ro 10
w
s
sa 100
P
0
2000</p>
        <p>4000 6000
Position in the sample
8000</p>
        <p>10000
Idea 2. The estimator will sample random passwords
for Θ until it gets  unique probabilities. It compresses
sample by discarding duplicate probabilities in such a
way that preserves the cumulative sum of the entry with
the largest index. Hence, the rank calculation remains
intact, and the overlap of the resulting Θ is be 0. Since
the sampling is done in precomputation phase, it does
not impact the estimation time or memory complexity
in any way.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. Estimation speed</title>
        <p>The binary search employed in the original estimator is
fast enough for assessing individual passwords. However,
when the estimator is used to evaluate or compare
diferent models and their variants, the ranks of a large number
0</p>
        <p>200 400 600 800 1000 1200
Number of passwords [thousands]
0</p>
        <p>200 400 600 800 1000 1200
Number of passwords [thousands]
4-gram
5-gram</p>
        <p>PCFG
of passwords need to be estimated. An optimization can
be relevant in these scenarios.</p>
        <p>Idea 3. Divide the interval of possible probability values
( ), in our case expressed as negative log2 values, into
 intervals (bins): [0,  1), [ 2,  3), . . . , [ − 1, ∞), where
0 &lt;  1 &lt; . . . &lt;  − 1. For each interval, for 1 ≤  ≤ ,
we calculate minimal and maximal look-up indices that
narrow interval for binary search (we use  0 = 0 in the
following equations):
LUmin() = max{1 ≤  ≤  | − log2 (  ) ≥  − 1},
LUmax() = min{1 ≤  ≤  | − log2 (  ) &lt;  − 1}.</p>
        <p>The estimator is adapted accordingly. Given a
password  , we calculate an appropriate interval
such that − log2 ( ) ∈ [ − 1,  ). Then, the
binary search is performed within the set of indices
{LUmin(), . . . , LUmax()}, instead of full set {1, . . . , }.
We expect to narrow the interval for the binary search
substantially, so the benefit of fewer comparisons will
be measurable. Trivially, the precision of the estimator
remains unchanged.</p>
        <p>The price paid is the cost of computing LUmin and
LUmax arrays, which is simple one-time precomputation,
and small memory needed to store these arrays in the
estimator2.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <p>We implement the ideas presented in the previous section
and present the results of our experiments.
2For example, 100 intervals “cost” approximately 7.8 kB, even with
a wasteful representation using Python’s int objects for stored
indices and lists for the arrays
training set</p>
      <sec id="sec-3-1">
        <title>4-gram</title>
      </sec>
      <sec id="sec-3-2">
        <title>5-gram</title>
      </sec>
      <sec id="sec-3-3">
        <title>Backof</title>
        <p>500,000
1,000,000</p>
        <p>Let rr( ) denote the real rank of password  ∈  , and
let er( ) denote the rank estimated by a particular variant
of the estimator. The weighted error of the estimator on
the password set  is calculated as follows:
∑︁ ( ) |er( ) − rr( )|.
 ∈</p>
        <sec id="sec-3-3-1">
          <title>4.1. Precision</title>
          <p>The weighted error assumes that the estimators are used
to asses passwords chosen by humans, following the
original distribution. We also consider a simple error for
completeness:
The ideas aimed at improving precision apply to the
Monte Carlo Estimator, regardless of the underlying
model. We do not attempt to modify the models. For
example, if password -1-1-1-1 is assigned inf3 as neg- 1
ative log2 probability in the PCFG model, because the ∑︁ |er( ) − rr( )|.
pattern is outside of the trained grammar, we do not try to | |  ∈
“fix this”. Moreover, we do not compare the performance
of the models to each other.</p>
          <p>
            We assess the impact of our ideas on the real ranks variant weighted error simple error
of password generated by the models. Similarly to the original 16.54 101.11
original paper [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], we generate all passwords up to some interpolation 15.33 90.63
probability threshold. The rank of a password is its posi- sampling 11.79 70.63
tion in the list sorted by the probabilities assigned by the all 10.86 63.10
model to the passwords.
          </p>
          <p>The first experiment uses the PCFG model trained on Table 3
10 million passwords from the RockYou dataset. The Weighted and simple errors of various estimator variants.
Evthreshold for password generation was set at 20, i.e., all ery number is an average of 100 experiments.
passwords with probability at least 2− 20 were generated
– there were 91,693 passwords in this dataset (let’s denote
it  ). We consider various combinations of proposed
ideas:</p>
          <p>Table 3 shows the results of our experiment. We
performed 100 experiments. We have to warn the reader –
the reported errors are sensitive to the particular
password distribution sampled into Θ . Unsurprisingly, the
sampling (Idea 2) helps to reduce estimation errors in
general. The situation with interpolation (Idea 1) is mixed,
with a substantial fraction of experiments showing worse
• original – a reference implementation of the
esti</p>
          <p>
            mator [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ];
3Python’s float(’inf’) value
original
all
0.8
statistics. The reason is that the interpolation makes the
error worse when passwords in Θ already “overshoot”
their true ranks. Taking the same rank without
interpolation compensates for this. Therefore, interpolation
cannot be recommended for improving the precision of
the estimator. On the other hand, it helps with the “same
rank” problem, when diferent passwords are assigned
the same rank by the estimator.
          </p>
          <p>
            Figure 3 compares visually the original variant with
the “all” variant. It illustrates the simple diference of
calculated rank and estimated rank. It also shows the
relative error of the estimators. As expected, based on
the convergence proof in [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], the relative error is rather
small in both cases.
          </p>
        </sec>
        <sec id="sec-3-3-2">
          <title>4.2. Estimation speed</title>
          <p>Since passwords in Θ are generated according to their
probability, with suficiently large sample size, we expect
that for some , the top- most probable passwords will
be in the correct order at the beginning of Θ . Therefore,
simply reporting the order of these top- passwords by
We tested two configurations: the first one with 100 inter- the estimator can be beneficial to the precision. Figure
vals (bins), and the second with 1,000 intervals. Negative 4 illustrates this phenomenon for the PCFG model and
log probabilities are divided into fixed intervals [0, 1), the sample size of 10,000, where approximately the top
[1, 2), . . . , [99, ∞) for the first case, and into [0, 0.1), 180 passwords have the exact rank as their position in
[0.1, 0.2), . . . , [99.9, ∞) for the second case, respectively. Θ . However, further down the precision quickly
deterioBoth configurations were tested with four diferent sizes rates.
of Θ . Table 4 shows the relative speed of diferent vari- An interesting question is if we can improve the
estiants with respect to the baseline, which is the original mator’s precision by compensating for unusually large or
algorithm with |Θ | = 10000. The results confirm a small jumps (diferences) between adjacent probabilities
moderate speed-up for 100 intervals and a substantial in the sampled passwords.
speed-up for 1,000 intervals. An area outside this paper that deserves further focus
is the precision of the estimator for low-probability
pass5. Additional observation and words. The estimator’s precision worsens for passwords
with high ranks. A potential approach might use a
difconclusion ferent or additional sampling methods that focus on less
probable passwords, so that we can cover this part of the
probability space better.</p>
          <p>700</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Wheeler</surname>
          </string-name>
          ,
          <article-title>Zxcvbn: low-budget password strength estimation</article-title>
          ,
          <source>in: Proceedings of the 25th USENIX Conference on Security Symposium</source>
          , SEC'16,
          <string-name>
            <given-names>USENIX</given-names>
            <surname>Association</surname>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          . URL: https: //www.usenix.org/conference/usenixsecurity16/ technical-sessions/presentation/wheeler.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] PCFG cracker,
          <year>2024</year>
          . URL: https://github.com/lakiw/ pcfg_cracker.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dell'Amico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Filippone</surname>
          </string-name>
          ,
          <article-title>Monte carlo strength evaluation: Fast and reliable password checking</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security</source>
          , CCS '15,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2015</year>
          , p.
          <fpage>158</fpage>
          -
          <lpage>169</lpage>
          . URL: https://doi.org/10.1145/ 2810103.2813631.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Shmatikov</surname>
          </string-name>
          ,
          <article-title>Fast dictionary attacks on passwords using time-space tradeof</article-title>
          ,
          <source>in: Proceedings of the 12th ACM Conference on Computer and Communications Security</source>
          , CCS '05,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2005</year>
          , pp.
          <fpage>364</fpage>
          -
          <lpage>372</lpage>
          . URL: https://doi.org/10.1145/1102120.1102168.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Weir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Glodek</surname>
          </string-name>
          ,
          <article-title>Password cracking using probabilistic context-free grammars</article-title>
          ,
          <source>in: 30th IEEE Symposium on Security and Privacy</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>405</lpage>
          . URL: https://doi.org/ 10.1109/SP.
          <year>2009</year>
          .
          <volume>8</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          , W. Yang,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A study of probabilistic password models</article-title>
          ,
          <source>in: IEEE Symposium on Security and Privacy</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>689</fpage>
          -
          <lpage>704</lpage>
          . URL: https://doi.org/ 10.1145/1102120.1102168.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , K. Xiu,
          <article-title>Password guessing using random forest</article-title>
          ,
          <source>in: 32nd USENIX Security Symposium (USENIX Security 23)</source>
          , USENIX Association,
          <year>2023</year>
          , pp.
          <fpage>965</fpage>
          -
          <lpage>982</lpage>
          . URL: https: //www.usenix.org/conference/usenixsecurity23/ presentation/wang-ding
          <article-title>-password-guessing.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dell</surname>
          </string-name>
          <article-title>'Amico, montecarlopwd: Monte carlo password checking</article-title>
          ,
          <year>2016</year>
          . URL: https://github.com/ matteodellamico/montecarlopwd.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>