<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Password Strength Analysis Through Social Network Data Exposure: A Combined Approach Relying on Data Reconstruction and Generative Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maurizio Atzori</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eleonora Calò</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Loredana Caruccio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Cirillo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Polese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giandomenico Solimando</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Salerno</institution>
          ,
          <addr-line>Via Giovanni Paolo II, 132, 84084 Fisciano (SA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics and Computer Science, University of Cagliari</institution>
          ,
          <addr-line>Via Ospedale, 72, 09124, Cagliari (CA)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>33</volume>
      <issue>0</issue>
      <fpage>16</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>Although passwords remain the primary defense against unauthorized access, users often tend to use passwords that are easy to remember. This behavior significantly increases security risks, also due to the fact that traditional password strength evaluation methods are often inadequate. In this discussion paper, we present soda advance, a data reconstruction tool also designed to enhance evaluation processes related to the password strength. In particular, soda advance integrates a specialized module aimed at evaluating password strength by leveraging publicly available data from multiple sources, including social media platforms. Moreover, we investigate the capabilities and risks associated with emerging Large Language Models (LLMs) in evaluating and generating passwords, respectively. Experimental assessments conducted with 100 real users demonstrate that LLMs can generate strong and personalized passwords possibly defined according to user profiles. Additionally, LLMs were shown to be efective in evaluating passwords, especially when they can take into account user profile data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Privacy-Preserving</kwd>
        <kwd>Password-disclosure</kwd>
        <kwd>Data wrapping</kwd>
        <kwd>Data reconstruction</kwd>
        <kwd>Social Network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Traditional password strength assessments often fall short, as they focus on static syntax rules
without considering the semantic context of user choices. Indeed, users generally choose
passwords by using keywords easy to remember. However, since much personal information is
shared on social networks, attackers can exploit these details to infer user passwords. Thus,
through data reconstruction tools, it is possible to reconstruct information semantically related
to a context close to users [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this landscape, Large Language Models (LLMs) emerge as
both a asset for evaluating password security and a potential threat in generating passwords.
      </p>
      <sec id="sec-1-1">
        <title>This discussion paper examines the privacy risks associated with sharing personal data online and explores the capabilities of LLMs in password evaluation and generation, as proposed in [2]. The latter presents soda advance, an extension of the tool soda [3], which includes a</title>
        <p>
          new module for evaluating password strength based on information publicly available on social
networks. This module exploits some approaches such as cupp [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], leet [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], coverage [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and
force [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and introduces a new cumulative metric, namely Cumulative Password Strength
(cps). Furthermore, we present diferent pipelines, with aim of investigating capabilities and
threats associated to the generation and evaluation of passwords by using diferent LLMs. The
overall evaluation is driven by the following research questions (RQs):
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>RQ1: Can we rely on LLMs to suggest complex and easy-to-remember passwords based on publicly available information on social networks?</title>
      </sec>
      <sec id="sec-1-3">
        <title>RQ2: Can LLMs represent a valid tool to support users in evaluating the strength of passwords based on personal information?</title>
      </sec>
      <sec id="sec-1-4">
        <title>RQ3: How does the public availability of personal information across multiple social networks</title>
        <p>impact the capabilities of LLMs to generate and evaluate password strength?</p>
      </sec>
      <sec id="sec-1-5">
        <title>RQ4: How efective is the prompt-based methodology for password generation and evaluation compared to state-of-the-art models?</title>
        <p>2. Combining soda advance and LLMs for Evaluating Passwords</p>
      </sec>
      <sec id="sec-1-6">
        <title>In this section, we describe the soda advance tool and the three proposed pipelines that</title>
        <p>combine the capabilities of LLMs1 (e.g., Google Gemini, ChatGPT, Claude, Dolly, Falcon, and</p>
      </sec>
      <sec id="sec-1-7">
        <title>LLaMa) with those of soda advance to address password generation and evaluation problems.</title>
        <p>soda advance Tool. The soda advance tool evaluates password strength based on
reconstructed personal data from social networks. The soda advance pipeline (see Figure 1) starts
with basic user information, i.e., name and photo as input ( 1 ). It then extracts public data from</p>
      </sec>
      <sec id="sec-1-8">
        <title>Facebook, LinkedIn and Instagram using web crawling and scraping techniques ( 2 ). The tool</title>
        <p>
          uses facial recognition to verify the user’s identity across all platforms ( 3 ). Finally, it merges
the extracted information ( 4 ) and evaluates the strength of the provided password based on
the reconstructed data ( 5 ). The evaluation module in soda advance uses four methods (i.e.,
cupp, leet, coverage and force) and a new metric cps that combines their results to provide a
cumulative value in the range [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-9">
        <title>Generation and Evaluation Pipelines. The first pipeline is designed to investigate the</title>
        <p>capabilities of LLMs to generate strong passwords based on specific information provided by
users. The process begins with the generation of passwords using LLMs, where each template
creates a set of strong but memorable passwords based on user input. The generated passwords
are then evaluated with the soda advance module, which analyzes their strength. Consequently,
each password is labeled as weak or strong according to the strength score.</p>
      </sec>
      <sec id="sec-1-10">
        <title>The second pipeline is designed to investigate the efectiveness of LLMs in assessing the</title>
        <p>strength of passwords by also considering their semantics in relation to user data. The
process begins by generating strong passwords using the best LLM of the Generation Pipeline.</p>
      </sec>
      <sec id="sec-1-11">
        <title>1www.deepmind.google, www.chat.openai.com, www.claude.ai, www.databricks.com, www.falconllm.tii.ae, and</title>
        <p>www.llama.meta.com
NAME
SURNAME
...</p>
        <p>USER PHOTO
3
3
3
4</p>
        <p>NAME George Smith
OCraITnYge 1/2D3A/1T9E94 UniveErDsiUtyCoAfTCIOalNifornia</p>
        <p>...
4
5</p>
        <p>Face
Recognition 3
PPRROOFFIILLEEPPHHOOTTOO
MERGING</p>
        <p>......
1
NAME
CITY</p>
        <p>George Smith
Orange, California
...</p>
        <p>3
2
CLEUEPTPSODCAOFVOERRCAEGE 31 ONCrAaITMnYgEe 1/2D3A/1T9E94..G.eUonrgiveeESrDsmiUtyiCthoAfTCIOalNifornia
a, A: @,4;
b, B: 3, 8;</p>
        <p>...
i, I: 1, |;</p>
        <p>...
z,Z: 2, %;
LEET: 0.33</p>
        <p>.. .. WEI1GHT</p>
        <p>Orange123 Orange123</p>
        <p>COVERAGE: 0.67 FORCE: 0.47
6 EVALUATION
66</p>
        <p>6
4</p>
        <p>5
5 LLMs</p>
        <p>TYPE PASSWORDS</p>
        <p>OrangeSystems23
......... Male...S....y..stems*?</p>
        <p>GeorgeCali1023
Syst3msSm1th@</p>
        <p>Parsing</p>
      </sec>
      <sec id="sec-1-12">
        <title>Simultaneously, weak passwords are generated by using cupp. Once passwords are created, a</title>
        <p>
          new prompt is generated to evaluate their strength. The evaluation involves submitting the
user data along with the generated passwords to an LLM, which then assigns each password a
numerical strength score. Finally, passwords are categorized as weak or strong according to
obtained score. Details concerning the above-described pipelines can be found in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
Data Reconstruction and Password Evaluation Pipeline. The third pipeline combines the
password strength evaluation of soda advance with that of the LLMs, using new automated
prompting functions for evaluating passwords. They directly consider within a prompt both
data reconstructed from the social networks and the results achieved by soda advance. As
shown in Figure 2, starting from a small set of user information, we used soda advance to
reconstruct it using the publicly available information shared on social networks 1 . Then, in
2 , the reconstructed information is used to create a dataset containing both strong and weak
passwords associated with the user. In 3 , the set of user passwords is provided to soda advance
that is responsible for their first evaluation. Before proceeding with the evaluation step, in 4 ,
new prompt containing the explanation of each metrics adopted by soda advance is provided
to LLMs. Moreover, for each of them, in 5 , a new prompt considered both values resulting from
soda advance and the data reconstructed from the social networks, is automatically generated,
which is then submitted to LM together with the passwords to be evaluated in 6 , each prompt
is filled with the user reconstructed data and the evaluation results from soda advance, and
it is submitted to an LLM together with the passwords to be evaluated. Finally, in 7 and 8 ,
each password is associated a strength score to identify its category: strong or weak.
Prompt Engineering Approach for Password Strength problem.
        </p>
      </sec>
      <sec id="sec-1-13">
        <title>The process of generat</title>
        <p>ing passwords required the definition of an ad-hoc prompting function, namely password-generation,
as shown in the following.</p>
        <p>On the basis of the following personal information: [Name: George], [Surname: Smith], [City: Orange,
California], [Date: 10/23/1994]. Could you generate a set of passwords that do not have to directly
contain personal data, but must be easy for the user to memorize?</p>
      </sec>
      <sec id="sec-1-14">
        <title>Instead, the process behind the password evaluation pipeline requires interacting with LLMs</title>
        <p>at several steps. Among these, we defined a new function prompt-generation to ask each LLM to
automatically generate prompts for password evaluation and a new function parsing-prompt to
ask each LLM to provide a strength score for each textual description. These prompts enabled
us to automatically create a new prompting function, namely evaluate-password for each LLM
involved in our study. An example of the prompt automatically generated for ChatGPT follows:</p>
        <p>User information: [Name: George], [Surname: Smith], [City: Orange, California], [Date: 10/23/1994],
[Education: University of California]. For each line containing a password that I could use for a social
network account, give me an answer for each of them and write whether the password can be
considered secure or not, giving secure or not secure. Assess the password’s strength using the
information supplied by the user, considering factors like its length and ability to resist guessing
techniques. Passwords: [OrangeSystems23], [MaleSystems*?], [GeorgeCali1023], [C@liforn1Sm1th49],
[Syst3msSm1th@], [0r@nge@n3@]</p>
      </sec>
      <sec id="sec-1-15">
        <title>In the third pipeline the interaction with LLMs to evaluate the password strength has required</title>
        <p>the use of some of the previous prompting functions, and the definition of new ones to explain
the metrics (i.e., understanding-metrics) to LLMs and evaluate the password, by also considering
the results of soda advance. We manually defined two new prompting functions following the</p>
      </sec>
      <sec id="sec-1-16">
        <title>Manual Template Engineering strategy [8], and we automatically generated those to evaluate</title>
        <p>passwords for each LLMs, by means of metrics-prompt-generation function. Starting from the
generated function eval, we automatically generate a new specific prompt is provided to each LLM.</p>
      </sec>
      <sec id="sec-1-17">
        <title>The prompt generated by ChatGPT is shown below:</title>
        <p>User information: [Name: George, Surname: Smith, City: Orange, California, Date: 10/23/1994]</p>
        <p>Passwords Evaluation Results:
Password; Force; Leet; Coverage; CUPP; CPS</p>
        <p>OrangeSystems; 23; 57; 57; 0; 0.45</p>
        <p>MaleSystems*?; 27; 2; 71; 1; 1
GeorgeCali1023; 63; 12; 76; 0; 0.50</p>
        <p>C@liforn1Sm1th49; 65; 0; 83; 0; 0.49
Please assess the security of each password listed. Using the user information provided, analyze the
password strength based on the following methods: Leet Coverage, Force, CUPP, and Cumulative
Password Strength. Upon evaluation, please provide a response of Strong if the password is deemed
suficiently strong and efectively safeguards the user’s information based on the provided data, or
Weak if the password could potentially be compromised or guessed based on the available details.</p>
      </sec>
      <sec id="sec-1-18">
        <title>The prompts generated by LLMs for evaluating password strength have showed similarities in their structures but have demonstrated diferences in formatting and language style. In the following sections, we will show a case study involving real users that allows us to investigate the capabilities of soda advance and LLMs to evaluate password strength.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Experimental Evaluation</title>
      <p>The experiments in this study aim to evaluate how password strength can be afected by the
information publicly available on social network platforms from both syntactical and semantic
perspectives. To this end, we investigate the behavior of soda advance and generative LLMs
following the three diferent pipelines discussed in the previous section. We involved 100 users,
each of whom filled out an information survey and an authorization form for profiling their
social network using soda advance. Among the questions submitted to users, we required
their name, surname, and a photo. The collected data is used as starting points of the evaluation.</p>
      <sec id="sec-2-1">
        <title>Notice that, we obtained the explicit consent by users, in compliance with GDPR [9].</title>
      </sec>
      <sec id="sec-2-2">
        <title>Technical Settings. soda advance was implemented using Python version 3.10.2 on the</title>
        <p>server side and using web programming frameworks for graphical interfaces. Concerning LLMs,
we adopt ChatGPT 3.5.5, Claude 2.1, LLaMa 2024.2.19.1, Falcon in its version at 40B, Google</p>
      </sec>
      <sec id="sec-2-3">
        <title>Gemini 1.0, and Dolly-v2-12b. Moreover, for the analysis of the characteristics of the generated</title>
        <p>
          passwords we used two diferent tools (i.e., Passat and Node-password-analyzer)2. Furthermore,
to make a comparative evaluation with soda advance, we use the Zxcvbn library [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] in its
version 4.4.2, the CKL_PSM library [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and the Semantic PCFG [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] tool. The latter tool
was trained on plain text passwords extracted from the Evite3 dataset. Finally, for generative
password comparison, we use the PassBERT model [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>RQ1. The characteristics of the generated passwords revealed that each LLM exhibits distinct</title>
        <p>patterns in the generation of strong passwords, with variations in syntactical complexity and
the combination of letters/characters. Thus, to evaluate the strength of passwords we used the
new metric cps of soda advance. In average, we obtained that Claude, Google Gemini, and</p>
      </sec>
      <sec id="sec-2-5">
        <title>ChatGPT outperform the other LLMs achieving a score of 0.82, 0.75, and 0.74, respectively. On</title>
        <p>the other hand, Dolly, LLaMa, and Falcon have generated more weak passwords, achieving a
score of 0.65, 0.66, and 0.66, respectively. This is probably due to their tendency to generate
repetitive or predictable passwords, using recurring and easily guessable patterns.
RQ2. Starting from the values provide by cps, we consider a password as strong when its
strength score is greater than or equal to 0.55, weak otherwise. Those, we are able to get a
binary evaluation of passwords and compare the results achieved by LLMs with those achieved
by methods proposed in the state-of-the-art. By considering the average value achieved by
each LLM, Claude obtained the highest values for accuracy, precision, recall, and F1-score, i.e.,
0.75, 0.76, 0.75, and 0.75, respectively. The high precision score indicates that it has a low rate
of False Positive, meaning that it correctly identifies strong passwords with a high degree of
confidence. To further investigate if the ensemble of diferent LLMs improves the values of
metrics, we considered two diferent ensembles: ) including all the LLMs and ) including the
three LLMs with the highest scores; but both performed lower than Claude.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2www.github.com/HynekPetrak/passat, www.github.com/T-PWK/node-password-analyzer</title>
      </sec>
      <sec id="sec-2-7">
        <title>3www.haveibeenpwned.com</title>
      </sec>
      <sec id="sec-2-8">
        <title>RQ3. By combining social media data with the semantic capabilities of LLMs, password</title>
        <p>strength evaluations significantly improved with respect to scenario in which a few user data is
provided to LLMs. Compared to the latter scenario, the inclusion of broader personal information
led to better performance across most models. For instance, Falcon improved its precision from
0.48 to 0.77 and ChatGPT reached high scores in accuracy, precision, recall, and F1-score.
Instead, Claude showed the best overall performance (i.e., accuracy equal to 0.77 and precision
equal to 0.89). Ensemble models also benefited, likely due to the enhanced performance of
individual LLMs. These improvements suggest that public social media data provides valuable
context, allowing LLMs to make more accurate assessments. However, this also raises privacy
concerns: as more personal data becomes accessible, users face increased risks. LLMs could be
exploited by attackers to guess passwords based on publicly shared information. This highlights
the importance of strong privacy settings, secure password practices, and the need for clear
ethical and legal guidelines regarding the use of LLMs.</p>
      </sec>
      <sec id="sec-2-9">
        <title>RQ4. To evaluate the capabilities of LLMs in both password generation and evaluation tasks,</title>
        <p>as well as the efectiveness of soda advance in assessing password strength, we analyzed the
medium-security passwords and compared the results with state-of-the-art tools.</p>
        <p>Medium Password Strength evaluation. Starting from the initial dataset provided by 100 users,
we generated a set of 30 passwords for each user using the prompt password-generation. The values
of cps obtained for medium-strength passwords generated by LLMs and evaluated through soda
advance range between 0.36 and 0.60. In particular, Claude, Google Gemini, and ChatGPT
outperform all other LLMs achieving the highest number of medium passwords. Then, to assess
the evaluation capabilities of LLMs and soda advance, we asked each model to evaluate each
password. By using the evaluation pipeline, the classification task involving multiple labels
(i.e., weak, medium, strong), significantly reduced the performance of all LLMs with respect to
the binary classification task (i.e., weak and strong). In particular, we have noticed that most of
the passwords correctly evaluated were weak passwords, containing recurrent patterns and
combinations of user data. Instead, LLMs were not able to discriminate passwords between
strong and medium levels. Conversely, with the data reconstruction and password evaluation
pipeline, the overall performance was higher, demonstrating that Claude outperformed all other</p>
      </sec>
      <sec id="sec-2-10">
        <title>LLMs. Our analysis shows that the initial evaluation provided by soda advance efectively supported LLMs in distinguishing between weak, medium, and strong passwords.</title>
        <p>Comparative evaluation with state-of-the-art tools. We performed a comparison with soda
advance and some of the most recent tools for password evaluation available in the
state-ofthe-art, i.e., Zxcvbn, CKL_PSM, and Semantic PCFG. In order to be able to compare the values
obtained from the library and tools with those of the evaluation module of soda advance, we
uniform the ranges to fit the strength of the passwords in three categories, weak, medium, and
strong. For the purposes of our evaluation, we extracted a random sample of 250 passwords,
ranging in length from 8 to 25 characters. Figure 3 shows the results of soda advance,
CKL_PSM, Zxcvbn, and Semantic PCFG on the considered set of passwords. As we can see, most
of the passwords have been classified as medium by all tools, and only a few of them as strong.
soda advance has demonstrated good capabilities of evaluation for the passwords containing
these types of information, classifying them as weak. Moreover, soda advance classified as
180
36</p>
        <p>34
CKL_PSM
65
152
61</p>
        <p>165
33</p>
        <p>24
Zxcvbn</p>
        <p>Semantic PCFG
medium some passwords consisting of simple dictionary words not semantically linked to users.</p>
      </sec>
      <sec id="sec-2-11">
        <title>These types of passwords have been considered strong by the methods that evaluate these attempts, i.e., CKL_PSM, Zxcvbn, and Semantic PCFG, since they have a medium-complex syntax that requires a large number of attempts to crack. This is probably due to the metrics for the analysis of syntax included in the cps.</title>
        <p>By summarizing, we have noticed that no model excels at evaluating password strength. As we
expected, soda advance demonstrated good evaluation capabilities for passwords that contain
some user information but overestimates the complexity of passwords when they contain words
not semantically linked to the user. On the other hand, tools that evaluate passwords based
on crack attempts often underestimate the strength of passwords with complex syntax if they
contain information related to the user. However, as also demonstrated for LLMs, considering
the problem of evaluating password strength based on semantics with three levels of strength is
extremely more dificult and the evaluations are less accurate.</p>
        <p>
          Evaluating passwords with a state-of-the-art model. To further investigate the
passwordgeneration capabilities of LLMs, we evaluated the strength of the passwords with PassBERT
[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which is one of the most recent models in the literature for making focused attacks on
passwords. PassBERT uses the fine-tuning paradigm for password-guessing attacks, with a
pre-trained password model and diferent fine-tuning approaches. Among them, we considered
Targeted Password Guessing (TPG) which aims to estimate the number of guesses of cracking
the input password given a set of leaked passwords. For the purposes of our evaluation, we
considered 100 users and their 250 strong passwords generated by LLMs. Moreover, we
considered the weak passwords inferred by cupp as leaked passwords. For each strong password,
we evaluated its strength with the PassBERT model and the TPG approach. By considering
250 passwords for each user, we collected a total of 25, 000 strong passwords. The results
showed that among the strong passwords, only the passwords of a small set of users were
inferred by PassBERT. Specifically, PassBERT was able to identify only 22 passwords out of
the 25, 000 evaluated, probably due to the complexity of the syntax of these passwords. In fact,
although the passwords generated by LLMs are based on personal information about the user
and therefore easy to remember, they are also syntactically complex and dificult to crack for
models such as TPG. These results, together with those achieved from the previous evaluation,
underscore the robustness of using LLMs for generating secure passwords semantically related
to the information of the users and highlight the limited efectiveness of an advanced targeted
guessing model, i.e., PassBERT.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion and Future Directions</title>
      <p>We have investigated the threats related to the definition of password when users publicly
share their data on social network platforms. To this end, we have first proposed a new
data reconstruction tool, namely soda advance, capable of reconstructing public user data
and evaluating a password according to them. Moreover, we have designed three diferent
pipelines aiming to evaluate the performance of emerging LLMs, in the generation of strong
passwords and the evaluation of their strength by a new ad-hoc prompting functions based
on automatic and manual prompt engineering approaches. The experimental evaluations with
real users have shown that Claude revealed good capabilities in generating strong passwords
and evaluating password strength based on user data. Moreover, the combination of LLMs
with the soda advance tool has led to significant improvements in the password evaluation
process with LLMs. To further investigate the efectiveness of LLMs and soda advance in
password generation and evaluation, we compared it with state-of-the-art approaches. The
results highlight that LLMs do not perform well in the generation of medium-level passwords.</p>
      <sec id="sec-3-1">
        <title>Instead, the evaluation methods included in soda advance performed better in this task. Finally,</title>
        <p>it has been shown that a very small percentage of strong passwords generated by LLMs succeed
in being leaked by PassBERT’s TPG model.</p>
        <p>The methodologies and results obtained in this study open the research in several new
directions. Future research could investigate in-depth the understanding and mitigation of
threats, including exploring alternative approaches to password management and authentication
in the context of widespread public data availability. In addition, further investigation could
focus on enhancing the capabilities of the data reconstruction tool to extract a large set of public
information from other Web platforms. Moreover, password strength assessment can be further
explored using LLM by investigating the efectiveness of models trained specifically for this
problem. Finally, emerging trends related to LLMs require further investigation for a better
understanding of how these models treat personal information and whether they comply with</p>
      </sec>
      <sec id="sec-3-2">
        <title>European and global regulations.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>This work was partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU.</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>The author(s) have not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Desiato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scalera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Solimando</surname>
          </string-name>
          ,
          <article-title>A visual privacy tool to help users in preserving social network data, in: Proceedings of the Workshops, Work in Progress Demos and Doctoral Consortium at the IS-EUD 2023 co-located with the 9th International Symposium on End-User Development (IS-EUD</article-title>
          <year>2023</year>
          ), Cagliari, Italy, June 6-8,
          <year>2023</year>
          , volume
          <volume>3408</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Atzori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Calò</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Caruccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          , G. Polese, G. Solimando,
          <article-title>Evaluating password strength based on information spread on social networks: A combined approach relying on data reconstruction and generative models</article-title>
          ,
          <source>Online Social Networks and Media</source>
          <volume>42</volume>
          (
          <year>2024</year>
          )
          <fpage>100278</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cerruto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Desiato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          , G. Polese,
          <article-title>Social network data analysis to highlight privacy threats in sharing data</article-title>
          ,
          <source>Journal of Big Data</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Mebus</surname>
          </string-name>
          ,
          <article-title>Common user password profiler</article-title>
          ,
          <year>2019</year>
          . URL: https://github.com/Mebus/cupp, accessed
          <issue>20</issue>
          <year>March 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <article-title>Leet usage and its efect on password security</article-title>
          ,
          <source>IEEE Transactions on Information Forensics and Security</source>
          <volume>16</volume>
          (
          <year>2021</year>
          )
          <fpage>2130</fpage>
          -
          <lpage>2143</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Personal information in passwords and its security implications</article-title>
          ,
          <source>IEEE Transactions on Information Forensics and Security</source>
          <volume>12</volume>
          (
          <year>2017</year>
          )
          <fpage>2320</fpage>
          -
          <lpage>2333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>X.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <article-title>A password strength evaluation algorithm based on sensitive personal information</article-title>
          ,
          <source>in: Proceedings of the IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)</source>
          <year>2020</year>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1542</fpage>
          -
          <lpage>1545</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          , G. Neubig,
          <article-title>Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>P. europeo</surname>
          </string-name>
          <article-title>e del Consiglio, Regolamento (ue) 2016/679 relativo alla protezione delle persone ifsiche con riguardo al trattamento dei dati personali</article-title>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Wheeler</surname>
          </string-name>
          , zxcvbn:
          <article-title>Low-Budget password strength estimation</article-title>
          ,
          <source>in: Proceedings of the 25th USENIX Security Symposium (USENIX Security 16)</source>
          , USENIX Association, Austin, TX,
          <year>2016</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>173</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Han,
          <article-title>Chunk-level password guessing: Towards modeling refined password composition representations</article-title>
          , in: Y.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Vigna</surname>
          </string-name>
          , E. Shi (Eds.),
          <source>Proceedings of the CCS '21: 2021 ACM SIGSAC Conference on Computer and Communications Security</source>
          , Virtual Event,
          <source>Republic of Korea, November 15 - 19</source>
          ,
          <year>2021</year>
          , ACM,
          <year>2021</year>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1145/3460120.3484743.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Veras</surname>
          </string-name>
          , C. Collins,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thorpe</surname>
          </string-name>
          ,
          <article-title>A large-scale analysis of the semantic password model and linguistic patterns in passwords</article-title>
          ,
          <source>ACM Transactions on Privacy and Security</source>
          <volume>24</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          . doi:
          <volume>10</volume>
          .1145/3448608.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Wu, W. Han,
          <article-title>Improving real-world password guessing attacks via bi-directional transformers</article-title>
          ,
          <source>in: Proceedings of the 32nd USENIX Security Symposium (USENIX Security 23)</source>
          , USENIX Association, Anaheim, CA,
          <year>2023</year>
          , pp.
          <fpage>1001</fpage>
          -
          <lpage>1018</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>