TypoAlert: a browser extension against typosquatting Francesco Blefari1,2,* , Angelo Furfaro1 , Giovambattista Ianni1 and Alessandro Viscomi 1 University of Calabria, Rende (CS), Italy 2 IMT Schools for Advanced Studies, Lucca (LU), Italy Abstract Nowadays, web browsing has become ubiquitous, with virtually everyone navigating the internet and routinely entering website addresses. However, frequent typing can lead to errors, resulting in the inadvertent input of incorrect domain names. One prevalent risk stemming from such mistakes is known as typosquatting, where users inadvertently land on maliciously crafted websites due to minor typing errors. By exploiting typographical errors made by users, typosquatting represents a malicious tactic wherein attackers capitalize on such mistakes to redirect unwitting victims to entirely different or deceptively similar websites. While various techniques and tools have been developed to mitigate this threat, currently, there is a notable absence of user-friendly tools available to everyday web users. This paper describes TypoAlert, a Chrome-based extension engineered to address this gap in defense against typosquatting. TypoAlert is meticulously crafted to analyze, detect, and promptly alert users in real-time about the legitimacy of the web domains they are visiting. Keywords TypoSquatting, URL Hijacking, Privacy, Phishing 1. Introduction In the current panorama of digital threats, cybersquatting represents an illicit activity aimed to hijack domain names that correspond to trademarks or famous personalities. Over time, this threat has evolved into the phenomenon known as typosquatting: a threat based on typing errors made by users when entering a URL into their browser. The attackers, called (typosquatters), register domains that contain spelling errors compared to legitimate domains, taking advantage of people’s inevitable oversights. This kind of attack is particularly effective when the reference domain is frequently visited because even a small percentage of user typing errors generates a significant flow of traffic to typosquatted sites. Typosquatted sites are web sites whose domain name are similar to legitimate domain name, and can host a wide range of content aimed to generate profits through advertising and often containing malicious elements and/or redirects to malicious websites. Usually, the attackers exploit typosquatted sites to conduct attack campaigns, such as phishing, or even to steal sensitive user information. Prior research [1] indicates that a considerable percentage, ranging from 10% to 20%, of manually entered URLs contain errors. For instance, an average user who SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy * Corresponding author. $ francesco.blefari@unical.it (F. Blefari); angelo.furfaro@unical.it (A. Furfaro); ianni@unical.it (G. Ianni); a.viscomi00@gmail.com (A. Viscomi) Β€ https://blefari.xyz/ (F. Blefari); https://angelo.furfaro.dimes.unical.it/ (A. Furfaro); https://www.mat.unical.it/ianni/ (G. Ianni)  0009-0000-2625-631X (F. Blefari); 0000-0003-2537-8918 (A. Furfaro); 0000-0003-0534-6425 (G. Ianni) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings erroneously inputs the URL of a popular website has a 1 out of 14 probability of landing on a typosquatted domain [1]. The consequences of typosquatting are profound and far-reaching. Companies suffer not only from traffic declines but also from subsequent financial losses, while users remain persistently vulnerable to potential online scams. Despite the wealth of studies conducted in this domain [2], and the presentation of several prototype anti-typosquatting tools in the past, there exists a lack of practical and effective solutions available to the general public, particularly those offering real-time assistance to users. This paper presents the TypoAlert extension for chrome-based browsers and shows how combining multiple anti-typosquatting methods into an integrated framework is possible in order to implement an effective detection tool against typosquatting. Remarkably, as our methods are not machine learning-based, they do not require cycles of training on input datasets; thus the maintenance effort of TypoAlert is reduced to the bare minimum. Moreover, we assessed the effectiveness of TypoAlert on an appropriate set of domain names. Section 2 summarizes some previous studies on typosquatting and highlighting the necessary background information. Section 3 briefly presents our typosquatting detection methodology and illustrates the experimental results obtained during the evaluation and validation phase. Section 4 presents the chrome-based extension, its features and how it works. Section 5 draws the conclusions and indicates some future research directions. 2. Background and related work The general term cyber-typosquatting might refer not just to domain name typosquatting but also to package typosquatting [3] and to other forms of typosquatting, like exploitation of typing errors in mobile app names, social media names, etc. Typosquatting remains a widespread and persistent practice, primarily due to the lack of effective solutions to prevent it [1]. Research on the topic can be roughly categorized in: (i) general analyses; (ii) company-centric anti- typosquatting proposals; and, (iii) user-centric anti-typosquatting research. General studies. The study presented in [4] identified over 8800 registered domains within typographic variations compared to popular domain names and more than 90% of these redi- rected to sexually explicit content often designed to make it hard to shut down the offending content. Initially, it was believed that shorter URLs were more susceptible to typosquatting [5], however, [6] indicated that domains with longer names share a similar probability of being subject to typosquatting. Similarly, the popularity of domain names was originally seen as a factor related to typosquat- ting [5]. This assumption has also been revisited; indeed, a shift in typosquatters’ behavior has been identified in [7]: around 95% of typosquatted domains now targets lesser popular domains. During the years, various models for generating typosquatted domains have been proposed. Five primary models have been identified [8]: Missing-dot typos, Character-omission typos, Character-permutation typos, Character-substitution typos, and Character-duplication typos. A subsequent study [9] scrutinizes registered domains whose name has been generated according to each of the five described models, evaluating their saturation. This work also provided valuable insights into the level of awareness regarding various typo domain generation models among distinct online entities. As shown in [9], both malicious and defensive registrations mirror the saturation trends. This implies that both attackers and defenders share a similar perception regarding the typosquatted domains deemed worthy of registration. Existing models were extended by introducing additional approaches in [10]. These include: (i) 1-mod-inplace: involves in replacing all domain name characters, one at a time, with every possible letter of the alphabet; (ii) 1-mod-deflate: entails removing, one at a time, all characters from the domain name; (iii) 1-mod-inflate: involves adding a character to the domain name, systematically considering all possible characters. All these generation models are based on the Levenshtein distance (also known as edit distance) [11, 12]. This metric allows to quantify the similarity between two strings, and is thus a crucial parameter for evaluating the similarity of typo domains to the original domain. However, when considering the character permutation generation model, the more appropriate reference is the Damerau-Levenshtein distance [13] which differs from plain Levenshtein distance by incorporating the operation of transposition between characters, in addition to insertion, deletion, and substitution operations. In [2] it is highlighted that 99% of typosquatted sites exhibit a Damerau-Levenshtein distance of one from their target domains. Company and user-centric anti-typosquatting tools. The pioneer Strider Typo-Patrol tool [8] is meant for discovering large typosquatting campaigns. It employed a multifaceted approach, incorporating (i) a Typo-Neighborhood Generator to produce sets of URLs with potential typos, (ii) a Typo-Neighborhood Scanner to actively analyze domains and record information such as third-party URLs and page content, and (iii) a Domain-Parking Analyzer for in-depth analysis of typosquatted domains. The same work proposed Strider URL Tracer, an instrument meant to allow website owners to monitor typosquatted domains targeting their sites. A comprehensive and relatively recent analysis of typosquatting domain registrations within the .com TLD can be found in [7]. The analysis was conducted using the Yet Another Typosquatting Tool (YATT). Another approach was provided in the now defunct iTrustPage Firefox extension [14], which provided automated identification of legitimate web pages, utilizing user input and external sources such as search engine results, including whitelists and local caches. A browser extension called The Anti Typosquatting Tool (ATST) was proposed in [1]. It provided several features such as: (i) a User Customized Local Repository for monitoring popular domains, (ii) an Edit-distance Computation Module employing the Damerau-Levenshtein distance for typosquatted domain checks, and (iii) a User Customized Local Repository Update Module for dynamic updates based on user interactions. The Stop URL Typo-squatting (SUT) approach, proposed in [5], addresses the broader issue of detecting phony websites, whose domain name is not necessarily typosquatted. This solution integrates autonomous modules for: (i) network-level criteria that assesses URL features (called SUT-net module) (ii) and site popularity assessment that leverages Google search results to evaluate domain legitimacy (called SUT-pop module). Another tool that is also worth mentioning is TypoWriter [15], which anticipates most likely domain variations using Recurrent Neural Networks trained on DNS logs. Unfortunately, at the time of writing, all the above mentioned tools are no longer available on the web. 3. Detection Methodology Behind our developed Chrome-based extension there is a detection algorithm that aims to classify the type of the web domain at hand. Our tool aims to integrate several anti-typosquatting techniques and to provide real-time monitoring, detection and filtering software. We also included additional detection features meant identifying as detection of domain names which are registered, yet are not used and/or intended for bad uses called parked domains. The detection process starts taking in input a domain name 𝑛 and, after analysis, 𝑛 is classified. The output is one of the following categories: NotTypo (𝑛 is not a typosquat), ProbablyNotTypo, ProbablyTypo, Typo, ProbablyTypoPhishing, TypoPhishing, TypoMalware, and is built according to a score (av, alert value) and to the value of a phishing indicator (ph) which are both obtained as outcomes of the evaluation step. A pre-filtering step is achieved by considering two lists: a blacklist (BL) and a whitelist (WL). The blacklist leverages the BlackBook list, an historical (black)list of malicious domains created as part of the periodic automated heuristic check (i.e. WHOIS, HTTP, etc.) of newly reported entries from public lists of malicious URLs [16]. The BlackBook blacklist is used to check whether the domain at hand is considered malware; if so, the domain is marked as a 𝑇 π‘¦π‘π‘œπ‘€ π‘Žπ‘™π‘€π‘Žπ‘Ÿπ‘’ domain. Let vd be the domain name eventually reached from 𝑛 after following a potential chain of redirects. If 𝑛 ∈ π‘Š 𝐿 or vd ∈ π‘Š 𝐿, we give to 𝑛 the minimum alert value, i.e. 0, classifying it as NotTypo. The WL list is constructed using a Top Domain Repository (TDR), giving at the same time the capability of adding more domains using the User Domain Repository (UDR); this latter can be populated directly by using the web page related to the developed extension. The WL is a list that can be reasonably assumed to be reliable and authentic built considering the top domains provided by Data4Seo [17]. Data4Seo website allows to export data concerning the top 1000 national web domains for each of the 74 distinct nations available and also the 1000 web domains with the highest ranking worldwide. We added to π‘Š 𝐿 all top domains present on Data4Seo website (for a total of around 32000 distinct domain names) and the user added trusted domains. TDR cannot be modified by the user, which can however customize the complementary UDR, which is initially empty. Afterwards, we build a set 𝐢𝑇 of candidate targets. 𝐢𝑇 is built by considering each element having DL-distance equals to 1 from 𝑛 taken from: (i) π‘Š 𝐿, (ii) the top 10 domain names resulting by querying a search engine with 𝑛 as the search keyword and whenever available, (iii) the domain name dym ("Did you mean?" domain), i.e. the domain name suggested by the search engine at hand as the inferred correct search keyword. Once the 𝐢𝑇 list is built, the evaluation step starts by computing the Parking Alert (PARKA) indicator which is set to either 0 or 1 according to an analysis based on a set of keyphrases, in different speaking languages, usually present in parked web pages. Then for each element 𝑐𝑑 ∈ 𝐢𝑇 we evaluate the Top 10 Alert (T10A) indicator, the Did You Mean Alert (DYMA) indicator and the Phishing Alert (PHA) indicator. The T10A indicator considers the result list obtained by querying the input domain 𝑛 on a search engine; we compute the T10A𝑐𝑑 score. This indicator returns: (i) 1 if 𝑛 is present in the resulting list; (ii) -1 if 𝑛 is not present in the resulting list; (iii) 0 in all other cases. The DYMA indicator is based on the concept of domain popularity, and it exploits the suggested ProbablyTypo PHA = 0 is NotTypo ProbablyTypo PHA = 1 ProbablyTypoPhishing av > 1 n in WL Candidate Typo isPresent(ctypo) av = 0 ProbablyNotTypo n in BL Typo PHA = 0 av = 1 TypoMalware is Typo PHA = 1 TypoPhishing Figure 1: Labelling Algorithm sites coming from a search engine about possible typing errors in 𝑛. Similarly to the previous indicator, it returns a score that we call DYMA𝑐𝑑 that is set to 1 if 𝑛 triggers the suggestion of 𝑐𝑑 in the search engine and is set to 0 otherwise. Last but not least, there is the PHA𝑐𝑑 indicator that evaluates the similarity degree between the web page related to the input domain 𝑛 and the web page related to 𝑐𝑑. This evaluation is carried out using fuzzy hashing [18] and returns the score value 0 or 1. Based on the above indicators, the alert value π‘Žπ‘£ is computed as follows: if 𝑛 ∈ WL ⎧ ⎨ 0 π‘Žπ‘£ = 7 if 𝑛 ∈ BL 2 + PARKA + π‘Žπ‘£|𝐢𝑇 otherwise ⎩ where π‘Žπ‘£|𝐢𝑇 = maxπ‘π‘‘βˆˆπΆπ‘‡ {(T10A𝑐𝑑 + DYMA𝑐𝑑 + PHA𝑐𝑑 )}. Along with π‘Žπ‘£ we obtain the phishing alert (π‘β„Ž) value as PHA𝑐𝑑* where 𝑐𝑑* is one of the arguments for which π‘Žπ‘£|𝐢𝑇 is reached and for which PHA is maximal, i.e. 𝑐𝑑* = π‘Žπ‘Ÿπ‘” max {PHA(𝑐𝑑)|T10A𝑐𝑑 + DYMA𝑐𝑑 + PHA𝑐𝑑 = π‘Žπ‘£|𝐢𝑇 }. π‘π‘‘βˆˆπΆπ‘‡ Finally, in the last step (see Figure 1) we label 𝑛 according to π‘Žπ‘£ and π‘β„Ž: for π‘Žπ‘£ = 0, we assign the label 𝑁 π‘œπ‘‘π‘‡ π‘¦π‘π‘œ; for π‘Žπ‘£ = 1, we assign the label 𝑃 π‘Ÿπ‘œπ‘π‘Žπ‘π‘™π‘¦π‘ π‘œπ‘‘π‘‡ π‘¦π‘π‘œ; for π‘Žπ‘£ = 2 we assign either the label 𝑃 π‘Ÿπ‘œπ‘π‘Žπ‘π‘™π‘¦π‘‡ π‘¦π‘π‘œ or 𝑃 π‘Ÿπ‘œπ‘π‘Žπ‘π‘™π‘¦π‘‡ π‘¦π‘π‘œπ‘ƒ β„Žπ‘–π‘ β„Žπ‘–π‘›π‘” depending on the value of π‘β„Ž, respectively if 0 or 1; for π‘Žπ‘£ = 7 we assign the label 𝑇 π‘¦π‘π‘œπ‘€ π‘Žπ‘™π‘€π‘Žπ‘Ÿπ‘’ while for any value π‘Žπ‘£ ∈ [3, 6] we assign either 𝑇 π‘¦π‘π‘œπ‘ƒ β„Žπ‘–π‘ β„Žπ‘–π‘›π‘” if π‘β„Ž = 1 or 𝑇 π‘¦π‘π‘œ if π‘β„Ž = 0. To assess the effectiveness of the classification techniques which TypoAlert is based on, we conducted an evaluation utilizing a purposely constructed dataset, named 𝑇 𝑆, including a set Receiver Operating Characteristic (ROC) Curve Confusion Matrix 1.0 4000 NotTypo 0.8 567 45 3500 True Positive Rate (TPR) 3000 Manual Values 0.6 2500 2000 0.4 1500 1 4493 Typo 1000 0.2 500 ROC curve 0.0 Random Guess NotTypo Typo 0.0 0.2 0.4 0.6 0.8 1.0 Predicted Values False Positive Rate (FPR) (a) ROC curve (b) Confusion matrix Figure 2: of potential typosquatted domains. To build the ground truth, each domain 𝑑 ∈ 𝑇 𝑆 has been manually analyzed and classified as being or not a typosquatted domain. Then we compared the results with the outcomes achieved by our classifier. To build the 𝑇 𝑆 dataset we started from the set 𝑇 π‘œπ‘, comprising the top 1000 websites globally ranked on Google, as per DataForSEO [17]. We extracted a subset of 300 domains by uniformly sampling 𝑇 π‘œπ‘ and using the open source tool ail-typo-squatting [19], we built a set containing all domain names having a Damerau-Levenshtein distance from 𝑑’s name which is equal to 1. Then we extracted a subset of all domain names 𝑑 such that (i) 𝑑 was actually registered in a DNS at the time of construction of the dataset; and (ii) there was an active web server responding (directly or indirectly) to HTTP(S) requests made to 𝑑. Finally, we obtained 𝑇 𝑆 . The final dataset TS includes potential 5106 typo domains. During the evaluation phase we conducted an analysis on the tool accuracy and we compared it with ground truth obtained manually. During the manual classification we labelled domains as (i) Typo: designated for domains considered malicious; (ii) NotTypo: assigned to either a legitimate domain or a domain that redirects to the legitimate domain. Note that, to mitigate the role of human subjectivity in manual annotations, we opted for building binary ground truth values. However, since TypoAlert produces a score value between 0 and 7, data have been validated by mapping our scores to ground truth. We consider an aggregation threshold 𝑑, and we build a family of binary classifiers each denoted by the two classes 𝑁 π‘œπ‘‘π‘‡ π‘¦π‘π‘œπ‘‘ = {π‘₯ ∈ 𝑇 𝑆 | 𝑠(π‘₯) < 𝑑}, and 𝑇 π‘¦π‘π‘œπ‘‘ = 𝑇 𝑆 βˆ– 𝑇 π‘¦π‘π‘œπ‘‘ . We identified the classifier that maximizes the TPR/FPR Ratio (True positive rate divided by False Positive Rate), as the one obtained for 𝑑 = 2. The Receiver Operating Characteristic (ROC) curve shows the trade-off between True Positive Rate and False Positive Rate of each classifier built among various score thresholds, as depicted in Figure 2a. Figure 2b depicts the confusion matrix for 𝑑 = 2, where 5060 over 5106 domains with a 99.0% of domains were correctly classified. 4. The extension We took several design choices in developing TypoAlert. First things first, as our software would be a browser extension, we have chosen to support all Chrome-based browsers. TypoAlert aims to improve the user experience in browsing the web without being pervasive for the users. To carry out this goal, TypoAlert, once installed in the browser, shows as the only visible additional feature, an icon in the dedicated extension section. This icon changes its color based on the web site present on the active tab. These colors have been chosen to give users a rapid evaluation measure of the domain kind they are visiting and may vary according to the Figure 3. Given a domain name 𝑛 the TypoAlert icon can assume a different color: (i) Blue: if the analysis is not started yet; (ii) Dark-Red: if 𝑛 is marked as TypoMalware or TypoPhishing; (iii) Red: 𝑛 is marked as Typo; (iv) Yellow: if 𝑛 is marked as ProbablyTypo or ProbablyTypoPhishing; (v) Green-Yellow: if 𝑛 is marked as ProbablyNotTypo; (vi) Green: if 𝑛 is marked as NotTypo. Figure 3: Different colours of the toggle extension. Different result values returned by analysis involve different (or none) alert notification. If the extension’s analysis indicates a label among ProbablyTypo, Typo, TypoPhishing, Probablyty- poPhishing or TypoMalware (colors from yellow to dark red), an alert appears, warning the user about the detected severity level. When the analysis returns 𝑁 π‘œπ‘‘π‘‡ π‘¦π‘π‘œ or 𝑃 π‘Ÿπ‘œπ‘π‘Žπ‘π‘™π‘¦π‘ π‘œπ‘‘π‘‡ π‘¦π‘π‘œ no alert is given and the user is allowed to visit the related web page, in this case the TypoAlert icon becomes either Green or Green-Yellow. If a typosquatting attempt is detected, the extension’s icon becomes red and an alert about the domain classification is shown. In Fig. 4a it is depicted the alert that appears when the domain 𝑛 is a typo and it is visited for the first time. It was highlighted before that the Phishing Alert indicator evaluates if a web domain is malicious and aims to conduct a phishing attack. If the Phishing Alert indicates that a web domain is a possible phishing web domain, the user is notified using a specific pop-up alert highlighting this special kind (malicious) of the web domain. Moreover, we inserted in the extension a caching mechanism that helps in avoiding multiple evaluations about the same site. Domain names classified as typosquatted are retained in the extension cache. If a domain name 𝑏 has been classified as typosquatted in the last 30 days the web page of 𝑏 is blocked and replaced by a notification page as depicted in Fig. 4b. Users can always access the extensions options and add misclassified domains to the verified user whitelist, excluding them from the analysis. 5. Conclusions and future work In this paper we presented TypoAlert, a tool for detecting typosquatted sites that, combining some of the known simplest yet provably effective practices, is able to detect a relevant number of (a) Alert popup for a typosquat domain (b) Notification page for a typosquat domain Figure 4: TypoAlert notification. typosquatted domains. The validation phase proves the effectiveness of the approach. As future work, we are planning to enrich TypoAlert with features that tackle typosquatting from an even more user-centric perspective, in the spirit of dynamic skins [20]. TypoAlert hase been released under LGPL license and it can be downloaded from https://github.com/aleviscomi/typoalert. Acknowledgments This work was partially supported by projects SERICS (PE00000014) and FAIR (PE0000013) under the MUR National Recovery and Resilience Plan funded by the European Union - NextGenera- tionEU. References [1] G. Chen, M. F. Johnson, P. R. Marupally, N. K. Singireddy, X. Yin, V. Paruchuri, Combating typo-squatting for safer browsing, in: 2009 International Conference on Advanced Infor- mation Networking and Applications Workshops, 2009, pp. 31–36. doi:10.1109/WAINA. 2009.98. [2] J. Spaulding, S. Upadhyaya, A. Mohaisen, The landscape of domain name typosquatting: Techniques and countermeasures, in: 2016 11th International Conference on Availability, Reliability and Security (ARES), 2016, pp. 284–289. doi:10.1109/ARES.2016.84. [3] M. Taylor, R. Vaidya, D. Davidson, L. De Carli, V. Rastogi, Defending against package typosquatting, in: Network and System Security: 14th International Conference, NSS 2020, Melbourne, VIC, Australia, November 25–27, 2020, Proceedings, Springer-Verlag, Berlin, Heidelberg, 2020, p. 112–131. doi:10.1007/978-3-030-65745-1_7. [4] B. Edelman, Large-scale registration of domains with typographical errors, https://cyber. harvard.edu/archived_content/people/edelman/typo-domains/, 2003. Harvard University. [5] A. Banerjee, M. S. Rahman, M. Faloutsos, Sut: Quantifying and mitigating url typosquatting, Computer Networks 55 (2011) 3001–3014. doi:10.1016/j.comnet.2011.06.005. [6] T. Moore, B. Edelman, Measuring the Perpetrators and Funders of Typosquatting, Springer Berlin Heidelberg, 2010, pp. 175–191. doi:10.1007/978-3-642-14577-3_15. [7] J. Szurdi, B. Kocso, G. Cseh, J. Spring, M. Felegyhazi, C. Kanich, The long β€œTaile” of typosquatting domain names, in: 23rd USENIX Security Symposium (USENIX Security 14), USENIX Association, San Diego, CA, 2014, pp. 191–206. URL: https://www.usenix.org/ conference/usenixsecurity14/technical-sessions/presentation/szurdi. [8] Y.-M. Wang, D. Beck, J. Wang, C. Verbowski, B. Daniels, Strider Typo-Patrol: Discovery and analysis of systematic Typo-Squatting, in: 2nd Workshop on Steps to Reducing Unwanted Traffic on the Internet (SRUTI 06), USENIX Associ- ation, San Jose, CA, 2006, p. 6. URL: https://www.usenix.org/conference/sruti-06/ strider-typo-patrol-discovery-and-analysis-systematic-typo-squatting. [9] P. Agten, W. Joosen, F. Piessens, N. Nikiforakis, Seven months’ worth of mistakes: A longitudinal study of typosquatting abuse, in: Proceedings 2015 Network and Distributed System Security Symposium, NDSS 2015, Internet Society, 2015, p. 13. doi:10.14722/ ndss.2015.23058. [10] A. Banerjee, D. Barman, M. Faloutsos, L. N. Bhuyan, Cyber-fraud is one typo away, in: IEEE INFOCOM 2008 - The 27th Conference on Computer Communications, 2008, pp. 1939–1947. doi:10.1109/INFOCOM.2008.258. [11] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet physics. Doklady 10 (1965) 707–710. URL: https://api.semanticscholar.org/CorpusID: 60827152. [12] F. J. Damerau, A technique for computer detection and correction of spelling errors, Commun. ACM 7 (1964) 171–176. doi:10.1145/363958.363994. [13] G. Navarro, A guided tour to approximate string matching, ACM Comput. Surv. 33 (2001) 31–88. doi:10.1145/375360.375365. [14] T. Ronda, S. Saroiu, A. Wolman, Itrustpage: a user-assisted anti-phishing tool, SIGOPS Oper. Syst. Rev. 42 (2008) 261–272. doi:10.1145/1357010.1352620. [15] I. Ahmad, M. A. Parvez, A. Iqbal, Typowriter: A tool to prevent typosquatting, in: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), volume 1, 2019, pp. 423–432. doi:10.1109/COMPSAC.2019.00068. [16] M. Stampar, Blackbook: a historical (black)list of malicious domains, https://github.com/ stamparm/blackbook, 2024. [17] DataForSEO, Top 1000 websites by ranking keywords, https://dataforseo.com/ free-seo-stats/top-1000-websites, 2024. [18] F. Breitinger, B. Guttman, M. McCarrin, V. Roussev, D. White, Approximate matching: definition and terminology, National Institute of Standards and Technology, 2014. doi:10. 6028/nist.sp.800-168. [19] AIL project, Ail-typo-squatting, https://github.com/typosquatter/ail-typo-squatting, 2023. [20] R. Dhamija, J. D. Tygar, The battle against phishing: Dynamic security skins, in: L. F. Cranor (Ed.), Proceedings of the 1st Symposium on Usable Privacy and Security, SOUPS 2005, Pittsburgh, Pennsylvania, USA, July 6-8, 2005, volume 93 of ACM International Conference Proceeding Series, ACM, 2005, pp. 77–88. doi:10.1145/1073001.1073009.