AI-based user identification method for web services Ihor Zakutynskyi1,∗,†, Oleksandr Kalishuk1,†, Maksim Iavich2,†, Vitalii Nebylytsia1,† and Vasyl Yehunko1,† 1 National Aviation University, Liubomyra Huzara Ave. 1, Kyiv, 03058, Ukraine 2 Caucasus University, Paata Saakadze Str., 1, Tbilisi, 0102, Georgia Abstract In our paper, we introduce a universal web service user’s identification method. This method is based on analyzing the digital fingerprint of the visitor using a neural network. Within the scope of our research, we performed a comparative analysis between our developed method and the existing fingerprint detection services. The testing results indicate that the accuracy of fingerprint identification using our method surpasses fingerprint.com by 3.1% on desktop platforms and 6.3% on mobile devices. Furthermore, the utilization of our method significantly reduces the number of false positive errors, thereby enhancing the robustness of user identification against variations in browser and device parameters. Keywords digital fingerprint, user identification, neural network, LSTM 1 1. Introduction In our paper, we present a universal web service user's identification method, which is based on creating a digital fingerprint that is determined using a dataset collected both on the client side (using a JS library) and on the server side (from the HTTP request data from the client) and subsequent analysis by a neural network. The method we have developed for calculating and evaluating a set of parameters using a neural network trained on a test database of users allows for achieving: 1) Greater overall accuracy in user identification, 2) Extended lifespan of the digital fingerprint, 3) Correct cross-browser user identification, 4) Accurate user identification through VPN. Moreover, the user recognition process requires no significant computational resources, maintains a high identification speed, has a low collision rate, and high accuracy [1]. The neural network helps us identify hidden patterns in parameters and allows us to reveal implicit associations among sets of parameters in the digital fingerprints of visitors [2, 3]. At the same time, our method provides strong protection of privacy and security of user data. 2. Background Browser fingerprint or device fingerprint, combined into the concept of a digital fingerprint, is information collected about the software and hardware of a remote device for the purpose of its identification. CH&CMiGIN’24: Third International Conference on Cyber Hygiene & Conflict Management in Global Information Networks, January 24–27, 2024, Kyiv, Ukraine ∗ Corresponding author. † These authors contributed equally. ihor.zakutynskyi@nau.edu.ua (I. Zakutynskyi); akalishuk@gmail.com (O. Kalishuk); miavich@cu.edu.ge (M. Iavich); tet129@gmail.com (V. Nebylytsia); 7253362@stud.nau.edu.ua (V. Yehunko) 0000-0003-2905-3205 (I. Zakutynskyi); 0009-0008-1577-6473 (O. Kalishuk); 0000-0002-3109-7971 (M. Iavich); 0009-0000- 0154-9909 (V. Nebylytsia); 0000-0002-5316-8996 (V. Yehunko) © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2.1. Fingerprinting techniques The technique of digital fingerprinting has existed for many years. The first mentions of various techniques for obtaining and analyzing digital fingerprints in scientific literature appeared in 2003 [4], and they have been widely studied since 2009 [5]. Since then, many different techniques for determining the digital fingerprint have been described: • JavaScript-Based Fingerprints • CSS-Based Fingerprints • Canvas-Based Fingerprint • Hardware and Software-Based Fingerprints • Fingerprint Based on Audio API • Plugin-Based Fingerprint • TLS Fingerprint • Other Browser Fingerprint Acquisition Technologies (correlation between visitor's gaze and mouse movement; characteristics of HTML parser; font sets (font glyphs); methods based on calculation of JavaScript scripts set execution time; based on user lag time on websites; on the nature of user interaction with touchpad; speed and specificity of typing on keyboard; speed and directions of mouse movement). In most cases, to identify a digital fingerprint, a scheme is used in which code based on a special JS library is executed on the client side. The code performs a set of tests and checks defined by the library and send the received parameters to the server. Usually, the server is deployed as a separate service (Figure 1). Figure 1: General fingerprinting process. All modern methods of identifying digital fingerprinting have both advantages and disadvantages. 2.2. Fingerprinting advantages and disadvantages The main drawbacks of fingerprinting solutions include: • Low user identification accuracy, • Computation time for generating a digital fingerprint, • Time required for matching with previously known digital fingerprints in the system, • Short lifespan of a specific digital fingerprint, • High device load on the user's end, • Dependence on JavaScript, • Challenges in computing a digital fingerprint in homogeneous environments (computer labs, internet cafes, mobile network environments), • Cross-browser digital fingerprinting, • Low accuracy in identifying users operating in incognito mode, • Matching digital fingerprints over VPN. In addition to the mentioned drawbacks of existing methods for digital fingerprinting based on open solutions, ready-made commercial services are characterized by additional disadvantages: • High cost, • Closed source nature, • Data stored on third-party servers, • Dependence on the service provider. In our assessment, there are currently no effective methods that reliably identify a user based on their digital fingerprint over an extended period, especially when using VPN, incognito mode, or engaging in cross-browser surfing. 2.3. The literature review We reviewed some research papers that address the problems of fingerprinting and user identification on the Internet. In [6], the authors reviewed and classified the existing fingerprinting techniques and their applications for user identification on the Internet and analyzed in detail the development of different research directions of browser fingerprinting. Based on the analysis of existing results, the problems faced by different research directions are pointed out. Also, the research achievements in the field of browser fingerprint recognition are summarized and the trend of future development is pointed out. The authors also discussed the privacy issues associated with the use of fingerprinting techniques. The authors of the paper [7] show that GPU information obtained using WebGL and other technologies can be used to create a unique device fingerprint that can be used for user identification. At the same time, the authors note that changing GPU settings and parameters can change the device fingerprint, which makes identification more difficult. In the study [8], the authors demonstrate the correlation between gaze and mouse movements and argue that this serves as a valuable source for obtaining browser fingerprints. Simultaneously, the authors point out that collecting data on a person's gaze in the browser has drawbacks, such as inaccuracies when using a webcam and the limitation that users must grant permission for camera access. The study also reveals that, in the case of computers used by multiple users, browser statistics may malfunction and can no longer differentiate between individuals. In the article [9] authors analyze the popularity of the Transport Layer Security (TLS) protocol on the Internet and its use in censorship circumvention tools. The researchers collected and analyzed a huge volume of real-world TLS traffic to identify the different implementations of TLS clients used on the Internet. Censors can use deep packet inspection (DPI) to identify and block such tools based on their TLS fingerprints. That said, many circumvention tools fail to properly mimic popular TLS implementations, leading to their detection and blocking. To solve the censorship circumvention problem, the authors proposed a solution that allows developers to automatically mimic other popular TLS implementations. Using real-world data, the authors of the paper propose methods to flexibly adapt TLS-fingerprint to the dynamic TLS ecosystem with minimal manual effort. The authors of the paper [10] propose a new mobile device user's identification method based on the study and analysis of touch dynamics, which has stable patterns of interaction between the user and his mobile device, including factors such as touch force, swipe speed and duration of touch. This method has shown excellent results, but its scope is limited to only a subset of mobile devices and depends on the availability of APIs for interacting with physical device elements. In the paper [11], the authors propose a browser fingerprinting defense tool to anonymize users' browsers. The authors show that browser fingerprinting cannot be prevented by the user. Although new methods are constantly being developed that can prevent browser fingerprinting, they cannot prevent it completely. In the article [12], presents new algorithms for encoding and comparing fingerprints, which focus on the values of parameters with low stability and low entropy. 2.4. Benefits of our method The method proposed by the authors allows for: • Improved accuracy in user identification under specified conditions, • Reduce the percentage of false positives, • Increased lifespan of the calculated digital fingerprint, • Maintenance of the speed of digital fingerprint identification at an industry-standard level. All of these improvements are achieved through the implementation of a novel neural network training algorithm. The results of determining the digital fingerprint of a web service user are a non- linear time series consisting of a set of browser and user device parameters and may vary over time [13, 14]. As the practice of the last 10 years shows, recurrent neural networks (RNN) are the most effective architecture for solving time series problems that cannot be solved by feedforward networks [15]. We performed comparative tests of the two most common RNN architectures LSTM and GRU by the methodology described in [16]. The results of the digital fingerprint accuracy tests are presented in Figure 2. Figure 2: LSTM vs GRU comparison. For our solution, we utilized the LSTM architecture as it demonstrated significantly better results over a small number of training epochs (50-100 epochs). This implies that, with equal resource consumption, LSTM yields superior results, which can be expressed by: → !→ " #. (1) 3. Experiment 3.1. Competitor Currently, the majorities of systems for obtaining a digital fingerprint are based on the fingerprint.js library or incorporate some of its functions. This library, one of the earliest to emerge, is dynamically evolving and includes prospective developments that emerge periodically. The library is actively developing, and the project repository is frequently updated. As of December 2023, the latest version is 4 [17]. Starting from this version, the developer has changed the distribution terms, and it is now offered under the Business Source License 1.1. Currently, the FingerprintJS service is considered an industry standard. The service allows for the identification of numerous browser and operating system parameters. The key modules of the fingerprint.js library are outlined in Table 2. Table 1 Key Modules of the fingerprint.js Library Parameter Function Type of Returned value Audio fingerprint getAudioFingerprint() number or Promise Fonts getFonts() string[] Plugins getPlugins() string[] Canvas getCanvasFingerprint() object Touchscreen getTouchSupport() object OS CPU getOsCpu() string | undefined Languages getLanguages() string[][] Color depth getColorDepth() number Memory getDeviceMemory() number Resolution getScreenResolution() [number | null, number | null] Screen frame size getRoundedScreenFrame() [number | null, number | null, number | null] Hardware concurrency getHardwareConcurrency() number | undefined Time zone getTimezone() string Session storage getSessionStorage() boolean Local storage getLocalStorage() boolean Indexed DB getIndexedDB() boolean | undefined Open DB getOpenDatabase() boolean CPU class getCpuClass() string | undefined Platform getPlatform() string Vendor getVendor() string Vendor flavors getVendorFlavors() string[] Cookie enabled areCookiesEnabled() Boolean Ad blockers getDomBlockers() Promise Color gamut getColorGamut() string | undefined Color inverted mode areColorsInverted() boolean | undefined Colors forced areColorsForced() boolean | undefined Monochrome depth getMonochromeDepth() number | undefined Contrast getContrastPreference() number | undefined Reduced motion isMotionReduced() boolean | undefined HDR isHDR() boolean | undefined Math calc getMathFingerprint() Record Font width getFontPreferences() Promise> Video card (WebGL) getVideoCard() object | undefined PDF viewer isPdfViewerEnabled() boolean Architecture getArchitecture() number The general algorithm of operation for the fingerprint.js library is presented in Figure 3. Figure 3: General fingerprinting algorithm. 3.2. Neural network training At the initial stage of preparing data for training the neural network, we have a multidimensional dataset about the user collected in the previous stage. To optimize time and computational resource costs, this multidimensional dataset is transformed into a linear vector. Thus, the neural network receives a one-dimensional vector as input. Next, after normalization, the data is randomly split into testing and training sets in a 30%/70% ratio. Based on the testing set, a prediction is made to determine if the visitor is known in our service, and the prediction result is compared with the result obtained based on the predefined parameters of the model. The schematic process of training the neural network is illustrated in Figure 4. The initial training of the model was conducted using the "Login Data Set for Risk-Based Authentication" dataset from Kaggle [13]. This dataset includes a list of parameters associated with each login attempt. The structure of the dataset is presented in Table 2. Figure 4: Neural network training algorithm. Table 2 Dataset Structure Characteristics Type Range or example Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) User Agent String Chrome/58.0.3029.110 Safari/537.36. Access-Control-Allow-Origin: *, Cache-Control: max-age=604800, Content-Type: multipart/form-data, HTTP Accept Headers String If-Unmodified-Since: Mon, 27 Nov 2023 12:43:00 EET Language String uk Integer (Width x Screen Resolution Height) 2073600 Timezone String Europe/Kiev [PDF Viewer, Chrome PDF Viewer, Chromium PDF List of Browser Plugins Viewer, Microsoft Edge PDF Viewer, WebKit built-in Strings [] PDF] Platform (Operating String Linux x86_64 System) Browser Version String Chrome 119 Integer (in Device Memory 8 gigabytes) String Canvas Fingerprint (hashed or 93a13b9b08d18393f5c731f8f5c58a11 raw data) WebGL Vendor and String WebKit WebGL Renderer Cookies Enabled Boolean TRUE Characteristics Type Range or example Do Not Track (DNT) Boolean FALSE Header ["4274,142 default, cursive, fantasy", "4314,143 sans-serif, Arial, Arimo, Helvetica, Liberation Sans", List of Fonts List "4249,142 serif", Strings [] "3780,149 monospace", "4431,143 system-ui, Ubuntu", "4189,143 aakar"] String Audio Fingerprint (hashed or 13b9b08d18393f5c731f8f5c58a116dcb raw data) Hardware Concurrency Integer 4 Touch Support Boolean FALSE Geolocation Boolean FALSE Connection Speed String 4g Ad Blocker Detection Boolean FALSE Local IP Address String 0.0.0.0 - 255.255.255.255 WebRTC Leak Boolean TRUE Float Battery Level 78.2 (percentage) CPU Cores Integer 4 Device Type String Desktop Hash of User Identity String 52d84b11737d980aef856699f885ca86 Information (hashed) 3.3. Experiment conditions To perform an experiment comparing the effectiveness of the developed method and the method of digital fingerprinting using the FingerprintJS service, a set of parameters from 2134 devices of different types (desktop computers, mobile devices, tablets) and a set of user agents that was generated using the npm package User-Agents [18] were used. User-Agents are a JavaScript package for generating random user agents based on how often they are used in a real environment. The generated data includes hard-to-find browser fingerprint properties, and powerful filtering capabilities allow the generated user agents to be constrained to fit specific needs. An experiment to measure the qualitative performance of the developed web service user identification method was performed on the current web service using the algorithm that is shown in Figure 5. 3.4. The results of the experiment The results of the experiment are summarized in Tables 3 – 5. Figure 5: Algorithm of the experiment. Table 3 Comparison Results: Desktop Develo Standa Develo Standa Develo Standa Develo Standa Devel Stand Method ped rd ped rd ped rd ped rd oped ard Platform MacIntel Linux Windows Android Total Total 226 188 871 329 1614 executions Accuracy, % 93,1 91,5 94,4 89,1 93,8 91,3 93,6 89,3 93,7 90,7 False positive 8 12 5 15 28 60 11 18 52 105 False negative 8 7 5 6 26 16 10 18 49 47 Duration, ms 59 78 71 69 54 55 77 82 61 65 Table 4 Comparison Results: Mobile Devel Stand Devel Stand Devel Stand Devel Stand Devel Stand Deve Stan Method oped ard oped ard oped ard oped ard oped ard loped dard Android Android Android Platform iPhone Linux Total type 1 type 2 type 3 Total 73 114 93 17 122 419 executions Accuracy, % 98,9 87,1 96,5 88,7 95,2 91,2 95,2 88,3 94,6 91,4 96,0 89,7 False positive 0 6 2 6 2 6 0 1 4 7 8 26 False 0 3 2 7 2 2 0 1 3 3 7 16 negative Duration, ms 156 152 164 168 181 164 160 138 143 146 160 157 Table 5 Comparison Results: Tablet Standar Develope Standar Develop Standa Method Developed Standard Developed d d d ed rd Platform Android type 3 iPad Android type 1 Total Total 19 28 54 101 executions Accuracy, % 97,2 88,4 94,3 87,1 93,8 90,6 94,6 89,2 False positive 0 1 1 2 2 4 3 7 False negative 0 1 1 2 2 1 3 4 Duration, ms 92 96 110 121 99 104 101 107 4. Conclusions The accuracy comparison data for digital fingerprint identification indicate that for desktop computers, the accuracy of the existing identification method (FingerprintJS) is 90.7%, while the accuracy of our developed method is 93.7%, representing a 3.1% improvement. For mobile devices, the accuracy of the existing user identification method (FingerprintJS) is 89.7%, whereas the accuracy of our developed method is 96%, showcasing an improvement of 6.3%. In the case of tablets, the accuracy of the existing identification method (FingerprintJS) is 89.2%, which is 5.4% lower than that of our developed method (94.6%). The weighted average accuracy of the method developed by us is 3.8% higher than the existing method (94.2% versus 90.4%). The stability of the algorithm directly depends on reducing the percentage of false positives and false negatives in user identification. The stability of the algorithm can be determined using equation () + (+ $ %&'& , (2) () + (+ + ,) + ,+ where TP - true positive, TN - true negative, FP - false positive, FN - false negative. The method developed by us shows a lower number of false positive fingerprint identification results on all investigated platforms: • Desktop computers: 52 versus 105, • Mobile devices: 8 versus 26, • Tablets: 3 versus 7. The weighted average number of false positive errors for the developed method is 41.0, compared to 84.9 for the existing method. The number of false negative results in digital fingerprint identification is comparable for both methods on all investigated platforms, with the advantage of the developed method being notably better only on mobile devices: • Desktop computers: 49 versus 47, • Mobile devices: 7 versus 16, • Tablets: 3 versus 4. The weighted average number of false negative errors for the developed method is 38.6, compared to 38.9 for the existing method. According to formula (2), with a decrease in the number of errors, the overall stability of the method increases. Based on the results obtained, due to a significant reduction in the number of false positive results for the developed method, its stability to changes is higher by 2% compared to the results of the existing method. The number of false negative results is comparable, so it did not significantly impact the final comparison result. The duration of the identification process using the developed method varies in the ranges of 59- 77 ms for desktop computers, 143-181 ms for mobile devices, and 92-110 ms for tablets. Based on the comparison results, it can be concluded that the speed of user identification using the developed method is comparable to the speed of identification using existing modern methods. The analysis of the obtained results shows that the developed method has higher accuracy on all investigated types of devices and platforms. Additionally, it exhibited a lower overall error rate in the accuracy of identification and comparable speed in the process of digital fingerprint determination. Declaration on Generative AI The author(s) have not employed any Generative AI tools. References [1] J. S. Al-Azzeh, M. Al Hadidi, R. S. Odarchenko, S. Gnatyuk, Z. Shevchuk, Z. Hu, Analysis of self- similar traffic models in computer networks, International Review on Modelling and Simulations 10(5) (2017) 328–336. doi: 10.15866/iremos.v10i5.12009. [2] M. Zaliskyi, R. Odarchenko, S. Gnatyuk, Y. Petrova, A. Chaplits, Method of traffic monitoring for DDoS attacks detection in e-health systems and networks, CEUR Workshop Proceedings 2255 (2018) 193–204. URL: https://ceur-ws.org/Vol-2255/paper18.pdf. [3] V. Tkachuk, Y. Yechkalo, S. Semerikov, M. Kislova, Y. Hladyr, Using mobile ICT for online learning during COVID-19 lockdown, Communications in Computer and Information Science, 1308 (2021) 46–67. doi: 10.1007/978-3-030-77592-6_3. [4] A. Hintz, Fingerprinting websites using traffic analysis, in: R. Dingledine, P. Syverson (Eds.), Privacy Enhancing Technologies. PET 2002, volume 2482 of Lecture Notes in Computer Science, Springer, Berlin, 2003. doi: 10.1007/3-540-36467-6_13. [5] J. R. Mayer, Any person a pamphleteer: Internet Anonymity in the Age of Web 2.0, Princeton University, 2009, Undergraduate Senior Thesis. URL: http://arks.princeton.edu/ark:/88435/dsp01nc580n467. [6] D. Zhang, J. Zhang, Y. Bu, B. Chen, C. Sun, T. Wang, A survey of browser fingerprint research and application, Wireless Communications and Mobile Computing, 2022. doi: 10.1155/2022/3363335. [7] DRAWNAPART: A Device Identification Technique based on Remote GPU Fingerprinting, 2022. URL: https://arxiv.org/abs/2101.03793. [8] W. Fuhl, N. I. Sanamrad, E. Kasneci, The Gaze and Mouse. Signal as additional Source for User Fingerprints in Browser, 2022. URL: https://arxiv.org/abs/2101.03793. [9] E. Wustrow, S. Frolov, (University of Colorado Boulder), The use of TLS in Censorship Circumvention, doi:10.14722/ndss.2019.23511. [10] B. Pelto, M. Vanamala, R. Dave, Your Identity is Your Behavior -- Continuous User Authentication based on Machine Learning and Touch Dynamics, 2022. URL: https://arxiv.org/abs/2305.09482. [11] D. Moad, V. Sihag, G. Choudhary, Fingerprint defender: Defense against browser-based user tracking. In: I. You, H. Kim, TY., Youn, F. Palmieri, I. Kotenko (Eds.), Mobile Internet Security. MobiSec 2021, volume 1544 of Communications in Computer and Information Science, Springer, Singapore, 2021. doi: 10.1007/978-981-16-9576-6_17. [12] M. Gabryel, K. Grzanek, Y. Hayashi, Browser Fingerprint Coding Methods Increasing the Effectiveness of User Identification in the Web Traffic, Journal of Artificial Intelligence and Soft Computing Research 10(4) (2020). doi: 10.2478/jaiscr-2020-0016. [13] Login Data Set for Risk-Based Authentication, 2022. URL: https://www.kaggle.com/datasets/dasgroup/rba-dataset. [14] O. I. Abiodun, A. Jantan, A. E. Omolara, K. V. Dada, N. A. Mohamed, H. Arshadf, State-of-the- art in artificial neural network applications: A surveyб Heliyon. 4(11):e00938 (2018). doi: 10.1016/j.heliyon.2018.e00938. [15] A. Lheureux, Feed-forward vs feedback neural networks, 2022. URL: https://blog.paperspace.com/feed-forward-vs-feedback-neural-networks/. [16] L.V. Sibruk, I.V. Zakutynskyi, Recurrent Neural Networks for Time Series Forecasting. Choosingthe best Architecture for Passenger Traffic Data. Automation and computer-integrated technologies 2(72) (2022) 38–44. doi: 10.18372/1990-5548.72.16941. [17] GitHub, fingerprintjs/fingerprintjs: Browser fingerprinting library, 2023. URL: https://github.com/fingerprintjs/fingerprintjs. [18] User Agents, 2022. URL: https://www.npmjs.com/package/user-agents.