A Natural Language Processing-based Approach for Cyber Risk Assessment in the Healthcare Ecosystems Stefano Silvestri1,* , Giuseppe Tricomi1,2,3 , Giuseppe Felice Russo1 and Mario Ciampi1 1 Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), via Pietro Castellino 111, Naples, 80131, Italy 2 Università degli Studi di Messina, Contrada di Dio 1, Messina, 98166, Italy 3 CINI—Consorzio Interuniversitario Nazionale per l’Informatica, Via Ariosto 25, Roma, 00185, Italy Abstract The cyber risk in the healthcare sector is constantly increasing, due the large adoption of digital services formed by a complex interconnection of different systems and technologies, which offer a larger attack surface for the attackers. Therefore, the risk assessment of the assets involved in these services is crucial to prevent and mitigate possible critical consequences, which could also affect the health of the patients. A large source of constantly updated information about threats and vulnerabilities of the assets of the healthcare ecosystems is available in natural language text on the Internet (cyber security news, forum, social media, etc.), but it is not easy to fully exploit them for a risk assessment process, due to the complexity of natural language. This paper proposes an AI-based approach for the individual risk assessment of the assets of digital healthcare systems based on the use of NLP and Knowledge Bases, which exploits the information extracted from natural language news from the web. The methodology has been developed within the activities of the EC-funded H2020 AI4HEALTHSEC project, where it has also been successfully tested in real-world scenarios. Moreover, the datasets collected have been made publicly available on the SoBigData research infrastructure. Keywords Natural Language Processing, Large Language Models, Cyber Threats, Cyber Vulnerabilities, Impact Assessment, Cyber Risk Assessment 1. Introduction could lead to the web exposure of sensitive information of patients, or an attack to a remote monitoring software The healthcare ecosystem is rapidly adopting a grow- of a medical device could damage the equipment of the ing number of recent technologies, such as Internet hospital or change the configuration of the device [4]. of Things (IoT), wearable and implantable devices, Pic- This sector has recently suffered several serious cyber ture Archiving and Communication System (PACS), Elec- attacks: for example, in 2017 and 2021 there were ran- tronic Health Records (EHRs), DiCOM images, and oth- somware attacks on U.K. National Health System (NHS) ers, interconnected to realise and offer innovative health- and Ireland’s Department of Health and Health Service care digital services. While their adoption and use im- Executive respectively [5]. Furthermore, inherent vul- prove the quality of service to patients, and support and nerabilities have been found in some medical devices ease the work of the physicians and the medical profes- such as Braun’s infusion pump and Medtronic’s insulin sionals, on the other hand, this complex and dynamic pump [3]. Finally, approximately 90% of healthcare or- inter-connection of several systems offers a larger at- ganisations experienced a data breach in 2018 [6]. For tack surface for the threat actors interested in attacking these reasons, it is necessary to study the most frequent the system by exploiting the existing vulnerabilities [1], attacks in healthcare to make the services offered more also taking into account a low level of awareness of the secure and resilient [4, 7]. Due to the complexity of the cyber risks by the the healthcare personnel [2], often healthcare ecosystems, performing an effective cyber risk causing dramatic impacts to the healthcare ecosystem assessment can help to limit and prevent the cyber secu- [3]. In example, a cyber-attack on a insecure PACS server rity incidents [8]. The cyber risk assessment process has the purpose of identifying, evaluating, and prioritising Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- security risks to the assets of an organisation, allowing nized by CINI, May 29-30, 2024, Naples, Italy * Corresponding author. to perform the most appropriate action to mitigate the $ stefano.silvestri@icar.cnr.it (S. Silvestri); risks and the vulnerabilities. giuseppe.tricomi@icar.cnr.it (G. Tricomi); Internet is a constantly updated source of threat, inci- giuseppefelice.russo@icar.cnr.it (G. F. Russo); dent, and vulnerability-related information for healthcare mario.ciampi@icar.cnr.it (M. Ciampi) ecosystem assets in the form of unstructured Natural Lan-  0000-0002-9890-8409 (S. Silvestri); 0000-0003-3837-8730 (G. Tricomi); 0009-0001-2090-9647 (G. F. Russo); guage (NL) within blogs, specialized Cyber-Security (CS) 0000-0002-7286-6212 (M. Ciampi) websites, social media, Knowledge Bases (KBs) and others. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Although these sources contain crucial information about CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings risk management and assessment, on the other hand, it is healthcare organizations and the risk assessment method- difficult to fully leverage them, due to the inherent com- ologies adopted. The authors demonstrated that in this plexity (polysemy, irony, long and complex sentences, domain, there is often a lack of adequate training for non-standardized abbreviations, acronyms) of NL. There- healthcare workers and a lack of specialized figures, such fore, extracting relevant information from this mass of as a chief information officer, highlighting the need to data becomes a demanding task [9]. The information have security protocols updated to the latest standards. extraction from NL text issues is currently addressed in Also, AI-based information extraction from CS textual literature adopting AI-based Natural Language Process- documents has been recently developed and presented ing (NLP) models, usually implementing Named Entity in the literature. In [13] is presented SecureBERT, a Bidi- Recognition (NER) systems [10, 11, 12, 13] using Large rectional Encoder Representations from Transformers Language Models (LLMs) and CS KBs. However, there is (BERT) model trained on CS-domain large NL corpora, a lack of focus in the literature on analyzing and priori- which outperforms other similar models in NLP tasks tizing threats and vulnerabilities about the most frequent in the CS domain. The authors of [10] collected a large threats in healthcare. In this context, this paper extends corpus of labeled sequences from Industrial Control Sys- the ideas previously presented in [14, 15, 16], combin- tems device’s documentation to pre-train and fine-tune ing NLP-based threat and vulnerability approaches to a BERT language model, named CyBERT. Also [12] pro- define an impact and risk assessment for the healthcare posed another interesting CS NER system, which exploits ecosystems, evaluating it by exploiting CS textual sources an architecture based on BERT, an LSTM, Iterated Di- available on the Internet, presenting the final NLP cyber lated Convolutional Neural Networks (ID-CNNs), and risk assessment methodology developed within the activ- Conditional Random Field, to improve the obtained per- ities of the EC-funded H2020 AI4HEALTHSEC research formances. project, as well as the collection of a textual CS dataset The main innovation of the proposed approach is the related to the “SoBigData.it” research project. use of CS information extracted from NL texts to calculate The paper is organized as follows: in Section 2, the the threat, vulnerability, and impact levels, allowing the most recent and related studies in the literature are out- risk assessment for the various assets involved in digital lined; subsequently, the details of the proposed approach healthcare services to be finally obtained. are described in Section 3.5; afterwards, Section 4 shows the implementation of the proposed solution, a descrip- tion of the datasets used and the research project where 3. Methodology the approach was tested in real-world scenarios. Finally, The proposed risk assessment methodology is composed Section 5 provides conclusions and future works. of the following five steps: i) Healthcare Ecosystem Assets Identification and Categorisation; ii) Threat Identification 2. Related Works and Assessment; iii) Vulnerability Assessment; iv) Impact Assessment; and v) Risk Assessment. There are several recent works in the literature dealing with risk assessment and CS information extraction from 3.1. Healthcare Ecosystem Assets NL documents. The authors of [8] reviewed and com- pared different generic cyber risk assessment frameworks Identification and Categorisation in the healthcare field, comparing them, discussing the The preliminary step of the methodology provides a list methodology of assessment and the limitations associ- of the assets of the considered digital complex health- ated with them. A threat and mitigation model tailored care system by identifying the corresponding services for the IoT health devices is presented in [17], combining involved and their assets, with the final purpose of mea- STRIDE and DREAD models: threats are identified us- suring their criticality within the healthcare system. For ing STRIDE model on the device access points, and then instance, the assets of a remote patient consultation ser- ranked using DREAD. This approach is suitable for both vice could include a Database, a Linux Server, communica- the designers and users of health IoT devices. tion software, and a web server. After their identification, The security and privacy challenges in Medical Cyber- the assets are also categorized, using the Common Plat- Physical Systems (MCPS) are discussed in [18], highlight- form Enumeration (CPE)1 catalogue to map them with ing that trust and threat models usually consider MCPS the corresponding area (based on their type) and cate- stakeholders, including healthcare practitioners, system gory (depending on their functionalities), as shown in administrators and non-medical staff, with incorrect lev- the next Table 1. This step allows us to understand the els of trust. Also, in [2], the issues related to the CS importance of each asset within the ecosystem and to awareness of the healthcare personnel are underlined, provide a list of the assets that require risk assessment. reviewing the existing gaps in CS strategies adopted by 1 https://nvd.nist.gov/products/cpe Table 1 a threat identification phase is performed by exploiting Assets areas and categories. the Common Attack Pattern Enumeration and Classifica- Area Name tion (CAPEC)2 , which also provides a detailed set of the 1 2 User interactions with implants and sensors Medical equipment and IT devices characteristics of the threats, such as Likelihood of At- 3 Services and processes tack, Related Attack Patterns, Execution Flow, Prerequisites 4 Interdependent HCIIs – Ecosystem and others. In this way, we obtain the list of the threats Category Influence Functionalities Found in most organizations, distinct for each asset that operates in the considered healthcare Type Software, hardware, Operating System (OS), Information service/system (identified in the previous step). Each Sensitivity threat also includes the CAPEC ID, a CAPEC category Sensitivity Restricted, unrestricted Criticality Essential, required, deferrable that will be used to rate the threat, and the corresponding characteristics. These classifications are used to evaluate the criticality Then, it is possible to assess the threats, assigning of each asset of the healthcare system, by measuring the them a severity level. Our methodology exploits the NL dependency level that an asset has with other system history of reported incidents related to those threats, ex- components. We defined our dependency levels: tracted from large CS domain collections available online, such as forums, social media, news, and others, using • Independent assets have a distinct operation an AI-based NLP approach. In detail, we use a Named and exhibit no dependency on other assets. If the Entity Recognition (NER) architecture based on Secure- asset fails, no cascading events occur. BERT [13], a BERT model pre-trained on a very large CS • Incoming dependency, if syntactically, another domain text collection (more than 2.2 million documents), asset uses its data or functionality. If such an asset preprocessed with a CS customized tokenizer, and fine- fails, the operation of all related assets that use tuned for the NER task, to extract the mentions of the its data or functionality may be disrupted. pairs threat and asset found in each sentence of the NL • Outgoing dependency, if syntactically it uses source. In this case, we produced a custom training set, data or functionality of another asset. Therefore, annotated with the entity types of interest (Asset, and if the latter asset fails, the operation of the former Threat) using the semi-supervised approach described asset will be affected as well. in [19]. Then, the threat level is calculated based on • Coupling relationship reveals that two assets the percentage of the occurrence of the mentions of that have both incoming and outgoing dependencies. threat within the considered dataset, following the ranges Thereupon, failures in one of the assets will affect shown in Table 3. The assessment is finally performed the functionality of the other. through a mapping between the assets of the services of the healthcare system and the pairs asset and threat with Thus, the criticality level of an asset can be determined the corresponding threat level. by the number of services and relevant business flows it participates in. Specifically, the General Asset Criticality level based on running services (GAC) is calculated as Table 3 the weighted summation of their interdependencies, nor- Threat Levels and corresponding percentage of occurrence. malized by the total number of services in the examined Threat Occurrence Per- Description Level centage healthcare ecosystem. Thereupon, the Asset Criticality Very High [80-100] Severe impact on critical services for a specific service (ACS) is equal to its GAC value di- and assets vided by the number of relevant/redundant assets that High [60-80] Significant impact on critical ser- vices and assets co-exist in the service. Finally, based on the ACS range Medium [40-60] Intermediate impact on services values, it is possible to assign a criticality level to each and assets and no critical service would be affected asset, as shown in Table 2. Low [20-40] Low impact and no critical service would be affected Very Low [1-20] Significant low impact Table 2 Asset Criticality Levels. 3.3. Vulnerability Assessment ACS Value Range Asset Criticality Level [0,1] Low The next step has the purpose of building a vulnerability (1,2] Medium exploit prediction scoring system specifically tailored for (2,3] High the healthcare domain. To this end, we adopted the NLP 3.2. Threat Identification and Assessment and Machine Learning (ML) approach described in [15], which leverages CS domain textual data sources to train Once the assets have been identified, the next step aims to a supervised ML classification model able to predict the assess the threats that could affect those assets, following the approach previously described in [14, 15, 16]. Firstly, 2 https://capec.mitre.org vulnerability score, obtaining in this way the vulnera- tures evaluated with two different classifiers that output bility assessment. In summary, this method uses the scores to predict relevancy and severity, following the textual data included in the CVE (the Report column of approach described in [22]. Each adjective is associated this KB) and the corresponding exploitability and im- with a coefficient, calculated by taking through the log- pact metrics, namely the attack vector, attack complexity, odd ratio, then computing the exponential function on privileges required, user interaction, scope, confidential- the log-odd, and converting odds to probability, using ity impact, integrity impact and availability, to obtain the formula: 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑜𝑑𝑑𝑠/(1 + 𝑜𝑑𝑑𝑠). In this a vector representation with the corresponding labels way, it is possible to associate the vulnerability to a scale related to exploitability and impact metrics, used to train Low, Medium, and High, where Low corresponds to [0, a set of ML XGBoost classifiers, which are able to pre- 33) (meaning that there is an 0-33% impact assessment dict the labels of the Attack Vector (Network, Adjacent probability), Medium corresponds to [33, 66), i.e., and Network, Local, Physical) and of the exploitability and High corresponds to [66, 100]. impact metrics, summarised in the next Table 4. For vulnerabilities expressed in CVSS (obtained in the previous step), the three security criteria Confidentiality Table 4 (C), Integrity (I), and Availability (A) are rated on a three- Exploitability and impact metrics and corresponding labels. tier-scale: None, Low, and High (see previous Table 4). Exploitability and Impact metrics Labels We can define a mapping from this three-tier scale onto a Attack Complexity Low, High five-tier scale ranging from Very Low (VL) to Very High Privileges Required None, Low, High (VH) combining these characteristics, as shown in Table 6, User Interaction None, Required Scope Unchanged, Changed providing in this way an initial impact level of a specific Confidentiality None, Low, High asset/vulnerability combination. Integrity None, Low, High Availability None, Low, High Then, the final impact level per asset is obtained by combining the initial impact with the asset criticality Then, an extension of CVE Exploit Prediction Scor- level (see Table 2), with the previous scale related to ing System (EPSS) is adopted [20], defining a Common the adjectives and the corresponding vulnerabilities ex- Vulnerability Scoring System (CVSS)-like score using the tracted by the NER module, as stated in next Table 7. labels predicted by the trained ML models on the NL texts, and following the specifications provided by [21]. The 3.5. Risk Assessment vulnerability level is based on the ranges of the computed CVSS-like score, as shown in Table 5. Finally, the Risk assessment is obtained by combining the Threat, Vulnerability, and Impact levels obtained in the previous steps, calculating the individual risk level Table 5 for each asset following the next Table 8. CVSS score ranges and corresponding vulnerability levels CVSS-like Score Range Vulnerability Level 8.0, 10 6.0, 8.0 Very High High 4. Implementation and 4.0, 6.0 2.0, 4.0 Medium Low Experiments 0.0, 2.0 Very Low To implement the Threat and Impact assessment methods, we firstly needed a large and updated CS domain textual 3.4. Impact Assessment document collection. To this end, we collected the news The next step of the proposed methodology is the In- published by The Hacker News website3 , a CS news plat- dividual Impact Assessment, where the impact level is form that attracts over 8 million readers monthly, which calculated to measure the effect that can be expected as is daily updated with attacks, threats, vulnerabilities, and the result of the successful exploitation of a vulnerability other CS news. A Python web crawler and scraper for that resides in a critical asset. In this case, the methodol- this website has been specifically developed to retrieve, ogy leverages the CVE KB used in conjunction with the extract, collect, and normalise the text of each posted same NER module used in the case of Threat Assessment news. The scraping task is performed bi-weekly, mak- fine-tuned to extract the assets and vulnerabilities entity ing this dataset constantly updated also increasing its types (see Section 3.2). This methodology exploits an ad- size. Moreover, this corpus is also made publicly on the ditional set of adjectives related to the vulnerabilities and SoBigData research infrastructure4 . The NER module is belonging to a predefined dictionary. These adjectives, based on SecureBERT [13], a BERT model pre-trained such as severe, serious, dangerous, etc., tend to indicate via 3 https://thehackernews.com a weight coefficient the severity level of the vulnerability. 4 Available at https://data.d4science.org/ctlg/ResourceCatalogue/ In detail, this dictionary is the result of the processed fea- the_hackernews_dataset Table 6 Initial Impact Level calculation. C None Low High I None Low High None Low High None Low High A None VL VL L L L M M M H Low VL L M L M H M H VH High L M M M H H H VH VH Table 7 Final Impact Level calculation. Asset Criticality Low Medium High NER Module Low Medium High Low Medium High Low Medium High Initial Impact Level Final Impact Level VL VL VL L VL L L L L M L VL L M L M M L M H M L L M M M M M M H H L M M M M H M H H VH M M H M H H H H VH on a very large CS domain text collection (more than 2.2 5. Conclusion and Future Works million documents), preprocessed with a CS customised tokenizer to improve its performance. This model has The paper proposes an AI-based approach for the indi- been fine-tuned for the NER task, to extract the men- vidual risk assessment of the assets of digital healthcare tions of the pairs of threat and asset found in each corpus systems. The approach, after the classification of the crit- sentence for the threat assessment, the mentions of vul- icality of the assets using CS KBs, leverages NER and ML nerabilities, the corresponding adjectives, and the assets systems to extract and classify relevant information from for the impact assessment. To this end, we created two textual CS sources, allowing to calculate the threat, vul- custom training sets, annotated with the entity types nerability and impact levels, which are finally combined of interest (Asset, and Threat in the first case and Asset, to obtain the risk level of each asset. The methodology Vulnerability and Adjectives in the latter case) using the was successfully tested in real-world pilot scenarios of semi-supervised approach described in [19]. The imple- the EC-funded H2020 AI4HEALTHSEC project, demon- mentation of this module is based on the Huggingface strating its applicability and effectiveness. Moreover, the Transformers Python library. The vulnerability assess- datasets, which are constantly updated, are made pub- ment ML classifiers have been implemented using the licly available on the SoBigData research infrastructure. Dmlc XGBoost library, a distributed gradient boosting library designed to be highly efficient and flexible. The proposed methodology has been developed and implemented within the activities of the EC-funded Acknowledgments H2020 project “AI4HEALTHSEC–A Dynamic and Self- This work is supported by the European Union— Organised Artificial Swarm Intelligence Solution for Se- NextGenerationEU—National Recovery and Resilience curity and Privacy Threats in Healthcare ICT Infrastruc- Plan (Piano Nazionale di Ripresa e Resilienza, PNRR)— tures”. In this project, the proposed approach has been Project: “SoBigData.it—Strengthening the Italian RI for tested in real-world pilot scenarios provided by the Fraun- Social Mining and Big Data Analytics”—Prot. IR0000013— hofer Institute for Biomedical Engineering (IBMT), a part- Avviso n. 3264 del 28/12/2021. ner of the project. The pilots tested three different com- We thank Simona Sada and Giuseppe Trerotola for the plex healthcare systems scenarios, namely Implantable administrative and technical support provided. Medical Devices, Wearables, and Biobank. The results of the tests, reported in [14, 15, 16], confirmed the effective- ness and the applicability of our method. Table 8 Individual Risk Level calculation. Threat Very Low Low Medium High Very High Vulnerability VL L M H VH VL L M H VH VL L M H VH VL L M H VH VL L M H VH Impact Risk VL VL VL L L L VL L L L M VL L L M M L L M M M L L M M M L VL L L L M L L L M M L L M M M L M M M H L M M H H M L L L M M L L M M M L M M M H M M M H H M M H H H H L L M M M L M M M H M M M H H M M H H H M M H VH VH VH L M M M H M M M H H M M H H H M H H H VH M H H VH VH References tional Conference on Big Data Analytics (ICBDA), volume 26, IEEE, Xiamen, China, 2021, pp. 316–320. [1] P. Ribino, M. Ciampi, S. Islam, S. Papastergiou, doi:10.1109/ICBDA51983.2021.9403180. Swarm intelligence model for securing health- [12] Y. Chen, J. Ding, D. Li, Z. Chen, Joint bert model care ecosystem, Procedia Computer Science 210 based cybersecurity named entity recognition, in: (2022) 149–156. doi:https://doi.org/10.1016/ 2021 The 4th International Conference on Software j.procs.2022.10.131. Engineering and Information Management, ICSIM, [2] S. Nifakos, K. Chandramouli, C. K. Nikolaou, P. Pa- Yokohama, Japan, 2021, pp. 236–242. doi:10.1145/ pachristou, S. Koch, E. Panaousis, S. Bonacina, In- 3451471.3451508. fluence of human factors on cyber security within [13] E. Aghaei, X. Niu, W. Shadid, E. Al-Shaer, Secure- healthcare organisations: A systematic review, Sen- BERT: A domain-specific language model for cy- sors 21 (2021). doi:10.3390/s21155119. bersecurity, in: Security and Privacy in Communi- [3] D. McKee, P. Laulheret, McAfee Enterprise cation Networks, Springer, Cham, 2023, pp. 39–56. ATR uncovers vulnerabilities in globally [14] S. Islam, S. Papastergiou, S. Silvestri, Cyber used B. Braun infusion pump, 2021. URL: threat analysis using natural language process- https://www.trellix.com/blogs/research/mcafee- ing for a secure healthcare system, in: 2022 enterprise-atr-uncovers-vulnerabilities-in- IEEE Symposium on Computers and Commu- globally-used-b-braun-infusion-pump/. nications (ISCC), 2022, pp. 1–7. doi:10.1109/ [4] S. Islam, S. Papastergiou, H. Mouratidis, A dynamic ISCC55528.2022.9912768. cyber security situational awareness framework for [15] S. Silvestri, S. Islam, S. Papastergiou, C. Tzagkarakis, healthcare ICT infrastructures, in: Proceedings M. Ciampi, A machine learning approach for the of the 25th Pan-Hellenic Conference on Informat- nlp-based analysis of cyber threats and vulnerabili- ics, PCI ’21, ACM, Volos, Greece, 2022, p. 334–339. ties of the healthcare ecosystem, Sensors 23 (2023). doi:10.1145/3503823.3503885. doi:10.3390/s23020651. [5] D. Rees, Cyber attacks in healthcare: [16] S. Silvestri, S. Islam, D. Amelin, G. Weiler, S. Pa- the position across europe, 2021. URL: pastergiou, M. Ciampi, Cyber threat assessment and https://www.pinsentmasons.com/out-law/ management for securing healthcare ecosystems analysis/cyber-attacks-healthcare-europe. using natural language processing, International [6] Sixth annual benchmark study on privacy & secu- Journal of Information Security 23 (2024) 31–50. rity of healthcare data, 2016. Ponemon Institute. doi:10.1007/s10207-023-00769-w. [7] K. S. Bhosale, M. Nenova, G. Iliev, A study of cyber [17] A. Omotosho, B. A. Haruna, O. M. Olaniyi, Threat attacks: In the healthcare sector, in: 2021 Sixth Ju- modeling of internet of things health devices, Jour- nior Conference on Lighting (Lighting), 2021, pp. 1– nal of Applied Security Research 14 (2019) 106–121. 6. doi:10.1109/Lighting49406.2021.9598947. doi:10.1080/19361610.2019.1545278. [8] S. Memon, S. Memon, L. Das, B. R. Memon, Cyber [18] H. Almohri, L. Cheng, D. Yao, H. Alemzadeh, On security risk assessment methods for smart health- threat modeling and mitigation of medical cyber- care, in: 2024 IEEE 1st Karachi Section Humanitar- physical systems, in: 2017 IEEE/ACM International ian Technology Conference (KHI-HTC), 2024, pp. 1– Conference on Connected Health: Applications, 6. doi:10.1109/KHI-HTC60760.2024.10481961. Systems and Engineering Technologies (CHASE), [9] M. Tikhomirov, N. Loukachevitch, A. Sirotina, 2017, pp. 114–119. doi:10.1109/CHASE.2017.69. B. Dobrov, Using BERT and augmentation in named [19] G. Aracri, A. Folino, S. Silvestri, Integrated use entity recognition for cybersecurity domain, in: of KOS and deep learning for data set annotation 25th International Conference on Applications of in tourism domain, Journal of Documentation Natural Language Processing and Information Sys- 79 (2023) 1440–1458. doi:10.1108/JD-02-2023- tems, Springer, Saarbrücken, Germany, 2020, pp. 0019. 16–24. [20] J. Jacobs, S. Romanosky, B. Edwards, I. Adjerid, [10] K. Ameri, M. Hempel, H. Sharif, J. Lopez Jr., K. Pe- M. Roytman, Exploit prediction scoring system rumalla, Cybert: Cybersecurity claim classifica- (EPSS), Digital Threats 2 (2021). doi:10.1145/ tion by fine-tuning the bert language model, Jour- 3436242. nal of Cybersecurity and Privacy 1 (2021) 615– [21] A.A.V.V., Common Vulnerability Scoring System 637. URL: https://www.mdpi.com/2624-800X/1/4/ version 3.1 Specification Document, Technical Re- 31. doi:10.3390/jcp1040031. port, FIRST.Org, 2019. URL: https://www.first.org/ [11] S. Zhou, J. Liu, X. Zhong, W. Zhao, Named entity cvss/v3-1/cvss-v31-specification_r1.pdf. recognition using bert with whole world masking [22] L. Breiman, Random forests, Machine learning 45 in cybersecurity domain, in: 2021 IEEE 6th Interna- (2001) 5–32.