=Paper=
{{Paper
|id=Vol-3762/527
|storemode=property
|title=A Natural Language Processing-based Approach for Cyber Risk Assessment in the Healthcare Ecosystems
|pdfUrl=https://ceur-ws.org/Vol-3762/527.pdf
|volume=Vol-3762
|authors=Stefano Silvestri,Giuseppe Tricomi,Giuseppe Felice Russo,Mario Ciampi
|dblpUrl=https://dblp.org/rec/conf/ital-ia/SilvestriTRC24
}}
==A Natural Language Processing-based Approach for Cyber Risk Assessment in the Healthcare Ecosystems==
A Natural Language Processing-based Approach for Cyber
Risk Assessment in the Healthcare Ecosystems
Stefano Silvestri1,* , Giuseppe Tricomi1,2,3 , Giuseppe Felice Russo1 and Mario Ciampi1
1
Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), via Pietro Castellino 111, Naples,
80131, Italy
2
Università degli Studi di Messina, Contrada di Dio 1, Messina, 98166, Italy
3
CINI—Consorzio Interuniversitario Nazionale per l’Informatica, Via Ariosto 25, Roma, 00185, Italy
Abstract
The cyber risk in the healthcare sector is constantly increasing, due the large adoption of digital services formed by a complex
interconnection of different systems and technologies, which offer a larger attack surface for the attackers. Therefore, the risk
assessment of the assets involved in these services is crucial to prevent and mitigate possible critical consequences, which
could also affect the health of the patients. A large source of constantly updated information about threats and vulnerabilities
of the assets of the healthcare ecosystems is available in natural language text on the Internet (cyber security news, forum,
social media, etc.), but it is not easy to fully exploit them for a risk assessment process, due to the complexity of natural
language. This paper proposes an AI-based approach for the individual risk assessment of the assets of digital healthcare
systems based on the use of NLP and Knowledge Bases, which exploits the information extracted from natural language news
from the web. The methodology has been developed within the activities of the EC-funded H2020 AI4HEALTHSEC project,
where it has also been successfully tested in real-world scenarios. Moreover, the datasets collected have been made publicly
available on the SoBigData research infrastructure.
Keywords
Natural Language Processing, Large Language Models, Cyber Threats, Cyber Vulnerabilities, Impact Assessment, Cyber Risk
Assessment
1. Introduction could lead to the web exposure of sensitive information
of patients, or an attack to a remote monitoring software
The healthcare ecosystem is rapidly adopting a grow- of a medical device could damage the equipment of the
ing number of recent technologies, such as Internet hospital or change the configuration of the device [4].
of Things (IoT), wearable and implantable devices, Pic- This sector has recently suffered several serious cyber
ture Archiving and Communication System (PACS), Elec- attacks: for example, in 2017 and 2021 there were ran-
tronic Health Records (EHRs), DiCOM images, and oth- somware attacks on U.K. National Health System (NHS)
ers, interconnected to realise and offer innovative health- and Ireland’s Department of Health and Health Service
care digital services. While their adoption and use im- Executive respectively [5]. Furthermore, inherent vul-
prove the quality of service to patients, and support and nerabilities have been found in some medical devices
ease the work of the physicians and the medical profes- such as Braun’s infusion pump and Medtronic’s insulin
sionals, on the other hand, this complex and dynamic pump [3]. Finally, approximately 90% of healthcare or-
inter-connection of several systems offers a larger at- ganisations experienced a data breach in 2018 [6]. For
tack surface for the threat actors interested in attacking these reasons, it is necessary to study the most frequent
the system by exploiting the existing vulnerabilities [1], attacks in healthcare to make the services offered more
also taking into account a low level of awareness of the secure and resilient [4, 7]. Due to the complexity of the
cyber risks by the the healthcare personnel [2], often healthcare ecosystems, performing an effective cyber risk
causing dramatic impacts to the healthcare ecosystem assessment can help to limit and prevent the cyber secu-
[3]. In example, a cyber-attack on a insecure PACS server rity incidents [8]. The cyber risk assessment process has
the purpose of identifying, evaluating, and prioritising
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- security risks to the assets of an organisation, allowing
nized by CINI, May 29-30, 2024, Naples, Italy
*
Corresponding author.
to perform the most appropriate action to mitigate the
$ stefano.silvestri@icar.cnr.it (S. Silvestri); risks and the vulnerabilities.
giuseppe.tricomi@icar.cnr.it (G. Tricomi); Internet is a constantly updated source of threat, inci-
giuseppefelice.russo@icar.cnr.it (G. F. Russo); dent, and vulnerability-related information for healthcare
mario.ciampi@icar.cnr.it (M. Ciampi) ecosystem assets in the form of unstructured Natural Lan-
0000-0002-9890-8409 (S. Silvestri); 0000-0003-3837-8730
(G. Tricomi); 0009-0001-2090-9647 (G. F. Russo);
guage (NL) within blogs, specialized Cyber-Security (CS)
0000-0002-7286-6212 (M. Ciampi) websites, social media, Knowledge Bases (KBs) and others.
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
Although these sources contain crucial information about
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
risk management and assessment, on the other hand, it is healthcare organizations and the risk assessment method-
difficult to fully leverage them, due to the inherent com- ologies adopted. The authors demonstrated that in this
plexity (polysemy, irony, long and complex sentences, domain, there is often a lack of adequate training for
non-standardized abbreviations, acronyms) of NL. There- healthcare workers and a lack of specialized figures, such
fore, extracting relevant information from this mass of as a chief information officer, highlighting the need to
data becomes a demanding task [9]. The information have security protocols updated to the latest standards.
extraction from NL text issues is currently addressed in Also, AI-based information extraction from CS textual
literature adopting AI-based Natural Language Process- documents has been recently developed and presented
ing (NLP) models, usually implementing Named Entity in the literature. In [13] is presented SecureBERT, a Bidi-
Recognition (NER) systems [10, 11, 12, 13] using Large rectional Encoder Representations from Transformers
Language Models (LLMs) and CS KBs. However, there is (BERT) model trained on CS-domain large NL corpora,
a lack of focus in the literature on analyzing and priori- which outperforms other similar models in NLP tasks
tizing threats and vulnerabilities about the most frequent in the CS domain. The authors of [10] collected a large
threats in healthcare. In this context, this paper extends corpus of labeled sequences from Industrial Control Sys-
the ideas previously presented in [14, 15, 16], combin- tems device’s documentation to pre-train and fine-tune
ing NLP-based threat and vulnerability approaches to a BERT language model, named CyBERT. Also [12] pro-
define an impact and risk assessment for the healthcare posed another interesting CS NER system, which exploits
ecosystems, evaluating it by exploiting CS textual sources an architecture based on BERT, an LSTM, Iterated Di-
available on the Internet, presenting the final NLP cyber lated Convolutional Neural Networks (ID-CNNs), and
risk assessment methodology developed within the activ- Conditional Random Field, to improve the obtained per-
ities of the EC-funded H2020 AI4HEALTHSEC research formances.
project, as well as the collection of a textual CS dataset The main innovation of the proposed approach is the
related to the “SoBigData.it” research project. use of CS information extracted from NL texts to calculate
The paper is organized as follows: in Section 2, the the threat, vulnerability, and impact levels, allowing the
most recent and related studies in the literature are out- risk assessment for the various assets involved in digital
lined; subsequently, the details of the proposed approach healthcare services to be finally obtained.
are described in Section 3.5; afterwards, Section 4 shows
the implementation of the proposed solution, a descrip-
tion of the datasets used and the research project where 3. Methodology
the approach was tested in real-world scenarios. Finally,
The proposed risk assessment methodology is composed
Section 5 provides conclusions and future works.
of the following five steps: i) Healthcare Ecosystem Assets
Identification and Categorisation; ii) Threat Identification
2. Related Works and Assessment; iii) Vulnerability Assessment; iv) Impact
Assessment; and v) Risk Assessment.
There are several recent works in the literature dealing
with risk assessment and CS information extraction from 3.1. Healthcare Ecosystem Assets
NL documents. The authors of [8] reviewed and com-
pared different generic cyber risk assessment frameworks
Identification and Categorisation
in the healthcare field, comparing them, discussing the The preliminary step of the methodology provides a list
methodology of assessment and the limitations associ- of the assets of the considered digital complex health-
ated with them. A threat and mitigation model tailored care system by identifying the corresponding services
for the IoT health devices is presented in [17], combining involved and their assets, with the final purpose of mea-
STRIDE and DREAD models: threats are identified us- suring their criticality within the healthcare system. For
ing STRIDE model on the device access points, and then instance, the assets of a remote patient consultation ser-
ranked using DREAD. This approach is suitable for both vice could include a Database, a Linux Server, communica-
the designers and users of health IoT devices. tion software, and a web server. After their identification,
The security and privacy challenges in Medical Cyber- the assets are also categorized, using the Common Plat-
Physical Systems (MCPS) are discussed in [18], highlight- form Enumeration (CPE)1 catalogue to map them with
ing that trust and threat models usually consider MCPS the corresponding area (based on their type) and cate-
stakeholders, including healthcare practitioners, system gory (depending on their functionalities), as shown in
administrators and non-medical staff, with incorrect lev- the next Table 1. This step allows us to understand the
els of trust. Also, in [2], the issues related to the CS importance of each asset within the ecosystem and to
awareness of the healthcare personnel are underlined, provide a list of the assets that require risk assessment.
reviewing the existing gaps in CS strategies adopted by 1
https://nvd.nist.gov/products/cpe
Table 1 a threat identification phase is performed by exploiting
Assets areas and categories. the Common Attack Pattern Enumeration and Classifica-
Area Name tion (CAPEC)2 , which also provides a detailed set of the
1
2
User interactions with implants and sensors
Medical equipment and IT devices
characteristics of the threats, such as Likelihood of At-
3 Services and processes tack, Related Attack Patterns, Execution Flow, Prerequisites
4 Interdependent HCIIs – Ecosystem and others. In this way, we obtain the list of the threats
Category
Influence
Functionalities
Found in most organizations, distinct
for each asset that operates in the considered healthcare
Type Software, hardware, Operating System (OS), Information service/system (identified in the previous step). Each
Sensitivity threat also includes the CAPEC ID, a CAPEC category
Sensitivity Restricted, unrestricted
Criticality Essential, required, deferrable that will be used to rate the threat, and the corresponding
characteristics.
These classifications are used to evaluate the criticality Then, it is possible to assess the threats, assigning
of each asset of the healthcare system, by measuring the them a severity level. Our methodology exploits the NL
dependency level that an asset has with other system history of reported incidents related to those threats, ex-
components. We defined our dependency levels: tracted from large CS domain collections available online,
such as forums, social media, news, and others, using
• Independent assets have a distinct operation
an AI-based NLP approach. In detail, we use a Named
and exhibit no dependency on other assets. If the
Entity Recognition (NER) architecture based on Secure-
asset fails, no cascading events occur.
BERT [13], a BERT model pre-trained on a very large CS
• Incoming dependency, if syntactically, another domain text collection (more than 2.2 million documents),
asset uses its data or functionality. If such an asset preprocessed with a CS customized tokenizer, and fine-
fails, the operation of all related assets that use tuned for the NER task, to extract the mentions of the
its data or functionality may be disrupted. pairs threat and asset found in each sentence of the NL
• Outgoing dependency, if syntactically it uses source. In this case, we produced a custom training set,
data or functionality of another asset. Therefore, annotated with the entity types of interest (Asset, and
if the latter asset fails, the operation of the former Threat) using the semi-supervised approach described
asset will be affected as well. in [19]. Then, the threat level is calculated based on
• Coupling relationship reveals that two assets the percentage of the occurrence of the mentions of that
have both incoming and outgoing dependencies. threat within the considered dataset, following the ranges
Thereupon, failures in one of the assets will affect shown in Table 3. The assessment is finally performed
the functionality of the other. through a mapping between the assets of the services of
the healthcare system and the pairs asset and threat with
Thus, the criticality level of an asset can be determined
the corresponding threat level.
by the number of services and relevant business flows it
participates in. Specifically, the General Asset Criticality
level based on running services (GAC) is calculated as Table 3
the weighted summation of their interdependencies, nor- Threat Levels and corresponding percentage of occurrence.
malized by the total number of services in the examined Threat Occurrence Per- Description
Level centage
healthcare ecosystem. Thereupon, the Asset Criticality Very High [80-100] Severe impact on critical services
for a specific service (ACS) is equal to its GAC value di- and assets
vided by the number of relevant/redundant assets that High [60-80] Significant impact on critical ser-
vices and assets
co-exist in the service. Finally, based on the ACS range Medium [40-60] Intermediate impact on services
values, it is possible to assign a criticality level to each and assets and no critical service
would be affected
asset, as shown in Table 2. Low [20-40] Low impact and no critical service
would be affected
Very Low [1-20] Significant low impact
Table 2
Asset Criticality Levels. 3.3. Vulnerability Assessment
ACS Value Range Asset Criticality Level
[0,1] Low The next step has the purpose of building a vulnerability
(1,2] Medium exploit prediction scoring system specifically tailored for
(2,3] High
the healthcare domain. To this end, we adopted the NLP
3.2. Threat Identification and Assessment and Machine Learning (ML) approach described in [15],
which leverages CS domain textual data sources to train
Once the assets have been identified, the next step aims to a supervised ML classification model able to predict the
assess the threats that could affect those assets, following
the approach previously described in [14, 15, 16]. Firstly, 2
https://capec.mitre.org
vulnerability score, obtaining in this way the vulnera- tures evaluated with two different classifiers that output
bility assessment. In summary, this method uses the scores to predict relevancy and severity, following the
textual data included in the CVE (the Report column of approach described in [22]. Each adjective is associated
this KB) and the corresponding exploitability and im- with a coefficient, calculated by taking through the log-
pact metrics, namely the attack vector, attack complexity, odd ratio, then computing the exponential function on
privileges required, user interaction, scope, confidential- the log-odd, and converting odds to probability, using
ity impact, integrity impact and availability, to obtain the formula: 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑜𝑑𝑑𝑠/(1 + 𝑜𝑑𝑑𝑠). In this
a vector representation with the corresponding labels way, it is possible to associate the vulnerability to a scale
related to exploitability and impact metrics, used to train Low, Medium, and High, where Low corresponds to [0,
a set of ML XGBoost classifiers, which are able to pre- 33) (meaning that there is an 0-33% impact assessment
dict the labels of the Attack Vector (Network, Adjacent probability), Medium corresponds to [33, 66), i.e., and
Network, Local, Physical) and of the exploitability and High corresponds to [66, 100].
impact metrics, summarised in the next Table 4. For vulnerabilities expressed in CVSS (obtained in the
previous step), the three security criteria Confidentiality
Table 4 (C), Integrity (I), and Availability (A) are rated on a three-
Exploitability and impact metrics and corresponding labels. tier-scale: None, Low, and High (see previous Table 4).
Exploitability and Impact metrics Labels
We can define a mapping from this three-tier scale onto a
Attack Complexity Low, High five-tier scale ranging from Very Low (VL) to Very High
Privileges Required None, Low, High (VH) combining these characteristics, as shown in Table 6,
User Interaction None, Required
Scope Unchanged, Changed providing in this way an initial impact level of a specific
Confidentiality None, Low, High asset/vulnerability combination.
Integrity None, Low, High
Availability None, Low, High Then, the final impact level per asset is obtained by
combining the initial impact with the asset criticality
Then, an extension of CVE Exploit Prediction Scor- level (see Table 2), with the previous scale related to
ing System (EPSS) is adopted [20], defining a Common the adjectives and the corresponding vulnerabilities ex-
Vulnerability Scoring System (CVSS)-like score using the tracted by the NER module, as stated in next Table 7.
labels predicted by the trained ML models on the NL texts,
and following the specifications provided by [21]. The 3.5. Risk Assessment
vulnerability level is based on the ranges of the computed
CVSS-like score, as shown in Table 5. Finally, the Risk assessment is obtained by combining
the Threat, Vulnerability, and Impact levels obtained in
the previous steps, calculating the individual risk level
Table 5
for each asset following the next Table 8.
CVSS score ranges and corresponding vulnerability levels
CVSS-like Score Range Vulnerability Level
8.0, 10
6.0, 8.0
Very High
High
4. Implementation and
4.0, 6.0
2.0, 4.0
Medium
Low
Experiments
0.0, 2.0 Very Low
To implement the Threat and Impact assessment methods,
we firstly needed a large and updated CS domain textual
3.4. Impact Assessment document collection. To this end, we collected the news
The next step of the proposed methodology is the In- published by The Hacker News website3 , a CS news plat-
dividual Impact Assessment, where the impact level is form that attracts over 8 million readers monthly, which
calculated to measure the effect that can be expected as is daily updated with attacks, threats, vulnerabilities, and
the result of the successful exploitation of a vulnerability other CS news. A Python web crawler and scraper for
that resides in a critical asset. In this case, the methodol- this website has been specifically developed to retrieve,
ogy leverages the CVE KB used in conjunction with the extract, collect, and normalise the text of each posted
same NER module used in the case of Threat Assessment news. The scraping task is performed bi-weekly, mak-
fine-tuned to extract the assets and vulnerabilities entity ing this dataset constantly updated also increasing its
types (see Section 3.2). This methodology exploits an ad- size. Moreover, this corpus is also made publicly on the
ditional set of adjectives related to the vulnerabilities and SoBigData research infrastructure4 . The NER module is
belonging to a predefined dictionary. These adjectives, based on SecureBERT [13], a BERT model pre-trained
such as severe, serious, dangerous, etc., tend to indicate via 3
https://thehackernews.com
a weight coefficient the severity level of the vulnerability. 4
Available at https://data.d4science.org/ctlg/ResourceCatalogue/
In detail, this dictionary is the result of the processed fea- the_hackernews_dataset
Table 6
Initial Impact Level calculation.
C None Low High
I
None Low High None Low High None Low High
A
None VL VL L L L M M M H
Low VL L M L M H M H VH
High L M M M H H H VH VH
Table 7
Final Impact Level calculation.
Asset Criticality Low Medium High
NER Module Low Medium High Low Medium High Low Medium High
Initial Impact Level Final Impact Level
VL VL VL L VL L L L L M
L VL L M L M M L M H
M L L M M M M M M H
H L M M M M H M H H
VH M M H M H H H H VH
on a very large CS domain text collection (more than 2.2 5. Conclusion and Future Works
million documents), preprocessed with a CS customised
tokenizer to improve its performance. This model has The paper proposes an AI-based approach for the indi-
been fine-tuned for the NER task, to extract the men- vidual risk assessment of the assets of digital healthcare
tions of the pairs of threat and asset found in each corpus systems. The approach, after the classification of the crit-
sentence for the threat assessment, the mentions of vul- icality of the assets using CS KBs, leverages NER and ML
nerabilities, the corresponding adjectives, and the assets systems to extract and classify relevant information from
for the impact assessment. To this end, we created two textual CS sources, allowing to calculate the threat, vul-
custom training sets, annotated with the entity types nerability and impact levels, which are finally combined
of interest (Asset, and Threat in the first case and Asset, to obtain the risk level of each asset. The methodology
Vulnerability and Adjectives in the latter case) using the was successfully tested in real-world pilot scenarios of
semi-supervised approach described in [19]. The imple- the EC-funded H2020 AI4HEALTHSEC project, demon-
mentation of this module is based on the Huggingface strating its applicability and effectiveness. Moreover, the
Transformers Python library. The vulnerability assess- datasets, which are constantly updated, are made pub-
ment ML classifiers have been implemented using the licly available on the SoBigData research infrastructure.
Dmlc XGBoost library, a distributed gradient boosting
library designed to be highly efficient and flexible.
The proposed methodology has been developed and
implemented within the activities of the EC-funded
Acknowledgments
H2020 project “AI4HEALTHSEC–A Dynamic and Self- This work is supported by the European Union—
Organised Artificial Swarm Intelligence Solution for Se- NextGenerationEU—National Recovery and Resilience
curity and Privacy Threats in Healthcare ICT Infrastruc- Plan (Piano Nazionale di Ripresa e Resilienza, PNRR)—
tures”. In this project, the proposed approach has been Project: “SoBigData.it—Strengthening the Italian RI for
tested in real-world pilot scenarios provided by the Fraun- Social Mining and Big Data Analytics”—Prot. IR0000013—
hofer Institute for Biomedical Engineering (IBMT), a part- Avviso n. 3264 del 28/12/2021.
ner of the project. The pilots tested three different com- We thank Simona Sada and Giuseppe Trerotola for the
plex healthcare systems scenarios, namely Implantable administrative and technical support provided.
Medical Devices, Wearables, and Biobank. The results of
the tests, reported in [14, 15, 16], confirmed the effective-
ness and the applicability of our method.
Table 8
Individual Risk Level calculation.
Threat Very Low Low Medium High Very High
Vulnerability VL L M H VH VL L M H VH VL L M H VH VL L M H VH VL L M H VH
Impact Risk
VL VL VL L L L VL L L L M VL L L M M L L M M M L L M M M
L VL L L L M L L L M M L L M M M L M M M H L M M H H
M L L L M M L L M M M L M M M H M M M H H M M H H H
H L L M M M L M M M H M M M H H M M H H H M M H VH VH
VH L M M M H M M M H H M M H H H M H H H VH M H H VH VH
References tional Conference on Big Data Analytics (ICBDA),
volume 26, IEEE, Xiamen, China, 2021, pp. 316–320.
[1] P. Ribino, M. Ciampi, S. Islam, S. Papastergiou, doi:10.1109/ICBDA51983.2021.9403180.
Swarm intelligence model for securing health- [12] Y. Chen, J. Ding, D. Li, Z. Chen, Joint bert model
care ecosystem, Procedia Computer Science 210 based cybersecurity named entity recognition, in:
(2022) 149–156. doi:https://doi.org/10.1016/ 2021 The 4th International Conference on Software
j.procs.2022.10.131. Engineering and Information Management, ICSIM,
[2] S. Nifakos, K. Chandramouli, C. K. Nikolaou, P. Pa- Yokohama, Japan, 2021, pp. 236–242. doi:10.1145/
pachristou, S. Koch, E. Panaousis, S. Bonacina, In- 3451471.3451508.
fluence of human factors on cyber security within [13] E. Aghaei, X. Niu, W. Shadid, E. Al-Shaer, Secure-
healthcare organisations: A systematic review, Sen- BERT: A domain-specific language model for cy-
sors 21 (2021). doi:10.3390/s21155119. bersecurity, in: Security and Privacy in Communi-
[3] D. McKee, P. Laulheret, McAfee Enterprise cation Networks, Springer, Cham, 2023, pp. 39–56.
ATR uncovers vulnerabilities in globally [14] S. Islam, S. Papastergiou, S. Silvestri, Cyber
used B. Braun infusion pump, 2021. URL: threat analysis using natural language process-
https://www.trellix.com/blogs/research/mcafee- ing for a secure healthcare system, in: 2022
enterprise-atr-uncovers-vulnerabilities-in- IEEE Symposium on Computers and Commu-
globally-used-b-braun-infusion-pump/. nications (ISCC), 2022, pp. 1–7. doi:10.1109/
[4] S. Islam, S. Papastergiou, H. Mouratidis, A dynamic ISCC55528.2022.9912768.
cyber security situational awareness framework for [15] S. Silvestri, S. Islam, S. Papastergiou, C. Tzagkarakis,
healthcare ICT infrastructures, in: Proceedings M. Ciampi, A machine learning approach for the
of the 25th Pan-Hellenic Conference on Informat- nlp-based analysis of cyber threats and vulnerabili-
ics, PCI ’21, ACM, Volos, Greece, 2022, p. 334–339. ties of the healthcare ecosystem, Sensors 23 (2023).
doi:10.1145/3503823.3503885. doi:10.3390/s23020651.
[5] D. Rees, Cyber attacks in healthcare: [16] S. Silvestri, S. Islam, D. Amelin, G. Weiler, S. Pa-
the position across europe, 2021. URL: pastergiou, M. Ciampi, Cyber threat assessment and
https://www.pinsentmasons.com/out-law/ management for securing healthcare ecosystems
analysis/cyber-attacks-healthcare-europe. using natural language processing, International
[6] Sixth annual benchmark study on privacy & secu- Journal of Information Security 23 (2024) 31–50.
rity of healthcare data, 2016. Ponemon Institute. doi:10.1007/s10207-023-00769-w.
[7] K. S. Bhosale, M. Nenova, G. Iliev, A study of cyber [17] A. Omotosho, B. A. Haruna, O. M. Olaniyi, Threat
attacks: In the healthcare sector, in: 2021 Sixth Ju- modeling of internet of things health devices, Jour-
nior Conference on Lighting (Lighting), 2021, pp. 1– nal of Applied Security Research 14 (2019) 106–121.
6. doi:10.1109/Lighting49406.2021.9598947. doi:10.1080/19361610.2019.1545278.
[8] S. Memon, S. Memon, L. Das, B. R. Memon, Cyber [18] H. Almohri, L. Cheng, D. Yao, H. Alemzadeh, On
security risk assessment methods for smart health- threat modeling and mitigation of medical cyber-
care, in: 2024 IEEE 1st Karachi Section Humanitar- physical systems, in: 2017 IEEE/ACM International
ian Technology Conference (KHI-HTC), 2024, pp. 1– Conference on Connected Health: Applications,
6. doi:10.1109/KHI-HTC60760.2024.10481961. Systems and Engineering Technologies (CHASE),
[9] M. Tikhomirov, N. Loukachevitch, A. Sirotina, 2017, pp. 114–119. doi:10.1109/CHASE.2017.69.
B. Dobrov, Using BERT and augmentation in named [19] G. Aracri, A. Folino, S. Silvestri, Integrated use
entity recognition for cybersecurity domain, in: of KOS and deep learning for data set annotation
25th International Conference on Applications of in tourism domain, Journal of Documentation
Natural Language Processing and Information Sys- 79 (2023) 1440–1458. doi:10.1108/JD-02-2023-
tems, Springer, Saarbrücken, Germany, 2020, pp. 0019.
16–24. [20] J. Jacobs, S. Romanosky, B. Edwards, I. Adjerid,
[10] K. Ameri, M. Hempel, H. Sharif, J. Lopez Jr., K. Pe- M. Roytman, Exploit prediction scoring system
rumalla, Cybert: Cybersecurity claim classifica- (EPSS), Digital Threats 2 (2021). doi:10.1145/
tion by fine-tuning the bert language model, Jour- 3436242.
nal of Cybersecurity and Privacy 1 (2021) 615– [21] A.A.V.V., Common Vulnerability Scoring System
637. URL: https://www.mdpi.com/2624-800X/1/4/ version 3.1 Specification Document, Technical Re-
31. doi:10.3390/jcp1040031. port, FIRST.Org, 2019. URL: https://www.first.org/
[11] S. Zhou, J. Liu, X. Zhong, W. Zhao, Named entity cvss/v3-1/cvss-v31-specification_r1.pdf.
recognition using bert with whole world masking [22] L. Breiman, Random forests, Machine learning 45
in cybersecurity domain, in: 2021 IEEE 6th Interna- (2001) 5–32.