=Paper=
{{Paper
|id=Vol-3762/527
|storemode=property
|title=A Natural Language Processing-based Approach for Cyber Risk Assessment in the Healthcare Ecosystems
|pdfUrl=https://ceur-ws.org/Vol-3762/527.pdf
|volume=Vol-3762
|authors=Stefano Silvestri,Giuseppe Tricomi,Giuseppe Felice Russo,Mario Ciampi
|dblpUrl=https://dblp.org/rec/conf/ital-ia/SilvestriTRC24
}}
==A Natural Language Processing-based Approach for Cyber Risk Assessment in the Healthcare Ecosystems==
<pdf width="1500px">https://ceur-ws.org/Vol-3762/527.pdf</pdf>
<pre>
                                A Natural Language Processing-based Approach for Cyber
                                Risk Assessment in the Healthcare Ecosystems
                                Stefano Silvestri1,* , Giuseppe Tricomi1,2,3 , Giuseppe Felice Russo1 and Mario Ciampi1
                                1
                                  Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), via Pietro Castellino 111, Naples,
                                80131, Italy
                                2
                                  Università degli Studi di Messina, Contrada di Dio 1, Messina, 98166, Italy
                                3
                                  CINI—Consorzio Interuniversitario Nazionale per l’Informatica, Via Ariosto 25, Roma, 00185, Italy


                                                Abstract
                                                The cyber risk in the healthcare sector is constantly increasing, due the large adoption of digital services formed by a complex
                                                interconnection of different systems and technologies, which offer a larger attack surface for the attackers. Therefore, the risk
                                                assessment of the assets involved in these services is crucial to prevent and mitigate possible critical consequences, which
                                                could also affect the health of the patients. A large source of constantly updated information about threats and vulnerabilities
                                                of the assets of the healthcare ecosystems is available in natural language text on the Internet (cyber security news, forum,
                                                social media, etc.), but it is not easy to fully exploit them for a risk assessment process, due to the complexity of natural
                                                language. This paper proposes an AI-based approach for the individual risk assessment of the assets of digital healthcare
                                                systems based on the use of NLP and Knowledge Bases, which exploits the information extracted from natural language news
                                                from the web. The methodology has been developed within the activities of the EC-funded H2020 AI4HEALTHSEC project,
                                                where it has also been successfully tested in real-world scenarios. Moreover, the datasets collected have been made publicly
                                                available on the SoBigData research infrastructure.

                                                Keywords
                                                Natural Language Processing, Large Language Models, Cyber Threats, Cyber Vulnerabilities, Impact Assessment, Cyber Risk
                                                Assessment


                                1. Introduction                                                                                        could lead to the web exposure of sensitive information
                                                                                                                                       of patients, or an attack to a remote monitoring software
                                The healthcare ecosystem is rapidly adopting a grow- of a medical device could damage the equipment of the
                                ing number of recent technologies, such as Internet hospital or change the configuration of the device [4].
                                of Things (IoT), wearable and implantable devices, Pic- This sector has recently suffered several serious cyber
                                ture Archiving and Communication System (PACS), Elec- attacks: for example, in 2017 and 2021 there were ran-
                                tronic Health Records (EHRs), DiCOM images, and oth- somware attacks on U.K. National Health System (NHS)
                                ers, interconnected to realise and offer innovative health- and Ireland’s Department of Health and Health Service
                                care digital services. While their adoption and use im- Executive respectively [5]. Furthermore, inherent vul-
                                prove the quality of service to patients, and support and nerabilities have been found in some medical devices
                                ease the work of the physicians and the medical profes- such as Braun’s infusion pump and Medtronic’s insulin
                                sionals, on the other hand, this complex and dynamic pump [3]. Finally, approximately 90% of healthcare or-
                                inter-connection of several systems offers a larger at- ganisations experienced a data breach in 2018 [6]. For
                                tack surface for the threat actors interested in attacking these reasons, it is necessary to study the most frequent
                                the system by exploiting the existing vulnerabilities [1], attacks in healthcare to make the services offered more
                                also taking into account a low level of awareness of the secure and resilient [4, 7]. Due to the complexity of the
                                cyber risks by the the healthcare personnel [2], often healthcare ecosystems, performing an effective cyber risk
                                causing dramatic impacts to the healthcare ecosystem assessment can help to limit and prevent the cyber secu-
                                [3]. In example, a cyber-attack on a insecure PACS server rity incidents [8]. The cyber risk assessment process has
                                                                                                                                       the purpose of identifying, evaluating, and prioritising
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- security risks to the assets of an organisation, allowing
                                nized by CINI, May 29-30, 2024, Naples, Italy
                                *
                                  Corresponding author.
                                                                                                                                       to perform the most appropriate action to mitigate the
                                $ stefano.silvestri@icar.cnr.it (S. Silvestri);                                                        risks and the vulnerabilities.
                                giuseppe.tricomi@icar.cnr.it (G. Tricomi);                                                                Internet is a constantly updated source of threat, inci-
                                giuseppefelice.russo@icar.cnr.it (G. F. Russo);                                                        dent, and vulnerability-related information for healthcare
                                mario.ciampi@icar.cnr.it (M. Ciampi)                                                                   ecosystem assets in the form of unstructured Natural Lan-
                                 0000-0002-9890-8409 (S. Silvestri); 0000-0003-3837-8730
                                (G. Tricomi); 0009-0001-2090-9647 (G. F. Russo);
                                                                                                                                       guage (NL) within blogs, specialized Cyber-Security (CS)
                                0000-0002-7286-6212 (M. Ciampi)                                                                        websites, social media, Knowledge Bases (KBs) and others.
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                          Attribution 4.0 International (CC BY 4.0).
                                                                                                                                       Although these sources contain crucial information about


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
risk management and assessment, on the other hand, it is     healthcare organizations and the risk assessment method-
difficult to fully leverage them, due to the inherent com-   ologies adopted. The authors demonstrated that in this
plexity (polysemy, irony, long and complex sentences,        domain, there is often a lack of adequate training for
non-standardized abbreviations, acronyms) of NL. There-      healthcare workers and a lack of specialized figures, such
fore, extracting relevant information from this mass of      as a chief information officer, highlighting the need to
data becomes a demanding task [9]. The information           have security protocols updated to the latest standards.
extraction from NL text issues is currently addressed in        Also, AI-based information extraction from CS textual
literature adopting AI-based Natural Language Process-       documents has been recently developed and presented
ing (NLP) models, usually implementing Named Entity          in the literature. In [13] is presented SecureBERT, a Bidi-
Recognition (NER) systems [10, 11, 12, 13] using Large       rectional Encoder Representations from Transformers
Language Models (LLMs) and CS KBs. However, there is         (BERT) model trained on CS-domain large NL corpora,
a lack of focus in the literature on analyzing and priori-   which outperforms other similar models in NLP tasks
tizing threats and vulnerabilities about the most frequent   in the CS domain. The authors of [10] collected a large
threats in healthcare. In this context, this paper extends   corpus of labeled sequences from Industrial Control Sys-
the ideas previously presented in [14, 15, 16], combin-      tems device’s documentation to pre-train and fine-tune
ing NLP-based threat and vulnerability approaches to         a BERT language model, named CyBERT. Also [12] pro-
define an impact and risk assessment for the healthcare      posed another interesting CS NER system, which exploits
ecosystems, evaluating it by exploiting CS textual sources   an architecture based on BERT, an LSTM, Iterated Di-
available on the Internet, presenting the final NLP cyber    lated Convolutional Neural Networks (ID-CNNs), and
risk assessment methodology developed within the activ-      Conditional Random Field, to improve the obtained per-
ities of the EC-funded H2020 AI4HEALTHSEC research           formances.
project, as well as the collection of a textual CS dataset      The main innovation of the proposed approach is the
related to the “SoBigData.it” research project.              use of CS information extracted from NL texts to calculate
   The paper is organized as follows: in Section 2, the      the threat, vulnerability, and impact levels, allowing the
most recent and related studies in the literature are out-   risk assessment for the various assets involved in digital
lined; subsequently, the details of the proposed approach    healthcare services to be finally obtained.
are described in Section 3.5; afterwards, Section 4 shows
the implementation of the proposed solution, a descrip-
tion of the datasets used and the research project where     3. Methodology
the approach was tested in real-world scenarios. Finally,
                                                             The proposed risk assessment methodology is composed
Section 5 provides conclusions and future works.
                                                             of the following five steps: i) Healthcare Ecosystem Assets
                                                             Identification and Categorisation; ii) Threat Identification
2. Related Works                                             and Assessment; iii) Vulnerability Assessment; iv) Impact
                                                             Assessment; and v) Risk Assessment.
There are several recent works in the literature dealing
with risk assessment and CS information extraction from      3.1. Healthcare Ecosystem Assets
NL documents. The authors of [8] reviewed and com-
pared different generic cyber risk assessment frameworks
                                                                  Identification and Categorisation
in the healthcare field, comparing them, discussing the      The preliminary step of the methodology provides a list
methodology of assessment and the limitations associ-        of the assets of the considered digital complex health-
ated with them. A threat and mitigation model tailored       care system by identifying the corresponding services
for the IoT health devices is presented in [17], combining   involved and their assets, with the final purpose of mea-
STRIDE and DREAD models: threats are identified us-          suring their criticality within the healthcare system. For
ing STRIDE model on the device access points, and then       instance, the assets of a remote patient consultation ser-
ranked using DREAD. This approach is suitable for both       vice could include a Database, a Linux Server, communica-
the designers and users of health IoT devices.               tion software, and a web server. After their identification,
   The security and privacy challenges in Medical Cyber-     the assets are also categorized, using the Common Plat-
Physical Systems (MCPS) are discussed in [18], highlight-    form Enumeration (CPE)1 catalogue to map them with
ing that trust and threat models usually consider MCPS       the corresponding area (based on their type) and cate-
stakeholders, including healthcare practitioners, system     gory (depending on their functionalities), as shown in
administrators and non-medical staff, with incorrect lev-    the next Table 1. This step allows us to understand the
els of trust. Also, in [2], the issues related to the CS     importance of each asset within the ecosystem and to
awareness of the healthcare personnel are underlined,        provide a list of the assets that require risk assessment.
reviewing the existing gaps in CS strategies adopted by      1
                                                                 https://nvd.nist.gov/products/cpe
Table 1                                                                   a threat identification phase is performed by exploiting
Assets areas and categories.                                              the Common Attack Pattern Enumeration and Classifica-
      Area         Name                                                   tion (CAPEC)2 , which also provides a detailed set of the
         1
         2
                   User interactions with implants and sensors
                   Medical equipment and IT devices
                                                                          characteristics of the threats, such as Likelihood of At-
         3         Services and processes                                 tack, Related Attack Patterns, Execution Flow, Prerequisites
         4         Interdependent HCIIs – Ecosystem                       and others. In this way, we obtain the list of the threats
   Category
    Influence
                   Functionalities
                   Found in most organizations, distinct
                                                                          for each asset that operates in the considered healthcare
      Type         Software, hardware, Operating System (OS), Information service/system (identified in the previous step). Each
                   Sensitivity                                            threat also includes the CAPEC ID, a CAPEC category
   Sensitivity     Restricted, unrestricted
    Criticality    Essential, required, deferrable                        that will be used to rate the threat, and the corresponding
                                                                          characteristics.
   These classifications are used to evaluate the criticality                Then, it is possible to assess the threats, assigning
of each asset of the healthcare system, by measuring the them a severity level. Our methodology exploits the NL
dependency level that an asset has with other system history of reported incidents related to those threats, ex-
components. We defined our dependency levels:                             tracted from large CS domain collections available online,
                                                                          such as forums, social media, news, and others, using
       • Independent assets have a distinct operation
                                                                          an AI-based NLP approach. In detail, we use a Named
           and exhibit no dependency on other assets. If the
                                                                          Entity Recognition (NER) architecture based on Secure-
           asset fails, no cascading events occur.
                                                                          BERT [13], a BERT model pre-trained on a very large CS
       • Incoming dependency, if syntactically, another domain text collection (more than 2.2 million documents),
           asset uses its data or functionality. If such an asset preprocessed with a CS customized tokenizer, and fine-
           fails, the operation of all related assets that use tuned for the NER task, to extract the mentions of the
           its data or functionality may be disrupted.                    pairs threat and asset found in each sentence of the NL
       • Outgoing dependency, if syntactically it uses source. In this case, we produced a custom training set,
           data or functionality of another asset. Therefore, annotated with the entity types of interest (Asset, and
           if the latter asset fails, the operation of the former Threat) using the semi-supervised approach described
           asset will be affected as well.                                in [19]. Then, the threat level is calculated based on
       • Coupling relationship reveals that two assets the percentage of the occurrence of the mentions of that
           have both incoming and outgoing dependencies. threat within the considered dataset, following the ranges
           Thereupon, failures in one of the assets will affect shown in Table 3. The assessment is finally performed
           the functionality of the other.                                through a mapping between the assets of the services of
                                                                          the healthcare system and the pairs asset and threat with
   Thus, the criticality level of an asset can be determined
                                                                          the corresponding threat level.
by the number of services and relevant business flows it
participates in. Specifically, the General Asset Criticality
level based on running services (GAC) is calculated as Table 3
the weighted summation of their interdependencies, nor- Threat Levels and corresponding percentage of occurrence.
malized by the total number of services in the examined                      Threat      Occurrence Per-  Description
                                                                             Level       centage
healthcare ecosystem. Thereupon, the Asset Criticality                       Very High   [80-100]         Severe impact on critical services
for a specific service (ACS) is equal to its GAC value di-                                                and assets
vided by the number of relevant/redundant assets that                        High        [60-80]          Significant impact on critical ser-
                                                                                                          vices and assets
co-exist in the service. Finally, based on the ACS range                     Medium      [40-60]          Intermediate impact on services
values, it is possible to assign a criticality level to each                                              and assets and no critical service
                                                                                                          would be affected
asset, as shown in Table 2.                                                  Low         [20-40]          Low impact and no critical service
                                                                                                           would be affected
                                                                            Very Low     [1-20]            Significant low impact
Table 2
Asset Criticality Levels.                                              3.3. Vulnerability Assessment
              ACS Value Range     Asset Criticality Level
                   [0,1]                   Low                         The next step has the purpose of building a vulnerability
                   (1,2]                 Medium                        exploit prediction scoring system specifically tailored for
                   (2,3]                   High
                                                                       the healthcare domain. To this end, we adopted the NLP
3.2. Threat Identification and Assessment                              and Machine Learning (ML) approach described in [15],
                                                                       which leverages CS domain textual data sources to train
Once the assets have been identified, the next step aims to            a supervised ML classification model able to predict the
assess the threats that could affect those assets, following
the approach previously described in [14, 15, 16]. Firstly,            2
                                                                           https://capec.mitre.org
vulnerability score, obtaining in this way the vulnera-     tures evaluated with two different classifiers that output
bility assessment. In summary, this method uses the         scores to predict relevancy and severity, following the
textual data included in the CVE (the Report column of      approach described in [22]. Each adjective is associated
this KB) and the corresponding exploitability and im-       with a coefficient, calculated by taking through the log-
pact metrics, namely the attack vector, attack complexity,  odd ratio, then computing the exponential function on
privileges required, user interaction, scope, confidential- the log-odd, and converting odds to probability, using
ity impact, integrity impact and availability, to obtain    the formula: 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑜𝑑𝑑𝑠/(1 + 𝑜𝑑𝑑𝑠). In this
a vector representation with the corresponding labels       way, it is possible to associate the vulnerability to a scale
related to exploitability and impact metrics, used to train Low, Medium, and High, where Low corresponds to [0,
a set of ML XGBoost classifiers, which are able to pre-     33) (meaning that there is an 0-33% impact assessment
dict the labels of the Attack Vector (Network, Adjacent     probability), Medium corresponds to [33, 66), i.e., and
Network, Local, Physical) and of the exploitability and     High corresponds to [66, 100].
impact metrics, summarised in the next Table 4.                For vulnerabilities expressed in CVSS (obtained in the
                                                            previous step), the three security criteria Confidentiality
Table 4                                                     (C), Integrity (I), and Availability (A) are rated on a three-
Exploitability and impact metrics and corresponding labels. tier-scale: None, Low, and High (see previous Table 4).
     Exploitability and Impact metrics       Labels
                                                            We can define a mapping from this three-tier scale onto a
     Attack Complexity                     Low, High        five-tier scale ranging from Very Low (VL) to Very High
     Privileges Required                 None, Low, High    (VH) combining these characteristics, as shown in Table 6,
     User Interaction                    None, Required
     Scope                             Unchanged, Changed   providing in this way an initial impact level of a specific
     Confidentiality                     None, Low, High    asset/vulnerability combination.
     Integrity                           None, Low, High
     Availability                        None, Low, High       Then, the final impact level per asset is obtained by
                                                            combining the initial impact with the asset criticality
   Then, an extension of CVE Exploit Prediction Scor- level (see Table 2), with the previous scale related to
ing System (EPSS) is adopted [20], defining a Common the adjectives and the corresponding vulnerabilities ex-
Vulnerability Scoring System (CVSS)-like score using the tracted by the NER module, as stated in next Table 7.
labels predicted by the trained ML models on the NL texts,
and following the specifications provided by [21]. The 3.5. Risk Assessment
vulnerability level is based on the ranges of the computed
CVSS-like score, as shown in Table 5.                       Finally, the Risk assessment is obtained by combining
                                                            the Threat, Vulnerability, and Impact levels obtained in
                                                            the previous steps, calculating the individual risk level
Table 5
                                                            for each asset following the next Table 8.
CVSS score ranges and corresponding vulnerability levels
           CVSS-like Score Range   Vulnerability Level
                  8.0, 10
                  6.0, 8.0
                                       Very High
                                          High
                                                                 4. Implementation and
                  4.0, 6.0
                  2.0, 4.0
                                        Medium
                                          Low
                                                                    Experiments
                  0.0, 2.0              Very Low
                                                                 To implement the Threat and Impact assessment methods,
                                                                 we firstly needed a large and updated CS domain textual
3.4. Impact Assessment                                           document collection. To this end, we collected the news
The next step of the proposed methodology is the In-             published by The Hacker News website3 , a CS news plat-
dividual Impact Assessment, where the impact level is            form that attracts over 8 million readers monthly, which
calculated to measure the effect that can be expected as         is daily updated with attacks, threats, vulnerabilities, and
the result of the successful exploitation of a vulnerability     other CS news. A Python web crawler and scraper for
that resides in a critical asset. In this case, the methodol-    this website has been specifically developed to retrieve,
ogy leverages the CVE KB used in conjunction with the            extract, collect, and normalise the text of each posted
same NER module used in the case of Threat Assessment            news. The scraping task is performed bi-weekly, mak-
fine-tuned to extract the assets and vulnerabilities entity      ing this dataset constantly updated also increasing its
types (see Section 3.2). This methodology exploits an ad-        size. Moreover, this corpus is also made publicly on the
ditional set of adjectives related to the vulnerabilities and    SoBigData research infrastructure4 . The NER module is
belonging to a predefined dictionary. These adjectives,          based on SecureBERT [13], a BERT model pre-trained
such as severe, serious, dangerous, etc., tend to indicate via   3
                                                                     https://thehackernews.com
a weight coefficient the severity level of the vulnerability.    4
                                                                     Available at https://data.d4science.org/ctlg/ResourceCatalogue/
In detail, this dictionary is the result of the processed fea-       the_hackernews_dataset
Table 6
Initial Impact Level calculation.
                                   C                         None                                  Low                                High
                                       I
                                              None           Low         High         None         Low          High     None            Low     High
                               A
                                None              VL          VL             L          L               L        M           M           M         H
                                 Low              VL           L             M          L               M        H           M            H       VH
                                High               L          M              M          M               H        H           H           VH       VH

Table 7
Final Impact Level calculation.
                         Asset Criticality                      Low                                  Medium                                     High
                           NER Module              Low         Medium            High        Low    Medium       High             Low          Medium          High
                       Initial Impact Level                                                     Final Impact Level
                                 VL                    VL           VL            L          VL         L          L                 L           L             M
                                  L                    VL            L            M           L         M          M                 L           M              H
                                 M                      L            L            M          M          M          M                 M           M              H
                                 H                      L           M             M          M          M          H                 M           H              H
                                VH                     M            M             H          M          H          H                 H           H             VH


on a very large CS domain text collection (more than 2.2                                       5. Conclusion and Future Works
million documents), preprocessed with a CS customised
tokenizer to improve its performance. This model has                                          The paper proposes an AI-based approach for the indi-
been fine-tuned for the NER task, to extract the men-                                         vidual risk assessment of the assets of digital healthcare
tions of the pairs of threat and asset found in each corpus                                   systems. The approach, after the classification of the crit-
sentence for the threat assessment, the mentions of vul-                                      icality of the assets using CS KBs, leverages NER and ML
nerabilities, the corresponding adjectives, and the assets                                    systems to extract and classify relevant information from
for the impact assessment. To this end, we created two                                        textual CS sources, allowing to calculate the threat, vul-
custom training sets, annotated with the entity types                                         nerability and impact levels, which are finally combined
of interest (Asset, and Threat in the first case and Asset,                                   to obtain the risk level of each asset. The methodology
Vulnerability and Adjectives in the latter case) using the                                    was successfully tested in real-world pilot scenarios of
semi-supervised approach described in [19]. The imple-                                        the EC-funded H2020 AI4HEALTHSEC project, demon-
mentation of this module is based on the Huggingface                                          strating its applicability and effectiveness. Moreover, the
Transformers Python library. The vulnerability assess-                                        datasets, which are constantly updated, are made pub-
ment ML classifiers have been implemented using the                                           licly available on the SoBigData research infrastructure.
Dmlc XGBoost library, a distributed gradient boosting
library designed to be highly efficient and flexible.
   The proposed methodology has been developed and
implemented within the activities of the EC-funded
                                                                                               Acknowledgments
H2020 project “AI4HEALTHSEC–A Dynamic and Self-                                               This work is supported by the European Union—
Organised Artificial Swarm Intelligence Solution for Se-                                      NextGenerationEU—National Recovery and Resilience
curity and Privacy Threats in Healthcare ICT Infrastruc-                                      Plan (Piano Nazionale di Ripresa e Resilienza, PNRR)—
tures”. In this project, the proposed approach has been                                       Project: “SoBigData.it—Strengthening the Italian RI for
tested in real-world pilot scenarios provided by the Fraun-                                   Social Mining and Big Data Analytics”—Prot. IR0000013—
hofer Institute for Biomedical Engineering (IBMT), a part-                                    Avviso n. 3264 del 28/12/2021.
ner of the project. The pilots tested three different com-                                      We thank Simona Sada and Giuseppe Trerotola for the
plex healthcare systems scenarios, namely Implantable                                         administrative and technical support provided.
Medical Devices, Wearables, and Biobank. The results of
the tests, reported in [14, 15, 16], confirmed the effective-
ness and the applicability of our method.


Table 8
Individual Risk Level calculation.
     Threat                 Very Low                                     Low                                    Medium                                  High                        Very High
  Vulnerability   VL    L      M       H   VH           VL     L          M      H      VH         VL       L     M      H       VH       VL      L      M        H   VH   VL   L      M        H    VH
     Impact                                                                                                      Risk
       VL         VL   VL      L       L      L         VL     L         L       L      M          VL       L     L      M       M        L      L      M         M   M    L    L     M         M    M
        L         VL    L      L       L      M          L     L         L       M      M           L       L     M      M       M        L      M      M         M   H    L    M     M          H   H
       M          L     L      L       M      M          L     L         M       M      M           L       M     M      M       H        M      M      M         H   H    M    M     H          H   H
       H          L     L      M       M      M          L     M         M       M      H          M        M     M      H       H        M      M      H         H   H    M    M     H         VH   VH
       VH         L    M       M       M      H         M      M         M       H      H          M        M     H      H       H        M      H      H         H   VH   M    H     H         VH   VH
References                                                        tional Conference on Big Data Analytics (ICBDA),
                                                                  volume 26, IEEE, Xiamen, China, 2021, pp. 316–320.
 [1] P. Ribino, M. Ciampi, S. Islam, S. Papastergiou,             doi:10.1109/ICBDA51983.2021.9403180.
     Swarm intelligence model for securing health-           [12] Y. Chen, J. Ding, D. Li, Z. Chen, Joint bert model
     care ecosystem, Procedia Computer Science 210                based cybersecurity named entity recognition, in:
     (2022) 149–156. doi:https://doi.org/10.1016/                 2021 The 4th International Conference on Software
     j.procs.2022.10.131.                                         Engineering and Information Management, ICSIM,
 [2] S. Nifakos, K. Chandramouli, C. K. Nikolaou, P. Pa-          Yokohama, Japan, 2021, pp. 236–242. doi:10.1145/
     pachristou, S. Koch, E. Panaousis, S. Bonacina, In-          3451471.3451508.
     fluence of human factors on cyber security within       [13] E. Aghaei, X. Niu, W. Shadid, E. Al-Shaer, Secure-
     healthcare organisations: A systematic review, Sen-          BERT: A domain-specific language model for cy-
     sors 21 (2021). doi:10.3390/s21155119.                       bersecurity, in: Security and Privacy in Communi-
 [3] D. McKee, P. Laulheret, McAfee Enterprise                    cation Networks, Springer, Cham, 2023, pp. 39–56.
     ATR uncovers vulnerabilities in globally                [14] S. Islam, S. Papastergiou, S. Silvestri, Cyber
     used B. Braun infusion pump, 2021. URL:                      threat analysis using natural language process-
     https://www.trellix.com/blogs/research/mcafee-               ing for a secure healthcare system, in: 2022
     enterprise-atr-uncovers-vulnerabilities-in-                  IEEE Symposium on Computers and Commu-
     globally-used-b-braun-infusion-pump/.                        nications (ISCC), 2022, pp. 1–7. doi:10.1109/
 [4] S. Islam, S. Papastergiou, H. Mouratidis, A dynamic          ISCC55528.2022.9912768.
     cyber security situational awareness framework for      [15] S. Silvestri, S. Islam, S. Papastergiou, C. Tzagkarakis,
     healthcare ICT infrastructures, in: Proceedings              M. Ciampi, A machine learning approach for the
     of the 25th Pan-Hellenic Conference on Informat-             nlp-based analysis of cyber threats and vulnerabili-
     ics, PCI ’21, ACM, Volos, Greece, 2022, p. 334–339.          ties of the healthcare ecosystem, Sensors 23 (2023).
     doi:10.1145/3503823.3503885.                                 doi:10.3390/s23020651.
 [5] D. Rees,        Cyber attacks in healthcare:            [16] S. Silvestri, S. Islam, D. Amelin, G. Weiler, S. Pa-
     the position across europe, 2021. URL:                       pastergiou, M. Ciampi, Cyber threat assessment and
     https://www.pinsentmasons.com/out-law/                       management for securing healthcare ecosystems
     analysis/cyber-attacks-healthcare-europe.                    using natural language processing, International
 [6] Sixth annual benchmark study on privacy & secu-              Journal of Information Security 23 (2024) 31–50.
     rity of healthcare data, 2016. Ponemon Institute.            doi:10.1007/s10207-023-00769-w.
 [7] K. S. Bhosale, M. Nenova, G. Iliev, A study of cyber    [17] A. Omotosho, B. A. Haruna, O. M. Olaniyi, Threat
     attacks: In the healthcare sector, in: 2021 Sixth Ju-        modeling of internet of things health devices, Jour-
     nior Conference on Lighting (Lighting), 2021, pp. 1–         nal of Applied Security Research 14 (2019) 106–121.
     6. doi:10.1109/Lighting49406.2021.9598947.                   doi:10.1080/19361610.2019.1545278.
 [8] S. Memon, S. Memon, L. Das, B. R. Memon, Cyber          [18] H. Almohri, L. Cheng, D. Yao, H. Alemzadeh, On
     security risk assessment methods for smart health-           threat modeling and mitigation of medical cyber-
     care, in: 2024 IEEE 1st Karachi Section Humanitar-           physical systems, in: 2017 IEEE/ACM International
     ian Technology Conference (KHI-HTC), 2024, pp. 1–            Conference on Connected Health: Applications,
     6. doi:10.1109/KHI-HTC60760.2024.10481961.                   Systems and Engineering Technologies (CHASE),
 [9] M. Tikhomirov, N. Loukachevitch, A. Sirotina,                2017, pp. 114–119. doi:10.1109/CHASE.2017.69.
     B. Dobrov, Using BERT and augmentation in named         [19] G. Aracri, A. Folino, S. Silvestri, Integrated use
     entity recognition for cybersecurity domain, in:             of KOS and deep learning for data set annotation
     25th International Conference on Applications of             in tourism domain, Journal of Documentation
     Natural Language Processing and Information Sys-             79 (2023) 1440–1458. doi:10.1108/JD-02-2023-
     tems, Springer, Saarbrücken, Germany, 2020, pp.              0019.
     16–24.                                                  [20] J. Jacobs, S. Romanosky, B. Edwards, I. Adjerid,
[10] K. Ameri, M. Hempel, H. Sharif, J. Lopez Jr., K. Pe-         M. Roytman, Exploit prediction scoring system
     rumalla, Cybert: Cybersecurity claim classifica-             (EPSS), Digital Threats 2 (2021). doi:10.1145/
     tion by fine-tuning the bert language model, Jour-           3436242.
     nal of Cybersecurity and Privacy 1 (2021) 615–          [21] A.A.V.V., Common Vulnerability Scoring System
     637. URL: https://www.mdpi.com/2624-800X/1/4/                version 3.1 Specification Document, Technical Re-
     31. doi:10.3390/jcp1040031.                                  port, FIRST.Org, 2019. URL: https://www.first.org/
[11] S. Zhou, J. Liu, X. Zhong, W. Zhao, Named entity             cvss/v3-1/cvss-v31-specification_r1.pdf.
     recognition using bert with whole world masking         [22] L. Breiman, Random forests, Machine learning 45
     in cybersecurity domain, in: 2021 IEEE 6th Interna-          (2001) 5–32.

</pre>