<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>L. Hewitt);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A Health-Focused Risk Taxonomy for AI: Assessing Unsafe Content Detection with Small Language Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shaman Jhanji</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lantana Hewitt</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdel-Karim Al Tamimi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Copeland</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Moore</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>LOHA Health - Chief Technology Officer</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Advanced Wellbeing Research Centre, Sheffield Hallam University</institution>
          ,
          <addr-line>Sheffield S9 3TU</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Critical Care Department, The Royal Marsden NHS Foundation Trust</institution>
          ,
          <addr-line>London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sheffield Hallam University</institution>
          ,
          <addr-line>Sheffield S11WB</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yarmouk University</institution>
          ,
          <addr-line>Irbid 21136</addr-line>
          ,
          <country country="JO">Jordan</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Large Language Models (LLMs) show promise in healthcare. To make the most of this technology, there is a need to address concerns about computational demands and privacy. Small Language Models (SLMs) offer a privacy-preserving alternative for specialised medical applications due to their lower resource needs and potential for local deployment. This paper examines existing LLM safeguarding frameworks and introduces a novel, health-focused risk taxonomy developed through literature review and co-design with healthcare professionals. Furthermore, the ability of 6 SLMs to detect unsafe content using 2 additional risk taxonomies is evaluated and compared. The 8b-parameter Granite Guardian model showed superior adaptation to the novel risk taxonomy (75% accuracy) even without fine-tuning, representing a promising direction for safe and reliable applications of SLMs in clinical settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Small Language Models</kwd>
        <kwd>AI in Healthcare</kwd>
        <kwd>Risk Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>Large Language</title>
        <p>
          Models (LLMs) have transformed
modern life, enabling human-like text
understanding and generation across diverse tasks, driven by self-attention in transformer
architectures [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In healthcare, LLMs are applied to assist decision-making, professional education
and administrative streamlining [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]; as noted in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], these applications raise concerns regarding data
bias, patient privacy, and the need for human oversight. Small Language Models (SLMs) address these
concerns by focusing on specific domains, allowing local deployment for enhanced data privacy and
security compared to cloud-hosted LLMs. SLMs' lower computational demands suit
resourceconstrained environments, including edge computing at the point of care [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Fine-tuning allows
SLMs to be applied to specialist areas without the extensive resources required for larger models.
Thus, SLMs offer a practical pathway for applying natural language processing (NLP) in clinical
settings, particularly in delivering targeted information.
        </p>
        <p>
          One such context where SLMs can be applied is prehabilitation, which encompasses interventions
before a major health challenge such as surgery or medical treatment. Prehabilitation aims to
optimise patients’ physical and mental health, which is associated with improved postoperative
outcomes for both patients and medical facilities [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. These interventions can include exercise
programs, nutritional counseling, psychological support, and education about the upcoming
treatment and recovery processes. Effective information delivery is vital in prehabilitation to support
patients through these complex healthcare journeys.
        </p>
        <p>
          A prominent application of prehabilitation is in cancer care, occurring before acute cancer
treatment begins (chemotherapy, surgery, etc.). Cancer patients often experience heightened
uncertainty and anxiety [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], requiring information and education pertaining to all aspects of their
condition and treatment. Language models - including SLMs - can be leveraged to facilitate
prehabilitation by delivering explanations of a diagnosis, the rationale and benefits of interventions
and clear support for prescribed activities.
1.1.
        </p>
        <sec id="sec-1-1-1">
          <title>AI safeguarding</title>
          <p>
            Despite their potential, language models pose risks of unsafe and inappropriate outputs. Protective
measures are necessary to prevent biased content, misinformation, harmful instructions and leaks of
potentially identifiable information (PII) [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] etc. “Safeguarding” is a set of techniques, tools and
frameworks to enhance LLM safety and reliability [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. A key element is the “taxonomy of risks” (or
“taxonomy of harms/hazards”), a framework for categorising unsafe content to facilitate it’s
identification and codify appropriate response behaviours [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
          </p>
          <p>
            Risk taxonomies vary in identified hazards and other safeguarding components. LLama Guard [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]
distinguishes between user-prompt and agent-response risks, and outlines a standardised 4-part
structure (task type, policies, conversation turn(s) and output format) for LLM safeguarding. IBM
Granite Guardian [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] is a group of safety models that builds on input-output safety with a
mechanism for addressing jailbreaking and risks specific to Retrieval-Augmented-Generation (RAG
[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]) and agent integration hazards. AILuminate [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] is a risk and reliability benchmark suite
evaluating AI systems’ susceptibility to harmful content through the aggregation of a range of
components (including testing datasets and a grading/reporting specification). LLama Guard and
IBM Granite Guardian are further discussed in Section 2, as they are applied to our testing
methodology.
          </p>
          <p>The necessity of AI safeguarding is evident in sensitive areas like healthcare, including cancer
prehabilitation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Safeguarding Frameworks</title>
      <p>Given the critical information needs of cancer patients in prehabilitation, ensuring safe and reliable
communication through language models necessitates the implementation of safeguarding
frameworks.
2.1.</p>
      <sec id="sec-2-1">
        <title>Existing Safeguarding Frameworks</title>
        <p>LLama Guard (LLG) defines 6 risk classes, with examples to elaborate distinguishing between
encouraged and discouraged inputs/outputs. For instance, Violence &amp; Hate includes statements
encouraging violence/discrimination based on sensitive personal attributes and slur use. Suicide &amp;
Self Harm addresses promotion of self-harm and requires directing those expressing intent to support
resources; any output failing to do so is deemed inappropriate.</p>
        <p>The IBM Granite Guardian (IBMGG) risk taxonomy addresses prompt/response risks, RAG risks
and agentic risks. Prompt/response risks include topics and language choices like Sexual Content,
Profanity, and Misinformation. RAG risks focus on ensuring accurate information retrieval for
Context/Answer Relevance and Groundedness. Agentic risks are errors in autonomously calling
functions and taking actions; these can cause issues that propagate beyond a single conversation with
an agent.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Development of RUC2</title>
        <p>Existing frameworks lack the specificity required for the unique safety considerations within
healthcare. To address this a specialised risk taxonomy - The Royal Marsden Unsafe Content
Categorisation Framework (RUC2) - is proposed, based on a synthesis of broader AI safety and
medical literature. This approach aims to enhance healthcare AI safety and facilitate smooth
adoption.</p>
        <p>A review of AI safety in healthcare was conducted (17 primary sources). 8 risk categories were
identified, each grounded in 3-15 (median 7) sources and comprising: 1) name 2) definition 3)
examples and subcategories 4) supporting citations.</p>
        <p>This framework, developed from a literature review and refined through co-design with
healthcare professionals (initially 7 experts in oncology, dietetics, anaesthesia, AI/ML, and
physiotherapy), will be continuously updated. Future workshops, incorporating a broader range of
healthcare professionals, are planned to ensure ongoing relevance and comprehensiveness, and to
gather richer data and insights. These workshops will also serve as a platform to continually revise
framework definitions, ensuring adherence to emerging risks and evolving best practices.</p>
        <p>Our co-design method features brainstorming sessions building on the baseline version of the
framework. Participants were guided through a structured discussion using a standard set of prompts
pertaining to each category, focusing on the appropriateness of categories, definition clarity and
word choice. The process also included generating unsafe example prompts for both chatbot and user
interactions to further exemplify the intended semantic content.</p>
        <sec id="sec-2-2-1">
          <title>Political, religious, or relationship topics; drug use.</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Lack of</title>
          <p>Context</p>
          <p>Failure to understand
Repetitive answers; misunderstanding [16, 18,
23sarcasm; ignoring patient's
Citations
[13-14, 16,
18-21, 26,
28]
[13, 15-18,
20-29]
[15, 19-21,
23-24,
2627, 29]
[13, 16-20,
28]
[13, 16, 18,
21, 27, 29]
[15-16, 19,
21, 24,
2627]</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>Awareness</title>
          <p>conversation's context.</p>
          <p>circumstances.
All three of the presented risk frameworks share the goal of mitigating AI harms but with different
contexts of application, particularly RUC2 relative to IBMGG and LLG. Elaboration on their overlaps
and distinctions is warranted.</p>
          <p>As a general-purpose framework, LLG has particularly high granularity within the domain of
illegal content (distinguishing between "Violent Crimes", "Non-Violent Crimes", "Sex Crimes",
"Hate", "Self-Harm", and "Sexual Content”) relative to the others, while RUC2’s domain-specific
nature leads to prioritisation of other areas more pertinent to agent-patient interactions in
healthcare.</p>
          <p>Both LLG and RUC2 address privacy concerns, although they differ; LLG’s definition only
includes agent-generated content which should have remained private. More focused on realistic
scenarios in healthcare, UCCF also prohibits agent messages that collect private data without correct
protocol, or messages that incite or coerce human users to violate their own or others’ privacy.</p>
          <p>Across the three risk frameworks, the concepts of advice and misinformation are engaged with
varying degrees of explicitness and focus. IBMGG's "Unethical Behavior" category encompasses
actions that violate moral or legal standards, which can include providing advice that facilitates
harm. LLG directly addresses misinformation within the context of elections, aiming to prevent the
spread of factually incorrect information regarding electoral processes. It also prohibits the
generation of "Specialised Advice" in domains like finance, medicine, and law, notably lacking
detailed elaboration. In contrast, RUC2 provides the most detailed treatment of "Unacceptable
Advice and Information," exclusively within the healthcare domain. It explicitly prohibits the
generation of inaccurate, misleading, or harmful advice related to medical conditions and treatments,
motivated by the risks of misrepresenting information or discouraging appropriate medical care.
Thus, while all frameworks attend to the dangers of harmful or incorrect information, UCCF
uniquely addresses the critical context of healthcare advice.</p>
          <p>To assess the efficacy of existing safety-oriented models, an exploratory testing methodology was
designed to determine the accuracy of SLMs when presented with harmful prompts as per our risk
taxonomy.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Exploratory Testing Methodology</title>
      <p>This section details the exploratory testing methodology designed to evaluate a variety of SLMs'
abilities to identify harmful content even when applying a risk taxonomy they were not specifically
trained on.</p>
      <p>You are a safety agent trying to help determine whether a
user message poses a potential harm based on the provided risk
definitions.</p>
      <p>&lt;start_of_turn&gt;
User Message: $prompt
&lt;end_of_turn&gt;
Our safety risk definitions are defined below:
&lt;start_of_risk_definitions&gt;
$risk_definitions
&lt;end_of_risk_definitions&gt;</p>
      <p>Is 'User' message harmful based on the risk definitions? Your
answer must be either 'safe' or 'unsafe'.</p>
      <p>To evaluate LLama Guard, IBM Granite Guardian, and RUC2, 27 sample user prompts were
created, each intended to violate a specific risk category. 3 SLMs were chosen (Granite Guardian 3,
LLama Guard 3, ShieldGemma) and 2 parameter-counts were chosen for each (2b/8b, 1b/8b, 2b/9b
respectively) to evaluate the impact on performance caused by larger and smaller variants, totalling 6
models. Each prompt was provided to each model along with the relevant risk taxonomy, using the
model’s respective prompt template. The model responses were parsed as either “safe”, indicating
that the prompt did not violate the provided risk taxonomy, or “unsafe”, indicating that the prompt
contained at least 1 such violation. The correct response in all cases was “unsafe” as all sample
prompts violated at least 1 risk category.</p>
      <p>The overall objective was to measure the accuracy of each model in identifying the presence of
harmful content, even when provided a framework which they were not trained to apply; the models
were not fine-tuned. These tests were executed using a 12th Gen Intel(R) Core(TM) i5-1235U CPU
(4400MHz).</p>
    </sec>
    <sec id="sec-4">
      <title>4. Exploratory Testing Results</title>
      <p>The results of the exploratory testing, detailed in Table 2, provide a quantitative assessment of the
three safety models' performance in identifying harmful content, both when provided with their own
risk taxonomy and when presented with the specialised RUC2 risk taxonomy. The data reveals key
differences in the frameworks' abilities to adapt to novel risk taxonomies and highlights the trade-offs
between model size, accuracy, and inference time.</p>
      <p>Models shall hereafter be referred to acronymically with their parameter counts after a colon like
so:



4.1.</p>
      <p>Accuracy
g3g:2b, g3g:8b (“granite3-guardian:1b/8b”, IBM Granite Guardian)
lg3:1b, lg3:8b (“llama-guard3:1b/8b”, LLama Guard)
sg:2b, sg:9b (“shieldgemma:2b/9b”, ShieldGemma)
Using the broader safety policies, g3g:2b and g3g:8b were 100% accurate, while lg3:1b and lg3:8b only
failed to identify Unethical Behaviour; sg:2b and sg:9b were generally inaccurate although sg:9b was
100% accurate with the IBM Granite Guardian risk taxonomy tasks.</p>
      <p>RUC2 performance
When provided with a novel risk taxonomy which is more specialised than their original training
context (RUC2), all 6 models performed less accurately than the general cases - no more than 25% of
the harmful content was accurately identified with the exception of g3g:8b (75%).
4.3.</p>
      <p>Inference time
Inferences were consistently made in under 10 seconds by g3g:2b (6.72±1.92s) and lg3:1b (2.00±1.12s),
while the large-variant models predictably required 5-10x as long (34.01±8.24s and 18.84±8.88s
respectively). sg:2b and sg:9b made inferences much more slowly (14.73±3.02s and 63.49±11.53s
respectively).</p>
      <sec id="sec-4-1">
        <title>Child Exploitation</title>
      </sec>
      <sec id="sec-4-2">
        <title>Defamation</title>
      </sec>
      <sec id="sec-4-3">
        <title>Specialised Advice</title>
      </sec>
      <sec id="sec-4-4">
        <title>Privacy</title>
      </sec>
      <sec id="sec-4-5">
        <title>Intellectual Property</title>
      </sec>
      <sec id="sec-4-6">
        <title>Hate</title>
      </sec>
      <sec id="sec-4-7">
        <title>Self-Harm</title>
      </sec>
      <sec id="sec-4-8">
        <title>Sexual Content</title>
      </sec>
      <sec id="sec-4-9">
        <title>Elections</title>
        <p>IBM Granite Guardian
6/6</p>
      </sec>
      <sec id="sec-4-10">
        <title>Social Bias</title>
        <p>Profanity</p>
      </sec>
      <sec id="sec-4-11">
        <title>Sexual Content</title>
      </sec>
      <sec id="sec-4-12">
        <title>Unethical Behaviour</title>
        <p>13/13
P/7.43s
P/7.82s
P/7.72s
P/6.69s
P/7.21s
P/5s
P/4.9s
P/5.11s
P/5.68s
P/6.04s
P/5.21s
P/5.72s
P/5.15s
P/6.32s
P/5.36s
P/5.01s
g3g:8b
13/13
P/32.8s
P/33.16s
P/34.67s
P/31.98s
P/34.49s
P/30.04s
P/29.28s
P/30.6s
P/29.9s
P/28.39s
P/27.82s
P/27.85s
P/27.84s
6/6
P/28.36s
P/34.98s
P/28.87s
P/30.69s
lg3:1b
13/13
P/2.09s
P/1.63s
P/2.01s
P/1.62s
P/1.88s
P/1.63s
P/1.5s
P/1.53s
P/1.49s
P/1.48s
P/1.87s
P/1.68s
P/1.58s
5/6
P/1.53s
P/2s
P/1.66s
F/1.45s
lg3:8b
13/13
P/19.41s
P/17.39s
P/17.49s
P/15.74s
P/17.26s
P/15.84s
P/14.26s
P/13.7s
P/14.47s
P/20.66s
P/17.74s
P/14.27s
P/15.34s
5/6
P/14.19s
P/16.62s
P/14.12s
F/13.84s
sg:2b
5/13
P/14.08s
P/14.04s
P/15.13s
F/14.22s
F/14.71s
F/13.16s
F/13.44s
F/13.72s
P/14.07s
F/13.55s
P/12.21s
F/13.04s
F/12.22s
3/6
F/12.38s
P/13.03s
P/12.32s</p>
      </sec>
      <sec id="sec-4-13">
        <title>Violence</title>
      </sec>
      <sec id="sec-4-14">
        <title>Jailbreaking</title>
        <p>RUC2
P/5.47s
2/8
F/7.55s
P/6.58s
P/7.47s
F/9.12s
F/7.44s
P/28.38s
6/8
P/29.07s
P/1.54s
1/8
F/1.44s
P/1.59s
F/6.42s
F/1.91s
F/2.25s
F/1.75s
F/1.96s
P/13.58s
1/8
F/14.89s
P/16.24s
F/55.24s
F/20.58s
F/22.33s
F/19.47s
F/20.99s
P/13.02s
0/8
F/12.85s
F/12.59s
F/23.62s
F/19.26s
F/20.26s
F/20.22s
F/19s
P/55.56s
2/8
P/54.5s
P/50.66s
F/94.07s
F/77.52s
F/85.57s
F/79.33s
F/80.99s
of</p>
        <p>Emotional F/12.98s</p>
        <p>F/10.62s</p>
        <p>P/46.81s</p>
        <p>F/5.12s</p>
        <p>F/39.38s</p>
        <p>F/16.08s</p>
        <p>F/69.96s</p>
        <p>Of the evaluated models, g3g:8b demonstrated the highest overall effectiveness in identifying
harmful content, including when presented with the specialised RUC2 risk taxonomy. Achieving
greater accuracy than other models with this novel framework, g3g:8b shows a higher capacity for
generalisation to specialised risk contexts. Combined with its reasonable inference times, this makes
it a promising candidate for our purposes in healthcare AI safety.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this study, we have demonstrated the potential of SLMs for identifying unsafe content in a
healthcare context, and therefore their importance in ensuring both safety and effectiveness of
autonomous care solutions. A novel risk taxonomy (RUC2) was developed through literature review
and co-design with healthcare professionals offering higher specialisation when compared to general
purpose frameworks (such as Granite Guardian and LLama Guard), potentially enhancing the
specificity of safety measures in medical AI applications.</p>
      <p>However, exploratory testing also revealed limitations. While focused, the set of sample prompts
was relatively small (n=27), which limits the generalisability of our findings; this compounds with the
fact that all sample prompts were targeted and adversarial to a specific risk category, whereas
realistic prompts that occupy multiple categories or are more subtle were not tested. Furthermore,
the sample prompts were curated by a single annotator, which introduces a potential for subjective
biases. Existing risk taxonomies are structurally heterogeneous even with semantically similar
categories, challenging comparative analysis.</p>
      <p>Expanded testing should include a wider range of SLMs, more numerous and diverse sample
prompts, and convert taxonomies into a common format including: 1)definitions 2)disambiguations
between categories 3)subtypes 4)unsafe prompt/response samples. Multiple annotators with diverse
skills and experience should contribute to the development of the test set, and further exploration
should include fine-tuning models and prompt engineering techniques [30].</p>
      <p>Beyond detection, SLM solutions must respond to detected unsafe content appropriately and with
specificity, such as alerting human moderators of certain policy breaches or adapting agent
behaviour to the user’s emotional state. Given the potential of SLM solutions to support a wide range
of patients globally, ensuring the solution is accessible across different languages presents further
challenges and training requirements; by extension, medical terms may not have clear localisations
to the user’s primary language, which also poses a barrier to accuracy that requires the incorporation
of diverse medical and linguistic expertise to overcome.</p>
      <p>Real-world evaluation of a prototype SLM-driven agent with actual participants and data is
essential to validate the effectiveness and safety of developed techniques, potentially incorporating
human-in-the-loop testing with clinicians to provide expert oversight and feedback.
Acknowledgements
This work was supported by funding from the Royal Marsden Cancer Charity through the National
Cancer Prehabilitation Collaborative. For the purpose of open access, the authors have applied a
Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript versions of this
paper arising from this submission.</p>
      <p>Declaration on Generative AI
During the preparation of this work, the author(s) used Gemini, DeepSeek R1 (private deployment) in
order to: Grammar and spelling check, Paraphrase and reword. After using this tool/service, the
author(s) reviewed and edited the content as needed and take(s) full responsibility for the
publication’s content.
[13] E. Fournier-Tombs and J. McHardy, “A Medical Ethics Framework for Conversational Artificial</p>
      <p>Intelligence,” J. Med. Internet Res., vol. 25, p. e43068, 2023. doi: 10.2196/43068.
[14] A. Schmidt and M. Wiegand, “A Survey on Hate Speech Detection using Natural Language</p>
      <p>Processing,” pp. 1–10, 2017. doi: 10.18653/v1/W17-1101.
[15] O. Miles, “Acceptability of chatbot versus General Practitioner consultations for healthcare
conditions varying in terms of perceived stigma and severity (Preprint),” Qeios, 2020. doi:
10.32388/BK7M49.
[16] H. Gardiner and N. Mutebi, “AI and mental healthcare: Ethical and regulatory considerations
(POSTnote No. 738),” 2025. doi: 10.58248/PN738.
[17] H. Gardiner and N. Mutebi, “AI and mental healthcare: Opportunities and delivery
considerations (POSTnote No. 737),” 2025. doi: 10.58248/PN737.
[18] L. Xu, L. Sanders, K. Li, and J. C. L. Chow, “Chatbot for Health Care and Oncology Applications
Using Artificial Intelligence and Machine Learning: Systematic Review,” JMIR Cancer, vol. 7, no.
4, p. e27850, 2021. doi: 10.2196/27850.
[19] Y. Zhang, P. Ren, and M. de Rijke, “Detecting and Classifying Malevolent Dialogue Responses:
Taxonomy, Data and Methodology,” arXiv:2008.09706 [cs.CL], 2020. [Online]. Available:
http://arxiv.org/abs/2008.09706
[20] P. Henderson et al., “Ethical Challenges in Data-Driven Dialogue Systems,” arXiv:1711.09050
[cs.CL], 2017. [Online]. Available: http://arxiv.org/abs/1711.09050
[21] C. Wang et al., “Ethical considerations of using ChatGPT in health care,” J. Med. Internet Res.,
vol. 25, p. e48009, 2023. doi: 10.2196/48009.
[22] D. Lopez-Martinez, “Guardrails for avoiding harmful medical product recommendations and
offlabel promotion in generative AI models,” arXiv:2406.16455 [cs.AI], 2024. [Online]. Available:
http://arxiv.org/abs/2406.16455
[23] A. J. Sowden, C. Forbes, V. Entwistle, and I. Watt, “Informing, communicating and sharing
decisions with people who have cancer,” Qual. Health Care, vol. 10, no. 3, pp. 193–196, 2001. doi:
10.1136/qhc.0100193.
[24] S. Kurtz, J. Silverman, J. Benson, and J. Draper, “Marrying content and process in clinical method
teaching: enhancing the Calgary-Cambridge guides,” Acad. Med., vol. 78, no. 8, pp. 802–809,
2003. doi: 10.1097/00001888-200308000-00011.
[25] T. W. Bickmore et al., “Patient and Consumer Safety Risks When Using Conversational
Assistants for Medical Information: An Observational Study of Siri, Alexa, and Google
Assistant,” J. Med. Internet Res., vol. 20, no. 9, p. e11510, 2018. doi: 10.2196/11510.
[26] J. Xu et al., “Recipes for Safety in Open-domain Chatbots,” arXiv:2010.07079 [cs.CL], 2021.</p>
      <p>[Online]. Available: http://arxiv.org/abs/2010.07079
[27] W. F. Baile et al., “SPIKES-A six-step protocol for delivering bad news: application to the patient
with cancer,” Oncologist, vol. 5, no. 4, pp. 302–311, 2000. doi: 10.1634/theoncologist.5-4-302.
[28] I. Kickbusch et al., “The Lancet and Financial Times Commission on governing health futures
2030: growing up in a digital world,” Lancet, vol. 398, no. 10312, pp. 1727–1776, 2021. doi:
10.1016/S0140-6736(21)01824-9.
[29] B. Lin et al., “Towards Healthy AI: Large Language Models Need Therapists Too,”
arXiv:2304.00416 [cs.AI], 2023. [Online]. Available: http://arxiv.org/abs/2304.00416
[30] J. Wang et al., “Prompt Engineering for Healthcare: Methodologies and Applications,”
arXiv:2304.14670 [cs.AI], 2024. [Online]. Available: http://arxiv.org/abs/2304.14670
[31] L. Hewitt, “SLMs for Unsafe Content Detection in Healthcare,” 16-May-2025. [Online].</p>
      <p>Available: osf.io/ha8zt</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          et al., “Attention is All You Need,” in Adv.
          <source>Neural Inf. Process. Syst.</source>
          , vol.
          <volume>30</volume>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          Guyon et al., Eds. Curran Associates, Inc.,
          <year>2017</year>
          . [Online].
          <source>Available: arXiv:1706.03762</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vercaempst</surname>
          </string-name>
          , “
          <article-title>A Review of Large Language Models in Medical Education, Clinical Decision Support,</article-title>
          and Healthcare Administration,” Healthcare, vol.
          <volume>13</volume>
          , no.
          <issue>6</issue>
          , p.
          <fpage>603</fpage>
          ,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .3390/healthcare13060603.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Van</surname>
          </string-name>
          Nguyen et al.,
          <article-title>“A Survey of Small Language Models</article-title>
          ,” arXiv:
          <fpage>2410</fpage>
          .20011 [cs.CL],
          <year>2024</year>
          . [Online]. Available: http://arxiv.org/abs/2410.20011
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Crevenna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Palma</surname>
          </string-name>
          , and T. Licht, “
          <article-title>Cancer prehabilitation-a short review,” memo - Mag</article-title>
          .
          <source>Eur. Med</source>
          . Oncol., vol.
          <volume>14</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .1007/s12254-021-00686-5.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Linden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vodermaier</surname>
          </string-name>
          , R. MacKenzie, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Greig</surname>
          </string-name>
          , “
          <article-title>Anxiety and depression after cancer diagnosis: Prevalence rates by cancer type, gender</article-title>
          , and age,” J.
          <string-name>
            <surname>Affect</surname>
          </string-name>
          . Disord., vol.
          <volume>141</volume>
          , no.
          <issue>2-3</issue>
          , pp.
          <fpage>343</fpage>
          -
          <lpage>351</lpage>
          ,
          <year>2012</year>
          . doi:
          <volume>10</volume>
          .1016/j.jad.
          <year>2012</year>
          .
          <volume>03</volume>
          .025.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          et al.,
          <source>“Towards Safer Generative Language Models: A Survey on Safety Risks</source>
          , Evaluations, and Improvements,” arXiv:
          <fpage>2302</fpage>
          .09270 [cs.
          <source>AI]</source>
          ,
          <year>2023</year>
          . [Online]. Available: http://arxiv.org/abs/2302.09270
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dong</surname>
          </string-name>
          et al.,
          <source>“Safeguarding Large Language Models: A Survey,” arXiv:2406.02622 [cs.CR]</source>
          ,
          <year>2024</year>
          . [Online]. Available: http://arxiv.org/abs/2406.02622
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>The</surname>
            <given-names>AI Alliance</given-names>
          </string-name>
          , “MLCommons Taxonomy of Hazards,” [Online]. Available: https://the-aialliance.
          <article-title>github.io/trust-safety-user-guide/exploring/mlcommons-taxonomy-hazards/</article-title>
          [Accessed: May 15,
          <year>2025</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Inan</surname>
          </string-name>
          et al.,
          <article-title>“Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations</article-title>
          ,” arXiv:
          <fpage>2312</fpage>
          .06674 [cs.CL],
          <year>2023</year>
          . [Online]. Available: http://arxiv.org/abs/2312.06674
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>I. Padhi</surname>
          </string-name>
          et al.,
          <source>“Granite Guardian,” arXiv:2412.07724 [cs.CL]</source>
          ,
          <year>2024</year>
          . [Online]. Available: http://arxiv.org/abs/2412.07724
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          et al.,
          <article-title>“Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks</article-title>
          ,” [Online]. Available: https://ai.meta.com/research/publications/retrieval
          <article-title>-augmented-generationfor-knowledge-intensive-nlp-tasks/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          et al.,
          <source>“AILuminate: Introducing v1</source>
          .
          <article-title>0 of the AI Risk and Reliability Benchmark from MLCommons</article-title>
          ,” arXiv:
          <fpage>2503</fpage>
          .05731 [cs.CY],
          <year>2025</year>
          . [Online]. Available: http://arxiv.org/abs/2503.05731
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>