<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Aphasia Speech Interpretation using Small Language Models (SLMs)⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Abdel-Karim Al-Tamimi</string-name>
          <email>a.al-tamimi@shu.ac.uk</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kate Radford</string-name>
          <email>kate.radford@nottingham.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacqueline Benfield</string-name>
          <email>jacqueline.benfield1@nottingham.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacob A Andrews</string-name>
          <email>jacob.andrews@nottingham.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Catherine Sweby</string-name>
          <email>cath.sweby@nca.nhs.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Derby</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Aphasia, Large Language Models, Small Language Models, SLLMs, AAC 1</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Northern Care Alliance (NCA)</institution>
          ,
          <addr-line>Salford M6 8HD</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Nottingham Biomedical Research Centre</institution>
          ,
          <addr-line>Nottingham</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sheffield Hallam University</institution>
          ,
          <addr-line>Sheffield S11WB</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Nottingham</institution>
          ,
          <addr-line>Nottingham NG7 2RD</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Yarmouk University</institution>
          ,
          <addr-line>Irbid 21136</addr-line>
          ,
          <country country="JO">Jordan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Aphasia is a severe communication disorder that significantly impairs an individual's ability to convey and process language, often resulting from stroke-related damage to brain regions critical for speech and language functions. With the emergence of Large Language Models (LLMs), their potential has been explored in various text-based tasks due to their exceptional language understanding capabilities, which are particularly valuable in medical applications where access to specialised data is crucial yet frequently restricted. In this paper, we present our research on leveraging Tiny and Small Language Models (SLMs) to improve speech interpretation for people living with aphasia (PwA). Through benchmarking several LLMs, we established performance benchmarks to guide the development of our SLM-based solution. Our findings indicate that chain-of-thought prompting significantly enhances interpretation accuracy (median similarity score: 0.68 vs. 0.64 for zero-shot), with larger SLMs (e.g., Phi4-mini:3.8b) outperforming smaller counterparts while maintaining clinical utility. Notably, compact models like Qwen2.5:1.5b achieved competitive results, demonstrating feasibility for re-source-constrained settings. This work advances accessible, privacy-preserving assistive technology for aphasia, balancing computational efficiency with clinical relevance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Aphasia is a profoundly under-recognised yet widespread condition that significantly impacts
millions worldwide. In the USA alone, over two million individuals live with aphasia, a prevalence
surpassing that of multiple sclerosis, Parkin-son’s disease, and muscular dystrophy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Similarly,
conservative estimates suggest that at least 350,000 individuals are affected in the UK, with
approximately 66 new cases per 100,000 people each year [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Stroke, a leading cause of aphasia,
remains the third most common cause of death in both the USA and Great Britain, with one-third of
stroke survivors experiencing aphasia and 12% remaining aphasic six months post-stroke [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Despite these alarming figures, awareness remains critically low, with 84.5% of people having never
encountered the term aphasia, and only 8.8% can correctly identify it as a language dis-order [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
consequences extend beyond communication difficulties; aphasia has been shown to have a greater
negative impact on quality of life than both cancer and Alzheimer’s disease [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Aphasia severely
impacts communication, leading to challenges in employment, social participation, and
relationships, ultimately reducing quality of life [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Post-stroke, individuals with aphasia face worse
recovery outcomes, including longer hospital stays and higher mortality rates [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Large Language Models (LLMs) are advanced AI systems trained on vast text data, enabling
human-like language understanding and generation. In digital health, they enhance diagnostics,
medical documentation, education, and project management [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Integrated into healthcare
applications, LLMs improve patient engagement, symptom analysis, and real-time health insights,
supporting precision medicine and personalised care [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        The authors in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed a novel framework for detecting and analysing speech dysfluencies
(e.g., stuttering, repetitions, blocks) using articulatory gestures and a connectionist subsequence
aligner (CSA), achieving state-of-the-art accuracy in dysfluency detection and alignment. The
research concludes that while LLMs enhanced usability (e.g., generating diagnostic reports), the
study highlighted that scalability and accuracy depended more on the gestural and alignment
modules, suggesting LLMs serve best as interactive interfaces rather than core dysfluency analysers.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the study found that LLMs when integrated into AAC systems like SocializeChat, can
enhance Augmentative and Alternative Communication (AAC) supported social communication
through personalised, context-aware responses, but face limitations in handling open-ended
dialogue, accurately modelling user preferences, and adapting to cultural and contextual nuances.
      </p>
      <p>Small Language Models (SLMs) and Tiny LMs offer a more efficient alternative to LLMs,
operating with reduced computation-al resources. While not as powerful as full-scale LLMs, they
provide key advantages in security and privacy by processing data locally rather than relying on
cloud-based servers. Additionally, SLMs and Tiny LMs deliver faster response times and function in
low-connectivity environments, making them ideal for real-time health applications [10].</p>
      <p>This paper introduces our innovative approach to use Natural Language Pro-cessing (NLP) and
SLMs to explore how we can improve communication for individuals with mild-to-moderate
expressive aphasia. Our proposed AI-based assistive tool helps users express their thoughts more
clearly by interpreting and extracting meanings from their speech. Designed through a collaborative
co-design process with experts and healthcare professionals, it aims to enhance stroke survivors’
independence and quality of life.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Our AI-based solution leverages the recent breakthroughs NLP, specifically the advent of LLMs, and
their small and tiny variants, to augment the speech comprehension of people with aphasia (PwA).
By harnessing the language-understanding capabilities of LLMs, our proposed system generates
coherent speech by building upon the utterances of PwA. The process, illustrated in Figure 1, begins
with recording and transcribing the speech of people with aphasia (PwA). These transcripts, along
with responses from their conversation partners, are used to craft precise prompts and provide
contextual data for LLMs. The LLMs then generate coherent interpretations by inferring unclear
terms, filling in missing words, and removing filler content. The user selects the most appropriate
output, which is subsequently rendered as speech, optionally using a synthesised voice that mimics
the user’s own voice. This study focuses on evaluating the performance of several open-source SLMs
within this pipeline, particularly their potential to replace LLMs in low-resource or embedded
systems.</p>
      <p>The use of tiny and SLMs is critical to enable deployment on mobile devices, ensuring widespread
accessibility and real-time assistance in daily communication. Furthermore, SLMs address pressing
privacy and security concerns by processing sensitive speech data locally, minimising reliance on
cloud-based servers and reducing risks associated with transmitting identifiable health information.
This innovative approach ensures the production of intelligible sentences readily understood by the
conversational partner, offering advantages such as personalisation opportunities, continuous
improvement, potential integration with existing assistive technologies and services.</p>
      <p>To establish a ground truth or a consensus-derived reference for benchmarking the performance
of the selected tiny and SLMs in transforming spoken utterances from PwA into intelligible
sentences, we collected 30 question-answer pairs transcribed from AphasiaBank [11]. These pairs
were processed using six LLMs: Mix-tral:22x7b, Gemma2:9b, Qwen2:7b, Llama3:8b, Phi3:3.8b, and
WizardLM2:7b. These LLMs were chosen for their open-source nature, allowing local deployment
without the need for high-end hardware. This enabled multiple experimental runs to assess result
consistency. They are also known for their strong reasoning capabilities, which is essential for the
task at hand. Moreover, running entirely offline ensured that potentially sensitive medical data
remained secure, avoiding transmission over cloud-based services and supporting ethical
compliance.</p>
      <p>Five experts, specialising in speech-language therapy, clinical rehabilitation, and digital health
transformation, independently evaluated the LLM interpretations of aphasic speech for each pair,
selecting the output they deemed most accurate. Consensus analysis revealed that Mixtral
outperformed all other models, as shown in Figure 2, followed by Gemma and Qwen. The
highestrated interpretations were then designated as the ground truth for subsequent evaluations. This
expert-derived reference standard enabled systematic evaluation of ten widely used tiny and SLMs
against clinically validated interpretations, as detailed in the following section.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation and Results</title>
      <p>Building on the expert-validated benchmark established in Section 2, we evaluated the performance
of 10 tiny and small language models (SLMs) in interpreting aphasic speech by comparing their
outputs to the consensus-derived interpretations. We implemented a structured evaluation
framework measuring cosine similarity between model-generated outputs and the
consensusderived interpretations. The chosen models (gemma3:1b, llama3.2:1b, llama3.2:3b, qwen2.5:0.5b,
qwen2.5:1.5b, qwen2.5:3b, smollm2:1.7b, phi3:3.8b, phi4-mini:3.8b, and hermes3:3b), ranging from 0.5B
to 3.8B parameters, were tested across three prompting techniques: zero-shot, zero-shot
chain-ofthought (CoT), and explicit CoT prompting designed to mimic clinical reasoning [12].</p>
      <p>This chosen parameter range, spanning two orders of magnitude, was deliberately selected to
assess the balance between computational efficiency and clinical efficacy, with smaller models (e.g.,
0.5B) optimised for deployment in re-source-limited environments (e.g., mobile devices), and larger
architectures (e.g., 3.8B) targeting enhanced linguistic reasoning for complex aphasic speech
patterns, deployable on higher-capacity portable clinical systems. The spectrum directly informs
realworld applicability, balancing latency-sensitive environments against scenarios requiring nuanced
semantic reconstruction. Table 1 defines these prompting techniques within the study context and
shares the CoT prompt used in this study.</p>
      <p>Prompt Technique</p>
      <sec id="sec-3-1">
        <title>Definition/Description</title>
      </sec>
      <sec id="sec-3-2">
        <title>Zero-Shot</title>
      </sec>
      <sec id="sec-3-3">
        <title>Zero-Shot CoT CoT</title>
      </sec>
      <sec id="sec-3-4">
        <title>The model is directly instructed to provide interpretations without examples or in-context guidance or hints</title>
      </sec>
      <sec id="sec-3-5">
        <title>A variation of Zero-Shot where the model is instructed to generate its own CoT steps before arriving to its final answer</title>
      </sec>
      <sec id="sec-3-6">
        <title>The model is provided with explicit reasoning framework to guide the analysis steps to help the model break down the problem and reach more accurate results.</title>
        <p>Prompt: You are a speech-language pathologist interpreting responses from a person
with aphasia. Aphasia is a communication disorder that can affect a person ability to
speak, understand, read, or write. People with aphasia often use incomplete phrases,
incorrect words, or fragmented speech, yet their intended meaning can often be
inferred from context and key words. Your task is to reconstruct the person intended
message based on their response.</p>
        <p>Question: '{question}'
Answer from patient: '{answer}'
Internally analyse the response step by step:
1. Identify meaningful or relevant words in the patient answer
2. Infer what the patient is trying to communicate
3. Consider how the response relates to the context of the question
4. Construct a clear and natural interpretation of the patient intended message</p>
        <p>Only output the final interpretation, written in the first person as if the patient is
saying it themselves. Do not include any internal reasoning, explanations, or
additional commentary.</p>
        <p>Interpretation:</p>
        <p>Table 2 provides an example of the variations in aphasic speech interpretation across the ten
language models, contrasting a representative question-answer pair with model-generated outputs.
The ground-truth interpretation in this example (Wizardlm2:7b, zero-shot) reflects comprehensive
inference, while the best-performing SLM (Phi4-mini:3.8b, chain-of-thought) achieves high semantic
similarity (0.82) through concise synthesis of key medical details. In contrast, the weakest
interpretation (Llama3.2:1b, zero-shot) scores substantially lower similarity score (0.45),
demonstrating typical failure patterns where models overlook temporal progression and diagnostic
markers. The comparison highlights how model architecture and prompting strategies influence the
model ability in reconstructing communicative intent from fragmented aphasic speech.</p>
      </sec>
      <sec id="sec-3-7">
        <title>Example of LLM Interpretations of Aphasic Speech. Question</title>
        <sec id="sec-3-7-1">
          <title>Do you remember much about your strokes?</title>
        </sec>
        <sec id="sec-3-7-2">
          <title>Ground Truth</title>
          <p>Wizardlm2:7b
{zero_shot}</p>
        </sec>
        <sec id="sec-3-7-3">
          <title>Best SLM Interpretation</title>
          <p>Phi4-mini:3.8b
{cot} [acc=0.82]</p>
        </sec>
        <sec id="sec-3-7-4">
          <title>Worst SLM Interpretation</title>
          <p>Llama3.2:1b
{zero_shot}[acc=0.45]</p>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>Aphasic Speech Response</title>
        <p>"well the first one . no .</p>
        <p>I could hear the doctors that morning .</p>
        <p>I could hear them talking .</p>
        <p>I could hear what they were saying .
but I could not do things .</p>
        <p>I I could listen to them .
and then later it was just hard to say things .</p>
        <p>I couldn't take words .</p>
        <p>I didn't know words ."
When I had my stroke, I recall hearing the doctors that morning as they
discussed my condition, but at the time, I was unable to respond or
perform tasks on my own. Later, it became particularly challenging to
articulate words and express what I needed to say.</p>
        <p>I remember some details about my strokes; I could hear what was
happening when they occurred but couldn't respond physically at that
time and later found speaking difficult because of word-finding issues.</p>
        <p>I remember hearing doctors talking about me in the hospital room.</p>
        <p>For each model-prompt combination, we compared the semantic similarity scores using cosine
similarity between sentence embeddings generated by the all-MiniLM-L6-v2 sentence-transformer
model [13]. We also compared the performance two widely used evaluation metrics, BertScore [14]
and ROUGE scores [15]. Our analysis reveals that these existing metrics yield inconsistent
evaluations and fail to adequately capture the semantic similarity that this research seeks to
measure. To illustrate this limitation, we examined the results obtained for the sample texts
presented in Table 2. When assessed using BertScore, the best and worst SLMs responses received
scores of 0.89 and 0.85, respectively, suggesting only a marginal difference in quality. In contrast,
ROUGE-1 scores for the same responses were substantially lower (0.5 and 0.3), while ROUGE-2
scores exhibited an even greater discrepancy (0.06 and 0.00). Additionally, ROUGE-L scores
displayed inconsistency, assigning values of 0.32 and 0.36 to the respective responses. These
divergent results highlight the lack of agreement between metrics and underscore their inadequacy
in reliably assessing semantic similarity.</p>
        <p>Performance was analysed through aggregate statistics (mean, median similarity scores) and
visualised to compare models and prompting strategies. Figure 3 demonstrates how larger language
models (e.g., Phi3:3.8b and Phi4-mini:3.8b) achieve significantly higher semantic similarity scores
compared to other models, while the smallest model (Qwen2.5:0.5b) performs weakest. Notably,
Qwen2.5:1.5b delivers competitive performance relative to its compact architecture, suggesting a
favourable trade-off between model size and accuracy.</p>
        <p>The boxplot analysis, shown in Figure 4, reveals distinct performance patterns among the tested
prompting techniques. CoT prompting achieves the highest median semantic similarity score
(~0.68), with a tighter interquartile range (IQR) suggesting more consistent performance compared
to zero-shot methods. Zero-shot techniques show broader score distributions (median ~0.64),
indicating higher variability in interpretation quality. Notably, zero-shot CoT (median ~0.63) bridges
this gap, demonstrating that implicit step-by-step reasoning improves reliability over basic zero-shot
approaches while remaining less constrained than full CoT. These results underscore that explicit
reasoning frameworks, represented in CoT, enhance both accuracy and consistency in aphasic
speech interpretation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This study demonstrates the potential of tiny and small language models (SLMs) to enhance
communication for individuals with aphasia, offering a balance be-tween computational efficiency
and clinical utility. Our findings indicate that model size and prompting strategy significantly
influence interpretation accuracy, with chain-of-thought (CoT) techniques yielding the most reliable
results. While larger SLMs (e.g., Phi4-mini:3.8b) achieved higher semantic similarity to clinician
benchmarks, smaller models like Qwen2.5:1.5b showed promising performance relative to their
reduced size, suggesting feasibility for real-world deployment on mobile devices. However, it is
important to note this research limitations that include the modest sample size of expert-validated
utterances and the focus on expressive aphasia, which may not fully capture the diversity of aphasic
speech patterns. Additionally, the study’s reliance on cosine similarity, though widely adopted, may
overlook nuanced semantic differences critical in clinical contexts.</p>
      <p>Future work would expand the dataset to include broader aphasia subtypes and multilingual
contexts, while incorporating real-time user feedback to refine model outputs dynamically.
Investigating hybrid approaches, by combining SLMs with rule-based systems or personalised LLM
fine-tuning, could further improve the solution accuracy, particularly for complex conversational
scenarios. Furthermore, participatory design methodologies should integrate stroke survivors as
coevaluators in assessing system utility, ensuring solutions align with lived experiences of aphasic
communication challenges. By addressing these challenges, SLM-based assistive tools could evolve
into scalable solutions, bridging gaps in accessible communication support for underserved
populations.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This study was partially funded by EPSRC/Next Generation Rehabilitation Technologies
(EP/W000679/1. R-SPEAK: 27598876 R02794). For the purpose of open access, the authors have
applied a Creative Commons Attribution (CC BY) licence to any Author Accept-ed Manuscript
versions of this paper arising from this submission</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Qwen2.5 in order to: Grammar and spelling
check.
[10] Healthcare Digital, Personal health LLMs to emerge in HealthTech 2024, 2025. URL:
https://www.healthcare.digital/single-post/personal-health-llm-s-to-emerge-in-healthtech2024.
[11] AphasiaBank, 2025. URL: https://aphasia.talkbank.org/.
[12] J. Ziqi, W. Lu, Tab-CoT: Zero-shot tabular chain of thought, Findings of the Association for
Computational Linguistics: ACL 2023 (2023) 10259–10277.
doi:10.18653/v1/2023.findingsacl.651.
[13] all-MiniLM-L6-v2, 2025. URL: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.
[14] T. Zhang et al., Bertscore: Evaluating text generation with bert, arXiv preprint arXiv:1904.09675,
2019.
[15] M. Grusky, Rogue scores, Proceedings of the 61st Annual Meeting of the Association for
Computational Linguistics. Volume 1: Long Papers. 2023.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>National</given-names>
            <surname>Aphasia</surname>
          </string-name>
          <string-name>
            <surname>Association</surname>
          </string-name>
          , Aphasia statistics,
          <year>2023</year>
          . URL: https://aphasia.org/aphasiaresources/aphasia-statistics/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>National</given-names>
            <surname>Aphasia</surname>
          </string-name>
          <string-name>
            <surname>Association</surname>
          </string-name>
          , Aphasia fact sheet,
          <year>2023</year>
          . URL: https://aphasia.org/aphasiaresources/aphasia-factsheet/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Royal College of Speech and Language Therapists, Resource manual for commissioning and planning services for</article-title>
          SLCN,
          <year>2023</year>
          . URL: https://rcslt.org/wp-content/uploads/media/Project/RCSLT/slcn-resource-manual.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Konishi</surname>
          </string-name>
          , et al.,
          <article-title>Exploring the impact of aphasia severity on employment, social participation, and quality of life</article-title>
          , medRxiv (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1101/
          <year>2025</year>
          .01.08.25320231.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Lazar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Boehme</surname>
          </string-name>
          ,
          <article-title>Aphasia as a predictor of stroke outcome</article-title>
          ,
          <source>Curr. Neurol. Neurosci. Rep</source>
          .
          <volume>17</volume>
          (
          <year>2017</year>
          )
          <article-title>83</article-title>
          . doi:
          <volume>10</volume>
          .1007/s11910-017-0797-z.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>X.</given-names>
            <surname>Meng</surname>
          </string-name>
          , et al.,
          <article-title>The application of large language models in medicine: A scoping review</article-title>
          , iScience
          <volume>27</volume>
          (
          <issue>5</issue>
          ) (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Holley</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Mathur, LLMs and generative AI for healthcare: The next frontier,</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc.,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lian</surname>
          </string-name>
          et al,
          <article-title>SSDM: Scalable speech dysfluency modelling</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>37</volume>
          (
          <year>2024</year>
          ):
          <fpage>101818</fpage>
          -
          <lpage>101855</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fang</surname>
          </string-name>
          et al.,
          <article-title>SocializeChat: A GPT-based AAC tool for social communication through eye gazing</article-title>
          ,
          <source>Adjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing &amp; the 2023 ACM International Symposium on Wearable Computing</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>