<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Beyond Negation Detection: Comprehensive Assertion Detection Models for Clinical NLP</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Veysel Kocaman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yigit Gul</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>M. Aytug Kaya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hasham Ul Haq</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mehmet Butgul</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cabir Celik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Talby</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>John Snow Labs inc.</institution>
          <addr-line>16192 Coastal Highway, Lewes, DE 19958</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Assertion status detection is a critical yet often overlooked component of clinical NLP, essential for accurately attributing extracted medical facts. Past studies narrowly focused on negation detection, resulting in underperforming commercial solutions such as AWS Medical Comprehend, Azure AI Text Analytics, and GPT-4o due to their limited domain adaptation. To address this gap, we developed state-of-the-art assertion detection models, including fine-tuned LLMs, transformer-based classifiers, few-shot classifiers, and deep learning (DL) approaches and evaluated our models against cloud-based commercial API solutions and legacy rule-based NegEx approach as well as GPT-4o. Our fine-tuned LLM achieves the highest overall accuracy (0.962), outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly excelling in Present (+4.2%), Absent (+8.4%), and Hypothetical (+23.4%) assertions. Our DL-based models surpass commercial solutions in Conditional (+5.3%) and Associated with Someone Else (+10.1%), while few-shot classifier ofers a lightweight yet highly competitive alternative (0.929), making it ideal for resource-constrained environments. Integrated within Spark NLP, our models consistently outperform black-box commercial solutions while enabling scalable inference and seamless integration with medical NER, Relation Extraction, and Terminology Resolution. These results reinforce the importance of domain-adapted, transparent, and customizable clinical NLP solutions over generalpurpose LLMs and proprietary APIs.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The widespread adoption of Electronic Health Records (EHRs) has transformed healthcare,
with 96% of non-federal acute care hospitals and 78% of ofice-based physicians in the United
States using certified EHR systems by 2021. This digitization has created vast patient data
repositories, opening new avenues for clinical applications and research. [
        <xref ref-type="bibr" rid="ref1 ref16">1</xref>
        ]. To harness this
valuable information and discover patterns in EHRs, various Natural Language Processing
(NLP) tasks have been performed. Among these, the classification of assertions stands out as a
critical but understudied task. Accurate assertion classification allows for the determination of
whether a medical concept is present,absent, possible, hypothetical, conditional, or associated with
someone other than the patient, crucial for extracting actionable insights from EHRs, driving
clinical decision-making, and facilitating healthcare analytics [
        <xref ref-type="bibr" rid="ref17 ref2">2</xref>
        ]. In other words, the status of
an assertion explains how a named entity (e.g. clinical finding, procedure, lab result) pertains to
the patient by assigning a label such as present (”patient is diabetic”), absent (”patient denies
nausea”), conditional (”dyspnea while climbing stairs”), or associated with someone else (”family
history of depression”). Table 1 illustrates diferent assertion classes with their label distribution
and sizes.
      </p>
      <p>
        Although early studies often equated assertion detection with negation detection in time,
sophisticated machine learning and deep learning methodologies evolved rudimentary rule-based
approaches. Early techniques such as NegEx [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], ConText [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], NegFinder [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and NegExpander
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] relied on hand-crafted rules and regular expressions, achieving high precision but sufering
from low recall due to rigid patterns [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In order to learn more about rule-based approaches
towards assertion detection, a reader is advised to check out a comprehensive study [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] evaluating
these approaches in detail. Deep learning methods, particularly transformer-based models and
attention mechanisms, emerged as powerful alternatives, ofering more nuanced understanding
of clinical text. However, these approaches consistently faced challenges such as requiring
large annotated datasets and struggling with minority classes, especially in detecting possible
medical assertions. Recent developments have focused on addressing these limitations through
innovative approaches like multi-task learning, pre-training techniques, and Large Language
Models (LLMs).
      </p>
      <p>
        Bhatia et al. demonstrated the efectiveness of a multitask learning approach for jointly
modeling named entity recognition and negation assertion in clinical texts. By utilizing shared
parameters, their model achieved improved contextual representation and overcame challenges
associated with neural networks in negation detection, outperforming rule-based systems in
conjunction with the proposed conditional softmax decoder [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Chen et al. explored applying attention-based bi-LSTM architectures for negation and
assertion detection in clinical notes, leveraging the ability to selectively focus on relevant
information and automatically capture semantic details without relying on external knowledge
inputs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Aken et al. proposed a comprehensive study of clinical assertion detection models by manually
annotating 5,000 assertions in the MIMIC-III dataset, evaluating medical language models’
performance and transferability across diferent medical domains, and releasing their annotated
dataset to address label sparsity and diversity challenges in existing research [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Similarly,
Wang et al. proposed a novel prompt-based learning approach for assertion classification
that addresses existing limitations by leveraging few-shot learning and advanced reasoning
techniques [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Yuan et al. proposed a deep learning approach for automatic Electronic Medical Record (EMR)
sectioning using MIMIC-III data, developing hand-crafted rules to create gold-standard labels
and generating multiple note versions with varied section heading formats to train models that
achieve robust adaptability and high accuracy in EMR segmentation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Ji et al. proposed a novel method leveraging Large Language Models (LLMs) with advanced
reasoning techniques like Tree of Thought (ToT), Chain of Thought (CoT), and Self-Consistency
(SC), combined with Low-Rank Adaptation (LoRA) fine-tuning, to transform assertion detection
into a generative task that enables more nuanced, contextually aware, and data-eficient medical
text understanding across multiple assertion categories [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        While existing clinical NLP approaches have made significant strides in assertion detection,
predominantly focusing on negation, they have consistently fallen short of providing a
comprehensive, multi-category framework capable of robustly addressing the full spectrum of medical
concept assertions. In this paper, we present a comprehensive implementation of assertion
detection within Healthcare NLP library [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] (based on Spark NLP [16] [17] [18] ecosystem),
utilizing state-of-the-art models and annotators to achieve high accuracy and eficiency in
clinical NLP tasks. Our approach transcends traditional negation detection methods by ofering
a comprehensive, fully integrable end-to-end solution that addresses the entire spectrum of
assertion types, including present, absent, possible, hypothetical, conditional, and assertions
associated with someone other than the patient. This holistic method leverages advanced deep learning
architectures, few-shot learning techniques, and flexible rule-based systems to overcome
common challenges in clinical texts, such as class imbalance and ambiguous concept expressions.
Specifically, we explore the following architectures/ modules that we developed during this
study to detect assertion status from clinical notes:
      </p>
      <p>• Assertion Detection with LLMs: To overcome the limitations coming from data
collec</p>
      <p>Hydrocodone 5 mg with Tylenol , one to two tablets every four hours hypothetical
p.r.n. pain .</p>
      <p>Label Description Size
present Confirms the presence of a 8622</p>
      <p>medical condition.
conditional Represents conditions that 148
might occur under specific
circumstances or conditions.</p>
      <p>Suggests uncertainty or po- 652
tential presence of a
condition.</p>
      <p>Indicates the negation or 2594
nonexistence of a medical
condition.
associated Refers to medical conditions 131
with someone related to individuals other
else (awse) than the patient, such as
family members.</p>
      <p>Denotes speculative or con- 445
jectural conditions that are
not currently present.
tion and annotations to design ML/ DL based assertion detection models, we experiment
with leveraging LLMs pretrained on extensive medical datasets to enhance assertion
detection accuracy and comprehensiveness in zero shot settings.
• Assertion Detection with a DL Model: A deep learning-based annotator built on a
Bi-LSTM architecture, inspired by [19]. This model processes medical concepts and their
surrounding tokens using word embeddings within a defined scope window.
• Assertion Detection with a Bert For Sequence Classification (BFSC) : This approach
leverages a transformer-based model, BERT, to classify assertion status in medical texts. By
encoding the contextual relationships within sequences, BERT enables accurate detection
of negations, afirmations, and other assertion types.
• Few Shot Assertion Detection with Transformers: A few-shot learning-based
classiifer that combines sentence embeddings with lightweight classification models to achieve
high accuracy with minimal training data.
• Rule-based Assertion Detection with Contextual Awareness: A rule-based
annotator designed to enhance assertion detection accuracy in complex clinical contexts. By
leveraging customizable keyword sets, regex patterns, and scope windows, this model
adapts to diverse clinical scenarios.</p>
      <p>The subsequent sections will systematically evaluate these assertion detection architectures
on a well-known benchmark dataset and compare them with GPT4o, a rule-based algorithm
(NegEx), and cloud-based healthcare-specific APIs ofered by commercial providers (AWS
Medical Comprehend and Azure AI Text Analytics for Health). Our analysis will showcase
a novel combined pipeline that integrates these models, demonstrating their complementary
strengths in enhancing assertion detection performance across various computational paradigms
and clinical scenarios.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>In this section, we explain the details of various model architectures supported and shipped as
a pretrained model into Healthcare NLP library by John Snow Labs (JSL).</p>
      <sec id="sec-2-1">
        <title>2.1. Assertion Detection with LLMs</title>
        <p>Traditional approaches to assertion detection in medical text, such as rule-based NLP systems
and machine learning or deep learning models, often require significant manual efort to design
patterns and frequently fail to capture less common assertion types, resulting in incomplete
contextual understanding. To overcome these limitations, we explored finetuning an LLM with
assertion detection datasets to enhance assertion detection accuracy and comprehensiveness.</p>
        <p>We explored training LLama-3.1-8B[20] on the i2b2 assertion dataset. We fine tuned
LLama3.1-8B[20] model with the i2b2 assertion training dataset using LoRA fine-tuning [ 21] approach
without quantization. LoRA ofers parameter eficiency by updating only a small subset of
parameters, reducing memory and computational overhead. It minimizes overfitting risk by
keeping pre-trained weights fixed, which makes it ideal for small training datasets and it
preserves pre-trained knowledge, maintaining generalization capabilities while allowing
taskspecific tuning. Our final configuration used a LoRA rank of 16, LoRA alpha of 32, and 5 training
epochs.</p>
        <p>For fine-tuning, a simple and eficient prompt structure is of paramount importance. We
explicitly included a detailed description of each assertion status to evidently improve
performance. Additionally, we replaced the term Present with Confirmed which yielded better results,
likely due to improved clarity and alignment with the task’s semantics. Including descriptions
of assertion statuses in the input prompt also allowed for minor adjustments during inference,
enhancing flexibility and adaptability. Our experimentation proved counterintuitive when we
studied the context: inputting the whole document created complexity and confusion, impairing
performance. We replaced this approach with a context windowing strategy, extracting two
sentences before and after the target text. This strategy substantially reduced training time and
increased the model’s ability to focus on relevant information.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Assertion Detection via DL Model</title>
        <p>Assertion Detection via DL Model (AssertionDL) is a classification model based on a Bi-LSTM
framework, representing a modified version of the architecture proposed by [ 19]. In this
implementation, entities (also referred to as chunks) are processed alongside a context string.
The context string and entities are tokenized and embedded before being passed to the Bi-LSTM
model. It is important to balance the length of the context string, as excessively long sequences
can result in vanishing gradients, which may hinder the model’s performance.</p>
        <p>An analysis of the i2b2 dataset revealed that 95% of the relevant scope tokens (neighboring
words) are located within a window spanning 9 tokens to the left and 15 tokens to the right of
the target tokens. Based on this observation, we adopted the same window size for our model.</p>
        <p>The model has been implemented in Healthcare NLP library as an annotator called
AssertionDLModel, enabling seamless integration into the Spark NLP library for clinical and biomedical
text processing.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Assertion Detection via Bert For Sequence Classification (BFSC)</title>
        <p>While the LLM approach generates new tokens as part of its output, we also explored a more
direct approach by framing the problem as a classification task. In this setup, the input consists
of the entity chunk and its surrounding context, while the output is the predicted assertion
status class. Specifically, we implemented a classification layer on top of a transformer model,
such as BERT[22], to perform assertion status prediction, a technique known as BERT for
Sequence Classification.</p>
        <p>
          Rather than using the standard BERT model, we utilized the pre-trained BERT models from
[23], which have been fine-tuned on biomedical text. Among these, we selected the model
trained on BioBert [24], as it demonstrated the best performance for our task. This approach
has previously shown promising results for assertion detection [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and helps the model focus
on the target entity, even in contexts that contain multiple entities.
        </p>
        <p>
          The input text was prepared by a novel approach as explained in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In addition, we
experimented with varying context lengths by incorporating additional sentences around the
target chunk. However, this approach yielded minimal performance improvements while
evidently increasing training and processing time.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Few Shot Assertion Detection via Transformers</title>
        <p>Few Shot Assertion Detection via Transformers (FewShotAssertion) in this study is built on a
modified version of SetFit (Sentence Transformer Fine-Tuning) framework [ 25], which leverages
sentence-transformer embeddings and a lightweight classifier for few-shot learning. SetFit
enables eficient fine-tuning by coupling a pre-trained sentence-transformer model with a
classifier trained on task-specific data using contrastive learning.</p>
        <p>The model takes as input the assertion context and the target entity, embedding them using
a pre-trained transformer encoder. These embeddings are then fine-tuned using contrastive
learning to align positive examples while separating negative ones in the embedding space. A
lightweight linear classifier is subsequently trained on the refined embeddings to predict the
assertion status. This approach is particularly well-suited for assertion detection in the i2b2
dataset, as it efectively handles limited labeled data while maintaining robust performance.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Rule-based Assertion Detection with Contextual Awareness (ContextualAssertion)</title>
        <p>The model based on this architecture enables assertion detection by labeling entities (chunks)
based on user-defined rules and contextual patterns, building upon principles similar to ConText
[26] and the widely used NegEx framework [27]. Unlike NegEx, which focuses on negation
detection using fixed lexical patterns, the Contextual Assertion module provides advanced
configurability through prefix and sufix keywords, regex patterns, exception handling, and
customizable scope windows. These enhancements enable the establishment of complex linguistic
rules, allowing the annotator to function as a robust and flexible guardrail for NLP pipelines.
Following are some of its features:</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Using Assertion Detection Models within Healthcare NLP Pipeline</title>
        <p>While the i2b2 dataset provides pre-annotated named entities (including their indices), practical
applications require extracting these entities directly from unstructured text. To address this,
we propose an end-to-end, flexible pipeline with component sharing, as illustrated in Figure A1.</p>
        <p>In this pipeline, named entities are identified using Healthcare NLP’s NER models and
subsequently passed to assertion models for assertion status detection. The assertion model
utilizes the same embeddings as the NER model, enabling embedding sharing for improved
memory management and reduced latency.</p>
        <p>The pipeline also supports a stacking approach, allowing multiple assertion models to coexist
within a single framework. To enhance performance, we developed a merging mechanism
that combines predictions from three assertion models and prioritizes them to produce a
unified label for each entity based on the performance of each assertion models on certain
entities. The key components of this pipeline includes AssertionDL, FewShotAssertion and
ContextualAssertion.</p>
        <p>To resolve conflicts in predictions across models, a majority voting mechanism is applied.
This approach ensures the final label reflects the consensus among models, mitigating the
impact of outlier predictions.</p>
      </sec>
      <sec id="sec-2-7">
        <title>2.7. Pretrained Models Ofered in Healthcare NLP</title>
        <p>The Healthcare NLP library by JSL ofers a range of domain-specific pretrained assertion models
(e.g., oncology, radiology) that have been fine-tuned or trained using the architectures explored
in this study. These models are fully optimized for integration within a Healthcare NLP pipeline,
enabling scalable and eficient deployment. For a detailed list of pretrained clinical assertion
models and their corresponding benchmarks, refer to the Table A4 that showcases the best
performance scores achieved by these models across multiple assertion categories (12 categories,
more than what is covered in this study) including Present, Past, Possible, Absent, Hypothetical,
Family, Someone Else, Planned, Conditional, Confirmed , Negative, and Suspected.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments and Results</title>
      <sec id="sec-3-1">
        <title>3.1. Experimental Setup</title>
        <p>In this study, we benchmarked the performance of our assertion classification approaches
— AssertionDL, FewShotAssertion, ContextualAssertion, BFSC and a combined pipeline
against available counterparts – NegEx, AWS Comprehend Medical, Azure AI Text Analytics,
and GPT-4o.</p>
        <p>
          NegEx is a rule-based algorithm designed to identify negation in clinical text, particularly to
determine whether a medical concept is absent or not. Introduced by [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], NegEx uses regular
expressions and predefined linguistic patterns to detect negation cues (e.g., “no,” “denies”) and
their scope within a sentence.
        </p>
        <p>GPT-4o was employed to benchmark assertion detection for medical conditions in the i2b2
dataset. As the disclosure statement of i2b2 dataset prohibits sharing the data via cloud based
APIs, we obfuscated the i2b2 dataset both for PHI and medical terms using Healthcare NLP
tools provided by John Snow Labs, and then run the evaluation. A carefully crafted prompt (see
ifgure A2) guided the model to classify assertion statuses for specified medical entities.</p>
        <p>AWS Comprehend Medical is an NLP service ofered by Amazon Web Services, designed
to automate the extraction of medical information from unstructured text. Azure AI Text
Analytics is a natural language processing (NLP) service provided by Microsoft, designed to
analyze and extract insights from unstructured text.</p>
        <p>Both AWS and Azure services extract entities at first and then annotate them with assertion
labels (e.g., present, absent, hypothetical). We aligned these annotations with i2b2 dataset
taxonomies via label mapping to ensure consistency in evaluation. Since these services assign
assertion labels only to the entities extracted by them at first, the evaluation is run over the
partially or fully overlapped common entities from i2b2 dataset. The overlapping rates can be
seen at Table A1 in Appendix.</p>
        <p>To maintain consistency, labels from Azure AI and AWS Comprehend were mapped to i2b2
equivalents. Matches were categorized into Full Match, Partial Match, and No Match, focusing
the evaluation on full and partial matches. Statistics for matching outcomes are summarized in
Table A1 with label mapping details available in Table A2 in Appendix for a reference.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset Description</title>
        <p>The evaluation and benchmarking in this study are conducted exclusively on the oficial 2010
i2b2 dataset (test split)[28], which represents a comprehensive resource for assessing assertion
detection frameworks in real-world clinical scenarios. The results focus on both individual
models and combined pipelines, showcasing their relative strengths and collective impact on
performance.</p>
        <p>The dataset utilized in this study covers all six assertion categories: Absent, Associated
with someone else, Conditional, Hypothetical, Possible, and Present. However, the fine-tuned LLM
excludes the Conditional label due to its ambiguity with the Hypothetical label, which could
complicate fine-tuning. This exclusion simplifies training and sharpens the model’s focus on
the remaining categories. In contrast, other models, including LLMs, retain all six categories to
ensure a thorough evaluation of performance across the full range of assertion types.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Comparative Results</title>
        <p>
          *awse: associated with someone else. **Combined pipeline elements denoted in italics. ***BFSC latest best is
benchmarked only on 3 labels by its authors [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]; hence excluded from comparison.
        </p>
        <p>Our fine-tuned LLM, based on the LLaMA 3.1-8B model and trained using LoRA on the i2b2
dataset, demonstrates superior performance in most categories compared to other models. This
approach aligns with recent research in domain adaptation for clinical NLP tasks [30]. The
results emphasize the eficacy of smaller, domain-specific models, which, when coupled with
carefully engineered prompts, can often outperform much larger, general-purpose models. Our
experimental findings indicate near-perfect performance across most categories, with only
minor underperformance in the possible and hypothetical labels. Notably, our model excels not
only in covering a broader range of categories but also in evidently outperforming commercial
solutions such as GPT-4o, Azure AI Text Analytics, and AWS Comprehend.</p>
        <p>The combined pipeline, which integrates rule-based methods with machine learning
techniques, closely mirrors the performance of the fine-tuned LLM across most categories. This
hybrid approach, which captures the strengths of both deep learning and rule-based systems,
outperforms comparable solutions ofered by Azure and AWS in every category except the
conditional label. Unlike Azure AI Text Analytics for Health and AWS Medical Comprehend,
which are API-based black-box solutions, our pipeline ofers customization and fine-tuning
options, allowing for potential performance improvements across all categories, including the
conditional label. This flexibility represents a significant advantage in adapting the system to
meet specific healthcare needs and optimizing performance across various clinical NLP tasks.</p>
        <p>For use cases where deploying the full combined pipeline is not feasible, users can still
achieve exceptional results by leveraging its individual components. AssertionDL, in particular,
stands out as a versatile solution, efectively handling all assertion categories with its advanced
deep learning architecture. It performs particularly well in the conditional and associated
with someone else categories, demonstrating superior results in the conditional label. Notably,
AssertionDL outperforms GPT-4o in most categories, making it a robust standalone option for
clinical assertion tasks.</p>
        <p>The FewShotAssertion model can be used both standalone and as part of the pipeline, ofering
an ideal solution for rapid training and inference in resource-constrained clinical NLP
environments where eficiency is crucial. It performs comparably to the fine-tuned LLM across most
categories, with the exception of the “conditional” category. However, when integrated into the
Healthcare NLP pipeline, its contribution of absent and hypothetical labels helps mitigate this
limitation.</p>
        <p>The BFSC model highlights the power of domain adaptation in clinical NLP tasks. By
leveraging the domain-specific BioBERT language model and employing a sequence classifier,
this approach demonstrates superior performance due to its fine-tuning on meticulously curated
training data. While the BFSC model slightly underperforms compared to AssertionDL in the
conditional label category, its performance is close to the benchmark, and it holds potential for
further improvement through strategic augmentation of the training dataset.</p>
        <p>Despite its superior performance, LLM-based solutions come with substantial computational
costs, requiring GPUs to run eficiently while still being slower. In our benchmarks, what
takes around 3 seconds using our deep-learning-based approach on a CPU requires around 300
seconds on a GPU-powered LLM, which is 100× slower. Given that GPU instances cost more
than CPU instances—often 10–50× higher per hour—the operational cost of running
LLM-based assertion detection can be thousands of times more expensive for only a
1-2% accuracy gain. This highlights the trade-of between accuracy and feasibility, where our
lightweight, domain-adapted models provide a far more scalable and cost-efective alternative
for real-world clinical NLP applications (see Table A3).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Limitations</title>
      <p>While this study demonstrates notable advancements in clinical assertion detection, several
limitations should be acknowledged. The models were benchmarked exclusively on the i2b2
dataset, which may limit generalizability to diverse clinical contexts. However, beyond this study,
there are numerous models that have been trained for use in various domains such as oncology
and radiology. Their F1 scores are available in Section A.5, Pre-trained Assertion Models in
Healthcare NLP, and Table A4. These models can be accessed via the JohnSnowLabs Model Hub
page[31]. Performance on underrepresented assertion types (e.g., conditional, associated with
someone else) could vary in real-world settings with diferent label distributions.</p>
      <p>Although the fine-tuned LLM achieved state-of-the-art accuracy, its GPU dependency and
100× slower inference speed compared to CPU-based DL models raise practical scalability
concerns, potentially hindering deployment in resource-constrained healthcare environments.
Despite addressing label skew (e.g., absent and present dominate the dataset), minority classes
like hypothetical (3.5% prevalence) and conditional (1.2%) still showed lower F1 scores,
suggesting residual bias in model predictions.</p>
      <p>Commercial APIs (AWS, Azure) were evaluated only on overlapping entities detected by their
proprietary NER systems, introducing selection bias, and partial matches (28–35% of cases) may
have skewed performance metrics for these systems.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Ethical Considerations</title>
      <p>The development of clinical assertion detection models necessitates ethical scrutiny due to
potential biases, privacy risks, and implications for patient care. The i2b2 dataset may contain
demographic biases, risking inequitable model performance across populations. Future work
should incorporate fairness audits and demographic stratification to mitigate these risks. Privacy
concerns arise from processing sensitive patient data, particularly with cloud-based APIs (e.g.,
GPT-4o, AWS, Azure), necessitating transparent data governance frameworks for compliance
with HIPAA and GDPR. The lack of interpretability in black-box models threatens clinical
trust, underscoring the need for explainability tools to audit model decisions. Over-reliance on
automation may lead to uncritical adoption in healthcare workflows, necessitating
human-inthe-loop validation mechanisms. Additionally, the high computational cost of LLM training
raises sustainability concerns, warranting eficiency-focused approaches. Future research
should prioritize bias mitigation, open fairness benchmarks, ethical model documentation, and
federated learning to enhance privacy and equity.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this study, we present a comprehensive evaluation of JSL’s state-of-the-art assertion
detection models, covering architectures from lightweight deep learning (DL) models to advanced
ifne-tuned LLMs. Overall, our fine-tuned LLM achieves the highest overall accuracy (0.962),
outperforming GPT-4o (0.901) and commercial APIs by a notable margin, particularly in Present,
Absent, and Hypothetical assertions. However, this comes at a high computational cost; Our
DL-based models run 100× faster on a CPU than the LLM on a GPU, while the LLM is thousands
of times more expensive for just 1-2% better accuracy. This highlights the impracticality of
LLM-based assertion detection for real-time, scalable clinical NLP.</p>
      <p>Our AssertionDL and FewShotAssertion models provide strong, eficient alternatives,
excelling in categories like Conditional and Associated with someone else assertions, while BFSC
achieves near-parity with our fine-tuned LLM. The Combined Pipeline outperforms all
commercial solutions and ofers a balance of accuracy and eficiency. As part of a scalable,
productionready Healthcare NLP library, these models seamlessly integrate with other clinical NLP
components, enabling robust, high-performance assertion detection at scale. Our results highlight
that smaller, domain-specific models outperform commercial black-box solutions like
GPT-4o, Azure AI, and AWS Medical Comprehend in both accuracy and scalability.
Integrated within Spark NLP, our pretrained assertion models and model architectures provide
production-ready, cost-efective alternatives for clinical text analysis, filling a critical gap in
extracting accurate medical insights.</p>
      <p>Software Impacts 13 (2022) 100373.
[16] V. Kocaman, D. Talby, Spark nlp: natural language understanding at scale, Software</p>
      <p>Impacts 8 (2021) 100058.
[17] H. U. Haq, V. Kocaman, D. Talby, Deeper clinical document understanding using relation
extraction, 2021. URL: https://arxiv.org/abs/2112.13259. arXiv:2112.13259.
[18] H. U. Haq, V. Kocaman, D. Talby, Mining adverse drug reactions from unstructured
mediums at scale, 2022. URL: https://arxiv.org/abs/2201.01405. arXiv:2201.01405.
[19] F. Fancellu, A. Lopez, B. Webber, Neural networks for negation scope detection, in:
K. Erk, N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Berlin, Germany, 2016, pp. 495–504. URL: https://aclanthology.org/P16-1047/.
doi:10.18653/v1/P16-1047.
[20] A. Grattafiori, A. Dubey, A. J. et al, The llama 3 herd of models, 2024. URL: https://arxiv.</p>
      <p>org/abs/2407.21783. arXiv:2407.21783.
[21] X. Wang, L. Aitchison, M. Rudolph, Lora ensembles for large language model fine-tuning,
arXiv preprint arXiv:2310.00035 (2023).
[22] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
transformers for language understanding, in: Proceedings of the 2019 conference of the
North American chapter of the associati on for computational linguistics: human language
technologies, volume 1 (long and short papers), 2019, pp. 4171–4186.
[23] E. Alsentzer, J. R. Murphy, W. Boag, W.-H. Weng, D. Jin, T. Naumann, M. B. A. McDermott,
Publicly available clinical bert embeddings, 2019. URL: https://arxiv.org/abs/1904.03323.
arXiv:1904.03323.
[24] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical
language representation model for biomedical text mining, Bioinformatics 36 (2020)
1234–1240.
[25] L. Tunstall, N. Reimers, U. E. S. Jo, L. Bates, D. Korat, M. Wasserblat, O. Pereg,
Eficient few-shot learning without prompts, 2022. URL: https://arxiv.org/abs/2209.11055.
arXiv:2209.11055.
[26] H. Harkema, J. N. Dowling, T. Thornblade, W. W. Chapman, Context: an algorithm for
determining negation, experiencer, and temporal status from clinical reports, Journal of
biomedical informatics 42 (2009) 839–851.
[27] W. W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, B. G. Buchanan, A simple
algorithm for identifying negated findings and diseases in discharge summaries, Journal of
biomedical informatics 34 5 (2001) 301–10. URL: https://api.semanticscholar.org/CorpusID:
6315215.
[28] O. Uzuner, B. R. South, S. Shen, S. L. DuVall, 2010 i2b2/va challenge on concepts,
assertions, and relations in clinical text, Journal of the American Medical
Informatics Association 18 (2011) 552–556. URL: https://doi.org/10.1136/amiajnl-2011-000203.
doi:10.1136/amiajnl-2011-000203.
[29] S. Wang, L. Tang, A. Majety, J. F. Rousseau, G. Shih, Y. Ding, Y. Peng, Trustworthy assertion
classification through prompting, Journal of biomedical informatics 132 (2022) 104139.
[30] J. Zhao, et al., Lora land: 310 fine-tuned llms that rival gpt-4, a technical report, arXiv
preprint arXiv:2405.00732 (2024).
[31] JohnSnowLabs, Johnsnowlabs model hub, https://nlp.johnsnowlabs.com/models, 2025.</p>
      <p>Accessed: March 5, 2025.</p>
      <sec id="sec-6-1">
        <title>A.1 A Spark NLP pipeline</title>
      </sec>
      <sec id="sec-6-2">
        <title>A.2 Entity Overlapping Rates</title>
        <p>*associated with someone else: Refers to medical conditions related to individuals other than the patient, such as family
Model
Fine-Tuned LLM
BFSC (BioBert)
AssertionDL
FewShotAssertion
Combined Pipeline
ContextualAssertion
GPT-4o
Azure Ai Textanalytics
AWS Comprehend</p>
        <p>Negex
members.
Table A3</p>
        <sec id="sec-6-2-1">
          <title>Mean latency per 100 rows, measured in seconds for various assertion methods. Experiments were run</title>
          <p>on Google Colab servers, with CPU tasks performed on a CPU instance (8vCPU @ 2.2 GHz, 50.99 GB</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>RAM) and GPU tasks executed on an NVIDIA A100 GPU (40 GB HBM2).</title>
        </sec>
      </sec>
      <sec id="sec-6-3">
        <title>A.5 Pretrained Assertion Models in Healthcare NLP</title>
      </sec>
      <sec id="sec-6-4">
        <title>A.6 GPT Prompt</title>
        <sec id="sec-6-4-1">
          <title>GPT-4o Prompt</title>
        </sec>
        <sec id="sec-6-4-2">
          <title>You are a highly experienced medical data expert specializing in patient medical records.</title>
        </sec>
        <sec id="sec-6-4-3">
          <title>In this context, an assertion refers to the sentiment or condition associated with a specific medical entity within the context of a patient’s record. This helps determine whether symptoms or conditions are present, absent, possible, hypothetical, or related to someone else, enhancing the precision of medical documentation and analysis.</title>
          <p>Your task is to detect the assertion status of medical conditions mentioned in notes. The possible assertion
types are:
• **absent**: condition is explicitly negated
• **associated_with_someone_else**: condition refers to someone other than the patient
• **conditional**: condition is mentioned as contingent on another factor
• **hypothetical**: condition is part of a hypothetical scenario
• **possible**: condition is suggested as a possibility but not confirmed
• **present**: condition is clearly present for the patient
### Instructions:
1 Analyze the input TEXT and identify the assertion status of the TARGET condition.
2 Format your answer in valid JSON, using double quotes for both keys and values.</p>
          <p>3 If multiple assertions are required, choose the most confident one.
### EXAMPLE INPUT
{
“TEXT”: “She was then started on Heparin with transition to Coumadin (goal INR of 2-3 secondary to h/o
bilateral DVTs).”,
“TARGET”: “bilateral DVT”
}
### INPUT
{
“TEXT”: “text”,
“TARGET”: “target”
}
### Your Answer in JSON:</p>
        </sec>
        <sec id="sec-6-4-4">
          <title>Provide a JSON object where the text and assertion type are the key-value pairs.</title>
        </sec>
        <sec id="sec-6-4-5">
          <title>Example Output Format:</title>
          <p>{
“TARGET”: “bilateral dvt”,
“ASSERTION_STATUS”: “present”
}
Figure A2: Example of GPT-4o prompt for detecting assertion status in medical records</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>A.7 Fine-tuned LLM Prompt</title>
        <sec id="sec-6-5-1">
          <title>Fine-tuned LLM Prompt</title>
        </sec>
        <sec id="sec-6-5-2">
          <title>You are provided with a document and an extracted entity (chunk).</title>
          <p>Your job is to analyze the document and the chunk, understand the context, and assign one of the following
statuses to the chunk:
• **present**: If the chunk is mentioned in the context of the person. *Example*: “He has a fractured
ankle.”
• **absent**: If the chunk is explicitly negated by the person. *Example*: “He did not sufer from
pain.” (In this case, “pain” is absent/negated.)
• **hypothetical**: If the chunk is mentioned in a hypothetical scenario or as part of guidelines.</p>
          <p>*Example*: “Adults above 70 are at greater risk of cancer.” (Here, “cancer” is hypothetical.)
• **possible**: If the chunk is mentioned in a way that implies possibility. *Example*: “Possible
fracture.”
• **associated_with_someone_else**: If the condition refers to someone other than the patient.</p>
          <p>*Example*: “Her mother has breast cancer.”
### Document:
{
“DOCUMENT”: “doc”
}
### Chunk:
{
“CHUNK”: “chunks”
}</p>
        </sec>
        <sec id="sec-6-5-3">
          <title>Example Output Format:</title>
          <p>{
“CHUNK”: “fractured ankle”,
“ASSERTION_STATUS”: “present”
}
### Your Answer in JSON:</p>
        </sec>
        <sec id="sec-6-5-4">
          <title>Provide a JSON object where the chunk and assertion status are the key-value pairs.</title>
          <p>Figure A3: Example of Fine-tuned LLM prompt for detecting assertion status in medical records
A.8 Healthcare NLP Pipeline
42
43
44
45 few_shot_assertion_classifier = FewShotAssertionClassifierModel()\
46 .pretrained("fewhot_assertion_i2b2_e5_base_v2_i2b2", "en", "clinical/models")\
47 .setInputCols(["assertion_embedding"])\
48 .setOutputCol("assertion_fewshot")
3
4
5
6 contextual_assertion_conditional = ContextualAssertion.pretrained("contextual_assertion_conditional
","en","clinical/models")\
7 .setInputCols("sentence", "token", "ner_chunk") \
8 .setOutputCol("ca_conditional")
9 #Merger
10 assertionMerger_fewshot = AssertionMerger()\
11 .setInputCols("assertion_fewshot")\
12 .setOutputCol("assertion_merger_fewshot")\
13 .setWhiteList(["absent","hypothetical"])
14
15 assertionMerger_dl = AssertionMerger()\
16 .setInputCols("assertionDL")\
17 .setOutputCol("assertion_merger_dl")\
18 .setWhiteList(["associated_with_someone_else","conditional"])
19
20 assertionMerger_all = AssertionMerger()\
21 .setInputCols("assertionDL","assertion_fewshot","ca_possible")\
22 .setOutputCol("assertion_merger_all")\
23 .setMergeOverlapping(True)\
24 .setMajorityVoting(False)\
25 .setOrderingFeatures(["confidence"])\
26 .setWhiteList(["present","possible"])\
27 .setApplyFilterBeforeMerge(True)
28
29 assertionMerger_final = AssertionMerger()\
30 .setInputCols("assertion_merger_fewshot","assertion_merger_dl","assertion_merger_all","
ca_conditional")\
.setOutputCol("assertion_merger")\
.setMergeOverlapping(True)\
.setMajorityVoting(True)\
.setOrderingFeatures(["confidence"])\
#Pipeline
pipeline = Pipeline(stages=[
document_assembler,
tokenizer,
converter,
few_shot_assertion_converter,
e5_embeddings,
few_shot_assertion_classifier,
word_embeddings_100,
clinical_assertion_100,
assertionMerger_fewshot,
contextual_assertion_conditional,
contextual_assertion_possible,
assertionMerger_dl,
assertionMerger_all,
assertionMerger_final</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Ofice of the National Coordinator for Health Information Technology, Adoption of electronic health records by hospital service type 2019-2021</article-title>
          , health it quick stat 60,
          <year>2022</year>
          . Available at: https://www.healthit.gov/data/quickstats/adoption
          <article-title>-electronic-health-recordshospital-service-</article-title>
          <string-name>
            <surname>type-</surname>
          </string-name>
          2019-
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fu</surname>
          </string-name>
          , S. Liu,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Hanauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kavuluru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Natarajan</surname>
          </string-name>
          , et al.,
          <article-title>A case demonstration of the open health natural language processing toolkit from the national covid-19 cohort collaborative and the researching covid to enhance recovery programs for a natural language processing system for covid-19 or postacute sequelae of sars cov-2 infection: Algorithm development and validation</article-title>
          ,
          <source>JMIR medical informatics 12</source>
          (
          <year>2024</year>
          )
          <article-title>e49997</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Bridewell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. F.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Buchanan</surname>
          </string-name>
          ,
          <article-title>A simple algorithm for identifying negated findings and diseases in discharge summaries</article-title>
          ,
          <source>Journal of biomedical informatics 34</source>
          (
          <year>2001</year>
          )
          <fpage>301</fpage>
          -
          <lpage>310</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dowling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Context:</surname>
          </string-name>
          <article-title>An algorithm for identifying contextual features from clinical text</article-title>
          , in: Biological, translational,
          <source>and clinical language processing</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Mutalik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Deshpande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Nadkarni</surname>
          </string-name>
          ,
          <article-title>Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the umls</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>8</volume>
          (
          <year>2001</year>
          )
          <fpage>598</fpage>
          -
          <lpage>609</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Aronow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fangfang</surname>
          </string-name>
          , W. B.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          ,
          <article-title>Ad hoc classification of radiology reports</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>6</volume>
          (
          <year>1999</year>
          )
          <fpage>393</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cuadros</surname>
          </string-name>
          , G. Rigau,
          <article-title>Negation and speculation processing: A study on cuescope labelling and assertion classification in spanish clinical text</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>145</volume>
          (
          <year>2023</year>
          )
          <article-title>102682</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.artmed.
          <year>2023</year>
          .
          <volume>102682</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ö.</given-names>
            <surname>Uzuner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Sibanda,
          <article-title>Machine learning and rule-based approaches to assertion classification</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>16</volume>
          (
          <year>2009</year>
          )
          <fpage>109</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <article-title>Joint entity extraction and assertion detection for clinical text, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>954</fpage>
          -
          <lpage>959</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Attention-based deep learning system for negation and assertion detection in clinical notes</article-title>
          ,
          <source>International Journal of Artificial Intelligence and Applications</source>
          (IJAIA)
          <volume>10</volume>
          (
          <year>2019</year>
          ). URL: https://ssrn.com/abstract=3342402.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>B. van Aken</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Trajanovska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Siu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayrdorfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Budde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Loeser</surname>
          </string-name>
          ,
          <article-title>Assertion detection in clinical notes: Medical language models to the rescue?</article-title>
          , in: C. Shivade,
          <string-name>
            <given-names>R.</given-names>
            <surname>Gangadharaiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Konam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          Wallace (Eds.),
          <source>Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations</source>
          , Association for Computational Linguistics, Online,
          <year>2021</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>40</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . nlpmc-
          <volume>1</volume>
          .5/. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .nlpmc-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Majety</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Rousseau</surname>
          </string-name>
          , G. Shih,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <article-title>Trustworthy assertion classification through prompting</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          <volume>132</volume>
          (
          <year>2022</year>
          )
          <article-title>104139</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S1532046422001538. doi:https: //doi.org/10.1016/j.jbi.
          <year>2022</year>
          .
          <volume>104139</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Black-box segmentation of electronic medical records</article-title>
          ,
          <year>2024</year>
          . URL: https: //arxiv.org/abs/2409.19796. arXiv:
          <volume>2409</volume>
          .
          <fpage>19796</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Assertion detection large language model in-context learning lora ifne-tuning</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2401.17602. arXiv:
          <volume>2401</volume>
          .
          <fpage>17602</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kocaman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Talby</surname>
          </string-name>
          ,
          <article-title>Accurate clinical and biomedical named entity recognition at scale,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>1 #Contextual Assertion Models</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <article-title>2 contextual_assertion_possible = ContextualAssertion.pretrained("contextual_assertion_possible","en" ,"clinical/models")\ .setInputCols("sentence", "token", "ner_chunk") \ .setOutputCol("ca_possible")</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>