<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>S. Eugin);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>RECAIDSTechTitans at CLEF 2025: Simplifying Scientific Text and Identifying Spurious Sentences using T5</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stergio Eugin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beulah A</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sathvikha V</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sangamithra V</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Artificial Intelligence and Data Science, Rajalakshmi Engineering College</institution>
          ,
          <addr-line>Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Scientific texts in the biomedical domain are often complex and inaccessible to non-experts, posing challenges for comprehension and decision support. The CLEF 2025 SimpleText track aims to evaluate systems that can simplify scientific content and identify hallucinations in generated text. In this paper, we describe the participation of the RECAIDSTechTitans team in four subtasks: Task 1.1 (Sentence Simplification), Task 1.2 (Document Simplification), Task 2.1 (Spurious Sentence Detection - Post-hoc), and Task 2.2 (Spurious Sentence Detection - Sourced). We developed prompt-based fine-tuning pipelines using T5 transformer models, tailored for each subtask using lightweight preprocessing and structured input prompts. Our best performance was observed on Task 1.2 with a SARI score of 33.89, demonstrating the potential of compact, domain-adapted models in biomedical NLP. We also explored hallucination detection challenges, highlighting the need for future work in error-aware generation and model grounding strategies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Simplifying Text</kwd>
        <kwd>T5 Model</kwd>
        <kwd>Source Classification</kwd>
        <kwd>Topic Identification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The CLEF 2025 SimpleText Track focuses on enhancing the accessibility of scientific texts through
simplification and controlled content generation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It introduces multiple subtasks to evaluate
systems on their ability to process, simplify, and classify complex biomedical information. Our team,
RECAIDSTechTitans from Rajalakshmi Engineering College, participated in this track by addressing
key classification tasks involving both sourced and unsourced scientific paragraphs. These included
Task 1.1 and Task 2.1 for source classification, and Task 1.2 and Task 2.2 for topic identification, making
our participation cover a wide range of challenges defined in the guidelines.
      </p>
      <p>Our primary goal was to explore how transformer-based architectures, particularly T5, could be
adapted for domain-specific classification problems in scientific and biomedical contexts. These tasks
are essential for improving structured access to knowledge and play a key role in the simplification
pipeline—enabling better content indexing, retrieval, and interpretation. We developed text-to-text
classification pipelines using HuggingFace’s T5 models, which were trained on paragraph-level inputs
with carefully designed prompt formats. Training and inference were conducted using Google Colab
and relied on curated CSV datasets provided by the organizers.</p>
      <p>
        The research on text simplification, particularly within scientific and technical domains, highlights
various approaches and challenges in making complex texts more accessible. He et. al., emphasize the
importance of alternative representations beyond traditional text simplification, such as graph-based
visualizations, to enhance consumer comprehension of dietary supplement information [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Their
study demonstrates that diferent simplification strategies, including manual and syntactic/lexical
simplification, can influence understanding, suggesting that multimodal approaches may be beneficial.
      </p>
      <p>
        Spring et. al., explore multi-level text simplification in German, utilizing source labels and pretraining
to adapt standard language to specific CEFR levels [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Their work illustrates the potential of automatic
text simplification (ATS) to produce graded versions of technical language, facilitating language learning
and comprehension across diferent proficiency levels. Addressing non-English languages, Aumiller et.
al., introduce Klexikon, a dataset for joint summarization and simplification in German, emphasizing
resource scarcity and the need for multilingual solutions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Lu et. al., propose NapSS, a two-stage
strategy combining summarization and simplification at the paragraph level, with a focus on preserving
narrative flow and content relevance, which are vital for scientific texts that often contain intricate
information [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Fatima et. al., extend the scope to cross-lingual science journalism, aiming to generate summaries
in local languages for non-expert audiences [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Their work frames cross-lingual summarization as a
downstream task of text simplification, highlighting the importance of linguistic and cultural adaptation
in scientific communication. Finally, Agrawal et. al., investigate the control of pre-trained language
models for grade-specific text simplification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Their empirical study demonstrates that various control
mechanisms can influence the adequacy and simplicity of simplified texts, underscoring the significance
of adjustable systems to meet diverse user needs. Collectively, these studies illustrate a multifaceted
approach to scientific text simplification, encompassing linguistic, computational, and multimodal
strategies. They highlight the importance of domain-specific resources, evaluation metrics, and control
mechanisms to efectively make scientific information more accessible to non-expert audiences.
      </p>
      <p>
        The identification of hallucination in text generated by multimodal models has garnered significant
research attention, with various approaches addressing the underlying causes and detection methods.
Cui et. al., highlight that while models like GPT-4V(ision) can process visual and textual data
simultaneously, their hallucination behaviors remain inadequately understood, prompting the development of
benchmarks such as Bingo to systematically assess bias and interference challenges associated with
hallucinations [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Similarly, Han et. al., introduce Correlation QA, a benchmark designed to quantify
hallucination levels in models given spurious images, revealing that mainstream multimodal large
language models (MLLMs) are universally susceptible to biases stemming from spurious visual inputs
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Mitigation strategies are also a focal point. Chen et. al., emphasized that enhancing vision annotations
and employing more discriminative vision models can improve the accuracy of responses, thereby
reducing hallucinations [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In another work Liu et. al., proposes a latent space steering technique,
Visual and Textual Intervention (VTI), which aims to stabilize vision features during inference by
intervening in the latent representations, thus decreasing hallucination occurrences [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. From a causal
perspective, Li et. al., hypothesize that hallucinations result from unintended direct influences of
individual modalities bypassing proper fusion, suggesting that addressing these causal pathways can
mitigate hallucination [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Further insights into the internal mechanisms of hallucination are provided by Yang et. al., who find
that hallucinations tend to concentrate in deeper layers of LVLMs, with a strong attention bias toward
text tokens [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This understanding informs targeted interventions to mitigate hallucination at specific
model depths. At the token level, Ogasa et. al., develop attention-based features for hallucination
detection, demonstrating improved performance in longer input contexts typical of data-to-text and
summarization tasks [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        In specialized domains such as medical imaging, Khanal et. al., introduce hallucination-aware
ifnetuning, which not only detects but also corrects hallucinations, addressing the critical implications
of hallucination in sensitive applications [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Similarly, Shu et. al., investigate semantic hallucinations
in scene text understanding, finding that transformer layers with a stronger focus on scene text regions
are less prone to semantic hallucinations, and proposing methods to mitigate these issues efectively
[16]. Overall, these studies collectively advance the understanding of hallucination in text generated
by multimodal models, emphasizing the importance of benchmarks, causal analysis, internal model
mechanisms, and domain-specific mitigation techniques to improve the reliability of text outputs.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Experimental Setup</title>
      <p>
        The datasets used in the CLEF 2025 SimpleText Track were designed to evaluate the performance
of NLP models in classifying scientific text at both the source and topic levels. Four subtasks were
addressed: Task 1.1 (Sentence Simplification), Task 1.2 (Document Simplification), Task 2.1 (Spurious
Sentence Detection (Post-hoc)), and Task 2.2 (Spurious Sentence Detection (Sourced)). Each task
involved predicting labels for biomedical paragraphs that originated from diverse sources such as
Cochrane Reviews and PubMed abstracts. For all tasks, the data was structured in CSV format and
included clearly labeled training, validation, and test splits. Each row in the dataset consisted of a single
paragraph of text and an associated label, either indicating its source (e.g., "Cochrane", "PubMed") or its
primary topic (e.g., "Treatment", "Diagnosis", "Epidemiology"). The sourced versions of the tasks (2.1
and 2.2) also included metadata linking the paragraph back to its originating document, which is useful
for evaluating faithfulness and alignment in simplification scenarios [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Example (Document Simplification - Task 1.2):
"This study analyzed 450 patients undergoing chemotherapy to compare the eficacy of drug A versus
drug B in reducing tumor size over a six-month period. The results suggest drug A had a higher response
rate."</p>
      <p>Label (Topic): Treatment</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>Our approach to the CLEF 2025 SimpleText Track was centered around the use of the T5 transformer
model, known for its flexibility in handling text-to-text tasks. We preprocessed the input paragraphs
by formatting them into prompts such as "classify: &lt;paragraph&gt;" to suit the requirements of source
and topic classification. This prompt-driven setup allowed us to train a unified model that could
generalize across both scientific and biomedical content. We used HuggingFace’s implementation of T5
and performed training and validation on Google Colab with eficient memory and compute scaling.
To handle all four subtasks—Task 1.1, Task 1.2, Task 2.1, and Task 2.2 we maintained a consistent
architecture and adjusted only the input encoding and dataset splits. The model training employed
a learning rate of 3e-4, batch size of 16, and ran for 4 to 10 epochs depending on task complexity.
For sourced tasks (2.1 and 2.2), metadata was included in evaluation to assess model grounding and
consistency with the source document. We ensured that all outputs adhered strictly to CLEF’s
JSONbased submission format with uniquely defined run_ids for tracking. Beyond standard classification, we
also conducted internal experiments using CLEF’s distortion error taxonomy to test model robustness
against simplification inconsistencies [ 17]. Although formal submission for hallucination detection tasks
wasn’t part of our main track, our system was evaluated informally for issues like overgeneralization
and topic shift. Overall, the pipeline demonstrated that prompt-based T5 models ofer a scalable and
efective solution for low-resource scientific text classification. The workflow is shown in Figure 1.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Analysis</title>
      <p>Our best result was achieved on Task 1.2, with a score of 33.8926. While T5 performed consistently
across subtasks, some submissions showed negative scores due to format errors or model limitations.
We observed that diferent submission configurations impacted results significantly, pointing to the
need for thorough tuning.</p>
      <sec id="sec-4-1">
        <title>4.1. Results for Text Simplification: Simplify scientific text</title>
        <p>In this task, the primary evaluation focuses on 37 abstracts comprising 587 aligned sentences, following
the same alignment format as Cochrane-auto and related datasets. As participants, we note that for
this track overview paper, the organizers chose to evaluate all submissions for both Task 1.1 and Task
1.2 at the document level. This unified approach ensures consistency in ground truth and allows for
comparable scoring across the two tasks. Similarly additional evaluation on the larger set of 217 abstracts
with 4,293 source sentences paired with 217 plain language summaries with 3,641 sentences were done.
Results for Task 1.1 is presented in Table 1. Similarly, the results for Task 1.2 is presented in Table 2.
To better illustrate our system’s behavior, we include a few examples of sentence-level simplification
below:
• Original: “The treatment led to a statistically significant improvement in survival rate.”</p>
        <p>Simplified: “The treatment helped patients live longer.”
• Original: “Participants were excluded if they had prior exposure to the drug in the last 6 months.”</p>
        <p>Simplified: “People who used the drug in the last 6 months were not included.”
• Original: “This systematic review included 25 studies evaluating the impact of physical activity
on blood pressure. Most studies were randomized controlled trials involving adults over 50.
The authors concluded that moderate exercise significantly reduced systolic and diastolic blood
pressure levels.”
Simplified: “This review looked at 25 studies on how exercise afects blood pressure. Most
involved people over 50. The authors found that moderate exercise lowered blood pressure.”</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results for Controlled Creativity: identify and avoid hallucination</title>
        <p>Task 2 is tested with a corpus comprising 2,659 manually annotated sentence–simplification pairs. Each
simplified sentence could exhibit multiple error types, framing the task as a multi-label classification
problem. The provided error taxonomy was hierarchically structured into four main categories (A–D),
each encompassing several fine-grained error types. Evaluation metrics included both micro and macro
F1 scores computed at the group level. Additionally, a "No Error" class was used to indicate cases where
no simplification errors were detected. The results for Task 2.1 and 2.2 are shown Table 3 and Table 4
respectively.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This work demonstrates the potential of transformer-based architectures like T5 for text simplification
and classification in scientific domains. Our models showed notable performance, especially in topic
classification tasks. Future work will explore model ensembling, grounding methods, and evaluation
on hallucination-related subtasks to improve system robustness. The results of our T5-based system
for CLEF 2025 SimpleText Track Task 1 demonstrate that careful model selection and fine-tuning
on domain-specific data can outperform larger, general-purpose models. While some teams used
massive models such as Zephyr-7B or Flan-T5, our more modest configuration achieved competitive or
superior performance in both primary (SARI) and secondary metrics. This outcome aligns with findings
from previous ImageCLEF tracks, including MedCLIP, where resource-eficient transformer models
showed strong potential when adapted correctly to biomedical and scientific text. Moreover, our results
underscore the importance of data preprocessing and submission formatting, as initial failures were
caused not by modeling limitations but by submission protocol violations.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We thank the organizers of the CLEF 2025 SimpleText Track for curating the datasets and organizing the
shared task. We also thank our faculty mentors and the Department of Artificial Intelligence and Data
Science at Rajalakshmi Engineering College for their support and motivation throughout this project.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the authors used ChatGPT and Grammarly for grammar
correction, spelling improvement, and sentence rephrasing. These tools were only used to enhance the
readability and language quality of the paper. All technical content, including model design, experiments,
and analysis, was produced entirely by the authors. The authors reviewed all AI-assisted edits and take
full responsibility for the final content.
[16] Y. Shu, H. Lin, Y. Liu, Y. Zhang, G. Zeng, Y. Li, Y. Zhou, S.-N. Lim, H. Yang, N. Sebe, When semantics
mislead vision: Mitigating large multimodal models hallucinations in scene text spotting and
understanding, arXiv preprint arXiv:2506.05551 (2025).
[17] M. Shardlow, R. Nawaz, Neural text simplification of clinical notes with domain-specific pretraining,
in: Proceedings of the 20th Workshop on Biomedical Language Processing (BioNLP 2021), 2021,
pp. 91–100.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ermakova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Azarbonyad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bakker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Vendeville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kamps</surname>
          </string-name>
          ,
          <article-title>Clef 2025 simpletext track</article-title>
          , in: C.
          <string-name>
            <surname>Hauf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Jannach</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kazai</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Nardini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Pinelli</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Silvestri</surname>
          </string-name>
          , N. Tonellotto (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          , pp.
          <fpage>425</fpage>
          -
          <lpage>433</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Alpert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raisa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <article-title>When text simplification is not enough: could a graph-based visualization facilitate consumers' comprehension of dietary supplement information?</article-title>
          ,
          <source>JAMIA open 4</source>
          (
          <year>2021</year>
          )
          <article-title>ooab026</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Spring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ebling</surname>
          </string-name>
          ,
          <article-title>Exploring german multi-level text simplification (</article-title>
          <year>2021</year>
          )
          <fpage>1339</fpage>
          -
          <lpage>1349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Aumiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gertz</surname>
          </string-name>
          ,
          <article-title>Klexikon: A German dataset for joint summarization and simplification</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Béchet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cieri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Goggi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Isahara</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Mazo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>2693</fpage>
          -
          <lpage>2701</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .288/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , G. Pergola,
          <article-title>NapSS: Paragraph-level medical text simplification via narrative prompting and sentence-matching summarization</article-title>
          ,
          <source>in: EACL</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>1049</fpage>
          -
          <lpage>1061</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .findings-eacl.
          <volume>80</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fatima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Strube</surname>
          </string-name>
          ,
          <article-title>Cross-lingual science journalism: Select, simplify and rewrite summaries for non-expert readers</article-title>
          ,
          <source>in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2023</year>
          , pp.
          <fpage>1843</fpage>
          -
          <lpage>1861</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carpuat</surname>
          </string-name>
          ,
          <article-title>Controlling pre-trained language models for grade-specific text simplification</article-title>
          ,
          <source>in: Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>12807</fpage>
          -
          <lpage>12819</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2023</year>
          .emnlp-main.
          <volume>790</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Zou,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <article-title>Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges</article-title>
          ,
          <source>arXiv preprint arXiv:2311.03287</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Diao,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. Zhang,</surname>
          </string-name>
          <article-title>The instinctive bias: Spurious images lead to illusion in MLLMs</article-title>
          ,
          <source>in: Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>16163</fpage>
          -
          <lpage>16177</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .emnlp-main.
          <volume>904</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Mitigating hallucination in visual language models with visual supervision</article-title>
          ,
          <source>arXiv preprint arXiv:2311.16479</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Reducing hallucinations in large vision-language models via latent space steering</article-title>
          ,
          <source>in: The Thirteenth International Conference on Learning Representations</source>
          ,
          <year>2025</year>
          . URL: https://openreview.net/forum?id=
          <fpage>LBl7Hez0fF</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Treble counterfactual vlms: A causal approach to hallucination</article-title>
          ,
          <source>arXiv preprint arXiv:2503.06169</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <article-title>Understanding and mitigating hallucination in large vision-language models via modular attribution and intervention</article-title>
          ,
          <source>in: The Thirteenth International Conference on Learning Representations</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ogasa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Arase</surname>
          </string-name>
          ,
          <article-title>Hallucination detection using multi-view attention features</article-title>
          ,
          <source>arXiv preprint arXiv:2504.04335</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Khanal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pokhrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhandari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shrestha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Gurung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Linte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. R.</given-names>
            <surname>Shrestha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bhattarai</surname>
          </string-name>
          ,
          <article-title>Hallucination-aware multimodal benchmark for gastrointestinal image analysis with large vision-language models</article-title>
          ,
          <source>arXiv preprint arXiv:2505.07001</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>