<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Embedding and Prompt-Driven Approaches for Named Entity Recognition, Entity Linking, and Clinical Code Prediction in Greek Discharge Summaries</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Poojan Vachharajani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Netaji Subhas University of Technology</institution>
          ,
          <addr-line>New Delhi</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper describes our participation in the ELCardioCC shared task under the team name pjmathematician, focusing on Named Entity Recognition (NER), Entity Linking (EL), and Multi-Label Classification with Explainable AI (MLC-X) over Greek cardiology discharge letters. For the NER and EL subtasks, we employed various configurations based on large language models, including Qwen2.5-32B-Instruct and a fine-tuned LoRA variant, combined with prompt engineering strategies. Entities were extracted from unstructured Greek clinical text and semantically matched to ICD-10 codes using embeddings from the multilingual-e5-large-instruct model. In the MLC-X task, we leveraged a larger Qwen2.5-72B model, guided by a candidate list of codes generated via multilingual document embeddings. This system explored cross-lingual semantic similarity and prompt-tuned LLMs to perform entity-level and document-level clinical coding in a low-resource language setting.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Clinical Natural Language Processing</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Entity Linking</kwd>
        <kwd>Large Language Models (LLMs)</kwd>
        <kwd>ICD-10 Coding</kwd>
        <kwd>Greek NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>
        The ELCardioCC task provided a specialized corpus of 1,000 Greek discharge letters from a cardiology
department for training and validation, and 500 letters for testing. These documents were annotated
with mentions (chief complaint, diagnosis, prior medical history, drugs, and cardiac echo) and their
corresponding ICD-10 codes. The dataset reflects the complexities of real-world clinical narratives,
including specialized terminology and abbreviations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A separate ‘codes.csv‘ file containing ICD-10
codes and their descriptions, and a ‘labelset.txt‘ with the target ICD-10 codes for the task were also
provided. For our fine-tuning experiments, the 1000-instance training set was split into a 700-instance
training and 300-instance validation set.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>Our approach varied across the subtasks, primarily utilizing LLMs for NER and initial predictions, and
sentence embeddings for EL and supporting MLC-X.</p>
      <sec id="sec-3-1">
        <title>3.1. Named Entity Recognition (NER) - Subtask 1</title>
        <sec id="sec-3-1-1">
          <title>For NER, we experimented with three main configurations:</title>
          <p>
            • Config 1 &amp; 2 (Base LLM) : These two configurations were identical experimental runs to assess the
stochasticity of the model’s output. Both used the base ‘Qwen/Qwen2.5-32B-Instruct‘ model [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
Inference was performed using LMDeploy [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] with a ‘TurbomindEngineConfig‘ and a generative
configuration set for sampling (‘top_p=0.8‘, ‘temperature=0.8‘). A detailed, zero-shot prompt
(see Appendix A.1) was designed to instruct the LLM to translate the Greek text, detect entities,
provide an English translation for each, and explain its relevance in a structured JSON format.
• Config 3 (LoRA Fine-tuned LLM) : This configuration utilized a LoRA [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] fine-tuned version
of ‘Qwen/Qwen2.5-32B-Instruct-AWQ‘. Fine-tuning was performed using the LLaMA-Factory
framework [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] on the 700-instance training split. A simpler prompt was used for fine-tuning,
focusing on direct entity extraction (see Appendix A.2). Key hyperparameters included a learning
rate of 5e-06, 2.0 training epochs, LoRA rank of 16, and LoRA alpha of 32. The final submission
used the checkpoint from training step 20, chosen based on preliminary validation.
The JSON output from the LLMs was parsed using a custom function to retrieve the list of entity mentions.
We noted occasional JSON parsing errors, a practical challenge when using LLMs for structured data
generation, which were handled with fallback logic.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Entity Linking (EL) - Subtask 2</title>
        <p>
          The EL subtask built upon the entities recognized in Subtask 1. For each extracted Greek entity, we
linked it to the most appropriate ICD-10 code from the ‘labelset.txt‘ using semantic similarity with the
‘intfloat/multilingual-e5-large-instruct‘ sentence transformer model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>1. Corpus Preparation: An ICD-10 code corpus was created by combining the 3-character codes
from ‘labelset.txt‘ with their detailed English descriptions from ‘codes.csv‘. Embeddings were
pre-computed for each code’s description.
2. Query Embedding: For each entity extracted by the LLM, a query embedding was generated
using an instructional format. If a contextual relevance description was available (from Config 1
&amp; 2), the prompt was: "Instruct: Given an entity, retrieve the related medical Disease\nQuery:
[entity_relevance_english]". If only the Greek entity was available, the prompt was: "Instruct:
Given a Greek entity, retrieve the related medical Disease\nQuery: [greek_entity]". This
dualprompting strategy aimed to leverage the richer context when available.
3. Similarity Matching: The cosine similarity between the query embedding and all ICD-10 code
embeddings was calculated. The ICD-10 code with the highest similarity was assigned.
The three EL configurations (‘config1‘, ‘config2‘, ‘config3‘) directly corresponded to the NER
configurations, using their respective entity outputs.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Multi-Label Classification with Explainable AI (MLC-X) - Subtask 3</title>
        <p>For the MLC-X subtask, we used a larger, more capable model, ‘Qwen/Qwen2.5-72B-Instruct-AWQ‘,
believing its enhanced reasoning abilities would be beneficial for this complex document-level task. A
two-stage approach was adopted:
1. Candidate Generation: To reduce the search space and guide the LLM, a candidate list of
ICD-10 codes was generated for each document. We created embeddings for the original Greek
text and its two diferent English translations (produced by our Config 1 and 2 systems) using
‘multilingual-e5-large-instruct‘. For each of the three texts, we found the top 20 most similar
ICD-10 codes from our corpus. The union of these three sets formed the final candidate list for
the LLM.
2. LLM-based Classification and Explanation : The 72B model was prompted with the Greek
text and the filtered list of candidate codes and their descriptions. The prompt (see Appendix A.3)
instructed the model to select the relevant codes and extract the exact Greek phrases justifying
each selection.</p>
        <sec id="sec-3-3-1">
          <title>Two final configurations were submitted:</title>
          <p>• Config 4 (MLC with Evidence) : The direct output of the 72B LLM, including both the predicted</p>
          <p>ICD-10 codes and their supporting textual evidence.
• Config 5 (MLC no Evidence) : From the same LLM output as Config 4, we extracted only the
predicted ICD-10 codes and submitted them with mention positions set to -1, as per task guidelines
for a classification-only submission.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>The oficial evaluation was performed on the test set of 500 discharge letters.</p>
      <sec id="sec-4-1">
        <title>4.1. NER and EL Results</title>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. MLC-X Results</title>
        <p>Table 3 presents our oficial results for Subtask 3a. For MLC-X, Config 5 (codes only) achieved a higher
F1-score than Config 4 (codes with evidence). This is likely because the Subtask 3a evaluation metric
penalizes for incorrect mention boundaries. By not providing them, Config 5 avoids this penalty,
resulting in a score that purely reflects the code classification accuracy. The stronger performance in
this subtask compared to EL suggests that the two-stage approach—using embeddings for candidate
ifltering and a larger LLM for final classification—is a more efective strategy for document-level coding.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Our experiments highlighted several challenges and insights. The primary challenge was the inherent
dificulty of clinical NLP in a low-resource language. The performance of EL was critically dependent
on the upstream NER task; any errors in entity recognition directly propagated, limiting the potential
of the linking module.</p>
      <p>The use of diferent LLM sizes was a conscious design choice. The 32B model was deemed suficient
for the more straightforward (though still challenging) NER task, while the larger 72B model was
reserved for the more complex MLC-X reasoning task, which involved selecting from a candidate list
and finding evidence. Our results suggest this was a reasonable approach, as the MLC-X F1-scores were
considerably higher than the EL scores.</p>
      <p>The LoRA fine-tuning experiment (Config 3) yielded interesting results. While it did not outperform
the zero-shot base model, it demonstrated a trade-of between precision and recall. With only two
training epochs and a small dataset (700 examples), the model may have been under-tuned. Further
training or more sophisticated prompt-tuning could potentially improve its performance.</p>
      <p>Finally, a practical challenge was the reliability of LLMs in generating perfectly formatted JSON,
with several parsing errors encountered during our experiments. This underscores the need for robust
post-processing and error-handling when integrating LLMs into structured data extraction pipelines.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We presented our systems for the ELCardioCC shared task, leveraging a combination of large language
models (Qwen2.5-32B and Qwen2.5-72B), LoRA fine-tuning, and multilingual sentence embeddings.
Our approach demonstrated the feasibility of using modern NLP techniques for NER, EL, and MLC-X
on Greek clinical text. The results indicate that while zero-shot prompting of capable LLMs provides a
strong baseline, the performance of downstream tasks like EL is highly sensitive to NER quality. For
document-level classification, a hybrid approach combining semantic retrieval for candidate generation
with a powerful LLM for final classification and explanation proved to be the most efective strategy.
Future work could explore joint NER and EL models to mitigate error propagation and more extensive
ifne-tuning to better adapt models to the specific clinical dialect.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank the organizers of the ELCardioCC shared task for providing the dataset and the evaluation
platform. We also acknowledge the developers of the open-source tools and models used in this work,
including Hugging Face Transformers, Sentence Transformers, LMDeploy, and LLaMA-Factory.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Gemini2.5 Pro in order to: Grammar and spelling
check. After using this tool, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-9">
      <title>A. Prompts Used in Experiments</title>
      <sec id="sec-9-1">
        <title>A.1. Prompt for Base LLM (Config 1 &amp; 2)</title>
        <p>This prompt was used for the zero-shot NER and EL tasks with the base ‘Qwen/Qwen2.5-32B-Instruct‘
model.
}
&lt;/JSON&gt;
} ,
. . .
" e n t i t y " : &lt; e x t r a c t e d e n t i t y h e r e ( same c a s e e x a c t l y how</p>
        <p>i t a p p e a r s i n t h e g r e e k t e x t ) &gt; ,</p>
      </sec>
      <sec id="sec-9-2">
        <title>A.2. Prompt for LoRA Fine-Tuning and Inference (Config 3)</title>
        <p>This simpler system prompt was used for fine-tuning and inference with the LoRA-adapted model,
focusing directly on entity extraction.</p>
      </sec>
      <sec id="sec-9-3">
        <title>A.3. Prompt for MLC-X Task (Config 4 &amp; 5)</title>
        <p>This prompt was used with the ‘Qwen/Qwen2.5-72B-Instruct-AWQ‘ model for the multi-label
classification task. The ‘‘ placeholders were populated with the Greek text and the candidate ICD-10 codes,
respectively.
# M e d i c a l Coding Task : Greek Text t o ICD10 C l a s s i f i c a t i o n
## Your Task
You w i l l a n a l y z e Greek m e d i c a l t e x t s and i d e n t i f y r e l e v a n t ICD10
c o d e s from a p r o v i d e d s e t . For each r e l e v a n t code , e x t r a c t t h e
s p e c i f i c Greek te rms / p h r a s e s from t h e t e x t t h a t j u s t i f y t h i s
c l a s s i f i c a t i o n .
2 . Analyze each p r o v i d e d ICD10 code and d e t e r m i n e i f i t a p p l i e s t o
t h e m e d i c a l t e x t
3 . For each r e l e v a n t ICD10 code , i d e n t i f y and e x t r a c t t h e e x a c t</p>
        <p>Greek terms / p h r a s e s t h a t s u p p o r t t h i s code a s s i g n m e n t
4 . Only i n c l u d e ICD10 c o d e s t h a t a r e c l e a r l y s u p p o r t e d by t h e t e x t
5 . E x t r a c t e n t i t i e s e x a c t l y a s t h e y a p p e a r i n t h e Greek t e x t (
p r e s e r v e e x a c t s p e l l i n g , a c c e n t s , and form )
6 . I f no c o d e s a r e r e l e v a n t , r e t u r n an empty ICD10 a r r a y
## Output Format
Respond i n v a l i d JSON f o r m a t with t h i s e x a c t s t r u c t u r e :
&lt;JSON&gt;
{
" ICD10 " : [</p>
        <p>{
}
&lt;/JSON&gt;</p>
        <p>USER_PROMPT = " " "
## Greek M e d i c a l Text
{ }
## ICD10 Codes
{ }
" " "
" code " : " A25 " ,
" e n t i t i e s " : [ " e x a c t g r e e k p h r a s e 1 " , " e x a c t g r e e k p h r a s e
2 " ]</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannakoulas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samaras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimitriadis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patsiou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bekiaridou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>ELCardioCC: Advancing Clinical Coding in Cardiology: A Challenge on Named Entity Recognition, Entity Linking, Multi-label Classification &amp; Explainable AI. Task Overview</article-title>
          . https://elcardiocc.web.auth.gr/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>ELCardioCC</given-names>
            <surname>Shared Task Organizers</surname>
          </string-name>
          . Dataset Description.
          <source>ELCardioCC Website</source>
          . https://elcardiocc. web.auth.gr/#dataset
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Qwen</given-names>
            <surname>Team</surname>
          </string-name>
          .
          <source>Qwen2.5 Technical Report</source>
          . https://qwenlm.github.io/blog/qwen1.5/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>LMDeploy</given-names>
            <surname>Contributors</surname>
          </string-name>
          .
          <article-title>LMDeploy: A High-throughput LLM Inference Engine</article-title>
          .
          <source>GitHub Repository</source>
          . https://github.com/InternLM/lmdeploy
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>E.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wallis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen-Zhu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , W. LoRA:
          <article-title>Low-Rank Adaptation of Large Language Models</article-title>
          .
          <source>In International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>LLaMA-Factory Contributors. LLaMA Factory: Unified LLaMA Fine-tuning Framework</surname>
          </string-name>
          .
          <source>GitHub Repository</source>
          . https://github.com/hiyouga/LLaMA-Factory
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Text Embeddings by Weakly-Supervised Contrastive</surname>
          </string-name>
          Pre-training.
          <source>arXiv preprint arXiv:2212.03533</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Dimitriadis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patsiou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoikopoulou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toumpas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kipouros</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papadopoulos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bekiaridou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barmpagiannos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasilopoulou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barmpagiannos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samaras</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannakoulas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tsoumakas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Overview of ElCardioCC Task on Clinical Coding in Cardiology at BioASQ 2025</article-title>
          . In CLEF 2025 Working Notes, edited by Guglielmo Faggioli, Nicola Ferro, Paolo Rosso, and Damiano Spina,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>