<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detection of Health Risks by Textual Analysis of Medical Documents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Martinez-Romo</string-name>
          <email>juaner@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lourdes Araujo</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arantza Casillas Rubio</string-name>
          <email>arantza.casillas@ehu.eus</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aitziber Atutxa</string-name>
          <email>aitziber.atucha@ehu.eus</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Early Disease Detection, Clinical Decision Support, Medical Text Mining, Predictive Health Analytics</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>HiTZ Basque Center for Language Technologies - Ixa (UPV/EHU)</institution>
          ,
          <addr-line>Manuel Lardizabal 1, 20018 Donostia</addr-line>
          ,
          <country>España</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Instituto Mixto UNED-ISCIII (IMIENS)</institution>
          ,
          <addr-line>Monforte de Lemos no 5 - Pabellón 7 - 2a Pl. 28029, Madrid</addr-line>
          ,
          <country>España</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NLP &amp; IR group - Universidad Nacional de Educación a Distancia (UNED). C/ Juan del Rosal</institution>
          ,
          <addr-line>16, 28040 Madrid</addr-line>
          ,
          <country>España</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The EDHER-MED project focuses on the early detection of health risks through Natural Language Processing (NLP) and Artificial Intelligence (AI). Led by two research groups - Natural Language Processing and Information Retrieval (NLP&amp;IR) group at National University of Distance Education and Hizkuntza Teknologiako Euskal Zentroa (HiTZ) group at University of the Basque Country - this initiative develops advanced computational models to extract medical insights from Electronic Health Records (EHRs), scientific literature, and patient narratives. The project introduces novel methodologies, including biomedical domain-specific language models, clinical argument mining, and temporal event detection to improve risk assessment in mental health, HIV, rare diseases, and cardiovascular conditions. Its main objectives include developing NLP tools, medical ontologies, and predictive models for early disease detection. The anticipated scientific, social, and economic impact includes enhanced clinical decision support, reduced healthcare costs, and improved patient outcomes, positioning EDHERMED as a transformative AI-driven solution in healthcare research.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Scientific Proposal</title>
      <p>The EDHER-MED project is a project funded by the Ministry of Science and Innovation in the 2022
call for R+D+i projects, within the State Program for Research, Development and Innovation Oriented
to the Challenges of Society. EDHER-MED is a coordinated project between the Natural Language
Processing and Information Retrieval (NLP&amp;IR) group at National University of Distance Education
(UNED) and Hizkuntza Teknologiako Euskal Zentroa (HiTZ) group at University of the Basque Country
(UPV-EHU). In this new project, the research is related to a set of use cases aiming to discover new
knowledge on health risks and supporting the early diagnoses of some illnesses and disorders, mainly
by the use of Electronic Health Records (EHRs), and other sources (such as scientific literature, texts
written by subjects patients).</p>
      <p>The early detection (ED) of health risks is a rapidly evolving field in healthcare research, aimed
at identifying the earliest signs or symptoms of diseases before they become severe. Traditional
biomedical informatics has focused on diagnosing patients based on symptoms, biological markers, and
comorbidities. However, with the digitalization of medical records, personalized medicine has advanced,
allowing for a broader analysis of multiple pathologies.</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>The EDHER-MED project leverages state-of-the-art (SOTA) Natural Language Processing (NLP)
techniques to analyze patient medical histories, identifying specific indicators that can alert healthcare
professionals about a patient’s risk of developing a disease or complication. ED is not designed to
replace physicians but to support their clinical decision-making by providing alerts based on risk
indicators. Physicians can choose to investigate or disregard these alerts based on their expertise. When
combined with timely interventions, ED can help mitigate complications, improve treatment eficacy,
and prevent long-term disease progression. This leads to better patient outcomes and a reduced burden
on healthcare systems by decreasing the need for intensive treatments and long-term management.</p>
      <p>Most patients’ health data is stored within Electronic Health Records (EHRs) in natural language
making Natural Language Processing techniques crucial for the acquisition and extraction of relevant
information. This project intends to make progress in the application and development of NLP
techniques aiming to enhance the automatic processing of clinical reports and thereby improve the early
detection of health risks in diferent scenarios. Note that the NLP technology implemented will be
generalizable to all the scenarios faced in the project and by extension to many others contributing to
build more and better systems and help in future use cases. The specific scenarios comprise diseases
with a high social impact, specifically in Mental health, Human immunodeficiency virus, Rare diseases
and Cardiovascular diseases.</p>
      <p>
        The analysis of mental health in the youth population is the use case where both groups will be
involved through two types of actions. On the one hand, the developed tools will detect high risk
suicidal behaviors in children and adolescents by the automatic extraction of information from structured
(questionary answers) and unstructured information (free text) to identify evidence of these situations
according to the established psychiatric ideation-to-action framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. On the other hand, we will
deepen in the early detection of mental health risks in youth population by the stratification of young
people (16-25 years) with mental health problems into subgroups or strata in terms of prognosis or
response to treatment in emotional state, depression and suicide.
      </p>
      <p>The early detection of Human Immunodeficiency Virus (HIV) will also be addressed by exploring
diferent techniques to extract key indicators of HIV status. Both statistical techniques, machine learning
and deep learning techniques will be explored. On one hand, we will analyze and classify as indicators
of HIV diferent symptoms and diseases given by physicians. This classification will be completed
by extracting relations from medical ontologies and graphs, to measure the relevance of indicators,
which will be compared with HIV indicators from clinical text data. We will extract explicit and implicit
indicators that can be combined with analytical laboratory values to early detect HIV.</p>
      <p>
        Research will also be carried out to improve the characterization of rare diseases and their efects on
the mental health and well-being of the child population. This is a little explored aspect in the field of
diseases, which is already a field with great limitations in the available information [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The application
of diferent techniques will be explored, including statistical methods and the most recent transformer
techniques to identify clinical and mental signs relevant to the study, and their relationship with the
disease under consideration, always considering the interpretability of the results.
      </p>
      <p>To finish, the last use case covered in this project corresponds to automatically discovering potential
risk factors associated with cardiovascular complications (for example, ischemic stroke, cardiomyopathy,
or recurrence of Atrial Fibrillation (AF)) after a first episode. To that end we will use patients’ Electronic
Health Records (EHR), progress notes and electrocardiograms (ECG) and confront these new knowledge
to the existing scores and guidelines used in hospitals to ameliorate them. In addition, by applying
explainability algorithms, we aim at finding relevant information to detect AF signs at early stages.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Research Groups</title>
      <p>The coordinated EDHER-MED project consists of two subprojects, each led by a diferent research
group:
• ENIGMA (Early detectioN of hIGh iMpact diseAses through natural language processing). UNED
subproject.
• EDHIA (Early Detection and Health rIsk identification with NLP and Argumentation). UPV-EHU
subproject.</p>
      <p>The project, which is multidisciplinary in nature, has the collaboration of two NLP research groups
and several health-related institutions, with whom the use cases of each subproject will be developed.
Figure 1 shows a scheme of the collaboration among the diferent participants.</p>
      <p>Both groups will contribute with their experience, resources, and knowledge of NLP, and in its
application to the medical field. The UNED team has proven experience in massive data crawling in
both scientific publications and tweets, anonymization and acronym disambiguation on medical texts,
and provides experience working with psychological disorders. The HiTZ team has large experience on
annotation, and development of tools for topic extraction, named entity recognition, and other relation
extraction tasks in medical texts, building multilingual and multimodal Large (LLM) and discriminative
Language Models (LM), clinical narrative section/topic modeling and working in low data regimes.
Both teams have experience with International Classification of Diseases (ICD) classification, Named
Entity Recognition using diferent approximations, negation detection and the adaptation of language
models for information extraction with transformers and deep learning in the clinical domain. The
collaboration is therefore synergistic, as the teams will share their expertise and knowledge of diferent
techniques, and methods to solve the medical domain NLP tasks required in the project.</p>
    </sec>
    <sec id="sec-3">
      <title>3. General Objectives</title>
      <p>The main objectives of the project will be presented below. Figure 2 shows the main elements of the
project and their interactions.</p>
      <sec id="sec-3-1">
        <title>1. Basic NLP resources necessary to achieve the rest of the general objectives.</title>
        <p>• Development and adaptation of data mining tools to identify medical concepts and
relevant relations between them, temporal pattern extraction in patient EHRs, creation and
annotation of diferent corpora and development of medical LLMs.</p>
        <p>• Medical ontology enrichment.
2. Development of NLP/multimodal technology to support early detection of health risks.
• Medical Information Extraction (timeline generation, diagnoses encoded as ICD,
phenotyping, ECG-EHR joint processing).
• Pattern discovery for early detection of risks.</p>
        <p>• Clinical Argument Mining.
3. Use of the tools and technologies developed in the previous objectives for their application to
specific use cases focused on health problems with high social impact.</p>
        <p>• Early detection of mental health risks.
• Early detection of HIV.
• Early detection of mental health and social problems in children afected by rare diseases.
• Potential risk factors identification to prevent cardiovascular complications.</p>
        <p>Given that our research project involves the use of electronic health records (EHRs) and is conducted
in collaboration with various healthcare institutions, adherence to ethical and legal standards
concerning patient data has been a fundamental priority from the outset. In compliance with national and
institutional regulations governing the use of health-related data for research purposes, the project
has been submitted for ethical review and approved by the corresponding ethics committees of each
participating healthcare institution. These evaluations were essential not only to authorize access to
the EHRs but also to ensure that the procedures for data handling align with established norms of
biomedical research ethics. Importantly, the data obtained for the study were subject to anonymization
protocols implemented at the institutional level prior to their provision to the research team. This
approach ensures that all personal identifiers are removed at the source, thereby safeguarding patient
confidentiality. Furthermore, the research team has implemented additional data protection measures
throughout the project lifecycle, including secure data storage, restricted access, and compliance with
applicable data privacy regulations. These combined eforts guarantee that the privacy and integrity
of patient information are rigorously protected, in accordance with both ethical obligations and legal
requirements.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Expected impact</title>
      <p>The extraordinary advances in AI and NLP that have revolutionized information processing technology
need to be adapted and evaluated in the diferent applications and types of data in the medical domain.
In particular we will explore improvements related to the following points:
• Mental health disorders and NLP are mainly joined by two scientific-technical perspectives:
computational methods for the detection of people with mental health risk and the generation of
specific data sources that feed the computational methods. Our main contribution will be the
generation of annotated Spanish corpora and the development of a model for the detection of
young population with risk suicidal behaviors.
• Progress in the knowledge of rare diseases is essential to improve their prognosis. Often the scarce
information available on a Rare Disease (RD) is scattered in diferent databases, with diferent
formats, in many cases unstructured. Moreover, even the information on the same patient is
scattered, since the diagnosis may involve multiple clinical centers, specialities and investigations.
We expect to improve the knowledge about RDs and their relationships with mental health and
social factors, especially in the case of the child population.
• Our main contribution regarding cardiovascular diseases will be to advance in improving existing
scores and clinical guidelines. To do so information derived from EHRs and ECGs will be annotated
and employed. Advancing in the explainability of model’s decisions which is key in medical
decission making.
• With respect to HIV, we expect that the application of the basic NLP tools and the more advanced
tools developed in the project will allow better analysis of the unstructured text of clinical reports
and notes in a way that will improve early diagnosis.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Progress achieved</title>
      <p>Since the start of the project, we have made progress on all objectives.</p>
      <p>
        • NLP resources: We have generated both data collections and fundamental tools for the development
of the project. Among them we have compiled an initial corpus in Spanish for the detection of
suicide attempts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We have also developed tools for the detection of fake news in medicine [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
which allow us to filter information after information retrieval processes. Another important
line for the generation of timelines to analyze the evolution of health problems has been the
development of advanced models for the identification and normalization of dates in medical
texts [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Regarding critical general resources, we have built domain specific medical LMs and
LLM [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
• Advances in early detection of health risks: We have also made progress in the early detection of
risks in diferent areas, such as mental health [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], HIV screening [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], ICD predicition [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and
early detection of cardiovascular diseases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
• Advances in medical argumentation: We have established a methodology and built a bechmark
to evaluate LLM’s generated medical arguments [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. We have also developed tools for
ifnding the argument supporting a correct medical hypothesis [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. And finally we have explored
cross-lingual Transfer and few shot techniques for Argument Mining [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>We have also participated in several related competitions, and our proposals have achieved relevant
rankings [14, 15, 16, 17, 18].</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <p>EDHER-MED, with subprojects ENIGMA (PID2022-136522OB-C21) and EDHIA
(PID2022-136522OBC22) are funded by the Spanish Ministry of Science and Innovation. In addition, this work has been
partially supported by the Spanish Ministry of Science and Innovation within the OBSER-MENH Project
(MCIN/AEI/10.13039 and NextGenerationEU”/PRTR) under Grant TED2021-130398B-C21.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[14] X. Larrayoz, N. Lebeña, A. Casillas, A. Pérez, Eating disorders detection by means of deep learning.,
in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2023), 2023.
[15] J. Martinez-Romo, J. Huesca-Barril, L. Araujo, E. d. L. C. Marin, UNED-UNIOVI at
EmoSPeechIberLEF2024: Emotion identification in spanish by combining multimodal textual analysis and
machine learning methods, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF
2024), 2024.
[16] X. Larrayoz, A. Casillas, M. Oronoz, A. Pérez, Mental disorder detection in spanish: hands on
skewed class distribution to leverage training, in: IberLEF (Working Notes). CEUR Workshop
Proceedings, 2024.
[17] M. Sierra-Callau, M. Á. Rodríguez-García, S. Montalvo-Herranz, R. Martínez-Unanue, UNED_MRES
Team at MentalRiskES2024: Exploring hybrid approaches to detect mental disorder risks in social
media, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), 2024.
[18] H. Fabregat, D. Deniz, A. Duque, L. Araujo, J. Martinez-Romo, NLP-UNED at eRisk 2024:
approximate nearest neighbors with encoding refinement for early detecting signs of anorexia, in:
Working Notes of CLEF, 2024, pp. 9–12.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. T.</given-names>
            <surname>Bayliss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Christensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamont-Mills</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            du
            <surname>Plessis</surname>
          </string-name>
          ,
          <article-title>Suicide capability within the ideationto-action framework: A systematic scoping review</article-title>
          ,
          <source>PloS one 17</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .1371/journal. pone.
          <volume>0276070</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T.</given-names>
            <surname>Leonardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roychoudhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Trifillis</surname>
          </string-name>
          ,
          <article-title>Natural history and real-world data in rare diseases: applications, limitations, and future perspectives</article-title>
          ,
          <source>The Journal of Clinical Pharmacology</source>
          <volume>62</volume>
          (
          <year>2022</year>
          )
          <fpage>S38</fpage>
          -
          <lpage>S55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Fernandez-Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          ,
          <article-title>Generation of social network user profiles and their relationship with suicidal behaviour</article-title>
          ,
          <source>Procesamiento del lenguaje natural 72</source>
          (
          <year>2024</year>
          )
          <fpage>87</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Martinez-Rico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          ,
          <article-title>Building a framework for fake news detection in the health domain</article-title>
          ,
          <source>Plos one 19</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .1371/journal.pone.
          <volume>0305362</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sánchez de Castro</surname>
          </string-name>
          , L. Araujo,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          ,
          <article-title>Generative LLMs for multilingual temporal expression normalization</article-title>
          ,
          <source>in: ECAI</source>
          <year>2024</year>
          , IOS Press,
          <year>2024</year>
          , pp.
          <fpage>3789</fpage>
          -
          <lpage>3796</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>García-Ferrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Atutxa</given-names>
            <surname>Salazar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cabrio</surname>
          </string-name>
          , I. de la Iglesia,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Molinet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ramirez-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rigau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Villa-Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Villata</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Zaninello,</surname>
          </string-name>
          <article-title>MedMT5: An open-source multilingual text-to-text LLM for the medical domain</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proceedings of the 2024 Joint International Conference on Computational Linguistics</source>
          ,
          <article-title>Language Resources and Evaluation (LREC-COLING 2024), ELRA</article-title>
          and
          <string-name>
            <given-names>ICCL</given-names>
            ,
            <surname>Torino</surname>
          </string-name>
          , Italia,
          <year>2024</year>
          , pp.
          <fpage>11165</fpage>
          -
          <lpage>11177</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .lrec-main.
          <volume>974</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Morales-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Montalvo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Riaño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martínez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Velasco</surname>
          </string-name>
          ,
          <article-title>Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>179</volume>
          (
          <year>2024</year>
          )
          <fpage>108830</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Lebeña</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pérez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Casillas</surname>
          </string-name>
          ,
          <article-title>Quantifying decision support level of explainable automatic classification of diagnoses in spanish medical records</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>182</volume>
          (
          <year>2024</year>
          )
          <fpage>109127</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Olea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Domingo-Aldama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. M.</given-names>
            <surname>Prado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Galletebeitia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Salazar</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Rada</surname>
            ,
            <given-names>I. G.</given-names>
          </string-name>
          <string-name>
            <surname>Díaz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Costa</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Cano</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Díaz</surname>
          </string-name>
          , et al.,
          <article-title>Rendimiento de las expresiones regulares en el análisis de informes de alta presentes en la historia clínica electrónica exprimiendo los datos secundarios</article-title>
          ,
          <source>Revista Española de Cardiología</source>
          <volume>77</volume>
          (
          <year>2024</year>
          )
          <fpage>33</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>I. De la Iglesia</surname>
            , I. Goenaga,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ramirez-Romero</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <string-name>
            <surname>Villa-Gonzalez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Goikoetxea</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Barrena</surname>
          </string-name>
          ,
          <article-title>Ranking over scoring: Towards reliable and robust automated evaluation of LLM-Generated medical explanatory arguments</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Computational Linguistics</source>
          ,
          <year>2025</year>
          , pp.
          <fpage>9456</fpage>
          -
          <lpage>9471</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>I.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oronoz</surname>
          </string-name>
          , R. Agerri,
          <article-title>MedExpQA: Multilingual benchmarking of large language models for medical question answering</article-title>
          ,
          <source>Artificial intelligence in medicine 155</source>
          (
          <year>2024</year>
          )
          <fpage>102938</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goenaga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Atutxa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gojenola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oronoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          ,
          <article-title>Explanatory argument extraction of correct answers in resident medical exams</article-title>
          ,
          <source>Artificial Intelligence in Medicine</source>
          <volume>157</volume>
          (
          <year>2024</year>
          )
          <fpage>102985</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Yeginbergen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oronoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          ,
          <article-title>Argument mining in data scarce settings: Cross-lingual transfer and few-shot techniques</article-title>
          ,
          <source>arXiv preprint arXiv:2407.03748</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>