Overview of MultiCardioNER Task at BioASQ 2024 on Medical Specialty and Language Adaptation of Clinical NER Systems for Spanish, English and Italian

Overview of MultiCardioNER Task at BioASQ 2024 on Medical Specialty and Language Adaptation of Clinical NER Systems for Spanish, English and Italian SalvadorLima-López salvador.limalopez@bsc.es Barcelona Supercomputing Center

Plaça Eusebi Güell, 1-3 08034 Barcelona Spain

EulàliaFarré-Maduell Barcelona Supercomputing Center

Plaça Eusebi Güell, 1-3 08034 Barcelona Spain

JanRodríguez-Miret Barcelona Supercomputing Center

Plaça Eusebi Güell, 1-3 08034 Barcelona Spain

MiguelRodríguez-Ortega Barcelona Supercomputing Center

Plaça Eusebi Güell, 1-3 08034 Barcelona Spain

LiviaLilli livia.lilli@policlinicogemelli.it Real World Data Facility Gemelli Generator Fondazione Policlinico Universitario Agostino Gemelli IRCCS

00168 Rome Italy

Catholic University of the Sacred Heart

00168 Rome Italy

JacopoLenkowicz jacopo.lenkowicz@policlinicogemelli.it Real World Data Facility Gemelli Generator Fondazione Policlinico Universitario Agostino Gemelli IRCCS

00168 Rome Italy

GiovannaCeroni g.ceroni@ucl.ac.uk University College London

University College London Hospitals NHS Foundation Trust

JonathanKossoff j.kossoff@nhs.net University College London Hospitals NHS Foundation Trust

AnoopShah a.shah@ucl.ac.uk University College London

University College London Hospitals NHS Foundation Trust

AnastasiosNentidis National Center for Scientific Research "Demokritos"

Athens Greece

Aristotle University of Thessaloniki

Thessaloniki Greece

AnastasiaKrithara akrithara@iit.demokritos.gr University College London Hospitals NHS Foundation Trust

GeorgiosKatsimpras gkatsibras@iit.demokritos.gr University College London Hospitals NHS Foundation Trust

GeorgiosPaliouras paliourg@iit.demokritos.gr University College London Hospitals NHS Foundation Trust

MartinKrallinger Barcelona Supercomputing Center

Plaça Eusebi Güell, 1-3 08034 Barcelona Spain

Overview of MultiCardioNER Task at BioASQ 2024 on Medical Specialty and Language Adaptation of Clinical NER Systems for Spanish, English and Italian 1613-0073 70399BB2F6D82B17DFFFAB69C4DF29EC GROBID - A machine learning software for extracting information from scholarly documents named entity recognition, cardiology, subdomain adaptation, multilingual, clinical NLP (M. Krallinger) 0000-0002-7384-1877 (S. Lima-López) 0009-0000-0793-981X (J. Rodríguez-Miret) 0009-0000-0188-079X (M. Rodríguez-Ortega) 0009-0005-3319-7211 (L. Lilli) 0000-0002-8366-1474 (J. Lenkowicz) 0000-0002-8907-5724 (A. Shah) 0000-0002-3782-4412 (A. Nentidis) 0000-0003-0491-4507 (A. Krithara) 0000-0003-3697-941X (G. Katsimpras) 0000-0001-9629-2367 (G. Paliouras) 0000-0002-2646-8782 (M. Krallinger)

Transformers and large language models (LLMs) are increasingly used for clinical data analysis, mostly in English, but also in many other languages used within medical care systems. To comply with clinical standards, it is critical to evaluate the generated results by means of benchmarking efforts based on high-quality manually annotated corpora. To foster the adaptation of general clinical natural language processing (NLP) components to the characteristics of medical specialties, as well as exploring cross-language adaptation techniques, we propose the MultiCardioNER task at BioASQ 2024. MultiCardioNER focuses on the adaptation of named entity recognition (NER) systems trained on multispecialty clinical case reports to cardiology, since cardiovascular diseases are the leading cause of death globally. The MultiCardioNER task covered two entity types (diseases and medications) in case reports written in three languages (Spanish, English and Italian). To generate a comparable Gold Standard clinical NER corpus, we used neural translation, annotation projection and manual annotation correction by domain experts. Top scoring teams reached very competitive results for disease (F1-score 0.8199) and medication mentions (0.9277) in Spanish and also obtained very competitive scores for English (F1-score 0.9223) and Italian (F1-score 0.8842). These results suggest that adaptation of general clinical NLP components to a specific clinical specialty can improve the overall results and that cross-language adaptation of clinical NLP components using neural translation and expert-in-the-loop annotation might speed up the implementation of clinical entity extraction systems. The MultiCardioNER corpora, as well as a silver standard made up of predictions of participating systems over the background set, are available at: https://zenodo.org/records/11368861.

Introduction

Cardiovascular diseases (CVDs) represent a leading cause of death and morbidity worldwide and, therefore, are responsible for considerable disability costs every year. Cardiology is a medical field with its own concepts and expressions of great relevance, as cardiovascular diseases are highly prevalent worldwide. Analysis of unstructured medical data, such as clinical notes or medical publications, may provide an opportunity to improve the characterization of cardiac pathologies. The extraction of clinical variables from medical content is key to enabling healthcare data analytics, improving patient care and advancing precision medicine. Information contained only as free text within Electronic Medical Records is currently mostly unused due to the difficulty of extracting relevant data from very diversely written data sources. Additionally, the distinctive language of each medical specialty calls for more specialized automatic semantic annotation resources in English and any language used in clinical services.

Due to its importance and the need to improve the extraction, use, and ultimately exploitation of patient data suffering from cardiovascular conditions, efforts have been made to implement natural language processing (NLP) solutions to classify or extract key variables from cardiology clinical content. Using results generated by NLP technologies might contribute to improving outcomes and understanding disease in cardiology. In order to account for the diversity and heterogeneity of NLP research applied to cardiology, several review articles have tried to systematically characterize the various NLP application scenarios adapted to handle cardiovascular disease medical documents [1]. These included applications related to heart failure [2], coronary artery disease, general cardiology or valvular heart disease [3,4]. Efforts were also made to extract, by means of NLP tools, symptoms [5,6], vital signs [7], heart function measurements [8], risk factors [4], or cardiovascular comorbidities [9] as well as social risk factors [10] or diagnostic codes of common cardiovascular diseases [11]. Some other attempts were also made to explore the use of NLP approaches to extract Framingham criteria [12] or New York Heart Association classifications from unstructured clinical notes [13,14,15].

General clinical domain pre-training does not necessarily transfer well to all medical sub-specialties or disciplines because of the use of highly specialized medical language, as encountered in cardiology clinical case reports or cardiology clinical notes. Domain adaptation strategies may have a great potential to improve NLP solutions for practical settings, real-world scenarios and industrial applications [16]. Also, adaptation of clinical NLP solutions across languages other than English is necessary and requires collaboration between researchers to accelerate progress in non-English clinical NLP [17].

In the case of Spanish, general clinical NLP datasets and resources, such as the DisTEMIST, SympTEMIST, PharmaCoNER, and MedProcNER corpora and systems, have been released. However, (a) the interplay and complementarity of multi-label entity extraction approaches were neither targeted nor evaluated, and (b) how such approaches could be adapted to handle multiple languages was not tested.

To address these issues and promote the development of comparable clinical NLP components adapted to a specific clinical domain across several languages, we have organized the MultiCardioNER shared task. This paper presents an overview of the data, methodologies and results of MultiCardioNER. It is structured as follows: Section 2 introduces the shared task, including its sub-tasks and evaluation methods. Next, Section 3 describes the different corpora used as part of MultiCardioNER, namely DisTEMIST, DrugTEMIST and CardioCCC, as well as other associated resources, while Section 4 presents the participation results and proposed methodologies. Finally, Section 5 concludes the paper with a discussion of some of the most interesting aspects, learned lessons, future work and more.

Task description 2.1. Shared task description

The MultiCardioNER task participants were asked to implement named entity recognition (NER) systems using a general clinical corpus annotated with disease and medication mentions. They were then required to adapt these NER systems to a particular medical specialty, namely cardiology. In addition, the MultiCardioNER task also explored the creation of clinical multilingual NER components or cross-language adaptation of these systems for three languages: Spanish, English and Italian.

The MultiCardioNER task relied on a previous resource exploited in a past shared task, called DisTEMIST [18]. The DisTEMIST corpus is a collection of 1,000 clinical case reports covering a wide range of specialties annotated for diseases by clinical experts. Furthermore, a previously unreleased corpus called DrugTEMIST was published as part of the task. The DrugTEMIST corpus provides drug or medication mention annotations for the same collection of clinical case reports as used for the DisTEMIST dataset.

For the adaptation to cardiology, we have constructed the CardioCCC corpus, a new dataset that consists of manually selected cardiology clinical case reports showing similar characteristics as cardiology discharge summaries. This resource was provided to participants to enable the exploration of different clinical subdomain/specialty adaptation strategies and to benchmark the resulting systems. To foster the generation of multilingual clinical NER corpora, DrugTEMIST and CardioCCC were automatically translated from Spanish into both English and Italian. The Gold Standard drug mention annotations were then mapped into both target languages and validated manually by clinical experts (native speakers of English and Italian). The three corpora and the underlying annotation projection process are described in more detail in Section 3.

The evaluation process relied on the comparison of participating team predictions against the manual annotations previously done by the clinical experts. Each team was allowed to submit up to 5 runs for each subtrack and language. The evaluation process and metrics are reported in Section 2.3.

Subtracks

MultiCardioNER was structured into two different subtracks:

• Subtrack 1 (CardioDis). This track focuses on the adaptation of disease recognition systems to the cardiology specialty in Spanish. Participants could use the DisTEMIST corpus [18] as a base training set, together with a new collection of cardiology-specific clinical case reports annotated with diseases (CardioCCC) that could be used to fine-tune or adapt their systems to cardiology case reports. • Subtrack 2 (MultiDrug). This subtrack focuses on the multilingual or cross-language adaptation (Spanish, English and Italian) of medication recognition systems, specifically for cardiology clinical case reports. For this track, participants could use the DrugTEMIST dataset as NER training resource. This corpus can be seen as a complementary dataset to the previously-released DisTEMIST, ProcTEMIST and SympTEMIST corpora, as it incorporates annotations of medications for the same document collection. To enable adaptation to cardiology, the CardioCCC corpus was annotated with medication mentions and divided into development and test subsets. While the original versions of both datasets were created using Spanish texts, a machine-translated version in English and Italian was revised by hand and annotated by clinical experts.

Evaluation

The task was divided into distinct phases: training and test set prediction (evaluation). During the training phase, participants were provided with the DisTEMIST and DrugTEMIST datasets, as well as a subset of the CardioCCC corpus made up of 258 documents. The second batch of the CardioCCC collection was used as test set and released together with a larger background set to make sure that no manual post-editing was carried out by the teams and that the submitted systems could scale up to process larger data collections. These collections (test and background set) were released approximately one month after the start of the training phase. Participants were given two weeks to generate predictions for all documents. They were then evaluated using the Gold Standard annotations of the CardioCCC test set, reserving the predictions for the background set to create a participants' Silver Standard (discussed in Section 3.4). It was not mandatory to submit results for all three languages. Both MultiCardioNER subtracks were evaluated using micro-averaged precision, recall and F1-score. These metrics are calculated as follows: As part of the task, an official MultiCardioNER evaluation library was released and is available on GitHub1 . After the task results were released, the test set Gold Standard annotations were shared with participating teams to enable them to perform extra experiments and facilitate error analysis of their systems.

Baseline

To provide a baseline system for comparison, we used a simple vocabulary transfer approach that relied on generating a gazetteer of entities from the training sets (DisTEMIST/DrugTEMIST corpora), and carrying out dictionary look-up of these terms in the test set. Specifically, the system is a lexical lookup approach that tries to find the annotated strings in both corpora within the cardiology test set. The baseline results are shown in Table 1.

Corpus and resources

The MultiCardioNER task leverages an already-existing corpus, DisTEMIST, as well as two new releases, DrugTEMIST and CardioCCC. DisTEMIST and DrugTEMIST share the same document collection, which consists of clinical case reports from various clinical specialties such as oncology, infectious diseases, urology and psychiatry. This collection of texts has also been used for the procedures corpus MedProcNER/ProcTEMIST [19] and the signs and symptoms corpus SympTEMIST [20]. These corpora could be considered complementary since they have been annotated by the same clinical experts using the same methodology, which includes the creation of dedicated annotation guidelines. They were released as part of previous shared tasks in an effort to promote the development and accessibility of annotated resources for clinical information extraction in Spanish validated by clinical experts. Other resources resulting from this initiative include PharmaCoNER [21], LivingNER [22], MEDDOPROF [23] or MEDDOPLACE [24].

The CardioCCC corpus consists of cardiology-specific clinical case reports. It includes annotations for diseases and drugs created using the same guidelines as DisTEMIST and DrugTEMIST. Although all three corpora were created originally in Spanish, the texts and annotations related to drugs were translated into English and Italian and released for this task. Table 2 provides some statistics for the different datasets that make up MultiCardioNER, which are explained in detail in this section. All datasets described in this section are openly available on Zenodo2 .

Table 2

Statistics for the datasets provided for MultiCardioNER. "Annot." stands for "annotations", while "Chars" stands for "characters". Unique annotations refer to the number of distinct annotated strings after converting all annotations to lowercase. The number of tokens has been calculated using the following spaCy models: "es_core_news_sm", "en_core_web_sm" and "it_core_news_sm".

Dataset

Lang

DisTEMIST

DisTEMIST is a Gold Standard manually annotated corpus of disease mentions in Spanish clinical case documents normalized or mapped to SNOMED CT concept identifiers. It consists of 1,000 clinical case reports written in Spanish from miscellaneous medical specialties. Figure 1 shows an example of an annotated document.

The texts in the corpus were derived from SciELO (Scientific Electronic Library Online) 3 , an electronic library that contains publications from scientific journals. The texts were manually selected by clinical experts so that their structure and content were clinically relevant and representative. The texts were then pre-processed to extract the appropriate sections of the clinical cases and to remove embedded figure references and citations to be as close as possible to real medical records. These texts were originally released under the name SpaCCC (Spanish Clinical Case Corpus). As shown in Table 2, the text collection includes a total of 406,137 tokens and 2,335,968 characters. In terms of annotations, the corpus includes a total of 10,664 entities, out of which 6,739 are unique after converting them into lowercase.

The DisTEMIST corpus was annotated and standardized by two clinical experts from a Spanish tertiary hospital. The annotated mentions and their normalization were post-processed and revised afterwards by a third physician. The annotations were created using the brat tool [25]. Annotation and normalization guidelines were created specifically for this task. The annotation involved discussions between physicians, particularly regarding complex mentions. This, together with multiple rounds of inter-annotator agreement (IAA) through parallel annotation of a section of the corpus (around 20%), resulted in an iterative refinement of the guidelines. After several rounds, a total IAA score of 82.3 (computed as the pairwise agreement between two independent annotators) for the disease mentions was achieved. The result of this process is the DisTEMIST guidelines, openly available on Zenodo 4 . The document contains a total of 28 pages describing how to annotate diseases in clinical texts. There are a total of 52 rules divided into various types, such as general, positive or negative. There is also a set of rules specific to oncology mentions that was added as the language used in clinical cases related to this specialty proved to be more specific and harder to annotate. These are partially based on the CANTEMIST corpus [26]. The guidelines also include a discussion of the task's importance, a corpus characterization, basic information about the task and the annotation process, as well as indications and resources for the annotators. It is noteworthy that the DisTEMIST guidelines have been adapted to other domains, such as social media [27].

The DisTEMIST text documents are in plain text format with UTF-8 encoding. The annotations are presented in two different stand-off versions. The first version includes the original annotation files as outputted by brat [25]. These are .ann files, one for each text file, where each line represents an annotation, including its label, its start and end position and its associated text. The second version is a single tab-separated file (.tsv) which includes all annotations in the corpus. Similarly to the .ann files, this version includes one annotation per row with an additional field for the corresponding filename.

For MultiCardioNER, all 1,000 documents in the corpus are presented together. For anyone who wishes to use the original train/test split of the corpus (consisting of 750 and 250 documents, respectively), we advise downloading the original DisTEMIST Gold Standard5 to retrieve the list of filenames belonging to each split. The original repository also includes the SNOMED CT mappings for the annotated mentions, as well as some additional data, such as a background set of related clinical documents and a Silver Standard of the corpus in 6 languages (English, Portuguese, Catalan, Italian, French and Romanian), created using annotation projection. The annotation projection methodology is described in Section 3.2, as well as in the original paper [18].

DrugTEMIST

DrugTEMIST is a collection of 1,000 clinical case reports from various clinical specialties annotated with mentions of medications. Figure 2 shows an excerpt of an annotated document from the corpus.

The corpus uses the same collection of texts as DisTEMIST, which is also shared by MedProcN-ER/ProcTEMIST [19] and SympTEMIST [20]. Unlike those corpora, DrugTEMIST hadn't been previously released and is one of the novelties of the MultiCardioNER task. Again, the corpus includes a total of 406,137 tokens and 2,335,968 characters, as well as 2,778 annotated entities (925 unique after converting them to lowercase).

Similarly to the DisTEMIST corpus, dedicated annotation guidelines were written to define what should be considered a medication and how to perform the annotations. These guidelines were created and refined using the same methodology used for DisTEMIST, including thorough discussions between physicians and the annotation of a sample of the corpus (around 20%). The final IAA of the corpus is 0.955. The DrugTEMIST annotation guidelines are also available in Zenodo 6 . They contain 17 pages and are quite similar to the DisTEMIST guidelines, with a total of 29 rules. The release format of the corpus is the same as that of DisTEMIST. The original Gold Standard of the corpus was created in Spanish. For the multilingual part of the task, we created versions of the corpus in English and Italian using annotation projection techniques. These two languages were chosen due to their relevance for other related projects and the availability of clinical experts fluent in each language, who performed a manual revision of all documents to validate the annotation and the quality of the translation. Specifically, the annotation projection methodology consisted of the following steps:

1. An automatic translation of the Spanish documents was carried out (for the previous DisTEMIST task) using high-quality commercial machine translation systems. In a separate step, the Gold Standard annotations were translated without context (i.e. as a plain list of strings). 2. The translated annotations were next transferred into each document using a look-up system.

For each document, only the annotations that existed in the original Gold Standard were looked up to prevent introducing false positives. The result of this step is an automatically annotated version of the corpus in each language, which could be considered a Silver Standard. 3. In order for the corpus to be used as a Gold Standard, a manual revision was performed. Experts compared the original Spanish version of the documents with the version in English and Italian using brat's side-by-side comparison mode. They were tasked with correcting existing and adding new mentions if necessary to make the annotation as close as possible to the original. Additionally, these experts were asked to provide alternative translations to annotated entities that were incorrectly translated. 4. A post-processing step incorporated the alternative translations suggested by the annotators.

These translations replaced the original annotated entity both in the text and in the annotation files.

Statistics about the English and Italian versions of DrugTEMIST are also provided in Table 2. We should underscore that the different versions of the corpus do not contain the exact same number of annotations. This is mostly due to translation differences and errors introduced by the machine translation system.

CardioCCC

CardioCCC (which stands for Cardiology Clinical Case Corpus) is a collection of 508 cardiology clinical case reports. The documents were retrieved from open-access cardiology journals in Spanish. Within these journals, we tried to manually locate clinical case reports that would have a similar structure to real clinical health records. The candidates were then extracted and, in a similar fashion to the other two corpora presented so far, pre-processed to keep only the relevant article sections and to remove references to figures and tables. The cases were then revised by a clinical expert to confirm their validity. Figure 3 shows a parallel example of the drug annotations in Spanish, English and Italian.

Table 3

Statistics for the two splits of the CardioCCC corpus. CardioCCC_dev refers to the first batch of the corpus, which participants were allowed to use freely during the training phase. CardioCCC_test refers to the held-out test set used for evaluation. "Annot." stands for "annotations", while "Chars" stands for "characters". Unique annotations refer to the number of distinct annotated strings after converting all annotations to lowercase. The number of tokens has been calculated using the following spaCy models: "es_core_news_sm", "en_core_web_sm" and "it_core_news_sm".

Dataset

Lang. Entity Docs Tokens Chars Annot. Unique Annot. The corpus contains annotations for diseases and drugs, which were created following the same guidelines used for DisTEMIST and DrugTEMIST. The main annotator for CardioCCC was the same clinical expert who did the final annotation and revision step for the other two corpora, which was a big asset in accelerating the corpus annotation process. As with DrugTEMIST, the corpus's texts were translated from Spanish into English and Italian using machine translation. The Gold Standard drug annotations were also transferred into English and Italian via annotation projection and revised by clinical experts who are native speakers of each language.

Mean

As explained in Section 2.3, CardioCCC was released in two batches: one for training/development and another for evaluation. The statistics for these two parts are presented in Table 3, while Table 2 presents the statistics of the complete corpus. In terms of content, as shown by Table 2, the corpus contains 568,297 tokens and 3,215,774 characters. Despite having about half the documents as the SpaCCC corpus (i.e. the texts in DisTEMIST/DrugTEMIST), CardioCCC contains over 150,000 more tokens and one million more characters, meaning the documents are quite longer. This is also reflected in the number of annotations, with CardioCCC having around 8,000 more annotated diseases and 1,500 more drugs. Notably, despite the higher total number of annotations, CardioCCC contains fewer unique drug mentions (which is calculated by converting all annotations to lowercase). This might be due to the fact that in CardioCCC, drug mentions are usually more limited to cardiology-specific medications, while in DrugTEMIST, there is a wider variety of medications mentioned due to the varied clinical specialties it contains. As for the length of annotations themselves, all corpora seem to have a similar distribution in terms of character and token length. The high standard deviation with respect to the mean, especially for diseases, indicates that there's a number of long annotations in the datasets.

Background set

In addition to the three annotated corpora, an additional dataset was released as a background set. This dataset contains 7,625 text documents, both from the cardiology subdomain and other clinical specialties. While most documents were originally written in Spanish, some of them were also originally in English and Italian. All documents were translated to the other languages to have a comparable background set in all three languages. Together with the background set, we release a tab-separated values (.tsv) file that specifies the original language of each document and whether they belong to the cardiology domain or not.

As part of the task's evaluation period, participants were asked to create predictions for diseases and drugs using their systems. Their predictions were then used to create a Silver Standard, which we release in three different versions:

1. All mentions are kept, with the label name reflecting the team and run the prediction belongs to.

This version inevitably includes many incorrect and redundant annotations. 2. Only predictions that have some overlap with the predictions of a different run are used. The overlapping annotations are then merged under a single annotation and a new label name. This version should have a reduced number of incorrect annotations, although some of the "correct" annotations might have extension problems, such as being too short or too long. 3. Only predictions that have a complete overlap with another prediction of a different run are used. This should, in theory, contain the highest number of correct annotations.

Table 4 shows some basic statistics about the text documents included within the Silver Standard. This new dataset can have multiple uses, such as bootstrapping manual annotations, system training using semi-supervised learning techniques or errors and data analysis, amongst others. 4. Results

Participation overview

A total of 31 teams registered for the MultiCardioNER task, out of which 7 teams submitted at least one run of their predictions. The participating teams originate from 8 different countries (some include collaborations between teams from different countries), and except for one group from the industry, the rest belong to academia. Table 5 shows the complete list of participating teams, along with their affiliation and the reference to their task paper.

As for the participation in each subtrack, 6 teams participated in the CardioDis subtrack, while 5 teams participated in the MultiDrug subtrack (with one of those teams participating only in the Spanish part). Overall, a total of 70 runs were submitted, with each team allowed up to 5 runs per subtrack and language: 20 for the CardioDis subtrack, 18 for the Spanish MultiDrug, 16 for the English MultiDrug and 16 for the Italian MultiDrug.

System results

All in all, the top scores for each subtrack were:

• Subtrack CardioDis. The team BIT.UA attained the top position with an ensemble of RoBERTabased models (roberta-es-clinical-trials-ner) that also uses a multi-head-CRF approach [35]. Their runs integrated the provided datasets in different ways, with the highest scores achieved by the models that use both the DisTEMIST and CardioCCC data. Their best run achieved an F1-score of 0.8199 and a recall of 0.8243. The team with the next best F1-score (0.8049) is Enigma, which uses a CLIN-X-ES model also fine-tuned on the DisTEMIST and CardioCCC data. Interestingly, the team PICUSLab achieves the best precision (0.8886) by a wide margin by combining the predictions of multiple models trained on different parts of the data (including an augmented version of the CardioCCC corpus) and then using string matching techniques to enhance the final predictions. • Subtrack MultiDrug. In Spanish, the best F1-score is achieved by the ICUE team (0.9277), who also achieved the best recall (0.9412). Meanwhile, in English and Italian, the winning team is Enigma, with an F1-score of 0.9223 and 0.8842, respectively.

The results for the CardioDis subtrack are shown in Table 6, while the results for the MultiDrug subtrack are presented in Table 7 for Spanish, Table 8 for English and Table 9 for Italian.

Methodologies

This section describes the methodologies used by each team, which are also summarized in Table 10. • Team BIT.UA.

For subtrack CardioDis, this team builds on some of their previous work, namely the Multi-Head-CRF approach [35], which introduces a Multi-Head Conditional Random Field (CRF) classifier on top of a multi-class NER system. Starting from the "roberta-es-clinical-trials-ner" pre-trained model7 , they present 5 runs of ensembled models, with some runs consisting on models fine-tuned only with the DisTEMIST dataset and others with DisTEMIST plus CardioCCC. Their best run is an ensemble of 17 systems trained on both corpora, which achieves the highest F1-score of the subtrack (0.8199). • Team Data Science TUW.

This team uses four main strategies throughout their experiments for both subtracks: pre-training via MLM (Masked Language Modelling), data augmentation, sliding windows with overlap and additional pre-training on general diseases and drugs using other corpora. The pre-trained models they use include the multilingual mDeBERTa [36,37], the Spanish "roberta-es-clinical-trials-ner",

Table 10

General overview of the approaches presented by participants for the MultiCardioNER task. "*TEMIST corpora" refers to the joint version of the DisTEMIST, SympTEMIST, ProcTEMIST and DrugTEMIST corpora. the English "biobert_chemical_ner"8 and the Italian BioBIT [38].

Team

An important note about this team's results is that they had some problems with their submission that caused the overall low results. This is addressed in their system description, in which they re-evaluate their models with much better results, comparable to some of the task's best. • Team Enigma.

For subtrack CardioDis, team Enigma fine-tuned a CLIN-X-ES model [39] on the DisTEMIST and CardioCCC corpora for a different number of epochs. One of their runs further pre-trains the model using Spanish Wikipedia pages and datasets from different challenges, achieving them a spot in the subtrack's top three F1-scores (0.8049). For subtrack MultiDrug, the team uses a combination of different models, including a multilingual XLM-RoBERTa [40] and language-specific models such as a Spanish RoBERTa [41] (which they also use for Italian) and BioLinkBERT for English [42]. Their first run, which uses the multilingual XLM-RoBERTa, pre-trains the model on a custom multi-lingual dataset (including biomedical challenge data, European drug description data, Wikipedia) and then fine-tuned for token classification on all data for all languages. For Italian, this approach achieves them the highest F1-score of the Italian part of the subtrack (0.8842).

Their second run uses the same system but adds a classifier before it, which determines if there are any drugs in the sentence. For Spanish and English, their best run is the third one, which uses a language-specific model. This is not the case, however, for their third Italian run, which uses a Spanish model pre-trained on Italian data. Another interesting contribution by this team is the combination of neural systems and drug dictionaries obtained from resources such as DrugBank, ATC, DrugCentral or the NIHS. The two runs that use this approach achieve very good results, although not as good as their other ones.

• Team ICUE.

For the MultiDrug subtrack, this team compares the effectiveness of multilingual and monolingual BERT models. They also experiment with the inclusion of post-processing rules (specifically for composite drug mentions in Spanish), as well as with using Large Language Models (LLMs) such as GPT-3.5 [43] to translate predictions in Spanish to the other two languages. Their methodology achieves very good results, especially when they use monolingual models. In Spanish, they achieve the best F1-score (0.9277). It is noteworthy that some of their runs in the results table are repeated since they presented the same system with changes only for some languages.

Team ICUE also includes some additional experiments in their system description paper, such as using GPT-3.5 and LLaMA [44] for entity recognition with competitive results. • Team NOVALINCS.

For CardioDis, this team fine-tunes the "bsc-bio-ehr-es" pre-trained RoBERTa9 using the Dis-TEMIST corpus. They prepared two runs: one in which they only use the DisTEMIST annotations and another in which they also incorporate the other 3 entities from the complementary corpora (that is, procedures from MedProcNER/ProcTEMIST, symptoms from SympTEMIST and medications from DrugTEMIST). For MultiDrug, they only participated in the Spanish part using the same methodology, exchanging DisTEMIST with DrugTEMIST. Their overall results for both tasks are remarkable for their high precision and low recall, which may indicate the difficulty of the systems to adapt to the cardiology subdomain using only the general clinical domain data. • Team PICUSLab.

For the CardioDis subtrack, this team employs an ensemble transfer learning strategy. They train different models on DisTEMIST, CardioCCC and an augmented version of CardioCCC (created with the help of sentence similarity techniques and a gazetteer), and then fuse the predictions of the different models. To further improve their predictions, they use string matching to postprocess them. Their best run earns them a spot in the subtrack's top 5 with an F1-score of 0.791. • Team Siemens.

This team participated in both CardioDis and MultiDrug with the same methodology. They use general domain BERT models ("bert-spanish-cased-finetuned-ner" 10 , "bert-base-NER" 11 and "bert-italian-finetuned-ner" 12 ) and fine-tune them for multi-label token classification using the different MultiCardioNER datasets. Despite not using clinical models, their results are quite good, especially for the MultiDrug subtrack (e.g. 0.8789 F1-score in the Italian part). In their overview paper, they also perform additional experiments that were not evaluated during the task's evaluation phase.

Discussion

Comparison with previous tasks.

MultiCardioNER is a novel task built upon the foundation of previous tasks and resources. In recent years, tasks such as DisTEMIST [18], PharmaCoNER [21] or MedProcNER [19] have provided the Spanish NLP community with a variety of corpora for the recognition (and normalization) of named entities in clinical texts. These corpora have progressively become reference corpora used to benchmark and model pre-training efforts [39,45,46,47,48,49]. MultiCardioNER is different from these previous tasks in that it uses data from a single clinical specialty, rather than a general medical dataset. The CardioCCC corpus could become a reference for cardiology and subdomain adaptation in clinical NLP in Spanish. The corpus is expected to expand with the addition of case reports, more entity types (such as procedures and symptoms), and more languages. Subdomain adaptation is a major goal of MultiCardioNER. The task's results indicate the importance of using subdomain data to build systems with specific application fields. All top-performing systems incorporate the released 258 documents from the CardioCCC corpus. In contrast, participants that only use the DisTEMIST and DrugTEMIST corpora (consisting of clinical case reports from various specialties) achieve high precision but fail to recall, thus obtaining a comparatively lower F1-score. This suggests that, while these systems are able to retrieve many clinical entities correctly (i.e. high precision), they fail to recover concepts specific to the cardiology subdomain (i.e. low recall). Furthermore, comparing the results of the DisTEMIST shared task [18] with the CardioDis subtrack, the overall results are somewhat better in the latter task: DisTEMIST's winning team obtained an F1-score of 0.77, while the winning team of MultiCardioNER obtained an F1 of 0.81. This might point to the importance of using specialty-specific data, even within very similar clinical domains.

We should underline that compared with DisTEMIST, this task offers a higher volume of training data. While there seems to be a positive correlation with the use of subdomain-specific data, it remains a question whether these improvements can actually be attributed to subdomain adaptation, to differences in each of the tasks' test sets, or to simply having more data.

Similarity between the general domain and the cardiology corpora.

Given the task's focus on subdomain adaptation, and in order to further characterise the cardiology and SpaCCC datasets (i.e. the DisTEMIST and DrugTEMIST texts) of the shared task, a comparison analysis was conducted between these clinical case reports and documents belonging to other medical disciplines. These documents consist of a collection of clinical cases categorised into 22 different specialities with varying text structures and content, including oncology, COVID-specific reports, primary health care, neurology, etc. The data for the other specialties was extracted using the same methodology as for the CardioCCC (cardiology) corpus (explained in Section 3.3).

For the analysis, we tried to create a mathematical representation of the different document specialties and their subsequent visualisation in a two-dimensional space. To this purpose, the document embeddings were extracted using the pre-trained language model "roberta-base-biomedical-clinical-es" (RoBERTa-based and trained on a large Spanish biomedical corpus from different sources), resulting in tensors of 𝑛 × 𝑚 dimensions, where 𝑛 is the number of sentences in the document and 𝑚 is the size of the language model (768 for the RoBERTa model). Subsequently, a vector composition technique was employed to process the extracted document embeddings, as described in the work of Amigó et al. [50]. This involved utilising the proposed generalised composition function in Amigó et al. [50] and illustrated in Equation 1. In this expression, the first component determines the vector direction of the sum of two vectors (𝑣 1 ⃗ and 𝑣 2 ⃗ ), while the second component represents its magnitude, which depends on the norm of single vectors and their inner product. By applying this function to pairs of successive sentences in a document and representing them as vectors, we are able to compute and represent each document as a single vector (embedding).

In this study we implemented two different composition functions derived from Equation 1, the summation (𝐹 𝑠𝑢𝑚 ), obtained when the constants 𝜆 and 𝜇 are equal to 1 and −2 respectively, and 𝐹 𝑖𝑛𝑑 , a particularization of Equation 1 when 𝜆 is equals to 1 and 𝜇 to 0.

𝐹 𝜆,𝜇 (𝑣 1 ⃗ , 𝑣 2 ⃗ ) = 𝑣 1 ⃗ + 𝑣 2 ⃗ ‖𝑣 1 ⃗ + 𝑣 2 ⃗ ‖ • √︁ 𝜆(‖𝑣 1 ⃗ 2 ‖ + ‖𝑣 2 ⃗ 2 ‖) − 𝜇⟨𝑣 1 ⃗ , 𝑣 2 ⃗ ⟩(1)

Following the document vector representation and the composition function technique, we implemented a t-Distributed Stochastic Neighbour Embedding (t-SNE) algorithm with a perplexity of 30 and a maximum number of iterations of 800 to reduce the dimensionality of the document embeddings. This statistical method enables the visualisation of high-dimensional document embeddings in lower-dimensional spaces, in this case, two dimensions.

Figures 4 and 5 illustrate the scatter plots generated by the applied methodology, utilising the two composition functions previously mentioned, 𝐹 𝑠𝑢𝑚 and 𝐹 𝑖𝑛𝑑 respectively. Both figures reveal distinct clustering patterns depending on the specialty. Documents belonging to specific specialties form a well-defined cluster (see cardiology i.e. CardioCCC in black), highlighting the fact that each of them possesses unique features in terms of content and structure. In contrast, documents from the SpaCCC corpus (red points) are scattered across the plot, reflecting their diverse nature. This is due to the fact that they cover a wide range of medical disciplines, such as cardiology (CardioCCC), oncology, urology, pneumology or infectious diseases, among many others.

Future work and conclusions.

There is a pressing need to promote the development of annotated datasets to generate automatic clinical concept detection tools, not only for a single language but for several languages, following comparable annotation criteria and consistent results across multiple languages. Due to the complexity and considerable workload associated with the manual corpus construction process of clinical content, the use of creative solutions such as neural translation and annotation projection strategies might provide an alternative solution to traditional corpus construction attempts. The results of the MultiCardioNER task indicate that it is feasible to create multilingual clinical corpora and use them to train and generate very competitive clinical NER systems with comparable results across several languages.

Moreover, an adaptation of clinical NLP components to specific medical specialties can improve the quality of the resulting systems for real-world scenarios. Typically clinical NLP application scenarios or use cases focus on content related to a particular medical discipline, disease or patient type. In this regard, the MultiCardioNER task also provides useful insights on how to adapt general-purpose clinical NLP systems to the characteristics of a medical specialty of interest.

We foresee that the results, resources, and strategies generated through the MultiCardioNER task (both by organizers and participants) might potentially promote also the creation of clinical NLP resources beyond the three chosen languages covered in this track. The MultiCardioNER silver standard corpus of predictions for Spanish, English and Italian could also constitute a valuable resource for data augmentation or corpus construction by manually validating the generated system predictions.

The presented annotation projection strategy obviously relies on the sufficient quality of the used medical translation systems. Therefore, systematic efforts to evaluate the quality of neural medical machine translation systems are critical. Initiatives like the Workshop on Machine Translation (WMT) Biomedical Translation shared task has provided insights on the quality and potential of neural translation technologies adapted to translate healthcare documents [51,52].

Figure 1 :1Figure 1: Excerpt from the DisTEMIST corpus with various annotated diseases. Translation with annotated entities in italics: "A 37-year-old woman diagnosed with AML (acute myeloblastic leukemia) in 2003 following a spontaneous right hemopneumothorax that required surgery with evacuation of the hemothorax and resection of bullous dystrophy. She was followed up on an outpatient basis without incident until 2009 when he presented with chylous ascites and a large retroperitoneal cystic lymphangioma was detected on an abdominal computed tomography (CT) scan. In February 2011 she was admitted for exertional dyspnea and extensive right pleural effusion. Pleural fluid showed characteristics of chylothorax: [...]".

Figure 2 :2Figure 2: Excerpt from the DrugTEMIST corpus with various annotated medications. Translation with annotated entities in italics: "An 82-year-old woman with a history of breast neoplasia treated with surgery and hormone therapy 20 years ago, hypertensive cardiomyopathy in sinus rhythm, hypercholesterolemia and moderate chronic hyponatremia around 133 mmol/L. She was treated with torasemide 5 mg/24h, isosorbide mononitrate 50 mg/24h, acetylsalicylic acid 100 mg/24h, pravastatin 20 mg/24h, candesartan 32 mg/24h, hydrochlorothiazide 12.5 mg/24h, atenolol 50 mg/24h and spironolactone 25 mg/24h".

( a )aExample in English. (b) Example in Spanish. (c) Example in Italian.

Figure 3 :3Figure 3: Excerpt from the CardioCCC drug annotations in all three languages taken from the same document.

models with multi-head CRF and differences in the data used for training (only DisTEMIST or DisTEMIST + CardioCCC) Data Science TUW CardioDis Transformer-based models with different pretraining settings, data augmentation and window sliding MultiDrug Multilingual and language-specific Transformers with different pretraining settings, data augmentation and window sliding Enigma CardioDis CLIN-X-ES model fine-tuned on the entire task data + custom clinical dataset MultiDrug Multilingual and language-specific Transformers fine-tuned on the entire task data + custom drug dictionary ICUE MultiDrug Multilingual and language-specific BERT models with re-training, post-processing rules + GPT 3.5 NOVALINCS CardioDis RoBERTa model fine-tuned on the standalone DisTEMIST corpus vs. joint *TEMIST corpora MultiDrug RoBERTa model fine-tuned on the standalone DrugTEMIST corpus vs. joint *TEMIST corpora PICUSLab CardioDis Ensemble of Transformer-based models trained on different datasets, including an augmented version of CardioCCC + post-processing via string matching Siemens CardioDis Fine-tuned general domain BERT model MultiDrug Fine-tuned language-specific general domain BERT models

Figure 4 :4Figure 4: Document embeddings representation per each discipline after reduction of their dimensionality to 2-dimensions by applying the t-SNE algorithm and using 𝐹 𝑠𝑢𝑚 as the composition function.

Figure 5 :5Figure 5: Document embeddings representation per each discipline after reduction of their dimensionality to 2-dimensions by applying the t-SNE algorithm and using 𝐹 𝑖𝑛𝑑 as the composition function.

Table 11Results of the baseline system (vocabulary transfer) for the two MultiCardioNER subtracksSubtrack LanguageSystem NamePrecision RecallF1CardioDisSpanishDisTEMIST vocabulary transfer0.51780.3681 0.4303MultiDrugSpanishDrugTEMIST vocabulary transfer0.63660.7148 0.6734MultiDrugEnglishDrugTEMIST vocabulary transfer0.33170.7269 0.4556MultiDrugItalianDrugTEMIST vocabulary transfer0.33200.6844 0.4471Precision (P) =True Positives True Positives + False PositivesRecall (R) =True Positives True Positives + False NegativesF1 score (F1) =2 * (𝑃 * 𝑅) (𝑃 + 𝑅)

. Entity Docs Tokens Chars Annot. Unique Annot.MeanMeanAnnot.Annot.TokensCharsDisTEMISTES Diseases 1,000 406,137 2,335,968 10,6646,7393.20 ± 2.98 24.76 ± 18.89DrugTEMIST ESDrugs 1,000 406,137 2,335,968 2,7789251.19 ± 0.56 11.34 ± 4.46ENDrugs 1,000 404,194 2,230,631 2,8148751.25 ± 0.66 11.26 ± 0.52ITDrugs 1,000 421,251 2,393,002 2,8088931.25 ± 0.69 11.49 ± 4.73CardioCCCES Diseases 508 568,297 3,215,774 18,2327,6923.32 ± 2.84 26.28 ± 19.06ESDrugs508 568,297 3,215,774 4,2277551.19 ± 0.71 11.60 ± 5.25ENDrugs508 576,772 3,114,833 4,2317341.21 ± 0.64 11.37 ± 4.74ITDrugs508 595,332 3,345,466 4,3857521.23 ± 0.72 11.85 ± 5.25

Table 44Statistics of the documents in the background set.Language Documents Tokens CharactersSpanish7,6253,863,80122,066,533English7,6253,857,83121,130,044Italian7,6254,015,92022,782,246

Table 55Overview of the teams that participated in MultiCardioNER. In the Affiliation column, A/I stands for academic or industry institution. In the Tasks column, C stands for the CardioDis subtrack and M for the MultiDrug subtrack.Team NameAffiliationTasks Ref.BIT.UAIEETA, University of Aveiro, Portugal [A]C[28]Technische Universität Wien, Austria & Spanish National ResearchDataScienceTUWC/M[29]Council (CSIC), Spain [A]EnigmaOntoText, Bulgary & Sofia University, Bulgary [I/A]C/M[30]ICUEUniversity of Edinburgh, UK & Imperial College London, UK [A]M[31]NOVALINCSNOVA School of Science And Technology, Portugal [A]C/M[32]PICUSLabUniversità degli Studi di Napoli Federico II, Italy [A]C[33]Siemens Advanta, Romania & Transilvania University of Brasov,SiemensC/M[34]Romania [I/A]

Table 66Results of the MultiCardioNER CardioDis subtrack, sorted by F1-score. The best result is bolded, and the second-best is underlined.Team NameRun namePrecisionRecallF1BIT.UArun1-all-full0.81550.82430.8199BIT.UArun0-top5-full0.8110.81810.8145Enigma3-system-CLIN-X-ES-pretrained0.80160.80820.8049Enigma2-system-CLIN-X-ES-140.80520.80070.803PICUSLabaug_fus_sub20.77940.8030.791BIT.UArun4-all0.79810.78270.7903Enigma1-system-CLIN-X-ES-120.78270.79380.7882PICUSLabaug_fus_sub10.73460.77990.7566BIT.UArun3-all-val0.75440.75880.7566BIT.UArun2-best-val0.7480.75420.7511DataScienceTUWrun4-roberta-dg0.65650.73760.6947DataScienceTUWrun5-roberta-dg-windows0.65460.72440.6877Siemensrun1_SDR0.67580.64370.6593PICUSLabaug_fus_sm_sub20.89190.48970.6323DataScienceTUWrun1_mdeberta-ct-mlm-dg0.59280.67150.6297PICUSLabaug_fus_sm_sub10.88860.47440.6185DataScienceTUWrun2-mdeberta-ct0.50270.68840.581DataScienceTUWrun3_mdeberta-ct-dg0.480.67730.5618NOVALINCS1_bsc-bio-ehr-es_distemist_40.80180.35250.4897NOVALINCS2_bsc-bio-ehr-es_distemist_10.81830.33980.4802

Table 77Results of the MultiCardioNER MultiDrug subtrack in Spanish, sorted by F1-score. The best result is bolded, and the second-best is underlined.Team NameRun namePrecisionRecallF1ICUErun2_single_pp0.91460.94120.9277ICUErun4_GPT_translation0.91460.94120.9277ICUErun5_GPT_translation_all0.91460.94120.9277Enigma3-system-SpanishRoBERTa0.9130.93480.9238Enigma1-system-XLMR0.9040.92080.9123Enigma2-system-XLMR-filtering0.91480.90050.9076ICUErun3_single0.87770.92720.9018Siemensrun1_SMR0.89280.87780.8852ICUErun1_multilingual_pp0.82870.93480.8786Enigma5-system-XLMR-filtering-dict20.76540.88710.8218NOVALINCS3_bsc-bio-ehr-es_drugtemist_40.92420.49650.646NOVALINCS4_bsc-bio-ehr-es_drugtemist_10.90760.49190.638DataScienceTUWrun3_roberta-ct-multilingual0.87050.43420.5794Enigma4-system-XLMR-filtering-dict10.43510.78990.5611DataScienceTUWrun5_roberta-ct-mlm0.84210.39120.5342DataScienceTUWrun4_mdeberta_ct_mlm_dg0.68150.38360.4909DataScienceTUWrun2_mdeberta-ct-multilingual0.76470.35560.4855DataScienceTUWrun1_mdeberta-multilingual0.39140.15310.2201

Table 88Results of the MultiCardioNER MultiDrug subtrack in English, sorted by F1-score. The best result is bolded, and the second-best is underlined.Team NameRun namePrecisionRecallF1Enigma3-system-BioLinkBERT0.89810.94770.9223ICUErun2_single_pp0.90860.91280.9107ICUErun4_GPT_translation0.90860.91280.9107Enigma1-system-XLMR0.88230.92330.9023Enigma2-system-XLMR-filtering0.90310.89890.901Enigma5-system-XLMR-filtering-dict20.86980.90470.8869ICUErun3_single0.87340.89770.8854ICUErun1_multilingual_pp0.83140.93430.8799Siemensrun1_EMR0.86850.87910.8738Enigma4-system-XLMR-filtering-dict10.82980.9210.873ICUErun5_GPT_translation_all0.87670.86350.87DataScienceTUWrun3_roberta-ct-multilingual0.86320.43640.5797DataScienceTUWrun4-mdeberta-windows0.79550.43170.5597DataScienceTUWrun5-biobert-mlm-windows0.67710.4410.5341DataScienceTUWrun2_mdeberta-ct-multilingual0.84530.37770.5221DataScienceTUWrun1_mdeberta-multilingual0.56480.24810.3448

Table 99Results of the MultiCardioNER MultiDrug subtrack in Italian, sorted by F1-score. The best result is bolded, and the second-best is underlined.Team NameRun namePrecisionRecallF1Enigma1-system-XLMR0.8840.88440.8842Enigma3-system-Italian-Spanish-RoBERTa0.87230.89560.8838Enigma2-system-XLMR-filtering0.90160.86060.8806Siemensrun1_IMR0.88910.86890.8789ICUErun4_GPT_translation0.91140.84610.8776ICUErun5_GPT_translation_all0.91140.84610.8776ICUErun2_single_pp0.81860.90.8574ICUErun1_multilingual_pp0.81390.88670.8487ICUErun3_single0.78790.88940.8356Enigma4-system-XLMR-filtering-dict10.56930.85780.6844Enigma5-system-XLMR-filtering-dict20.57070.8450.6813DataScienceTUWrun3_roberta-ct-multilingual0.82640.42060.5574DataScienceTUWrun4-mdeberta0.74810.39280.5151DataScienceTUWrun5-biobit-mlm0.79220.35170.4871DataScienceTUWrun2_mdeberta-ct-multilingual0.74330.33940.4661DataScienceTUWrun1_mdeberta-multilingual0.50740.20940.2965

https://github.com/nlp4bia-bsc/multicardioner_evaluation_library https://zenodo.org/doi/10.5281/zenodo.10948354 http://www.scielo.org https://zenodo.org/doi/10.5281/zenodo.6458078 https://zenodo.org/doi/10.5281/zenodo.6408476 https://zenodo.org/doi/10.5281/zenodo.11065432 https://huggingface.co/lcampillos/roberta-es-clinical-trials-ner https://huggingface.co/alvaroalon2/biobert_chemical_ner https://huggingface.co/PlanTL-GOB-ES/bsc-bio-ehr-es https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner https://huggingface.co/dslim/bert-base-NER https://huggingface.co/nickprock/bert-italian-finetuned-ner

Acknowledgments

The MultiCardioNER track was funded by Spanish and European projects such as DataTools4Heart (Grant Agreement No. 101057849), AI4HF (Grant Agreement No. 101080430), BARITONE (Proyectos de Transición Ecológica y Transición Digital 2021. Expediente Nº TED2021-129974B-C21) and AI4ProfHealth (PID2020-119266RA-I00 MICIU/AEI/10.13039/501100011033).

Google was a proud sponsor of the BioASQ Challenge in 2023. Ovid is also sponsoring this edition of BioASQ. The twelfth edition of BioASQ is also sponsored by Elsevier. Atypon Systems Inc. is also sponsoring this edition of BioASQ.

Natural language processing for cardiovascular applications ATariq TSantos IBanerjee Artificial Intelligence in Cardiothoracic Imaging Springer 2022 Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data TNagamine BGillette APakhomov JKahoun HMayer RBurghaus JLippert MSaxena Scientific reports 10 21340 2020 Systematic review of current natural language processing methods and applications in cardiology MRTurchioe AVolodarskiy JPathak DNWright JETcheng DSlotwiner Heart 108 2022 Artificial intelligence: revolutionizing cardiology with large language models MJBoonstra DWeissenbacher JHMoore GGonzalez-Hernandez FWAsselbergs European Heart Journal 45 2024 Prevalence of heart failure signs and symptoms in a large primary care population identified through the use of text and data mining of the electronic health record RVijayakrishnan SRSteinhubl KNg JSun RJByrd ZDaar BAWilliams CDefilippi SEbadollahi WFStewart Journal of cardiac failure 20 2014 Identifying heart failure symptoms and poor self-management in home healthcare: a natural language processing study SChae JSong MOjo MTopaz Nurses and Midwives in the Digital Age IOS Press 2021 Cohort design and natural language processing to reduce bias in electronic health records research SKhurshid CReeder LXHarrington PSingh GSarma SFFriedman PDi Achille NDiamant JWCunningham ACTurner Npj Digital Medicine 5 47 2022 Unlocking echocardiogram measurements for heart disease research through natural language processing OVPatterson MSFreiberg MSkanderson SJFodeh CABrandt SLDuvall BMC cardiovascular disorders 17 2017 Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio-canary comorbidity project ANBerman DWBiery CGinder OLHulme DMarcusa OLeiva WYWu NCardin JHainer DLBhatt Clinical Cardiology 44 2021 Information extraction from electronic health records to predict readmission following acute myocardial infarction: does natural language processing using clinical notes improve prediction of readmission? JRBrown IMRicket RMReeves RUShah CAGoodrich GGobbel MEStabler AMPerkins FMinter KCCox Journal of the American Heart Association 11 e024198 2022 Structuring clinical text with ai: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases XZhan MHumbert-Droz PMukherjee OGevaert Patterns 2 2021 Applications of natural language processing in cardiology using text clinical data: A systematic review HAAlhakimi TEMagzoub Advances in Clinical and Experimental Medicine 10 2023 Automatic methods to extract new york heart association classification from clinical notes RZhang SMa LShanahan JMunroe SHorn SSpeedie 2017 ieee international conference on bioinformatics and biomedicine (bibm) IEEE 2017 Discovering and identifying new york heart association classification from electronic health records RZhang SMa LShanahan JMunroe SHorn SSpeedie BMC medical informatics and decision making 18 2018 PAdejumo PThangaraj LSDhingra AAminorroaya XZhou CBrandt HXu HMKrumholz RKhera A deep learning approach for automated extraction of functional status and new york heart association class for heart failure patients during clinical encounters medRxiv 2024 Domain adaptation: challenges, methods, datasets, and applications PSinghal RWalambe SRamanna KKotecha IEEE access 11 2023 A review of recent work in transfer learning and domain adaptation for natural language processing of electronic health records ELaparra AMascio SVelupillai TMiller Yearbook of medical informatics 30 2021 AMiranda-Escalada LGascó SLima-López EFarré-Maduell DEstrada ANentidis AKrithara GKatsimpras GPaliouras MKrallinger Overview of DisTEMIST at BioASQ: Automatic detection and normalization of diseases from clinical texts: results, methods, evaluation and multilingual resources 2022 Overview of medprocner task on medical procedure detection and entity linking at bioasq SLima-López EFarré-Maduell LGascó ANentidis AKrithara GKatsimpras GPaliouras MKrallinger Working Notes of CLEF 2023 2023. 2023 Overview of SympTEMIST at BioCreative VIII: Corpus, Guidelines and Evaluation of Systems for the Detection and Normalization of Symptoms, Signs and Findings from Text SLima-López EFarré-Maduell LGasco-Sánchez JRodríguez-Miret MKrallinger Proceedings of the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models the BioCreative VIII Challenge and Workshop: Curation and Evaluation in the era of Generative Models 2023 Pharmaconer: Pharmacological substances, compounds and proteins named entity recognition track AGonzalez-Agirre MMarimon AIntxaurrondo ORabal MVillegas MKrallinger Proceedings of The 5th Workshop on BioNLP Open Shared Tasks The 5th Workshop on BioNLP Open Shared Tasks 2019 Mention detection, normalization & classification of species, pathogens, humans and food in clinical documents: Overview of livingner shared task and resources AMiranda-Escalada EFarré-Maduell SLima-López DEstrada LGascó MKrallinger Procesamiento del Lenguaje Natural 2022 Nlp applied to occupational health: Meddoprof shared task at iberlef 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts SLima-López EFarré-Maduell AMiranda-Escalada VBrivá-Iglesias MKrallinger Procesamiento del Lenguaje Natural 67 2021 MEDDOPLACE Shared Task overview: recognition, normalization and classification of locations and patient movement in clinical texts SLima-López EFarré-Maduell VBrivá-Escalada LGascó MKrallinger Procesamiento del Lenguaje Natural 71 2023 Brat: a web-based tool for nlp-assisted text annotation PStenetorp SPyysalo GTopić TOhta SAnaniadou JTsujii Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics 2012 Named entity recognition, concept normalization and clinical coding: Overview of the cantemist track for cancer text mining in spanish, corpus, guidelines, methods and results AMiranda-Escalada EFarré-Maduell MKrallinger Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2020) the Iberian Languages Evaluation Forum (IberLEF 2020) 2020 CEUR Workshop Proceedings The SocialDisNER shared task on detection of disease mentions in health-relevant content from social media: methods, evaluation, guidelines and corpora LGasco Sánchez DEstradaZavala EFarré-Maduell SLima-López AMiranda-Escalada MKrallinger Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, Association for Computational Linguistics The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, Association for Computational Linguistics

Gyeongju, Republic of Korea

2022 UA at MultiCardioNER: Adapting a Multi-head CRF for Cardiology RJonker TAlmeida SMatos Bit CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Cross-Linguistic Disease and Drug Detection in Cardiology Clinical Texts: Methods and Outcomes PStyll LCampillos-Llanos WKusa AHanbury CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Transformer-Based Disease and Drug Named Entity Recognition in Multilingual Clinical Texts: MultiCardioNER challenge AAksenova ADatseris SVassileva SBoytcheva CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Comparative Analyses of Multilingual Drug Entity Recognition Systems for Clinical Case Reports In Cardiology CLee TISimpson JMPosma ADLain CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Team NOVA LINCS @ BIOASQ12 MultiCardioNER Track: Entity Recognition with Additional Entity Types RGonçalves ALamúrias CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Identifying Cardiological Disorders in Spanish via Data Augmentation and Fine-Tuned Language Models ARomano GRiccio MPostiglione VMoscato CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Multilingual Clinical NER for Diseases and Medications Recognition in Cardiology Texts using BERT Embeddings MDDanu VGMarica CSuciu LMItu OFarri CLEF Working Notes GFaggioli NFerro PGaluščáková AGarcía Seco De Herrera 2024 Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes RA AJonker TAlmeida RAntunes JRAlmeida SMatos Database 2024 Deberta: Decoding-enhanced bert with disentangled attention PHe XLiu JGao WChen International Conference on Learning Representations 2021 PHe JGao WChen arXiv:2111.09543 DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing 2021 Localizing in-domain adaptation of transformer-based biomedical language models TMBuonocore CCrema ARedolfi RBellazzi EParimbelli Journal of Biomedical Informatics 144 104431 2023 Clin-x: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain LLange HAdel JStrötgen DKlakow Bioinformatics 38 2022 Unsupervised cross-lingual representation learning at scale AConneau KKhandelwal NGoyal VChaudhary GWenzek FGuzmán EGrave MOtt LZettlemoyer VStoyanov CoRR abs/1911.02116 2019 Pretrained Biomedical Language Models for Clinical NLP in Spanish CPCarrino JLlop MPàmies AGutiérrez-Fandiño JArmengol-Estapé JSilveira-Ocampo AValencia AGonzalez-Agirre MVillegas 10.18653/v1/2022.bionlp-1.19 Proceedings of the 21st Workshop on Biomedical Language Processing, Association for Computational Linguistics the 21st Workshop on Biomedical Language Processing, Association for Computational Linguistics

Dublin, Ireland

2022 LinkBERT: Pretraining Language Models with Document Links MYasunaga JLeskovec PLiang Association for Computational Linguistics (ACL) 2022 Openai Gpt-3.5 model 2023 HTouvron TLavril GIzacard XMartinet M.-ALachaux TLacroix BRozière NGoyal EHambro FAzhar ARodriguez AJoulin EGrave GLample arXiv:2302.13971 Llama: Open and efficient foundation language models 2023 A comparative analysis of spanish clinical encoder-based models on ner and classification tasks GGarcía Subies ÁBarbero PMartínezJiménez Fernández Journal of the American Medical Informatics Association e054 2024 Clinical text mining in spanish enhanced by negationdetection and named entity recognition AJTamayo Herrera DABurgos AGelbukh Computación y Sistemas 27 2023 HVerma SBergler NTahaei arXiv:2305.19120 Comparing and combining some popular ner approaches on biomedical tasks 2023 arXiv preprint AVSerrano GGSubies HMZamorano NAGarcia DSamy DBSanchez AMSandoval MGNieto ABJimenez arXiv:2205.10233 Rigoberta: a state-of-the-art language model for spanish 2022 arXiv preprint Clinlinker: Medical entity linking of clinical concept mentions in spanish FGallego GLópez-García LGasco-Sánchez MKrallinger FJVeredas International Conference on Computational Science Springer 2024 Information Theory-based Compositional Distributional Semantics EAmigó AAriza-Casabona VFresno MAMartí 10.1162/coli_a_00454 Computational Linguistics 48 2022 Findings of the wmt 2019 biomedical translation shared task: Evaluation for medline abstracts and biomedical terminologies RBawden KBCohen CGrozea AJYepes MKittner MKrallinger NMah ANeveol MNeves FSoares ACL 2019 Fourth Conference on Machine Translation, Association for Computational Linguistics 2019 Findings of the WMT 2022 biomedical translation shared task: Monolingual clinical case reports MNeves AJYepes ASiu RRoller PThomas MVNavarro LYeganova DWiemann GMDi Nunzio FVezzani WMT22-Seventh Conference on Machine Translation 2022