<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Jaén, Spain
$ sjzafra@ujaen.es (S. M. Jiménez-Zafra); kico.rangel@gmail.com (F. Rangel); mmontesg@inaoep.mx
(M. Montes-y-Gómez)
 https://sjzafra.github.io/ (S. M. Jiménez-Zafra); https://www.linkedin.com/in/kicorangel/ (F. Rangel);
https://ccc.inaoep.mx/~mmontesg/ (M. Montes-y-Gómez)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of IberLEF 2023: Natural Language Processing Challenges for Spanish and other Iberian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Salud María Jiménez-Zafra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Rangel</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Montes-y-Gómez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Nacional de Astrofísica</institution>
          ,
          <addr-line>Óptica y Electrónica, Puebla</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SINAI, Computer Science Department, CEATIC, Universidad de Jaén</institution>
          ,
          <addr-line>Jaén</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Symanto Research</institution>
          ,
          <addr-line>Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>IberLEF is a shared evaluation campaign of Natural Language Processing systems in Spanish and other Iberian languages that has been organized since 2019, and is held as part of the annual conference of the Spanish Society for Natural Language Processing. Its goal is to encourage the research community to organize competitive text processing, understanding and generation tasks in order to define new research challenges and set new state-of-the-art results in at least one of the following Iberian languages: Spanish, Portuguese, Catalan, Basque or Galician. This paper summarizes the evaluation activities carried out in IberLEF 2023, which included 14 tasks and 34 subtasks dealing with automatically text generated identification, clinical content, code switch analysis, early risk prediction on the Internet, harmful and inclusive content detection, political ideology and propaganda identification, and sentiment, stance and opinion analysis. Overall, the IberLEF activities represented a remarkable collective efort involving 432 researchers from 35 countries in Europe, Asia, Africa, Australia and the Americas.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Artificial Intelligence</kwd>
        <kwd>Evaluation</kwd>
        <kwd>Evaluation Challenges</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>IberLEF is a shared evaluation campaign of Natural Language Processing systems in Spanish
and other Iberian languages that has been organized since 2019, and is held as part of the annual
conference of the Spanish Society for Natural Language Processing (SEPLN). It aims to promote
research in text processing, understanding and generation tasks in at least one of the following
Iberian languages: Spanish, Portuguese, Catalan, Basque or Galician.</p>
      <p>In this shared evaluation campaign, the research community defines new research challenges
and proposes tasks to advance the state of the art in Natural Language Processing (NLP). These
tasks are reviewed by the members of the steering and program committees of IberLEF, and
ifnally evaluated by the IberLEF general chairs. The organizers of the accepted tasks, set up the
evaluation according to the proposal submitted, promote the task and manage the submission
and scientific evaluation of the system description papers submitted by the participants. These
scientific papers are included in this IberLEF proceedings volume published at CEUR-WS.org.
Moreover, the task organizers have to prepare and submit an overview of their task evaluation
exercise. These overviews are reviewed by the IberLEF organizing committee and then published
in the journal Procesamiento del Lenguaje Natural, vol. 71 (September 2023 issue). Finally, the task
organizers report the results of the tasks and the selected participants present the description
of their systems at the IberLEF workshop.</p>
      <p>IberLEF 2023 is held on September 26, 2023 in Jaén (Andalusia, Spain), within the framework
of the XXXIX International Conference of the Spanish Society for Natural Language Processing
(SEPLN 2023). This year 14 shared tasks have been accepted for IberLEF 2023, out of a total
of 22 proposals. They are NLP tasks on automatically text-generated identification, clinical
content, code switch analysis, early risk prediction on the Internet, harmful and inclusive
content detection, political ideology and propaganda identification, and sentiment, stance and
opinion analysis.</p>
      <p>In this paper we summarize the tasks organized in IberLEF 2023, analyzing them for a better
understanding of this collective efort.</p>
    </sec>
    <sec id="sec-2">
      <title>2. IberLEF 2023 Tasks</title>
      <sec id="sec-2-1">
        <title>The 14 tasks involved in IberLEF 2023 are presented below, grouped by theme.</title>
        <sec id="sec-2-1-1">
          <title>2.1. Automatically Generated Texts Identification</title>
          <p>
            AuTexTification [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] is a multi-domain machine-generated text detection and attribution task.
It consists of two subtasks: i) Subtask 1, Machine-generated text detection and ii) Subtask 2,
Machine-generated text attribution. Subtask 1 is a binary classification task in which, given and
Spanish or English text, it should be determined if the text has been automatically generated or
by contrast it is from a human. Subtask 2 is a multi-class classification task which consists of,
given an automatically generated text in Spanish or English, identifying which text model has
generated it. The possible classes are A, B, C, D, E or F. Each class represent a text generation
model and the model label mapping is: "A" - "bloom-1b7", "B" - "bloom-3b", "C" - "bloom-7b1",
"D" - "babbage", "E" - "curie", "F" - "text-davinci-003". The domains covered in this task are
tweets, reviews, news, legal, and how-to articles.
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. Clinical Content</title>
          <p>
            ClinAIS [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] shared task aims to tackle the identification of seven section types within
unstructured clinical records in the Spanish language. Specifically, the sections types are: i) Present
Illnes, ii) Past Medical History/ Medical History, iii) Family History, iv) Exploration, v) Evolution,
vi) Treatment and, vii) Derived from/to. The dataset used in this task is a subset of the CodiEsp
corpus [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], a collection of Spanish unstructured clinical case reports from diferent medical
specialties which was used in a Named Entity Recognition task at CLEF eHealth 2020.
          </p>
          <p>
            MEDDOPLACE [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] is a shared task about geographical information extraction and toponym
resolution in the clinical domain. It is structured into four subtasks: i) Location and place-related
entity mention detection, ii) Entity normalization (geocoding to GeoNames, PlusCodes and
SNOMED CT), iii) Location entity classification and, iv) End-to-end evaluation of detection,
normalization and classification. The corpus of this task consists of 1,000 clinical cases in
Spanish, together with location mention normalization (mapping to GeoNames, PlusCodes and
SNOMED-CT concepts), as well as a Silver Standard dataset in multiple languages (including
English, Italian, Portuguese, Dutch or Swedish).
          </p>
          <p>
            TESTLINK [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] focuses on relation extraction from clinical cases in Spanish and Basque. It
consists in identifying textual mentions of both laboratory tests and their results in a clinical
narrative, and then linking tests to their respective results. The task is divided into two tracks
depending on the language: i) Spanish and ii) Basque. The dataset used is based on the Spanish
and Basque parts of E3C, the multilingual European Clinical Case Corpus [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ], which consists of
three sections of clinical cases published in medical journals and other medical resources.
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.3. Code Switch Analysis</title>
          <p>
            GUA-SPA [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] is a shared task for detecting and analyzing code-switching in Guarani and
Spanish. This challenge consists of three subtasks: i) Language identification in code-switched
data, i.e., identifying the language of each token of a given text; ii) Named entity classification;
and, iii) Spanish code classification, which consists of classifying the way a Spanish span is
used in the code-switched context. The corpus of this task consists of 1,500 texts extracted from
news articles and tweets, around 25 thousand tokens.
          </p>
        </sec>
        <sec id="sec-2-1-4">
          <title>2.4. Early Risk Prediction on the Internet</title>
          <p>
            MentalRiskES [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] aims to promote the early detection of mental risk disorders in Spanish.
This task must be resolved as an online problem, that is, the participants must be able to detect
a potential risk as early as possible in a continuous stream of data. It includes three substasks: i)
Eating disorders detection, ii) Depression detection and, iii) Non-defined disorder detection on
an undisclosed disorder during the competition (anxiety) to observe the transfer of knowledge
among the diferent disorders proposed. Participants were also asked to submit measurements
of carbon emissions for their systems, emphasizing the need for sustainable NLP practices. For
this competition, a set of comments from Telegram users was compiled.
          </p>
        </sec>
        <sec id="sec-2-1-5">
          <title>2.5. Harmful and Inclusive Content</title>
          <p>DA-VINCIS [9] supports research into the development of automatic solutions for detecting
violent events in social networks. It proposes two subtasks: i) A binary classification task aimed
to determine whether a tweet is about a violent incident or not and, ii) A multi-label multi-class
classification task in which the category(ies) of a violent incident must be identified. This shared
task was also organized in IberLEF 2022 [10]. In this edition, instead of only providing textual
data the participants were provided with a multimodal dataset consisting of Mexican Spanish
tweets associated with at least an image.</p>
          <p>HOMO-MEX [11] encourages the development of NLP systems for detecting and classifying
LGBTQ+ phobic content in Mexican Spanish tweets. This shared task is divided into two tracks:
i) Determining whether a tweet exhibit LGBT+ phobic content or not and, ii) Classifying the
LGBT+ phobic tweets as containing Lesbophobia (L), Gayphobia (G), Biphobia (B), Transphobia
(T), and/or other LGBT+phobia (O).</p>
          <p>HOPE [12] shared task is related to the inclusion of vulnerable groups and focuses on the
study of the detection of hope speech, in pursuit of Equality, Diversity and Inclusion (EDI). It
consists of, given a text, written in Spanish or English, identifying whether it contains hope
speech or not. It is divided into two subtasks, according to the language in which the texts are
written: i) Identifying whether a Spanish tweet contains hope speech or not and, ii) Determining
whether an English YouTube comment contains hope speech or not.</p>
          <p>HUHU [13] focus is on examining the use of humour to express prejudice towards minorities,
specifically analyzing Spanish tweets that are prejudicial towards: women and feminists, LGBTIQ
community, immigrants and racially discriminated people, and overweight people. This shared
task consists of three subtasks: i) Determining whether a prejudicial tweet is intended to
cause humour, ii) Identifying the targeted groups (women and feminists, LGBTIQ community,
immigrants and racially discriminated people, and overweight people) on each tweet as a
multilabel classification task and, iii) Predicting how prejudicial a message is on average to
minority groups on a continuous scale from 1 to 5.</p>
        </sec>
        <sec id="sec-2-1-6">
          <title>2.6. Political Ideology and Propaganda</title>
          <p>DIPROMATS [14] is organized with the aim of finding the best techniques to identify and
categorize propagandistic tweets from governmental and diplomatic sources. It presents three
subtasks for each language, Spanish and English: i) A binary classification task to decide
whether a tweet contains propaganda techniques, ii) A multiclass, multilabel classification task,
where systems have to decide, for each tweet, in which of the 5 available categories it fits and,
iii) A fine-grained classification task in which systems have to decide which of the available
techniques the tweet contained.</p>
          <p>PoliticES [15] goal is to extract political ideology and other psychographic and demographic
characteristics of users in social networks. For this purpose, a cluster profiling task is proposed.
It focuses on the identification of two demographic traits (self-assigned gender and profession)
and one psychographic trait (political ideology), from a binary and multi-class perspective,
from clusters of Spanish tweets posted by users who share these traits. This shared task
was also organized in IberLEF 2022 [16]. The novelty this year is that instead of profiling
users, participants work with clusters of texts written by diferent users but with the same
characteristics, in order to avoid legal and ethical issues. In addition, users who are celebrities
have also been included.</p>
        </sec>
        <sec id="sec-2-1-7">
          <title>2.7. Sentiment, Stance and Opinions</title>
          <p>FinancES [17] shared task aims to extend the challenge of sentiment analysis in Spanish to the
ifnancial domain, in order to extract the sentiment that a piece of financial information can have
for several actors, including the main economic target (i.e., the specific company or asset where
the economic fact applies), other companies (i.e., the entities producing the goods and services
that others consume) and consumers (i.e., households/individuals). It consists of two subtasks:
i) Identifying the main economic target from financial news headlines and determining the
sentiment polarity (positive, neutral or negative) towards such target and, ii) Determining the
sentiment polarity of each news headline towards both companies and consumers.</p>
          <p>Rest-Mex [18] focuses on sentiment analysis and text clustering of tourist texts. It is divided
into two subtasks: i) Sentiment analysis, to predict the polarity of opinions expressed by tourists,
classifying the type of place visited (tourist attraction, hotel, or restaurant), as well as the
country it is located in (Mexico, Cuba or Colombia) and, ii) Text clustering, to classify news
articles related to tourism in Mexico by topic. This shared task was also organized in IberLEF
2022 [19] focusing on texts related to tourist destinations in Mexico, but this edition includes
data from Cuba and Colombia for the first time.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Aggregated Analysis of IberLEF 2023 Tasks</title>
      <sec id="sec-3-1">
        <title>3.1. Tasks characterization</title>
        <p>In terms of languages, the distribution per tasks (including subtasks) is shown in Figure 1.
One more year, Spanish is the central language of IberLEF (14 tasks) followed by English in a
secondary role (3 tasks), and both Euskara and Guarani in a third position (1 task each). Main
Spanish variants considered are those from Spain and Mexico, albeit this year for the first time</p>
        <sec id="sec-3-1-1">
          <title>Cuban or Colombian texts have been considered in one task.</title>
          <p>In terms of abstract task types, the distribution of diferent identified subtasks can be seen
in Figure 2. The most popular type of task is multi-class classification (8 tasks), followed by
binary classification (7 tasks). There are also two multi-label classification tasks and one task
for each regression, clustering, sequence labelling, NER and relation extraction.</p>
          <p>Even though following the trend towards a less numerous but more diverse and complex set
of tasks that started last year, this year again binary classification has been almost the most
popular type of task.</p>
          <p>In terms of evaluation metrics, the distribution can be seen in Figure 3, which depicts only
the main metrics used to rank systems in each task. As in previous years, there is a remarkable
predominance of F1 (12 tasks, in five cases together with Precision and Recall). Accuracy is used
in three tasks, while MPA, CFD and RMSE in two more tasks. There are other eighteen metrics
used in six tasks where each of these metrics was used only in one task. Among others, weighted
B2, BPA, ICM, AUC, ERDEx (with diferent values for the x), Pearson, P@x (for diferent x
values). Some of them correspond to the complex tasks which embed subtasks (e.g.,  ,
 ,  , Sent, Thematic, or Easiness). In others, they correspond to diferent distance
metrics (e.g., Mean Distance Error, Median Distance Error, or A@161km).</p>
          <p>Overall, in IberLEF as in other NLP competitive evaluation challenges we might still be
relying too much on averages to combine diferent quality metrics: it has been common this
year to combine F1 measures (which are harmonic averages) with other measures using some
other form of averaging. This hides the actual behaviour of systems and give usually no clues
on how to improve them. Also, again in 2023 the choice of metrics is, in general, barely justified,
particularly in terms of how the system output is going to be used in realistic usage scenarios.</p>
          <p>Finally, in terms of novelty/stability IberLEF 2023 has brought many new problems, with
eleven out of the fourteen primary tasks being new this year ( 79%). Only DA-VINCIS, PoliticES,
and Rest-Mex had also been run in 2022.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Datasets and results</title>
        <p>In terms of types of textual sources, Figure 4 shows how they are used in IberLEF 2023
tasks. Like in 2022, there is more diversity than previous years, with new sources like Youtube,
Telegram or How-to articles being considered this year. However, Twitter has become the
dominant source again with more than half of the tasks (8 out of 14) using it. News and Clinical
Cases Reports being used in three tasks are followed by Reviews (in one case coming from
TripAdvisor) used in two tasks. The new sources (Youtube, Telegram and How-to Articles) have
been used in one task each.</p>
        <p>In terms of dataset sizes and annotation eforts, it is dificult to establish fair comparisons,
because of the diversity of data sources, text sizes and the wide variance in terms of annotation
dificulty.</p>
        <p>In any case, in the majority of cases (11 tasks), datasets have been manually annotated. In 9 of
these cases, the size of the dataset is below 10,000 instances while in two of the cases, the size is
around 11,000 (HOMO-MEX) and 32,000 (HOPE) respectively. Two tasks provide self-annotated
datasets with around 2,700 (PoliticES) and 360,000 (Rest-Mex) instances respectively. In the case
of AuTexTification, the dataset has been partially self-annotated and partially human-assisted
auto-generated, providing around 160,000 instances.</p>
        <p>As for the reliability of the annotations, one useful indicator is inter-annotator agreement,
which is reported in 6 out of 14 tasks. In the tasks where it is reported, annotator agreement is
high in four cases, mid-high in one, and mid-low in another one. In general, mid-low agreement
may indicate the complexity of the task rather than poor annotation guidelines.</p>
        <p>Overall the annotation efort in IberLEF 2023 keeps being a remarkable contribution to enlarge
test collections for Spanish (and, less prominently, other languages). One more year, IberLEF
has been carried out without specific funding sources (other than those obtained individually
by the teams organizing and participating in the tasks). A centralized funding schema could
certainly help reach larger and better annotations in IberLEF as a whole.</p>
        <p>In terms of progress with respect the state of the art, it is as usual dificult to extract
aggregated conclusions for the whole IberLEF efort, in particular given the diversity of approaches
for providing task baselines: for instance, no baseline was provided in two tasks. Furthermore,
in eleven subtasks, only a trivial baseline was included in the comparisons (e.g. majority class
or random baselines in classification). Ten subtasks used BOW kind of approaches as baseline,
while eleven used some variant of transformers (BETO, DEBERTA, ROBERTA, etc.). In some
of the tasks, several baselines have been given and compared to the participants’ approaches,
generally combining majority, BOW, and transformers.</p>
        <p>In the subtasks that used baselines, the baseline was beaten (by a margin larger than 5%) by
the best system in 27 cases, while in 6 cases the baseline obtained better results. Looking at
the obtained results, there are only three subtasks where the best performing system obtained
results higher than 0.9, and only one subtask where the baseline did. This is an indication that,
in at least some of the tasks, there is still room for improvement.</p>
        <p>In Figure 5 we display a pairwise comparison between the best system and the best baseline,
for each of the tasks where at least one baseline is provided, and with respect to the oficial
ranking metric used in each task. To avoid confusion, we have restricted the chart to tasks
where the oficial metric varies between 0 (worst quality) and 1 (perfect output). And in the
case of RMSE, we have normalized the value to [1-RMSE].</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Participation</title>
        <p>Given that IberLEF 2023 was not a funded initiative, participation has again been impressive,
with a large fraction of the current research groups interested in NLP for Spanish and other
Iberian languages organizing and/or participating in one or more tasks. Overall, 432 researchers
representing 211 research groups from 35 countries in Europe, Asia, Africa, Australia and
the Americas were involved in IberLEF tasks. NOTE: Statistics have been compiled from the
submitted working notes, meaning two things: i) Some groups and researchers may be counted
twice if they have participated in more than one task; ii) Real participation may be higher due
to the number of teams who submitted runs but did not submit their working notes afterwards
and thus have not been counted in the statistics.</p>
        <p>Figure 6 shows the distribution of research groups per country. This year, Spain has the
largest representation, with 72 groups, followed by Mexico with 64 groups.</p>
        <p>Figure 7 shows the distribution of researchers (appearing as authors in the working notes)
per country. The top five, with Spain, Mexico, Chile, Colombia, and USA, represents roughly
80% of the researchers involved. In addition, it can be observed a great diversity of non-Spanish
speaking countries, such as USA, Australia, India, Romania, China and Italy in the top ten
positions in terms of participation, which indicates: first, that Spanish attracts the attention
of the NLP community at large; and second, that current NLP technologies enable addressing
diferent languages without language-specific machinery, other than pre-trained language
models made available to the research community.</p>
        <p>Figure 8 shows the number of teams participating in each of the tasks, considering that they
submitted at least one run. Participation ranges between 3 and 46 teams. The distribution of
research groups per task is shown in Figure 9. In this case, participation ranges between 3 and 43
groups. Regarding the most participated ones, except probably for the case of AuTexTification,
there does not seem to be a correlation between the number of participating teams and the
number of participant groups. For instance, in Rest-Mex 43 groups collaborated to participate
as 16 teams, while in HUHU only 12 groups participated as 46 teams. NOTE: A team is a group
of researchers coming from the same or diferent research groups and/or research entities who
join eforts to participate in a shared task. A research group is a group of researchers often from
the same faculty, specialised in the same subject, working together on the issue or topic in an
oficial manner and not just for participating in a shared task.</p>
        <p>As in other evaluation initiatives, participation seems to be driven not only by the task’s
intrinsic interest, but also by the cost of entry: as usual, classification tasks (the most basic
machine learning task, for which more plug-and-play software packages exist) receive more
participation than tasks which require more elaborated approaches and more creativity to
assemble algorithmic solutions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In its fifth edition, IberLEF has again been a remarkable collective efort for the advancement
of Natural Language Processing in Spanish and other Iberian languages, comprising 14 main
tasks and 432 researchers involved, from institutions in 35 countries in Europe, Asia, Africa,
Australia and the Americas. In comparison to the last edition, there has been a great increase in
participants (28% increase, from 310 to 432) and countries (31% increase, 24 to 35), showing the
increasing interest that IberLEF arouses throughout the world.</p>
      <p>IberLEF 2023 has been one of the most diverse in terms of types of tasks and application
domains, and has contributed to advance the field in the areas of automatically text generated
identification, clinical content, code switch analysis, early risk prediction on the Internet,
harmful and inclusive content detection, political ideology and propaganda identification, and
sentiment, stance and opinion analysis.</p>
      <p>In a field where machine learning is the ubiquitous approach to solve challenges, the definition
of research challenges, the development of high-quality test collections that allow for iterative
evaluation and the design of sound evaluation methodologies and metrics are perhaps the most
critical aspects of research, and we believe IberLEF keeps making significant contributions to
all of them.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work of the first author has been partially supported by Project CONSENSO
(PID2021122263OB-C21), Project MODERATES (TED2021-130145B-I00) and Project SocialTox
(PDC2022133146-C21) funded by MCIN/AEI/10.13039/501100011033 and by the European Union
NextGenerationEU/PRTR, Project PRECOM (SUBV-00016) funded by the Ministry of Consumer Afairs
of the Spanish Government, Project FedDAP (PID2020-116118GA-I00) and Project Trust-ReDaS
(PID2020-119478GB-I00) supported by MICINN/AEI/10.13039/501100011033, and WeLee project
(1380939, FEDER Andalucía 2014-2020) funded by the Andalusian Regional Government. Salud
María Jiménez-Zafra has been partially supported by a grant from Fondo Social Europeo and
the Administration of the Junta de Andalucía (DOC_01073). The work of the second author
has been partially funded by the Pro2Haters - Proactive Profiling of Hate Speech Spreaders
(CDTi IDI-20210776), the XAI-DisInfodemics: eXplainable AI for disinformation and conspiracy
detection during infodemics (MICIN PLEC2021-007681), the OBULEX - OBservatorio del Uso de
Lenguage sEXista en la red (IVACE IMINOD/2022/106), and the ANDHI - ANomalous Difusion
of Harmful Information (CPP2021-008994) R&amp;D grants.</p>
      <p>Martín-Valvidia, L. A. Ureña-López, A. Montejo-Ráez, Overview of MentalRiskES at
IberLEF 2023: Early Detection of Mental Disorders Risk in Spanish, Procesamiento del
Lenguaje Natural 71 (2023).
[9] H. Jarquín-Vásquez, D. I. Hernández-Farías, L. J. Arellano, H. J. Escalate, L.
VillaseñorPineda, M. Montes-y Gómez, F. Sanchez-Vega, Overview of DA-VINCIS at IberLEF 2023:
Detection of Aggressive and Violent Incidents from Social Media in Spanish, Procesamiento
del Lenguaje Natural 71 (2023).
[10] L. J. Arellano, H. J. Escalante, L. Villaseñor-Pineda, M. Montes-y Gómez, F. Sanchez-Vega,
Overview of DA-VINCIS at IberLEF 2022: Detection of Aggressive and Violent Incidents
from Social Media in Spanish 69 (2022).
[11] G. Bel-Enguix, H. Gómez-Adorno, G. Sierra, J. Vásquez, S. T. Andersen, S. Ojeda-Trueba,
Overview of HOMO-MEX at Iberlef 2023: Hate speech detection in Online Messages
directed tOwards the MEXican Spanish speaking LGBTQ+ population, Procesamiento del
Lenguaje Natural 71 (2023).
[12] S. M. Jiménez-Zafra, M. García-Cumbreras, D. García-Baena, J. A. García-Díaz, B. R.</p>
      <p>Chakravarthi, R. Valencia-García, L. A. Ureña-López, Overview of HOPE at IberLEF 2023:
Multilingual Hope Speech Detection, Procesamiento del Lenguaje Natural 71 (2023).
[13] R. Labadie Tamayo, B. Chulvi, P. Rosso, Everybody Hurts, Sometimes Overview of
HUrtful HUmour at IberLEF 2023: Detection of Humour Spreading Prejudice in Twitter,
Procesamiento del Lenguaje Natural 71 (2023).
[14] P. Moral, G. Marco, J. Gonzalo, J. Carrillo-de Albornoz, I. Gonzalo-Verdugo, Overview of
DIPROMATS 2023: automatic detection and characterization of propaganda techniques in
messages from diplomats and authorities of world powers, Procesamiento del Lenguaje
Natural 71 (2023).
[15] J. A. García-Díaz, S. M. Jiménez-Zafra, M.-T. Martín-Valdivia, F. García-Sánchez, L. A.</p>
      <p>Ureña-López, R. Valencia-García, Overview of PoliticES at IberLEF 2023: Political Ideology
Detection in Spanish Texts, Procesamiento del Lenguaje Natural 71 (2023).
[16] J. A. García-Díaz, S. M. Jiménez-Zafra, M.-T. Martín-Valdivia, F. García-Sánchez, L. A.</p>
      <p>Ureña-López, R. Valencia-García, Overview of PoliticEs 2022: Spanish Author Profiling for
Political Ideology, Procesamiento del Lenguaje Natural 69 (2022).
[17] J. A. García-Díaz, Almela, F. García-Sánchez, G. Alcaraz-Mármol, M. J. Marín, R.
ValenciaGarcía, Overview of FinancES 2023: Financial Targeted Sentiment Analysis in Spanish,
Procesamiento del Lenguaje Natural 71 (2023).
[18] A. Álvarez Carmona, Miguel A.and Díaz-Pacheco, R. Aranda, A. Y. Rodríguez-González,
V. Muñiz-Sánchez, A. P. López-Monroy, F. Sánchez-Vega, L. Bustio-Martínez, Overview
of Rest-Mex at IberLEF 2023: Research on Sentiment Analysis Task for Mexican Tourist
Texts, Procesamiento del Lenguaje Natural 71 (2023).
[19] A. Álvarez Carmona, Miguel A.and Díaz-Pacheco, R. Aranda, A. Y. Rodríguez-González,
D. Fajardo-Delgado, R. Guerrero-Rodríguez, L. Bustio-Martínez, Overview of Rest-Mex
at IberLEF 2022: Recommendation System, Sentiment Analysis and Covid Semaphore
Prediction for Mexican Tourist Texts, Procesamiento del Lenguaje Natural 69 (2022).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Sarvazyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franco-Salvador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of AuTexTification at IberLEF 2023:
          <article-title>Detection and Attribution of Machine-Generated Text in Multiple Domains</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>I. De la Iglesia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Vivó</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Chocrón</surname>
            , G. de Maeztu,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Gojenola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Atutxa</surname>
          </string-name>
          , Overview of ClinAIS at IberLEF 2023:
          <article-title>Automatic Identification of Sections in Clinical Documents in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miranda-Escalada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Armengol-Estapé</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, Overview of automatic clinical coding: Annotations, guidelines, and solutions for non-english clinical cases at codiesp track of clef ehealth 2020</article-title>
          .,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          (Working Notes)
          <year>2020</year>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Briva-Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gasco-Sanchez</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Krallinger, MEDDOPLACE Shared Task overview: recognition, normalization and classification of locations and patient movement in clinical texts</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Altuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agerri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-E.</given-names>
            <surname>Lidia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Saiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Speranza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanoli</surname>
          </string-name>
          , G. Karunakaran, Overview of TESTLINK at IberLEF 2023:
          <article-title>Linking Results to Clinical Laboratory Tests</article-title>
          and Measurements,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Altuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lavelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Minard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Speranza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zanoli</surname>
          </string-name>
          ,
          <article-title>European clinical case corpus, in: European Language Grid: A Language Technology Platform for Multilingual Europe</article-title>
          , Springer International Publishing Cham,
          <year>2022</year>
          , pp.
          <fpage>283</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agüero-Torales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Giménez-Lugo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alvarez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Góngora</surname>
          </string-name>
          , T. Solorio, Overview of GUA-SPA at IberLEF 2023:
          <article-title>Guarani-Spanish Code Switching Analysis</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>71</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Mármol-Romero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moreno-Muñoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Plaza-del Arco</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-G. M. Dolores</surname>
          </string-name>
          , M. T.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>