<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A Coruña, Spain
$ julio@lsi.uned.es (J. Gonzalo); mmontesg@inaoep.mx (M. Montes-y-Gómez); kico.rangel@gmail.com (F. Rangel)
 https://nlp.uned.es/ (J. Gonzalo); https://ccc.inaoep.mx/~mmontesg/ (M. Montes-y-Gómez);
https://kicorangel.com/ (F. Rangel)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of IberLEF 2022: Natural Language Processing Challenges for Spanish and other Iberian Languages</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julio Gonzalo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Montes-y-Gómez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Rangel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Institute of Astrophysics</institution>
          ,
          <addr-line>Optics and Electronics, Puebla</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Symanto Research</institution>
          ,
          <addr-line>Valencia</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>nlp.uned.es, ETSI Informática de la UNED</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>IberLEF is a comparative evaluation campaign for Natural Language Processing Systems in Spanish and other Iberian languages. Its goal is to encourage the research community to organize competitive text processing, understanding and generation tasks in order to define new research challenges and set new state-of-the-art results in those languages. This paper summarizes the evaluation activities carried out in IberLEF 2022, which included 10 tasks and 19 subtasks dealing with sentiment, stance and opinion analysis, detection and categorization of harmful content, Information Extraction, Paraphrase Identification, and Question Answering. Overall, IberLEF activities were a remarkable collective efort involving 310 researchers from 24 countries in Europe, Asia, Africa, Australia and the Americas.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Artificial Intelligence</kwd>
        <kwd>Evaluation</kwd>
        <kwd>Evaluation Challenges</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In this paper we summarize the activities carried on in IberLEF 2022, extracting some
aggregated figures for a better understanding of this collective efort.</p>
    </sec>
    <sec id="sec-2">
      <title>2. IberLEF 2022 Tasks</title>
      <p>These are the ten tasks successfully run in 2022, grouped thematically:</p>
      <sec id="sec-2-1">
        <title>2.1. Sentiment, Stance and Opinions</title>
        <p>
          ABSAPT [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is an aspect-based sentiment analysis task in Portuguese, which used TripAdvisor
reviews as target texts. It included (i) a subtask on aspect term extraction devoted to identification
of aspects in reviews, and (ii) a subtask on sentiment orientation (polarity) identification about
a single aspect mentioned in the review.
        </p>
        <p>
          PoliticES [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is an author profiling task on Twitter accounts for Spanish politicians and
political journalists, where systems must extract gender, profession and political spectrum of
each profile.
        </p>
        <p>
          Rest-Mex [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is a task that works with Mexican Tourist Texts, and addresses three problems:
(i) a recommendation subtask where, given a TripAdvisor user and a Mexican tourist destination,
the system must predict the degree of satisfaction (1-5) that the user will have when visiting the
destination; (ii) a sentiment analysis task where the system must predict the polarity (1-5) of a
given TripAdvisor review, and also the type of destination (hotel, restaurant, attraction); (iii)
an epidemiological semaphore prediction task, where given covid-related news of a Mexican
region, systems must predict the semaphore color of weeks 0, 2, 4 and 8 in the future.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Harmful Content</title>
        <p>
          DA-VINCIS [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is a task where systems must detect and classify tweets (in Spanish) that report
violent incidents. It included two subtasks; the first one is a binary classification task in which
users had to determine whether tweets were associated to a violent incident or not, and the
second one is a multi-label classification task in which the category of the violent incident
should be spotted.
        </p>
        <p>
          DETESTS [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is a task where systems must detect and classify racial stereotypes in comments
to online news articles written in Spanish. Subtask 1 is stereotype detection, and systems must
identify whether the comment contains at least one stereotype or not. Manual annotations are
handled following the learning with disagreement paradigm, where there is not necessarily
a single correct label for every example in the dataset. Subtask 2 is a multi-label hierarchical
classification problem where systems must detect and classify stereotypes according to this set
of categories: victims of xenophobia, sufering victims, economic resources, migration control,
cultural and religious diferences, people which takes “benefits” of our social policy, problem of
public health, security threat, dehumanization, other.
        </p>
        <p>
          EXIST [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is a task where systems must detect and classify sexist content in Spanish and
English tweets and gabs. Task 1 is about identification of sexism-related content: a tweet is
positive if it is sexist itself, describes a sexist situation or criticizes a sexist behavior. Task 2
is about sexism categorization: once a message has been classified as sexist, systems must
classify positive tweets in the following categories: ideological and inequality, stereotyping and
dominance, objectification, sexual violence, misogyny and non-sexual violence.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Information Extraction and Paraphrase Identification</title>
        <p>
          LivingNER [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is a task on named entity recognition, normalization and classification of species,
pathogens and food. Source texts are medical documents (case reports) annotated by medical
experts using the NCBI taxonomy. In Task 1, LivingNER-Species NER track, systems must
ifnd all mentions to (human or non-human) species mentioned, such as “hepatitis B”, “virus
herpes simple”, “paciente”. In Task 2, LivingNER-Species Norm track, systems have to retrieve
all species mentions together with their corresponding NCBI taxonomy concept identifiers.
And in Task 3, LivingNER-Clinical Impact track, for each text systems must (i) detect if the text
contains information relevant to real-world clinical use cases of high impact; (ii) retrieve the
list of NCBI taxonomy identifiers that support such detections; categorize the documents in the
following information axes: pets and farm animals, animal causing injuries, food species, and
nosocomial entities.
        </p>
        <p>
          PAR-MEX [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is a paraphrase identification task. Systems must detect sentence-level
paraphrase identification in Mexican Spanish food-related texts, which have been manually generated
from an original set of texts using literary creation, low paraphrase, high paraphrase and no
paraphrase methods.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Question Answering and Machine Reading</title>
        <p>
          QuALES [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a Question Answering task where answers must be extracted from news articles
written in Spanish. The input for systems is a question and a piece of news, and the system
must find the shortest spans of text in the article (if there is any) that answer the question. Most
questions (but not all) in the dataset deal with covid-19 issues.
        </p>
        <p>
          ReCoRES [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] is a Reading Comprehension and Reasoning Explanation task for Spanish.
Given a passage and a question about its content, Reading Comprehension systems must (1)
select the correct answer from a given set of candidates (multiple-choice task); and (2) provide
an explanation for why a given candidate was chosen as answer (reasoning explanation). Texts
in this dataset are based on university entrance examinations, and explanations are evaluated
according to automatic similarity estimations with respect to manual reference explanations,
and with manual assessments of their accuracy, fluency and readability.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Aggregated Analysis of IberLEF 2022 Tasks</title>
      <sec id="sec-3-1">
        <title>3.1. Tasks characterization</title>
        <p>In terms of languages, the distribution per tasks (including subtasks) is shown in Figure 1.
Spanish is, one more year, the central language of IberLEF (17 tasks) with Portuguese and
English in a secondary role (2 tasks each). Main Spanish variants considered are those from
Spain, Mexico, Uruguay and Perú.</p>
        <p>In terms of abstract task types, the distribution of tasks can be seen in Figure 2. Out of
a total of 19 tasks (each subtask is counted as a task here), the most popular type of task is
multi-class classification (7 tasks), followed by sequence tagging and binary classification (4
each). There are also two ordinal classification tasks, two regression tasks, two KB linking
tasks (one on entity linking and another one on taxonomy linking), two answer extraction
tasks (one is multiple choice, which is also counted as classification, and the other one is span
selection, which we also count as sequence tagging) and one text generation task. Interestingly,
in 2022 there are four complex tasks which involve solving more than one core task at once (for
instance, sequence tagging plus entity linking).</p>
        <p>Compared with 2021, the trends are towards a less numerous (19 vs 29) but more diverse and
more complex set of tasks, where binary classification is no longer the most popular type of
task and several tasks imply solving many NLP problems at the same time.</p>
        <p>
          In terms of evaluation metrics, the distribution can be seen in Figure 3, which depicts only
the main metrics used to rank systems in each task. As in previous years, there is a remarkable
predominance of F1 (11 tasks), even if it does not perfectly match the problem considered.
Accuracy is used by three tasks, MAE in two regression tasks, and there are other six metrics
that are used only in one task. Some of them correspond to the complex tasks which embed
subtasks (e.g. the mean of F1 scores for several tasks is used in one occasion, the mean of inverse
MAE and F1 scores for diferent tasks in other, or the average of F1 measures at diferent points
in the future with weights according to the time distance in other. The rest are Average Exact
Match [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] for a QA task, BERTScore [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to compare system and gold standard explanations,
and ICM [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] for a hierarchical classification task.
        </p>
        <p>Overall, in IberLEF as in other NLP competitive evaluation challenges we might still be
relying too much on averages to combine diferent quality metrics: it has been common this
year to combine F1 measures (which are harmonic averages) with other measures using some
other form of averaging. This hides the actual behaviour of systems and give usually no clues
on how to improve them. Also, again in 2022 the choice of metrics is, in general, barely justified,
particularly in terms of how the system output is going to be used in realistic usage scenarios.</p>
        <p>Finally, in terms of novelty/stability IberLEF 2022 has brought many new problems, with
seven out of the 10 primary tasks being new this year. Only REST-MEX, EXIST and DETESTS
had also been run in 2021.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Datasets and results</title>
        <p>In terms of types of textual sources, Figure 4 shows how they are used in IberLEF 2022 tasks.
There is more diversity than previous years, with Twitter being less dominant: TripAdvisor
reviews were used in 5 tasks, Twitter in 4, clinical cases in 3 subtasks (all belonging to the same
task), exams, news comments and Gabs were used in two subtasks each, and finally news and
gastronomy texts were used in one task each.</p>
        <p>In terms of dataset sizes and annotation eforts, it is dificult to establish fair comparisons,
because of the diversity of text sizes and the wide variance in terms of annotation dificulty.
In any case, in the majority of cases (14 tasks) manually annotated datasets were below 6,000
instances. Two other tasks provided annotated collections comprising between 10,000 and
15,000 instances, and there was one task which provided over 40,000 annotated instances.</p>
        <p>As for the reliability of the annotations, one useful indicator is inter-annotator agreement,
which is reported in 9 out of 19 tasks. In the tasks where it is reported, annotator agreement is
high in three cases and mid-low in another six. In general, mid-low agreement indicates the
complexity of the task rather than poor annotation guidelines.</p>
        <p>Overall the annotation efort in IberLEF 2022 keeps being a remarkable contribution to enlarge
test collections for Spanish (and, less prominently, other languages). One more year, IberLEF
has been carried out without specific funding sources (other than those obtained individually
by the teams organizing and participating in the tasks). A centralized funding schema could
certainly help reaching larger and better annotations in IberLEF as a whole.</p>
        <p>In terms of progress with respect the state of the art, it is as usual dificult to extract
aggregated conclusions for the whole IberLEF efort, in particular given the diversity of
approaches for providing task baselines: in five tasks, no baseline was provided. In three, only
a trivial baseline was included in the comparisons (e.g. majority class or random baselines in
classification). Four tasks used SVM as baseline, and five used some variant of transformers
(BETO in two occasions, BERT in another two and T5 in one). Only two used other types of
baselines.</p>
        <p>In the tasks that used baselines, the baseline was beaten (by a margin larger than 5%) by the
best system in eight cases. In two cases, the diference was below 5% (one in favour of the best
system, the other in favour of the baseline), and in the last two tasks, the baseline was better
than any system. This is an indication that at least some of the tasks</p>
        <p>In Figure 5 we display a pairwise comparison between the best system and the best baseline,
for each of the tasks where at least one baseline is provided, and with respect to the oficial
ranking metric used in each task. To avoid confusion, we have restricted the chart to tasks
where the oficial metric varies between 0 (worst quality) and 1 (perfect output).</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Participation</title>
        <p>Given that IberLEF 2022 was not a funded initiative, participation has again been impressive,
with a large fraction of current research groups interested in NLP for Spanish organizing and/or
participating in one or more tasks. Overall, 310 researchers representing 169 research groups
from 24 countries in Europe, Asia, Africa, Australia and the Americas were involved in IberLEF
tasks1.</p>
        <p>Figure 6 shows the distribution of research groups per country. This year, Mexico has the
largest representation, with 54 groups, followed by Spain with 44 groups (note that all figures
1Statistics have been compiled from the submitted working notes, meaning two things: i) some groups and researchers
may be counted twice if they have participated in more than one task; ii) real participation may be higher due to
the number of teams who submitted runs but did not submit their working notes afterwards, and thus have not
been counted in the statistics.
reporting participation do not collapse duplicates: a group or a researcher participating in two
tasks is counted twice).</p>
        <p>Figure 7 shows the distribution of researchers (appearing as authors in the working notes)
per country. The numbers are almost consistent with the distribution of groups per country,
with some flips between USA and Brazil, or China, Chile and Vietnam. The top five, with
Mexico, Spain, Brazil, USA, and China, represents roughly 80% of the researchers involved.
The fact that there are two non-Spanish, non-Portuguese speaking countries in the top five,
China and the USA, as well as others such as Vietnam or Canada in the top positions in terms
of participation, indicates two things: first, that Spanish attracts the attention of the NLP
community at large; and second, that current NLP technologies enable addressing diferent
languages without language-specific machinery, other than pre-trained language models made
available to the research community.</p>
        <p>The distribution of research groups per task is shown in Figure 8. Participation ranges
between 3 and 36 groups. As in other evaluation initiatives, participation seems to be driven
not only by the task intrinsic interest, but also by the cost of entry: as usual, classification tasks
(the most basic machine learning task, for which more plug and play software packages exist)
receive more participation than tasks which require more elaborated approaches and more
creativity to assemble algorithmic solutions.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>In its third edition, IberLEF has again been a remarkable collective efort for the advancement of
Natural Language Processing in Spanish and other Iberian languages, comprising 10 main tasks
and 310 researchers involved, from institutions in 24 countries in Europe, Asia, Africa, Australia
and the Americas. IberLEF 2022 has been one of the most diverse in terms of types of tasks and
application domains, and has contributed to advance the field in the areas of sentiment, stance
and opinion analysis, detection and categorization of harmful content, Information Extraction,
Answer Extraction, and Paraphrase identification. In a field where machine learning is the
ubiquitous approach to solve challenges, the definition of research challenges, the development
of high-quality test collections that allow for iterative evaluation and the design of sound
evaluation methodologies and metrics are perhaps the most critical aspects of research, and we
believe IberLEF keeps making significant contributions to all of them.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the Spanish Ministry of Science and Innovation,
Project FairTransNLP (PID2021-124361OB-C32), and by CONACyT-México, Project
CB-201501-257383. The work of the third author has been partially funded by CDTI under grant
IDI-20210776, IVACE under grant IMINOD/2021/72, and grant PLEC2021-007681 funded by
MCIN/AEI/10.13039/501100011033 and by European Union NextGenerationEU/PRTR.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F. L. V.</given-names>
            da
            <surname>Silva</surname>
          </string-name>
          , G. d. S. Xavier,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Mensenburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. F.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. P. dos Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Araújo</surname>
          </string-name>
          , U. B.
          <string-name>
            <surname>Corrêa</surname>
          </string-name>
          , L. A. de Freitas,
          <article-title>ABSAPT 2022 at IberLEF: Overview of the Task on Aspect-Based Sentiment Analysis in Portuguese 69 (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>García-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Ureña-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          , Overview of PoliticEs 2022:
          <article-title>Spanish Author Profiling for Political Ideology</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Álvarez</surname>
          </string-name>
          <string-name>
            <surname>Carmona</surname>
          </string-name>
          ,
          <article-title>Miguel A</article-title>
          .and
          <string-name>
            <surname>Díaz-Pacheco</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Rodríguez-González</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fajardo-Delgado</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Guerrero-Rodríguez</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Bustio-Martínez</surname>
          </string-name>
          ,
          <article-title>Overview of Rest-Mex at IberLEF 2022: Recommendation System</article-title>
          ,
          <source>Sentiment Analysis and Covid Semaphore Prediction for Mexican Tourist Texts, Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Arellano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Villaseñor-Pineda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes-y Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sanchez-Vega</surname>
          </string-name>
          ,
          <article-title>Overview of DA-VINCIS at IberLEF 2022: Detection of Aggressive and Violent Incidents from Social Media in Spanish 69 (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ariza-Casabona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. S.</given-names>
            <surname>Schmeisser-Nieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nofre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , Overview of DETESTS at IberLEF 2022:
          <article-title>DETEction and classification of racial STereotypes in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rodríguez-Sánchez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendieta-Aragón</surname>
          </string-name>
          , G. MarcoRemón, M. Makeienko,
          <string-name>
            <given-names>M.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <source>Overview of EXIST</source>
          <year>2022</year>
          :
          <article-title>sEXism Identification in Social neTworks</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miranda-Escalada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Farré-Maduell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lima-López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Estrada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gascó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krallinger</surname>
          </string-name>
          ,
          <article-title>Mention detection, normalization classification of species, pathogens, humans and food in clinical documents: Overview of the LivingNER shared task and resources</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bel-Enguix</surname>
          </string-name>
          , G. Sierra,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gómez-Adorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-M.</given-names>
            <surname>Torres-Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-G.</given-names>
            <surname>Ortiz-Barajas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vásquez</surname>
          </string-name>
          ,
          <article-title>Overview of PAR-MEX at Iberlef 2022: Paraphrase Detection in Spanish Shared Task</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosá</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bouza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dragonetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Castro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Etcheverry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Góngora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goycoechea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Machado</surname>
          </string-name>
          , G. Moncecchi,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Prada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wonsever</surname>
          </string-name>
          , Overview of QuALES at IberLEF 2022:
          <article-title>Question Answering Learning from Examples in Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Sobrevilla Cabezudo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Diestra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>López</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gómez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oncevay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alva-Manchego</surname>
          </string-name>
          , Overview of ReCoRES at IberLEF 2022:
          <article-title>Reading Comprehension and Reasoning Explanation for Spanish</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lopyrev</surname>
          </string-name>
          , P. Liang, SQuAD:
          <volume>100</volume>
          ,000+ Questions for Machine Comprehension of Text,
          <year>2016</year>
          . URL: https://arxiv.org/abs/1606.05250. doi:
          <volume>10</volume>
          .48550/ARXIV. 1606.05250.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kishore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. Q.</given-names>
            <surname>Weinberger</surname>
          </string-name>
          , Y. Artzi,
          <source>BERTScore: Evaluating Text Generation with BERT</source>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1904</year>
          .09675. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1904</year>
          .
          <volume>09675</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delgado</surname>
          </string-name>
          ,
          <article-title>Evaluating Extreme Hierarchical Multi-label Classification, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</article-title>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          , pp.
          <fpage>5809</fpage>
          -
          <lpage>5819</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>399</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>399</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>