<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ELiRF-UPV at TA1C-IberLEF 2025: Clickbait Detection in Spanish</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alberto Picazo Pardo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vicent Ahuir</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>María José Castro-Bleda</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Systems and Computation, Universitat Politècnica de València</institution>
          ,
          <addr-line>Camí de Vera s/n, València, 46020</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>VRAIN: Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València</institution>
          ,
          <addr-line>Camí de Vera s/n, València, 46020</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ValgrAI: Valencian Graduate School and Research Network of Artificial Intelligence, Universitat Politècnica de València</institution>
          ,
          <addr-line>Camí de Vera s/n, València, 46020</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents our participation in the TA1C: Clickbait Detection and Spoiling in Spanish task at IberLEF 2025, which focuses on the automatic identification and mitigation of clickbait in Spanish-language online news shared on social media. The task comprises two subtasks: (1) a classification task, where the system must determine whether a given tweet constitutes clickbait, and (2) a generative task, in which the system-given a teaser (tweet and headline) along with the corresponding news article-must generate a concise spoiler that resolves the information gap and satisfies the curiosity induced by the teaser. The dataset includes Spanish tweets, headlines, and full news articles, annotated to support both classification and generation objectives. Our team participated in the first subtask, addressing the challenge of clickbait detection by developing several systems based on pre-trained transformer-based language models, applying fine-tuning strategies to improve prediction quality. The results confirm the efectiveness of our approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Natural Language Processing</kwd>
        <kwd>Transformers-based Models</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Clickbait Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Clickbait is a prevalent phenomenon in online news media. It refers to the use of sensational,
ambiguous, or deliberately incomplete headlines and teasers intended to provoke curiosity and drive user
engagement, often at the expense of informativeness. Although its primary function is to increase
trafic, clickbait often contributes to the spread of low-quality or misleading information, eroding trust
in digital journalism.</p>
      <p>The growing use of clickbait is not limited to soft news or dubious sources; increasingly, even
reputable media outlets employ clickbait strategies for high-impact stories. The need for automatic
tools to detect and mitigate the influence of clickbait content has become critical, not only from a
technological standpoint but also from an ethical and communicative perspective.</p>
      <p>Although early eforts in clickbait detection focused primarily on English, recent research has begun
to address this issue in other languages, including Spanish. However, Spanish remains underexplored
in terms of large-scale, annotated datasets, and shared evaluation frameworks.</p>
      <p>
        This work presents our contribution to the shared task TA1C ("Te Ahorré Un Click") at IberLEF
2025 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the first shared evaluation campaign specifically focused on clickbait detection and spoiling
in Spanish [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The goal is twofold: (1) to classify whether a teaser (e.g., tweet + title) constitutes
clickbait, and (2) to generate a spoiler that fills the curiosity gap, providing readers with the missing key
information. These tasks are relevant not only for the Natural Language Processing (NLP) community
but also for researchers in digital communication and journalism studies.
      </p>
      <p>The remainder of the paper is structured as follows. In Sections 2 and 3, we introduce the task and
describe the dataset and evaluation metrics. Section 4 presents our clickbait classification system along
with our experimental results. Finally, Section 6 concludes the paper and discusses future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Task Description: Clickbait Detection</title>
      <p>The TA1C shared task consists of two subtasks that address the identification and mitigation of clickbait
in Spanish-language online news shared on social networks. The dataset includes tweets, news headlines
and full news articles, annotated to support both classification and generation objectives. We have
experimented with the first subtask, clickbait detection.</p>
      <p>
        The task is a binary classification problem: Given a teaser consisting of a tweet and the corresponding
headline of the news article, the objective is to determine whether the content qualifies as clickbait. The
annotation relies on the definition proposed by Mordecki et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which is grounded in Loewenstein’s
information gap theory [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]: clickbait deliberately omits crucial information to provoke curiosity and
entice users to click. Systems must predict a binary label (clickbait / non-clickbait) for each instance.
Performance will be evaluated using standard classification metrics: F1-score (primary metric), Accuracy,
Precision and Recall.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The Dataset</title>
      <p>
        The dataset for both tasks consists of Spanish-language tweets that link to online news articles. Data
were curated for the TA1C shared task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], with the aim of capturing linguistic diversity across the
Spanish-speaking world.
      </p>
      <p>The dataset for Subtask 1 includes 4200 tweets collected between October 2020 and October 2021
from 18 well-known media outlets in 12 Spanish-speaking countries. Each tweet is accompanied by the
URL and clean HTML of the linked article, the headline, subheadline, article body (cleaned), images and
captions, and embedded external links.</p>
      <p>Each tweet was independently labeled by three human annotators to determine if it is clickbait. The
4200 samples of the dataset are split as shown in Table 1. We can appreciate an imbalance towards
Non-clickbait labeled samples in the training and development set. Regarding the test set, we could not
provide the distribution of Clickbait and Non-clickbait samples for that partition since the participants
could not access the test labels. Examples of clickbait and non-clickbait samples are provided in Table 2.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Systems for Clickbait Detection</title>
      <p>
        We developed six systems for the Clickbait Detection task, each exploring diferent pre-training and
data augmentation strategies based on the Spanish-language RoBERTa model pre-trained with data
from the National Library of Spain (roberta-base-bne) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] publicly available in the HuggingFace
hub [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The objective was to evaluate the impact of these techniques on classification performance
while ensuring comparability across systems.
      </p>
      <p>All systems were trained with the same hyperparameters: 5 epochs, a learning rate of 3 × 10 −5 ,
batch size of 16, 50 warmup steps, and a weight decay of 0.01. The TA1C training split (2800 examples)
was used as the training data, and the development set (700 examples) was used for validation, unless
otherwise stated. The following configurations were explored.</p>
      <sec id="sec-4-1">
        <title>System T1-1: Baseline Fine-tuning</title>
        <p>This system serves as the reference point for evaluating the efectiveness of subsequent strategies. The
roberta-base-bne model was fine-tuned directly on the TA1C training set. The development set
was used to monitor validation performance and select the best checkpoint.</p>
      </sec>
      <sec id="sec-4-2">
        <title>System T1-2: Two-phase Pre-training with Translated Tweets</title>
        <p>
          To enrich the model with more diverse examples of clickbait-like language, we carried out an
intermediate pre-training phase using 20,000 English tweets that were machine-translated into Spanish. The
20,000 tweets were randomly sampled from the https://huggingface.co/datasets/christinacdl/clickbait_
detection_dataset dataset. Translation was performed using the Helsinki-NLP translation
EnglishSpanish model (https://huggingface.co/Helsinki-NLP/opus-mt-en-es) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This pre-training phase is
carried out using the same hyperparameters used for fine-tuning. This phase was followed by
finetuning on the TA1C training set. The development set was used in both phases for early stopping and
evaluation.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>System T1-3: Backtranslation-based Augmentation</title>
        <p>
          A data augmentation method was applied using multilingual backtranslation [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. We generated 1189
synthetic samples from the TA1C training set using a multi-step backtranslation pipeline: Spanish →
German → French → Polish → Spanish, using the Helsinki-NLP translation models[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We retained only
those samples with a word-level distance between 5 and 35 words relative to their original counterparts,
to balance diversity and fidelity. Erroneous or low-quality translations were removed manually. These
synthetic samples were then concatenated with the original training data to form an augmented dataset
of 3989 examples.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>System T1-4: Random Masking Augmentation</title>
        <p>
          In this system, we applied a random masking strategy to simulate Masked Language Model (MLM)
noise. We generated five augmented versions of the TA1C training set by randomly masking 15% of the
tokens in each sentence. After concatenation and shufling, the final dataset contained approximately
14,000 augmented samples. This noisy training data was used to fine-tune the roberta-base-bne
model, again validated on the TA1C development set.
As an additional strategy to enhance the model’s generalization capabilities, Named Entity Recognition
(NER) masking was applied using spaCy [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] with the es_core_news_lg model. This approach
consists of replacing detected named entities (such as organizations, products, or locations) with their
corresponding generic labels (e.g., &lt;ORG&gt;, &lt;PRODUCT&gt;, &lt;LOC&gt;), aiming to reduce the model’s reliance
on specific entity names and encourage a greater focus on linguistic context. Fine-tuning was then
done on the masked texts.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>System T1-6: Fine-tuning Using All Available Data</title>
        <p>This system trained on both the training and validation sets (3500 tweets in total) without a development
set for evaluation. The roberta-base-bne model was fine-tuned directly on these 3500 tweets. No
early stopping or checkpoint selection was applied; the final model corresponds to the last training
epoch. All other training parameters were the same.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results and Discussion</title>
      <p>This section summarizes the performance of the systems on both the TA1C development set and the
oficial blind test set provided for the shared task. All systems were evaluated using the F1-score as the
primary metric.</p>
      <p>Table 3 presents the F1 results. System T1-3, which incorporated backtranslation-based augmentation,
achieved the highest F1 score on the test set (0.81481), ranking third place overall in the shared
task leaderboard. Notably, although System T1-1 (baseline fine-tuning) already achieved a strong
performance, augmenting the training data with synthetic samples from multilingual backtranslation
in T1-3 ofered a meaningful improvement of more than two points in the test F1-score.</p>
      <p>System T1-2 (two-phase pre-training) and T1-4 (random masking) showed moderate performance
compared to the baseline in the development set, and did not surpass T1-3 on the test set. These findings
suggest that while pre-training on loosely aligned translated data or introducing random masking
noise may help generalization, carefully filtered backtranslated samples better preserve task-relevant
semantics and are more efective for low-resource domain adaptation in Spanish clickbait detection.</p>
      <p>System T1-5 (NER masking) showed solid generalization performance, achieving an F1-score of
0.8451 on the validation set and 0.7647 on the test set. Despite a drop from development to test, the
results suggest that abstracting entities can still retain useful semantic patterns and may help reduce
overfitting.</p>
      <p>Finally, System T1-6, trained on all available data, achieved a strong test F1 (0.80342), showing the
benefit of leveraging the full dataset at the expense of validation-based checkpoint selection.</p>
      <p>Fig. 1 visually summarizes the results from Table 3, with systems ordered by Test performance.
System T1-3 (Backtranslation Augmentation) achieved the highest F1 on the test set (0.81481),
conifrming the efectiveness of multilingual backtranslation in generating useful training samples. System
T1-6, which leveraged all available data (training + validation), also performed well on the test set
(0.80342). The baseline system T1-1 already showed strong performance, suggesting that the pre-trained
roberta-base-bne model is well-suited for the task even without additional augmentation. Systems
T1-4 (Random Masking) and T1-2 (Two-phase Pre-training) yielded slightly lower test scores, indicating
limited benefits from these strategies in this context. Finally, System T1-5 (NER Masking) had the lowest
test F1-score (0.76471), but still demonstrated respectable generalization, highlighting its potential to
reduce overfitting by abstracting entity-specific information. Overall, the figure underscores that not
all augmentation techniques are equally efective, with carefully designed backtranslation yielding the
most significant gains.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Work</title>
      <p>
        This study explored several approaches for clickbait detection in the TA1C competition in Spanish [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
We proposed a transformer-based strategy centered on fine-tuning a pre-trained RoBERTa model
adapted to Spanish, using the competition’s dataset, external clickbait-related corpora, and multiple
data augmentation techniques.
      </p>
      <p>Among the techniques evaluated, backtranslation proved particularly efective, significantly
enhancing system robustness and contributing to the highest test F1-score (0.81481) across all our models,
achieving third place in the competition. While other strategies, such as random masking or NER-based
masking, showed more modest gains, they still ofer potential for reducing overfitting and warrant
further investigation.</p>
      <p>Future work will focus on experimenting with larger language models and more sophisticated
data augmentation strategies, aiming to further improve performance and generalization in clickbait
detection tasks.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Ethics Statement</title>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We have not used additional data to those provided by the competition. The pre-trained models used
are obtained from HuggingFace models hub, under the Apache License 2.0.</p>
      <p>This work is partially supported by MCIN/AEI/10.13039/501100011033 and "ERDF A way of making
Europe" under grant PID2021-126061OB-C41. It is also partially supported by the Spanish Ministerio de
Universidades under the grant FPU21/05288 for university teacher training.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Á</surname>
          </string-name>
          .
          <string-name>
            <surname>González-Barba</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chiruzzo</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <string-name>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <article-title>Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2025), co-located with the 41st Conference of the Spanish Society for Natural Language Processing (SEPLN 2025), CEUR-WS</article-title>
          . org,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mordecki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chiruzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Laguna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Prada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosá</surname>
          </string-name>
          , I. Sastre, G. Moncecchi, Overview of TA1C at IberLEF 2025:
          <article-title>Detecting and Spoiling Clickbait in Spanish-Language News</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>75</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mordecki</surname>
          </string-name>
          , G. Moncecchi,
          <string-name>
            <given-names>J.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>Te Ahorré Un Click: A Revised Definition of Clickbait and Detection in Spanish News</article-title>
          ,
          <source>in: Proceedings of Iberamia</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Loewenstein</surname>
          </string-name>
          ,
          <article-title>The Psychology of Curiosity: A Review and Reinterpretation</article-title>
          ,
          <source>Psychological Bulletin</source>
          <volume>116</volume>
          (
          <year>1994</year>
          )
          <fpage>75</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Fandiño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Estapé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pàmies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Palao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Ocampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. P.</given-names>
            <surname>Carrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Oller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Penagos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Villegas</surname>
          </string-name>
          ,
          <article-title>Maria: Spanish language models</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>68</volume>
          (
          <year>2022</year>
          ). URL: https://upcommons.upc.edu/handle/2117/367156#.YyMTB4X9A-0. mendeley. doi:
          <volume>10</volume>
          .26342/2022-68-3.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          , et al.,
          <article-title>Huggingface's transformers: State-of-the-art natural language processing</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>03771</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          ,
          <article-title>The Tatoeba Translation Challenge - Realistic Data Sets for Low Resource and Multilingual MT</article-title>
          ,
          <source>in: Proc. of the 5th Conference on Machine Translation, ACL</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1174</fpage>
          -
          <lpage>1182</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .wmt-
          <volume>1</volume>
          .
          <fpage>139</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zou</surname>
          </string-name>
          , EDA:
          <article-title>Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Hong Kong, China,
          <year>2019</year>
          , pp.
          <fpage>6382</fpage>
          -
          <lpage>6388</lpage>
          . URL: https://aclanthology.org/D19-1670. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -1670.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Al-Azzawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kovács</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Adewumi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Liwicki</surname>
          </string-name>
          , NLP-LTU at SemEval-2023
          <source>Task</source>
          <volume>10</volume>
          :
          <article-title>The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset (</article-title>
          <year>2023</year>
          ). arXiv:
          <volume>2304</volume>
          .
          <fpage>12847</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Montani</surname>
          </string-name>
          , spaCy 2:
          <article-title>Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing</article-title>
          , https://explosion.ai/blog/spacy-2
          <string-name>
            <surname>-</surname>
          </string-name>
          nlp-updates,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>