<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking Automatic Tools for Neologisms Extraction: Issues and Challenges</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information Engineering, University of Padova</institution>
          ,
          <addr-line>Via Gradenigo 6/b, 35131 Padova</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Human language is constantly evolving, driven by societal, technological, and cultural shifts, which lead to the creation of new terms and expressions. The rise of digital platforms, including social media and academic publications, has accelerated the introduction and spread of these neologisms. This paper explores current advancements and challenges in benchmarking automated and semi-automated tools for extracting neologisms. In particular, we will discuss challenges in dataset creation and evaluation procedures, such as defining neologisms, ensuring diverse text sources, managing annotation variability, and evaluating these tools.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;neologisms extraction</kwd>
        <kwd>dataset creation</kwd>
        <kwd>evaluation methodology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Human language, by its nature, is always evolving and it generates newly coined terms or expressions
that emerge in response to societal, technological, and cultural changes. The research into the detection
and understanding of neologisms has increased exponentially in the last years due to the proliferation
of digital communication platforms, including social media, academic publications, and technical
documents. In fact, this panorama of available platforms has accelerated the introduction and dissemination
of such novel linguistic elements which gives a unique opportunity for developing (semi-)automatic
techniques for neologism extraction [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        Neologism extraction is a critical task for various fields, including computational linguistics,
lexicography, and natural language processing (NLP). Traditional manual approaches to tracking
linguistic evolution are labor-intensive and time-consuming, highlighting the need for automated or
semi-automated methodologies. Identifying emerging terms can support the update of lexical resources,
improve machine translation systems, and provide insights into societal trends[
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Fully automated approaches typically leverage large-scale corpora and machine learning techniques
to identify candidate neologisms based on statistical analysis, linguistic patterns, or contextual novelty.
These systems often incorporate dictionary comparisons, word frequency analysis, and morphological
evaluation. Advances in deep learning and pretrained language models [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5, 6, 7</xref>
        ] have further enhanced
the precision of such techniques by enabling context-aware evaluations of word novelty [
        <xref ref-type="bibr" rid="ref3 ref8 ref9">3, 8, 9</xref>
        ].
      </p>
      <p>
        Semi-automated methods, on the other hand, combine computational eficiency with human
expertise [
        <xref ref-type="bibr" rid="ref1 ref10">1, 10</xref>
        ]. These approaches may flag potential neologisms for manual validation, allowing domain
experts to assess their linguistic legitimacy and relevance. By integrating human judgment,
semiautomated systems to balance scalability and accuracy, making them particularly useful for specialized
domains such as scientific literature or emerging technologies.
      </p>
      <p>Multilinguality also plays a crucial role in automatic neologism extraction, as lexical innovation does
not occur in isolation within a single language. Many neologisms emerge through cross-linguistic
influence, such as borrowings from dominant languages or calques that adapt foreign terms into native
structures. Moreover, diferent languages exhibit distinct morphological and syntactic processes for
word formation, necessitating language-specific adaptation in extraction methodologies. A multilingual
1st International Workshop on Terminological Neologism Management (NeoTerm 2025), June 18, 2025, Thessaloniki, Greece.
" giorgiomaria.dinunzio@unipd.it (G. M. Di Nunzio)
0000-0001-7116-9338 (G. M. Di Nunzio)</p>
      <p>
        © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
approach would enable comparative studies on neologism difusion, tracking how new terms spread
across linguistic and cultural boundaries. Developing resources that support multiple languages
enhances the generalizability and applicability of neologism extraction tools, making them more robust
for global linguistic research and practical applications in translation, lexicography, and information
retrieval [
        <xref ref-type="bibr" rid="ref10 ref11 ref3 ref4 ref8">3, 4, 8, 11, 10</xref>
        ].
      </p>
      <p>Despite recent progress, several challenges remain in the development of efective neologism
extraction systems. These include distinguishing genuine neologisms from typographical errors, handling
polysemy, and detecting subtle shifts in meaning for existing terms. Moreover, the rapid evolution of
language in social media environments demands adaptive models capable of processing informal and
creative language variations.</p>
      <p>In this paper, we aim to provide a preliminary overview of the state-of-the-art techniques for
benchmarking (semi-)automated tools for the extraction of neologisms, highlight their strengths and
limitations, and suggest directions for future research. In particular, we will focus on the required
specialized datasets that capture the dynamic nature of language in order to evaluate neologism
extraction tools.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Current Datasets for Evaluating Neologism Extraction</title>
      <p>When selecting a dataset for evaluation, it’s crucial to consider the specific goals of your neologism
extraction tool and choose resources that align with your target language and domain. The availability
of well-structured datasets is essential for the evaluation and advancement of techniques designed to
extract neologisms [12]. These datasets provide benchmarks for assessing the efectiveness of diferent
methodologies and ofer valuable insights into the linguistic characteristics of newly coined terms.</p>
      <p>
        One notable resource is the Adjective-Noun Neologism Dataset [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which accompanies research
on identifying adjective-noun neologisms using pretrained language models. This dataset contains
positive examples of adjective-noun neologisms alongside negative examples, making it suitable for
supervised learning and evaluation tasks.
      </p>
      <p>Another important dataset is the New York Times Word Innovation Types (NYTWIT), which includes
over 2,500 novel English words published in the New York Times between November 2017 and March
2019[13]. The entries in this dataset are manually annotated according to diferent lexical innovation
processes, such as derivation, blending, and compounding. This resource is valuable for tracking
linguistic innovation in media discourse and evaluating automated extraction systems.</p>
      <p>Additionally, the NEO-BENCH benchmark [14] provides a comprehensive evaluation framework for
assessing how well NLP models handle neologisms across various language understanding tasks. The
benchmark highlights the robustness and adaptability of systems when encountering unfamiliar lexical
items.</p>
      <p>These datasets collectively address diferent aspects of neologism extraction, from structural and
morphological innovation to semantic novelty and media-driven trends. They ofer diverse testing
environments for fully automated and semi-automated approaches, fostering the development of more
accurate and context-aware systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Challenges for Datasets for Neologism Extraction</title>
      <p>The creation of a dataset for the automatic extraction of neologisms presents multiple challenges related
to the dynamic nature of linguistic innovation, the variability of textual sources, and the complexity of
evaluation. One of the primary dificulties lies in defining what constitutes a neologism. Given that
new words emerge and evolve over time, establishing temporal boundaries is essential but remains
problematic, as some words gain acceptance while others disappear. Additionally, neologisms are highly
domain-dependent, with technical fields generating specialized vocabulary that may not be perceived
as new outside their respective disciplines. Their formation mechanisms, including afixation, blending,
borrowing, and semantic shifts, add further complexity to their identification.</p>
      <p>The choice of data sources significantly impacts the quality of a neologism dataset. While informal
digital texts such as social media and blogs are rich sources of emerging words, they are also noisy,
featuring spelling errors and non-standard language. On the other hand, curated sources like news
articles or academic papers may ofer greater linguistic stability but risk omitting the more ephemeral
or subcultural neologisms. Ensuring a balanced representation across multiple text types is necessary
but dificult to achieve. Ethical concerns also arise, particularly when mining from online communities
where privacy regulations must be respected.</p>
      <p>Annotation represents another major challenge. Human annotators must determine whether a term
is genuinely new, rare, or simply a re-emergence of an older word. This process requires external
validation, such as dictionary cross-referencing or frequency-based corpus comparison. Disagreements
among annotators introduce variability in the dataset, reducing its reliability. Moreover, given that
neologisms evolve, a static dataset may fail to capture their long-term usage trends. Therefore, an
efective resource should support longitudinal tracking - repeated observation of the same variables -,
enabling researchers to study word stabilization, meaning shifts, and eventual obsolescence.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Challenges for the Evaluation of Neologism Extraction Tools</title>
      <p>Beyond data collection and annotation, evaluation poses additional obstacles. Unlike traditional NLP
tasks, neologism extraction lacks standardized benchmarks, making performance assessment dificult.</p>
      <p>Evaluating automatic tools for neologism extraction presents several challenges that must be
addressed to improve their efectiveness and adaptability. One of the primary dificulties lies in defining
the criteria for what constitutes a neologism across diferent domains. As new terms may emerge from
slang, technical jargon, or creative word formations, establishing a universal benchmark for evaluation
remains elusive.</p>
      <p>Another significant challenge is the dynamic nature of language evolution, particularly on social
media platforms. Rapid linguistic shifts, cultural memes, and ephemeral terms complicate the process
of maintaining up-to-date evaluation datasets. Tools designed for neologism detection must therefore
be adaptable and capable of processing large volumes of informal text while distinguishing between
lfeeting trends and enduring linguistic innovations.</p>
      <p>Handling multilingual data adds an additional layer of complexity. Many neologisms emerge in
one language and later difuse into others, often undergoing transformations in spelling, morphology,
or meaning. Evaluation frameworks must account for these cross-linguistic influences to assess the
robustness of extraction tools in diverse linguistic environments.</p>
      <p>Furthermore, distinguishing genuine neologisms from typographical errors, spelling variations, and
non-standard word forms remains an issue. Automated systems require sophisticated mechanisms for
contextual analysis to accurately filter out noise and identify meaningful linguistic innovations.</p>
      <p>Semantic evaluation poses another challenge. Some neologisms involve new meanings for existing
words rather than entirely novel forms. Automatic tools must therefore go beyond surface-level text
analysis and incorporate semantic modeling techniques to capture these subtler shifts in usage.</p>
      <p>Lastly, the human-in-the-loop approach remains critical for the evaluation process. While fully
automated systems are eficient, expert validation is often necessary to ensure the linguistic validity and
relevance of detected neologisms. Developing user-friendly interfaces and hybrid evaluation models
that seamlessly integrate human expertise with machine eficiency is essential.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>The continued creation and curation of high-quality datasets remain crucial for advancing research in
the area of neologisms extraction. Future datasets should aim to capture neologisms from emerging
domains, including social media and scientific literature, while incorporating multilingual perspectives
to better understand global linguistic trends.</p>
      <p>
        While there are limited datasets specifically dedicated to neologism extraction, several related
resources can be utilized for this purpose. For example, tools like the NeoCrawler have been developed
for semi-automatic neologism identification [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. While not a dataset per se, it represents a methodological
approach to neologism extraction. Sketch Engine also ofers a feature called Trends, which is a diachronic
analysis tool designed to study changes in word usage over time.1 The NOW corpus (News on the Web)
is another resource which was created from web-based newspapers and magazines from 2010 to the
present time.2 Google Trends3 is also an alternative way of looking at how users search on the web
rather than studying the content of the pages. In all these cases, a methodology for the evaluation of
neologisms extraction is still to be studied.
      </p>
      <p>Addressing these challenges will open the possibility for more accurate, scalable, and context-aware
systems for neologism extraction. Future research should prioritize adaptive evaluation methodologies,
cross-linguistic analysis, and enhanced semantic modeling to advance the state of the art in this domain.
In particular, the representation of neologisms in Linked Open Data (LOD) is crucial for ensuring their
integration into digital knowledge systems, enhancing both interoperability and accessibility across
languages and domains [15].</p>
      <p>Another possible line of research may involve interdisciplinary collaboration between digital
humanities and the study of neologisms is pivotal in understanding and analyzing the evolution of language in
the digital era. By integrating computational tools with linguistic research, scholars can efectively track,
analyze, and interpret the emergence and usage of new words. For example, in [16] researchers employed
computational methods to identify new elements of the hybrid language Surzhyk. In another work, [17],
authors tackled the inaccuracies introduced by Optical Character Recognition (OCR) software when
digitizing historical newspapers. This issue is crucial for accurately studying the evolution of language
and the emergence of neologisms over time. The fusion of digital humanities and linguistic studies
may ofer a robust framework for exploring neologisms. Computational tools enable researchers to
process vast textual datasets, identify new linguistic patterns, and understand the socio-cultural factors
influencing language change. This interdisciplinary approach not only enriches our comprehension of
language evolution but also enhances the methodologies employed in linguistic research.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is partially supported by the HEREDITARY Project, as part of the European Union’s Horizon
Europe research and innovation programme under grant agreement No GA 101137074, and it is part of
the initiatives of the Center for Studies in Computational Terminology (CENTRICO) of the University
of Padua and in the research directions of the Italian Common Language Resources and Technology
Infrastructure CLARIN-IT. This work is also partially supported by the “National Biodiversity Future
Center - NBFC” project funded under the National Recovery and Resilience Plan (NRRP), Mission 4
Component 2 Investment 1.4 - Call for tender No. 3138 of 16 December 2021, rectified by Decree n. 3175
of 18 December 2021 of Italian Ministry of University and Research funded by the European Union –
NextGenerationEU. Project code CN_00000033, Concession Decree No. 1034 of 17 June 2022 adopted by
the Italian Ministry of University and Research, CUP F87G22000290001.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used Chat-GPT-4 in order to: Grammar and spelling
check. After using these tool, the author reviewed and edited the content as needed and takes full
responsibility for the publication’s content.
1https://www.sketchengine.eu/english-trends-corpus/
2https://www.english-corpora.org/now/
3https://trends.google.com/home
[12] T.-J. Liu, S.-K. Hsieh, L. Prevot, Observing Features of PTT Neologisms: A Corpus-driven Study
with N-gram Model, in: H.-D. Yang, W.-L. Hsu, C.-P. Chen (Eds.), Proceedings of the 25th
Conference on Computational Linguistics and Speech Processing (ROCLING 2013), The Association
for Computational Linguistics and Chinese Language Processing (ACLCLP), Kaohsiung, Taiwan,
2013, pp. 250–259. URL: https://aclanthology.org/O13-1025/.
[13] Y. Pinter, C. L. Jacobs, M. Bittker, NYTWIT: A Dataset of Novel Words in the New York Times, 2020.</p>
      <p>URL: http://arxiv.org/abs/2003.03444. doi:10.48550/arXiv.2003.03444, arXiv:2003.03444 [cs].
[14] J. Zheng, A. Ritter, W. Xu, NEO-BENCH: Evaluating Robustness of Large Language Models with
Neologisms, 2024. URL: http://arxiv.org/abs/2402.12261. doi:10.48550/arXiv.2402.12261,
arXiv:2402.12261 [cs].
[15] J. P. McCrae, I. Wood, A. Hicks, The Colloquial WordNet: Extending Princeton WordNet with
Neologisms, in: J. Gracia, F. Bond, J. P. McCrae, P. Buitelaar, C. Chiarcos, S. Hellmann (Eds.),
Language, Data, and Knowledge, Springer International Publishing, Cham, 2017, pp. 194–202.
doi:10.1007/978-3-319-59888-8_17.
[16] N. Sira, G. M. Di Nunzio, V. Nosilia, Towards an Automatic Recognition of Mixed Languages: The
Case of Ukrainian-Russian Hybrid Language Surzhyk, Umanistica Digitale (2020) 97–116. URL:
https://umanisticadigitale.unibo.it/article/view/10740. doi:10.6092/issn.2532-8816/10740,
number: 9.
[17] D. Del Fante, G. M. Di Nunzio, Correzione dell’OCR per Corpus-assisted Discourse Studies: un caso
di studio su vecchi quotidiani, Umanistica Digitale (2021) 99–124. URL: https://umanisticadigitale.
unibo.it/article/view/13689. doi:10.6092/issn.2532-8816/13689, number: 11.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kerremans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Prokić</surname>
          </string-name>
          ,
          <article-title>Mining the Web for New Words: Semi-Automatic Neologism Identification with the NeoCrawler</article-title>
          ,
          <source>Anglia</source>
          <volume>136</volume>
          (
          <year>2018</year>
          )
          <fpage>239</fpage>
          -
          <lpage>268</lpage>
          . URL: https://www.degruyter.com/document/ doi/10.1515/ang-2018-0032/html?utm_source=chatgpt.com. doi:
          <volume>10</volume>
          .1515/ang-2018-0032, publisher: De Gruyter.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jurgens</surname>
          </string-name>
          ,
          <article-title>The structure of online social networks modulates the rate of lexical change</article-title>
          , in: K.
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hakkani-Tur</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Cotterell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Chakraborty</surname>
          </string-name>
          , Y. Zhou (Eds.),
          <source>Proceedings of the</source>
          <year>2021</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>2201</fpage>
          -
          <lpage>2218</lpage>
          . URL: https://aclanthology.org/
          <year>2021</year>
          . naacl-main.
          <volume>178</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .naacl-main.
          <volume>178</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Falk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bernhard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gérard</surname>
          </string-name>
          , From Non Word to New Word:
          <article-title>Automatically Identifying Neologisms in French Newspapers</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Loftsson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Reykjavik, Iceland,
          <year>2014</year>
          , pp.
          <fpage>4337</fpage>
          -
          <lpage>4344</lpage>
          . URL: https://aclanthology.org/L14-1260/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Babych</surname>
          </string-name>
          ,
          <article-title>Unsupervised Induction of Ukrainian Morphological Paradigms for the New Lexicon: Extending Coverage for Named Entities and Neologisms using Inflection Tables and Unannotated Corpora</article-title>
          , in: T. Erjavec,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marcińczuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Piskorski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pivovarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Šnajder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinberger</surname>
          </string-name>
          , R. Yangarber (Eds.),
          <source>Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing</source>
          , Association for Computational Linguistics, Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          . URL: https://aclanthology.org/W19-3701/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W19</fpage>
          -3701.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Webson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eickhof</surname>
          </string-name>
          , E. Pavlick,
          <article-title>Are “Undocumented Workers” the Same as “Illegal Aliens”? Disentangling Denotation and Connotation in Vector Spaces</article-title>
          , in: B.
          <string-name>
            <surname>Webber</surname>
            , T. Cohn,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4090</fpage>
          -
          <lpage>4105</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>335</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp-main.
          <volume>335</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ryskina</surname>
          </string-name>
          , E. Rabinovich,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berg-Kirkpatrick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mortensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tsvetkov</surname>
          </string-name>
          , Where New Words Are Born:
          <article-title>Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods</article-title>
          , in: A.
          <string-name>
            <surname>Ettinger</surname>
          </string-name>
          , G. Jarosz, J. Pater (Eds.),
          <source>Proceedings of the Society for Computation in Linguistics</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , New York, New York,
          <year>2020</year>
          , pp.
          <fpage>367</fpage>
          -
          <lpage>376</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .scil-
          <volume>1</volume>
          .43/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <article-title>Identification of Adjective-Noun Neologisms using Pretrained Language Models</article-title>
          , in: A.
          <string-name>
            <surname>Savary</surname>
            ,
            <given-names>C. P.</given-names>
          </string-name>
          <string-name>
            <surname>Escartín</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mitrović</surname>
          </string-name>
          , V. B.
          <string-name>
            <surname>Mititelu</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN</source>
          <year>2019</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>135</fpage>
          -
          <lpage>141</lpage>
          . URL: https://aclanthology.org/W19-5116/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>W19</fpage>
          -5116.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mizrahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Yardeni</given-names>
            <surname>Seelig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shahaf</surname>
          </string-name>
          , Coming to Terms:
          <article-title>Automatic Formation of Neologisms in Hebrew</article-title>
          , in: T. Cohn,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          , Y. Liu (Eds.),
          <source>Findings of the Association for Computational Linguistics: EMNLP</source>
          <year>2020</year>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Online,
          <year>2020</year>
          , pp.
          <fpage>4918</fpage>
          -
          <lpage>4929</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .findings-emnlp.
          <volume>442</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .findings-emnlp.
          <volume>442</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          , J. Cheng,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. Niu,
          <article-title>NEDetector: Automatically extracting cybersecurity neologisms from hacker forums</article-title>
          ,
          <source>Journal of Information Security and Applications</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <article-title>102784</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S2214212621000302. doi:
          <volume>10</volume>
          .1016/ j.jisa.
          <year>2021</year>
          .
          <volume>102784</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lerner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yvon</surname>
          </string-name>
          ,
          <source>Towards the Machine Translation of Scientific Neologisms</source>
          ,
          <source>Technical Report Rapport D2-3</source>
          .1,
          <string-name>
            <surname>ISIR</surname>
          </string-name>
          ,
          <source>Université Pierre et Marie Curie UMR CNRS 7222</source>
          ,
          <year>2025</year>
          . URL: https://hal. science/hal-04852293.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Camacho</surname>
          </string-name>
          ,
          <article-title>A primer on getting neologisms from foreign languages to under-resourced languages</article-title>
          ,
          <year>2023</year>
          . URL: http://arxiv.org/abs/2304.10495. doi:
          <volume>10</volume>
          .48550/arXiv.2304.10495, arXiv:
          <fpage>2304</fpage>
          .10495 [cs].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>