<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Spatial Entities Involved in the Description of a Movement Action Using Deep Learning Methods: A Comparative Study of Three Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Abdelkrim Tafer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Gaio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Pau and the Adour Region, Laboratory of Mathematics and Their Applications</institution>
          ,
          <addr-line>Pau</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Zaragoza, Aragón Institute for Engineering Research, Advanced Information Systems Laboratory</institution>
          ,
          <addr-line>Zaragoza</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper proposes a methodology to automatically extract spatial information from itinerary descriptions in French. We compare three models: BiLSTM-CRF, CamemBERT, and GLINER, focusing on the recognition of nested spatial entities, motion verbs, spatial relations and spatial condition, and measures. Preliminary results demonstrate the potential of these models in accurately identifying and classifying spatial elements necessary for the annotation of movement actions evoked in textual descriptions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;automatique annotation</kwd>
        <kwd>classification</kwd>
        <kwd>deep learning</kwd>
        <kwd>nested spatial named entities</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Probabilistic models such as Conditional Random Fields (CRF) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have been widely used for structured
sequence prediction tasks. When combined with recurrent neural networks such as Long Short-Term
Memory (LSTM) networks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], these models efectively capture local and contextual dependencies while
improving the accuracy of named entity recognition (NER) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Transformer-based language models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have significantly advanced the modeling of linguistic
structures through large-scale pre-training on extensive text corpora. Among these, BERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] introduced a
bidirectional transformer architecture that substantially improved performance across various NLP
tasks and can be further specialized for NER through targeted fine-tuning. Additionally, newer
approaches such as GLiNER [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] exploit pre-trained language models as backbone, such as Deberta v3
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in the original paper, to develop low-resource NER systems that require minimal or no fine-tuning
with state-of-the-art performance in zero-shot learning NER. Transformers have also been adapted for
domain-specific applications, such as place name extraction from unstructured text [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Although recurrent neural networks ofer moderate computational eficiency, their inherently
sequential training and inference can limit parallelization and make it dificult to capture long-range
dependencies. In contrast, transformer-based architectures leverage self-attention to process entire
sequences in parallel, facilitating more efective modeling of distant context and exploiting modern
GPU resources eficiently. However, transformers can become computationally demanding for very
long inputs, as the self-attention mechanism scales quadratically with sequence length.</p>
      <p>
        For these models, a labeled training corpus is required. Texts are first tokenized into word or subword
units using various approaches, after that they are transformed into numerical vector representations
[
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ]. A classification layer is then applied to predict the token labels.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>In location category, strong named entity, hereafter simply called Named Entities (NE), is built from
a toponym (i.e. a proper name, such as in Figure 1: "Saint-Ybars", "Porte de Mazet"). As weak spatial
named entity is built from noun phrase describing the feature of the object to be referenced such as
building, river, or path (e.g., "medieval street", "church tower"); for ease of reference, it is henceforth
termed nominal entity (NoE). As mentioned earlier, the combination of the first two categories of spatial
entities make up the category of spatial Nested Named Entity (NNE). For instance, in Figure 1, the
phrase "hôtel de ville de Saint-Ybars" (’Saint-Ybars city hall’) exemplifies an NNE, where the NoE "hôtel
de ville" functions as the feature and the NE "Saint-Ybars"; the same applies to the NNE "le chocher de
l’église" (’the church bell tower’), where the first NoE "clocher" acts as a feature for the second NoE
"église".</p>
      <p>In addition to NNE, movement verbs or movement verbal phrases such as in Figure 1: "traversez"
(’cross’) or "tourner pour descendre" (’turn down’) delineate a moving action. Finally expressions like "à
gauche" (’left’), "au bout" (’at the end of’), "à côté" ( ’next to’), and/or "200 m" provide fine grain spatial
context, these expressions while be called hereafter Ofsets or Measures .</p>
      <p>(1) [. . . ] Traversez la route en diagonale et montez dans la rue de la Porte de Lezat . Après 200 m ,
tournez immédiatement à gauche vers le clocher de l’église au bout de cette rue , la rue de Dessous .
Admirer l’imposante façade de l’hôtel de ville de Saint-Ybars sur la place , puis tourner pour descendre
la rue Porte de Mazet à côté de la pharmacie . [. . . ]
Translation:[. . . ] Cross the road diagonally and go up Porte de Lezat street . After 200 m , turn
immediately left towards the church bell tower at end of this street , de Dessous street . Admire the imposing facade of
Saint-Ybars city hall in the square , then turn down Porte de Mazet street next to the pharmacy . [. . . ]
Named Entity (NE) Nominal Entity(NoE) Ofset Measure Verb of Movement (Motion) Nested Named Entity (NNE)</p>
      <p>Rule-based approaches such as the Perdido system [14] have traditionally been employed for
structured spatial tagging by combining morpho-syntactic and semantic constraints. Although efective
for predefined structures, these methods are inherently limited in adaptability, often failing to detect
variations in nominal entities and their relationships. This rigidity underscores the necessity for more
lfexible methodologies capable of dynamically learning entity representations and dependencies.</p>
      <p>To address these challenges, deep-learning-based approaches ofer a promising alternative. These
models, trained on annotated corpora, exhibit strong generalization capabilities, allowing them to
classify and extract spatial entities even in previously unseen contexts. Unlike rule-based systems,
deep learning models learn implicit representations of spatial languages and capture hierarchical
dependencies and context-aware entity relationships.</p>
      <p>The aim of this study is to evaluate three models for recognizing NNE and their contextual references.
These models were selected based on their significance in Named Entity Recognition (NER) research,
each representing a distinct approach to structured sequence prediction:
1. Bidirectional Long Short-Term Memory with a Conditional Random Field Layer
(BiLSTM-CRF): A well-established standard in NER using recurrent neural networks (RNNs).
2. Pre-Trained Bidirectional Transformer (CamemBERT): A transformer-based bidirectional
language model (BiLM) with a classification head for token labeling.
3. Generalist Named Entity Recognition Using Bidirectional Transformers (GLiNER): An
innovative zero-shot and few-shot learning model introducing a new paradigm for NER.</p>
      <p>Each selected model represents a diferent paradigm in NER, providing a comparative analysis of
their performance on structured sequence prediction tasks.</p>
      <p>
        BiLSTM-CRF This model [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">6, 5, 4</xref>
        ] is a widely adopted architecture for Named Entity Recognition
(NER) and structured sequence labeling. It integrates a BiLSTM network with a CRF layer to eficiently
capture contextual dependencies while enforcing valid label transitions.
      </p>
      <p>The BiLSTM component processes input sequences in both forward and backward directions. Given
a sequence of tokens x = {1, 2, . . . , }, two LSTM networks generate forward hidden state→s−ℎ and
backward hidden state− s ←ℎ for each token. The final representation is obtained by concatenating these
states, yielding ℎ = →[︁−ℎ− ; ←ℎ ]︁ with a total hidden state dimension . This bidirectional encoding allows
the model to incorporate context from both past and future tokens.</p>
      <p>A dense layer projects each hidden representation ℎ into a score vector () ∈ R, where  is the
number of possible labels. Instead of predicting labels independently, the CRF layer models dependencies
between adjacent labels. The probability of a label sequence y = {1, ..., } is defined as:
 (y|x) =
1</p>
      <p>∏︁ exp(− 1, + ()).</p>
      <p>(x) =1
where − 1, is the transition score from label − 1 to , and () is the BiLSTM emission score
at position . The partition function (x) normalizes over all possible sequences:</p>
      <p>The model is optimized by minimizing the negative log-likelihood loss:
(x) =</p>
      <p>∑︁ ∏︁ exp(′− 1,′ + (′)).</p>
      <p>y′∈(x) =1
ℒ = −</p>
      <p>∑︁(− 1, + ()) + log (x).</p>
      <p>=1</p>
      <p>During inference, the CRF layer selects the most probable label sequence by considering both emission
scores from the BiLSTM and transition scores from the CRF. Figure 2 presents an overview of the model
architecture.</p>
      <p>CamemBERT This model [15] is a transformer-based model designed specifically for the French
language. Unlike BiLSTM-CRF, which processes sequences token by token, CamemBERT employs
self-attention mechanisms that allow all tokens in a sequence to be processed in parallel, capturing
long-range dependencies more eficiently.</p>
      <p>Given an input sequence x = {1, 2, ..., }, CamemBERT encodes each token using multiple
transformer layers. At the core of its architecture is the self-attention mechanism, which computes
(1)
(2)
(3)
contextualized representations by attending to all tokens in the sequence. The attention score between
token  and token  is computed as 4 and the output representation is then obtained as 5:
exp( )
  = ∑︀
=1 exp()
,  =
q · k</p>
      <p>√

ℎ′ = ∑︁   v ,
=1
v = ℎ
(4)
(5)
where q = ℎ, k = ℎ , and  is the head dimension, and , , and  are learnable
projection matrices.</p>
      <p>Unlike BiLSTM, which encodes sequential dependencies using recurrence, self-attention allows each
token to directly incorporate information from all other tokens in a single operation.</p>
      <p>For Named Entity Recognition (NER), CamemBERT employs a classification head that assigns labels
to tokens. A dense layer maps the final hidden representation into logits z ∈ R, where  is the
number of entity labels.</p>
      <p>The model is trained using the cross-entropy loss. Compared to BiLSTM-CRF, which explicitly models
label dependencies via a CRF layer, CamemBERT implicitly learns contextual relationships through
self-attention. Figure 3 presents the overview of model architecture.</p>
      <p>
        GLiNER This third and last model [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a transformer-based NER model that introduces span-based
classification with zero-shot learning capabilities. By modeling spans instead of tokens, it allows more
lfexible boundary detection and can better handle nested structures. The token encoder processes
a unified input consisting of both entity type tokens and the input text, generating contextualized
representations. Let p = {} =−0 1 ∈ R×  denote the entity type representations, where  is
the number of entity types and  is the dimensionality of each representation. Similarly, let h =
{ℎ}=− 01 ∈ R×  represent the contextual embeddings for each token in the input text, with  being
the number of tokens. The entity representations are refined through a two-layer feedforward network,
producing q = {} =−0 1 ∈ R× .
      </p>
      <p>The representation of a span from position  to  is computed as S = FFN(ℎ ⊗ ℎ ), where ⊗
denotes concatenation. To determine whether a span (, ) corresponds to entity type , a matching
score is computed as:
(, , ) =  (︀ S⊤ ︀) ,
(6)
where  is the sigmoid activation function. This score represents the probability that the span (, )
belongs to entity type .</p>
      <p>During training, the model distinguishes between positive pairs (spans correctly labeled with type )
and negative pairs (incorrect associations) using a binary cross-entropy loss:
ℒ = − ∑︁ [I∈ log () + I∈ log (1 − ())] , (7)</p>
      <p>∈× 
where I is the indicator function. This loss encourages high matching scores for correct span-type
pairs while penalizing incorrect associations.</p>
      <p>GLiNER difers fundamentally from BiLSTM-CRF, which explicitly models sequence dependencies
via a CRF layer, and CamemBERT, which performs token-level classification. By employing span-based
prediction and textual entailment-style classification, GLiNER enhances generalization across domains
and under certain conditions, it enables entity recognition in low-resource and zero-shot settings.
Figure 4 presents an overview of the model architecture.</p>
      <p>By comparing these approaches, this study provides insights into the efectiveness of diferent NER
paradigms in extracting spatial movement actions from descriptive texts.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>An initial pilot study aimed to assess the performance of these 3 models1 in accurately annotating text
segments with six predefined labels.</p>
      <sec id="sec-4-1">
        <title>Training Dataset and Annotation Process</title>
        <p>The dataset2 consists of 1,897 french hiking descriptions, totaling 27,083 sentences and 569,214 tokens.
Spatial expressions are categorized using the annotation labels given in Figure 1: strong named entities
Named Entities (NE), weak named entities Nominal Entities (NoE), motion verbs or verbal phrases
(Motion), expressions evoking spatial relation or condition (Ofset) , and numerical expressions
followed by a unit of measurement (Measure) and finally, Nested Named Entities (NNE). These
labels are inspired by previous rule-based approaches [16].</p>
        <p>It is well known that producing an annotated dataset is a cumbersome and time-consuming task. It
was therefore decided to use Perdido [14] as the annotator for this first study. But Perdido was not
designed to be able to annotate nominal entities directly, and it would be a real challenge to integrate
it. It was decided that this annotation would go through two stages (Figure 5). Firstly, following the
result of the annotation carried out by Perdido, all the words or phrases involved in the annotation of
a spatial named entity and having received the part of speech label "Noun" were extracted. A dictionary
was created from these words or phrases, which then enabled all occurrences of the lexical entries in
this dictionary to be labelled in the dataset as NoE.</p>
        <p>The result is a silver-standard corpus—potentially containing errors due to fully automated annotation.</p>
        <p>All models were trained and tested on an identical dataset extracted from the silver-standard corpus.
Evaluation metrics include Precision, Recall, and micro F1-score, as summarized in Table 1.
Tokenization Tokenization is a crucial preprocessing step that can significantly afects model
performance. In our experiments, each model uses a distinct strategy. The BiLSTM-CRF model employs
rule-based, word-level tokenization with TreeTagger [17], configured for French. Camembert-base uses
subword tokenization based on Byte Pair Encoding (BPE) [18, 19] as implemented by SentencePiece [20]
to decompose rare and compound words. GLiNER, which leverages a multilingual DeBERTa backbone,
adopts a unigram-based subword tokenization strategy [21] via SentencePiece.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Models Parameters</title>
        <p>For each model, the following parameter settings were used without applying additional
hyperparameter tuning techniques:
1Model implementation: https://git.univ-pau.fr/atafer/sner
2Dataset: https://git.univ-pau.fr/atafer/hiking-dataset
BiLSTM-CRF The BiLSTM-CRF model employs two LSTM cells (one for the forward and one for the
backward direction) with an embedding size of 300 and a hidden dimension of 512 (256 per cell). The
model is trained using a learning rate of 0.001.</p>
        <p>Camembert-base Camembert-base is configured with an embedding/hidden size of 768, utilizes 12
transformer layers, and is trained with a learning rate of 2 × 10− 5.</p>
        <p>GLiNER GLiNER utilizes the mDeBERTa-v3-large backbone—a multilingual variant of
DeBERTav3—with an embedding/hidden size of 1024 and 12 transformer layers. The model is optimized using a
learning rate of 5 × 10− 6.</p>
        <p>Overall Analysis
Camembert-base achieved the highest overall performance with an F1-score of 0.9534, followed closely
by GLiNER with an F1-score of 0.9355. The superior performance of Camembert-base Could perhaps
be explained by its pre-training on French-language data [15], which enhances its ability to capture
ifne linguistic nuances inherent in the corpus. In contrast, GLiNER employs a backbone pretrained on
the CC100 a multilingual corpus [22] where French comprises only about 3% of the tokens; this may
partially explain its slightly lower performance on French-language data.</p>
        <p>Interestingly, despite the absence of a dedicated pre-trained language model, the BiLSTM-CRF model
efectively captured the specific characteristics of the hiking description corpus, achieving an F1-score
of 0.9269 while maintaining a moderate number of parameters and lower computational cost.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Model Memory Footprint, Parameter Count, and Eficiency</title>
        <p>Despite its compact architecture of approximately 8.73 million parameters and a minimal GPU memory
allocation of 53.82 MB along with only 8.98 MB CPU memory during inference, the BiLSTM-CRF model
demonstrates competitive performance relative to more complex transformer-based models. In contrast,
CamemBERT-base, with 110.05 million parameters, requires substantially greater computational
resources (430.07 MB allocated on the GPU and 324.73 MB on the CPU), achieving enhanced performance
through richer language representations. The GLiNER model, which leverages a large multilingual
DeBERTa backbone, comprises approximately 288.95 million parameters and incurs the highest memory
demands (2206.75 MB allocated on the GPU and 1709.67 MB on the CPU).</p>
        <p>These results highlight that small, specialized architectures such as BiLSTM-CRF can yield
nearcomparable performance with significantly lower memory and parameter footprints, making them
particularly advantageous for deployment in resource-constrained settings, while the choice of a larger
model backbone in GLiNER underlines the trade-of between resource investment and the potential for
improved cross-lingual generalization. In addition, although our evaluation does not formally assess
cross-lingual transfer performance, preliminary examples in English suggest that GLiNER’s multilingual
pretraining enables efective transfer of learned representations in french to other languages. Moreover,
the GLiNER framework is inherently modular, allowing for the replacement of its resource-intensive
multilingual DeBERTa backbone with alternatives such as CamemBERT, which ofers a lower memory
footprint. This flexibility provides a promising avenue for optimizing the balance between computational
eficiency and performance in Named Entity Recognition tasks.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Perspectives</title>
      <p>In this study, we compared three deep learning models—our specialized BiLSTM-CRF model,
CamemBERT-base, and GLiNER—for the extraction of spatial entities (nested or not, strong or weak)
and movement actions from French itinerary descriptions. The experimental results indicate that
transformer-based models, such as CamemBERT, efectively capture complex spatial patterns, while our
specialized BiLSTM-CRF model, designed specifically for this task, ofers a competitive alternative with
substantially lower computational requirements. The eficiency of the BiLSTM-CRF model makes it well
suited for resource-constrained environments, and incorporating subword tokenization could further
enhance its ability to handle out-of-vocabulary terms—an issue highlighted by the misclassification of
certain named entities.</p>
      <p>The GLiNER model, which utilizes a large multilingual DeBERTa backbone, was not subjected to
a detailed cross-lingual transfer analysis; however, its design suggests that multilingual pretraining
may support transferring representations learned on French data to other languages. Moreover, its
modular architecture permits the substitution of its resource-intensive backbone with alternatives such
as CamemBERT, potentially reducing memory usage while hoping to maintain good performance.</p>
      <p>Future work will focus on several key directions. First, the development of a gold-standard corpus
(especially for the test dataset) with manually corrected annotations is essential to overcome the
limitations of our current silver-standard dataset and to provide a more reliable benchmark. Second,
integrating higher-level structural annotations—particularly syntax-semantic dependencies linking
spatial entities with their contextual elements—could refine the extraction process. Lastly, we will
continue to investigate and refine model architectures to optimize the automated extraction and
categorization of spatial entities and movement actions from descriptive texts.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4 to Grammar and spelling check. After
using these tool, the author reviewed and edited the content as needed and take full responsibility for
the publication’s content.
Transactions of the Association for Computational Linguistics 5 (2017) 135–146. URL: https:
//aclanthology.org/Q17-1010/. doi:10.1162/tacl_a_00051.
[14] L. Moncla, M. Gaio, Perdido: Python library for geoparsing and geocoding French texts, in:
First International Workshop on Geographic Information Extraction from Texts (GeoExT), Dublin,
Ireland, 2023. URL: https://hal.science/hal-04049794.
[15] L. Martin, B. Muller, P. J. Ortiz Suárez, Y. Dupont, L. Romary, É. de la Clergerie, D. Seddah,
B. Sagot, CamemBERT: a tasty French language model, in: D. Jurafsky, J. Chai, N. Schluter,
J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational
Linguistics, Association for Computational Linguistics, Online, 2020, pp. 7203–7219. URL: https:
//aclanthology.org/2020.acl-main.645/. doi:10.18653/v1/2020.acl-main.645.
[16] M. Gaio, L. Moncla, Extended Named Entity Recognition Using Finite-State Transducers: An
Application To Place Names, in: The Ninth International Conference on Advanced Geographic
Information Systems, Applications, and Services (GEOProcessing 2017), Nice, France, 2017. URL:
https://hal.science/hal-01492994.
[17] H. Schmid, Probabilistic part-of-speech tagging using decision trees, in: Proceedings of the</p>
      <p>International Conference on New Methods in Language Processing (NOLP 1994), 1994, pp. 44–49.
[18] P. Gage, A new algorithm for data compression, C Users J. 12 (1994) 23–38.
[19] R. Sennrich, B. Haddow, A. Birch, Neural machine translation of rare words with subword units,
in: K. Erk, N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for
Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics,
Berlin, Germany, 2016, pp. 1715–1725. URL: https://aclanthology.org/P16-1162/. doi:10.18653/
v1/P16-1162.
[20] T. Kudo, J. Richardson, SentencePiece: A simple and language independent subword tokenizer
and detokenizer for neural text processing, in: E. Blanco, W. Lu (Eds.), Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing: System Demonstrations,
Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 66–71. URL: https://
aclanthology.org/D18-2012/. doi:10.18653/v1/D18-2012.
[21] T. Kudo, Subword regularization: Improving neural network translation models with multiple
subword candidates, in: I. Gurevych, Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Melbourne, Australia, 2018, pp. 66–75. URL: https://aclanthology.org/P18-1007/. doi:10.
18653/v1/P18-1007.
[22] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised Cross-lingual Representation Learning at Scale, in:
D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics, Association for Computational Linguistics, Online,
2020, pp. 8440–8451. URL: https://aclanthology.org/2020.acl-main.747/. doi:10.18653/v1/2020.
acl-main.747.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Grishman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sundheim</surname>
          </string-name>
          , Message understanding conference-6
          <article-title>: a brief history</article-title>
          ,
          <source>in: Proceedings of the 16th Conference on Computational Linguistics - Volume</source>
          <volume>1</volume>
          , COLING '96,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computational Linguistics, USA,
          <year>1996</year>
          , p.
          <fpage>466</fpage>
          -
          <lpage>471</lpage>
          . URL: https://doi.org/10.3115/992628.992709. doi:
          <volume>10</volume>
          .3115/992628.992709.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Vicente</surname>
          </string-name>
          ,
          <article-title>La glose comme outil de désambiguïsation référentielle des noms propres purs, Corela</article-title>
          . Cognition, représentation, langage (
          <year>2005</year>
          ). URL: http://journals.openedition.org/corela/1212. doi:
          <volume>10</volume>
          .4000/corela.1212.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>Nested named entity recognition</article-title>
          , in: P. Koehn, R. Mihalcea (Eds.),
          <source>Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Singapore,
          <year>2009</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>150</lpage>
          . URL: https://aclanthology. org/D09-1015/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pawar</surname>
          </string-name>
          ,
          <article-title>Named entity recognition using conditional random fields</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>167</volume>
          (
          <year>2020</year>
          )
          <fpage>1181</fpage>
          -
          <lpage>1188</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/ S1877050920308978. doi:https://doi.org/10.1016/j.procs.
          <year>2020</year>
          .
          <volume>03</volume>
          .431, international Conference on
          <source>Computational Intelligence and Data Science.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural Comput. 9</source>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          . URL: https://doi.org/10.1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.1735. doi:
          <volume>10</volume>
          .1162/neco.
          <year>1997</year>
          .
          <volume>9</volume>
          .8.1735.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawakami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <article-title>Neural architectures for named entity recognition</article-title>
          , in: K.
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          Rambow (Eds.),
          <source>Proceedings of the</source>
          <year>2016</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , San Diego, California,
          <year>2016</year>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          . URL: https://aclanthology.org/N16-1030/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N16</fpage>
          -1030.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , NIPS'17, Curran Associates Inc.,
          <string-name>
            <surname>Red</surname>
            <given-names>Hook</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NY</surname>
          </string-name>
          , USA,
          <year>2017</year>
          , p.
          <fpage>6000</fpage>
          -
          <lpage>6010</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Berragan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Singleton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Calafiore</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M.</surname>
          </string-name>
          <article-title>and, Transformer based named entity recognition for place name extraction from unstructured text</article-title>
          ,
          <source>International Journal of Geographical Information Science</source>
          <volume>37</volume>
          (
          <year>2023</year>
          )
          <fpage>747</fpage>
          -
          <lpage>766</lpage>
          . URL: https: //doi.org/10.1080/13658816.
          <year>2022</year>
          .
          <volume>2133125</volume>
          . doi:
          <volume>10</volume>
          .1080/13658816.
          <year>2022</year>
          .
          <volume>2133125</volume>
          . arXiv:https://doi.org/10.1080/13658816.
          <year>2022</year>
          .
          <volume>2133125</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , in: J.
          <string-name>
            <surname>Burstein</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Doran</surname>
          </string-name>
          , T. Solorio (Eds.),
          <source>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>U.</given-names>
            <surname>Zaratiana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Holat</surname>
          </string-name>
          , T. Charnois,
          <article-title>GLiNER: Generalist model for named entity recognition using bidirectional transformer</article-title>
          , in: K. Duh,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , S. Bethard (Eds.),
          <source>Proceedings of the</source>
          <year>2024</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Association for Computational Linguistics</article-title>
          , Mexico City, Mexico,
          <year>2024</year>
          , pp.
          <fpage>5364</fpage>
          -
          <lpage>5376</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>300</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2024</year>
          .
          <article-title>naacl-long</article-title>
          .
          <volume>300</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>DeBERTaV3: Improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing</article-title>
          ,
          <year>2023</year>
          . URL: http://arxiv.org/abs/2111.09543. doi:
          <volume>10</volume>
          . 48550/arXiv.2111.09543. arXiv:
          <volume>2111</volume>
          .09543 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          , in: R.
          <string-name>
            <surname>Caruana</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lawrence</surname>
          </string-name>
          , C. Giles (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>26</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2013</year>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
          <article-title>Introduced the Skip-gram model and Negative Sampling, foundational for word embeddings</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>