<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A BERT-based Approach for Part-of-Speech Tagging in the Low-Resource Context of Sardinian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Salvatore Mario Carta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filippo Concas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianni Fenu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Giuliani</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Manolo Manca</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Marras</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Piergiorgio Mura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Pisano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Humanities, University for Foreigners of Siena</institution>
          ,
          <addr-line>Piazza Carlo Rosselli 27/28, 53100 Siena -</addr-line>
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics and Computer Science, University of Cagliari</institution>
          ,
          <addr-line>Via Ospedale 72, 09124 Cagliari -</addr-line>
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>VisioScientiae S.r.l.</institution>
          ,
          <addr-line>Via Francesco Ciusa 46, 09131 Cagliari -</addr-line>
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Natural language processing (NLP) has made significant improvements in recent years, primarily driven by the latest advancements in deep learning technologies and the increasing availability of large-scale linguistic resources. Nevertheless, such advancements have mostly benefited high-resource languages, leaving many minority and underrepresented languages at the margins of computational linguistics research. Sardinian, the native language of the island of Sardinia, exemplifies this disparity. Indeed, despite its cultural and linguistic value, there is a lack of proper resources, annotated corpora, and NLP tools. This work proposes a Part-of-Speech tagging system for Sardinian characterized by methods consistent with its morphological specificity. The system integrates a BERT-based token classifier capable of assigning a grammatical category to each input word in a sentence. The classifier was trained on a balanced, manually-annotated corpus, and its performance was evaluated using standard machine-learning-oriented performance metrics (Accuracy, F1-score, Recall, and Precision). Experiments show that pre-trained architectures such as BERT remain efective even for languages with limited data availability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Low-resource languages</kwd>
        <kwd>Part-of-speech tagging</kwd>
        <kwd>Language models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>digital domain and thus inadequately, or even entirely,
unknown to most models. Indeed, in this scenario, tools
Recent scientific advances in language models (LMs) and that support linguistic analysis, such as PoS taggers,
renatural language processing (NLP) have contributed to main scarce or nonexistent, limiting the ability of
linthe development of sophisticated technologies for gen- guists to study the features of such tools at scale. More
erating, analyzing, and interpreting the world’s major specifically, PoS tagging aims to assign a grammatical
languages. In such a context, large language models label to every word in a sentence to facilitate the study
(LLMs), such as GPT-4 [1], Llama-3 [2], and Phi-4 [3], of its grammatical structure. This task is crucial for
anahave shown strong proficiency across a wide range of lyzing the multifaceted nature of a given language.
language-related tasks [4], including tasks such as sen- Sardinian, a Romance language spoken primarily on
timent analysis [5, 6], text classification [ 7, 8], text sum- the island of Sardinia (Italy), stands out as a notable case
marization, and part-of-speech (PoS) tagging [9]. study of low-resource language. Indeed, its rich
morpho</p>
      <p>However, despite their increasing efectiveness, LLMs logical structure and its classification as an endangered
still present limitations in performing several NLP tasks language have attracted increasing attention in linguistic
[10]. In particular, they struggle when the task concerns preservation and digital humanities [11]. In this
direcminority and/or low-resource languages, which often tion, the present work describes the creation and the
exhibit distinctive linguistic features that make them a evaluation of an automatic Sardinian PoS tagging model.
subject of special interest for linguists. However, linguists The methodology relies on fine-tuning a BERT-based
lanrarely have access to automated tools and resources that guage model [12] using a corpus manually annotated
facilitate in-depth studies, as these minority and/or low- by linguists specializing in Sardinian. The
experimenresource languages are often underrepresented in the tal phase includes the analysis of the hyperparameters
and the monitoring of machine-learning-oriented
perCLiC-it 2025: Eleventh Italian Conference on Computational Linguis- formance metrics. The proposed approach provides a
*tiCcso,rSreepstpeomnbdeirng24a—uth2o6,r.2025, Cagliari, Italy foundational methodology that can be adapted to develop
$ salvatorem.carta@unica.it (S. M. Carta); similar tools for other low-resource languages.
iflippo.concas2@unica.it (F. Concas); fenu@unica.it (G. Fenu); The remainder of this paper is structured as follows:
alessandro.giuliani@unica.it (A. Giuliani); marcom.manca@unica.it Section 2 describes the state of the art; Section 3 provides
(M. M. Manca); mirko.marras@unica.it (M. Marras); a mathematical formulation of the problem and a
descripspiimerognioer.pgiisoa.mnou@rau@nuisntrisatsria.isti.(iSt.(PP.isManuor)a); tion of the proposed approach; Section 4 illustrates the
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License results; and finally, Section 5 concludes the work.</p>
      <p>Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This section provides an overview of the state of the art This section describes the methodology followed to build
in PoS tagging for low-resource languages, followed by and evaluate the PoS tagger for the Sardinian language.
a description of the work carried out for the Sardinian The section is organized as follows: first, the problem is
language in the context of NLP. The PoS tagger is an NLP formulated mathematically; subsequently, an overview
tool that assigns a grammatical label to each word in a of the entire methodology is provided; then, an analysis
sentence, thus enabling the identification of the function of the data used to build the PoS tagger is conducted;
of each word in that sentence. This tool facilitates syntac- finally, the fine-tuning technique employed is presented.
tic analysis and provides fundamental support for
developing any low-resource language, including Sardinian, 3.1. Problem Formulation
by automating linguistic analysis in contexts where
structured linguistic resources are lacking. Mathematically, let s ∈  be a sentence belonging to a set</p>
      <p>
        In recent years, numerous approaches have been ex- of sentences; then s can be identified as a vector whose
tensively investigated, with the aim of developing auto- entries represent the words included in the sentence s =
matic tagging systems or augmenting training corpora [1, . . . , ], with  ∈ N+. Therefore, a PoS tagger
to enable high-accuracy, high-eficiency grammatical an- can be defined as a function  expressed as:
notation at the sentence level. In the context of
lowresource languages, where typically scarce data is
publicly available, data from more widely known languages  :  →− 
similar to the target language is usually employed; one ap- s →↦−  (s) = t = [1, . . . , ]
proach following this direction involves the use of Hidden
Markov Models (HMMs), in which the PoS tagging task where  ∈  identifies the tag, i.e., a grammatical
is modelled as a sequence-to-sequence problem [13, 14]. label, of the -th word and is chosen from a specific tagset
HMMs are first trained on a language with large amounts  , and  is the set of vectors whose entries contain the
of annotated data, followed by a model that transfers the tag of each word in a sentence.
learned information to the target language of interest. In this work, from an application point of view, the
Diferent approaches that fill the gap in labeled data are problem of estimating the function  defined above is
based on adopting unsupervised learning techniques to interpreted as a classification problem, and therefore, it
group words within sentences, annotate them, and then is solved by training a specific classifier. Given a dataset
assign a label [15, 16]. Moreover, the problem of PoS tag- D = {s, t|s ∈ , t ∈  } that includes sentences and
ging is sometimes interpreted as a classification problem. their respective tags, the objective is to optimize the
paFor example, several works proposed to first train fully- rameters of a classifier so that it accurately assigns the
connected neural networks (FNNs) and long short-term correct grammatical tag to each word in a sentence.
memory (LSTM) models on annotations projected into
English and, subsequently, adapt them to the tags of the 3.2. Methodology Overview
target low-resource language [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">17, 18, 19</xref>
        ].
      </p>
      <p>
        The aforementioned works build upon resources from Figure 1 illustrates the workflow followed to develop the
other languages to create the PoS taggers; alternative Sardinian PoS tagger proposed in this study.
methods focus on optimizing the limited availability of
data for the target language to achieve equally good
results. An example is provided by a model that utilizes
translations of parts of the Bible to train PoS taggers by
aggregating tags from multiple annotated languages and
spreading them through word alignment within the text
[
        <xref ref-type="bibr" rid="ref4">20</xref>
        ]. Furthermore, diferent deep learning models have
been evaluated to build a PoS tagger for the Albanian
language [
        <xref ref-type="bibr" rid="ref5">21</xref>
        ], which is a low-resource language as well.
      </p>
      <p>
        To the best of our knowledge, no prior studies describe
a PoS tagger for the Sardinian language. Recent work
has introduced a linguistic resource designed to identify
semantic relationships between Sardinian words through
manual mapping of existing WordNet entries to Sardinian
word meanings [
        <xref ref-type="bibr" rid="ref6">22</xref>
        ]. However, this resource does not Figure 1: Workflow of the Sardinian PoS tagger.
include any tools for automatic linguistic annotation.
      </p>
      <p>The process consists of three main steps. In the first
phase (pre-processing), the available tagged data is
transformed and formatted adequately for use in the
subsequent steps. Once transformed, the data is split into two
parts: one part is used for training the model, and the
other for evaluating it. In the second step (fine-tuning ),
the model learns to accurately assign grammatical tags
to each word in a sentence based on the training data.</p>
      <p>Finally, in the third step (testing), the fine-tuned model
automatically annotates the test data, and standard
machine learning metrics are computed to evaluate how
well it has learned to assign tags to each word.</p>
      <p>
        Let us note that, as mentioned in the previous section,
tags must be chosen from a specific set  . In this work,
two diferent state-of-the-art tag sets will be considered,
i.e., the Universal Tags [
        <xref ref-type="bibr" rid="ref7">23</xref>
        ] (denoted as tag), and the
tagset, conceived for the Italian language, adopted in
the work of Palmero Aprosio &amp; Moretti [
        <xref ref-type="bibr" rid="ref8">24</xref>
        ] (denoted as
fineTag). The latter tagset is compliant with the
EAGLES standards [
        <xref ref-type="bibr" rid="ref9">25</xref>
        ] and also more fine-grained than the
former. Consequently, the pipeline depicted in Figure 1
is executed for each tagset separately.
      </p>
      <sec id="sec-3-1">
        <title>3.3. Data Pre-Processing</title>
        <sec id="sec-3-1-1">
          <title>In the context of minority languages, particularly the</title>
          <p>
            Sardinian language, it is challenging to find or utilize
data that enables the training of specific models. In our
scenario, to the best of our knowledge, the only available
dataset for the Sardinian language that allows us to
address a PoS tagging task is proposed by Mura et al. [
            <xref ref-type="bibr" rid="ref10">26</xref>
            ].
          </p>
          <p>
            The dataset consists of 1, 472 sentences in which each
word is annotated with both tag sets described in the
previous section. The sentences were extracted from
transcripts of interviews conducted with 21 native
Sardinian emigrants, each speaking a diferent variety of
Sardinian, as part of the Mannigos project [
            <xref ref-type="bibr" rid="ref11">27</xref>
            ].
          </p>
          <p>Figure 2 illustrates the distribution of the number of
words per sentence in the dataset. It is worth pointing out
that the term word in this context refers to any part of the
sentence, including punctuation. It can be observed that
most sentences contain a limited number of words, with
a significant portion not exceeding 100 words. Another
key aspect is the distribution of tags within the dataset.</p>
          <p>Ensuring a balanced representation of grammatical
categories allows the model to efectively learn each tag
from the two defined tag sets. Figure 3 illustrates this
distribution and highlights the overall balance level. Even
though the dataset appears to be heavily imbalanced due
to the natural linguistic structures that are common in
any language, it is noteworthy that all tag labels in the
considered tag sets are represented in the dataset.</p>
          <p>The development of the PoS tagger in this work is
based on fine-tuning the BERT language model. This
choice requires a careful data pre-processing phase,</p>
          <p>
            tag [
            <xref ref-type="bibr" rid="ref7">23</xref>
            ] 1,172 293 Even though the BERT model is multilingual, it does
fineTag [
            <xref ref-type="bibr" rid="ref8">24</xref>
            ] 1,177 295 not recognize minority languages like Sardinian.
However, the pre-trained BERT model has learned the
morphosyntactic behaviors of languages similar to Sardinian,
where the text is appropriately tokenized (i.e., divided such as Italian or Spanish. Consequently, a fine-tuning
into smaller units called tokens, which may consist of phase in which the BERT model identifies the primary
words, sub-words, or characters). For consistency, this characteristics of the Sardinian language can lead to a
process was performed using the BERT tokenizer, which high-performance PoS tagger for the Sardinian language.
employs the WordPiece technique. This latter breaks un- Given that the PoS tagging problem can be interpreted
known words into more common sub-word units, ensur- as a classification problem, the tuning phase of a token
ing that each token aligns with an entry in the BERT classification model can be interpreted as a supervised
vocabulary. Finally, each token is transformed into a nu- training phase, in which the model sees which tags are
asmerical identifier that BERT can process. To streamline signed to each part of speech. In this phase, it is therefore
processing, each sentence was standardized to a length essential to choose the appropriate loss function to
miniof 512 tokens by appending padding tokens as needed. mize during the tuning phase and the hyperparameters
          </p>
          <p>Following tokenization, the dataset was divided into to be input to the trainer to allow optimal learning. As for
two train sets and two test sets, one for each tagset, se- the former, the Cross-Entropy Loss function was chosen,
lecting 80% of the sentences for the first set and 20% for which, with the padding approach, takes the form:
the second. These pre-processing steps, along with the
removal of sentences containing missing or incorrect 1
tags, led to the data splits described in Table 1. (1)

ℒ = − ∑︀ ∑︁  · log(, )
=1  =1</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.4. Model Fine-Tuning</title>
        <sec id="sec-3-2-1">
          <title>The next step is to choose the appropriate model for the</title>
          <p>ifne-tuning phase. As a result of extensive, preliminary
empirical evaluations, the pre-trained BERT model in
its large-cased version was selected [12]. In more detail,
BERT is a deep learning model based on the Transformer
architecture developed by Google. Its special feature is
its ability to process context bidirectionally, i.e., by
simultaneously considering both the context to the left and the
right of a word, significantly improving performance in
the context of this work. It should be noted that, in this
study, BERT was implemented for token classification,
and the same architecture is used for both tagsets. For
token classification, BERT follows this structure:
• Input Embedding: Each token is transformed into
a vector representation that combines token
embeddings, i.e., the token representation, segment
embeddings, i.e., the sentence the token belongs
to, and positional embeddings, the position of the
token in the sentence.
• Transformer Layers: The network comprises 24
layers of this type, each using multi-head
attention mechanisms to model the relationships
between tokens.
• Output Layer: BERT returns a probability
distribution over all possible classes for each token.
The final output is a sequence of logits, with one
prediction for each token.
where:
•  is the total number of tokens;
•  ∈ {0, 1} is the mask that is 1 if the token  is
valid (not padding), 0 otherwise;
•  ∈  is the true class of the token ;
• , is the probability the model predicts for the
correct class .</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Note that the same loss function was used for models</title>
          <p>trained on both the tag and fineTag sets.</p>
          <p>Figure 4 shows the evolution of training loss, validation
loss, and validation F1 score over epochs for both models
during the tuning phase. These graphs were instrumental
in determining the optimal number of fine-tuning epochs
and in choosing other hyperparameters. Although all
three metrics were considered, particular attention was
paid to the validation F1 score, as it most directly reflects
the model’s ability to generalize on the classification task.
in which  , , and  are the same as defined in
Formula 1; while ˆ is the tag predicted by the model,  is
the size of the set  (i.e. the number of all possible tags),
and 1() is the indicator function, equal to 1 if condition
 is true, 0 otherwise.</p>
          <p>It is important to note that all metrics introduced vary
within a range between 0 and 1, with values closer to 1
indicating better performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <sec id="sec-4-1">
        <title>3.5. Model Testing</title>
        <sec id="sec-4-1-1">
          <title>This section is organized into two main parts. In the first</title>
          <p>Several metrics were used to evaluate model performance. part, we present the quantitative analysis of the models,
Given the classification nature, the four performance reporting and comparing their performance on the test
metrics used in this study are Accuracy, Recall, Precision, sets. These results allow us to evaluate the overall
efecand F1 score. Note that the last three metrics mentioned tiveness of each model in a rigorous and reproducible
were calculated in their macro version, considering the manner. The second part is dedicated to a brief
qualipresence of more than two classes to be evaluated. tative analysis, in which we examine selected examples</p>
          <p>These metrics allow us to assess how accurately the unobserved during the fine-tuning and testing phases.
PoS tagging models classify the various words in the This analysis aims to illustrate the models’ predictions in
sentence. In particular, they allow us to analyze both the practice, thus complementing the information obtained
model’s ability to identify all relevant classes (Recall) and from the quantitative evaluation.
its accuracy in avoiding false assignments (Precision),
providing an overall measure of the balance between 4.1. Quantitative Analysis
these two properties (F1 score). The following formulas
define the metrics in detail.</p>
          <p>Accuracy =
Precisionmacro =</p>
          <p>Recallmacro =</p>
          <p>=1  · 1( = ˆ)
∑︀
∑︀</p>
          <p>=1 
− 1
1 ∑︁</p>
          <p>=0   +  
− 1
1 ∑︁</p>
          <p>=0   +  
F1macro =
 =0</p>
          <p>− 1
1 ∑︁ 2 · Precision · Recall</p>
          <p>Precision + Recall</p>
          <p>Table 3 shows the performance of the two fine-tuned
BERT-based models on the test sets1. The first model,
finetuned on the coarser-granularity tagset (tag), achieves
an accuracy of 0.9418 and a macro F1 score of 0.9298, with
recall and precision scores of 0.9347 and 0.9250,
respectively. The second model, fine-tuned on the more detailed
tagset (fineTag), produces slightly lower but still good
results, with an accuracy of 0.9362, a macro F1 of 0.9291,
1While per-tag evaluation metrics could in principle ofer additional
insights, given also the large size of the tagset, we chose to focus on
overall metrics to maintain a clear and coherent narrative aligned
with the primary research questions. We consider a detailed
pertag analysis an important direction for future work, particularly in
application-specific settings where tag-level behavior is critical.
a recall of 0.9308, and a precision of 0.9274. These
results indicate that both models generalize to the test data
well. It is important to note that high performance is still
achieved even in the fineTag setting, which involves
a classification task with 36 PoS classes (the tag set
included 15 PoS classes). This observation highlights the
robustness of the fine-tuned models, demonstrating their
ability to handle more complex and fine-grained label
distributions without substantial performance loss. Notably,
these results are achieved despite the linguistic
variability within the dataset, which includes multiple Sardinian
language varieties with difering morphological features.</p>
          <p>Nevertheless, the models successfully capture the core
structural patterns of each variety, demonstrating strong
generalization across intra-language variation.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Qualitative Analysis</title>
        <p>could be the creation of an accessible user interface that
would make the PoS tagger usable by linguists,
scholars, and citizens not experts in computer science. Such a
tool could be integrated into digital platforms for
teaching, documentation, and linguistic research on Sardinian,
contributing to greater digitization and visibility of the
language.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <sec id="sec-5-1">
        <title>We acknowledge financial support under the National</title>
        <p>Recovery and Resilience Plan (NRRP), Mission 4
Component 2 Investment 1.5 - Call for tender No.3277 published
on December 30, 2021 by the Italian Ministry of
University and Research (MUR) funded by the European Union
– NextGenerationEU. Project Code ECS0000038 – Project
Title eINS Ecosystem of Innovation for Next Generation
Sardinia – CUP F53C22000430001- Grant Assignment
Decree No. 1056 adopted on June 23, 2022 by the Italian
Ministry of University and Research (MUR).</p>
        <p>Declaration on Generative AI</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Duong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verspoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          , P. Cook, dda, G. Fenu,
          <string-name>
            <given-names>L.</given-names>
            <surname>Frigau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giuliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Grassi</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M.</surname>
          </string-name>
          <article-title>What can we get from 1000 tokens? a case study Manca</article-title>
          , et al.,
          <article-title>Limba: An open-source frameof multilingual pos tagging for resource-poor lan- work for the preservation and valorization of lowguages</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on resource languages using generative models, arXiv Empirical Methods in Natural Language Processing preprint arXiv:2411.13453</source>
          (
          <year>2024</year>
          ).
          <source>(EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>886</fpage>
          -
          <lpage>897</lpage>
          . [29]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Podda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Balia</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Manca</surname>
          </string-name>
          , J. Martellucci,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cohn</surname>
          </string-name>
          ,
          <article-title>Learning when to trust distant su- L. Pompianu, A deep learning strategy for the pervision: An application to low-resource pos tag- 3d segmentation of colorectal tumors from ultraging using cross-lingual projection, arXiv preprint sound imaging, Image</article-title>
          and
          <string-name>
            <given-names>Vision</given-names>
            <surname>Computing</surname>
          </string-name>
          (
          <year>2025</year>
          ) arXiv:
          <fpage>1607</fpage>
          .01133 (
          <year>2016</year>
          ).
          <fpage>105668</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fang</surname>
          </string-name>
          , T. Cohn,
          <article-title>Model transfer for tagging low-</article-title>
          [30]
          <string-name>
            <given-names>R.</given-names>
            <surname>Saia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fenu</surname>
          </string-name>
          , L. Pompianu,
          <article-title>Influencresource languages using a bilingual dictionary, ing brain waves by evoked potentials as biometric arXiv preprint</article-title>
          arXiv:
          <volume>1705</volume>
          .00424 (
          <year>2017</year>
          ).
          <article-title>approach: taking stock of the last six years of re-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Ž. Agić</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Søgaard</surname>
          </string-name>
          ,
          <article-title>If all you have is search, Neural Computing and Applications 35 a bit of the Bible: Learning POS taggers for truly (</article-title>
          <year>2023</year>
          )
          <fpage>11625</fpage>
          -
          <lpage>11651</lpage>
          .
          <article-title>low-resource languages</article-title>
          , in: C.
          <string-name>
            <surname>Zong</surname>
            , M. Strube [31]
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Giuliani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Savona</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Carta</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Addari</surname>
          </string-name>
          , A. S. (Eds.),
          <source>Proceedings of the 53rd Annual Meeting of Podda</source>
          ,
          <article-title>Corporate risk stratification through an the Association for Computational Linguistics and interpretable autoencoder-based model</article-title>
          ,
          <source>Computers the 7th International Joint Conference on Natural &amp; Operations Research</source>
          <volume>174</volume>
          (
          <year>2025</year>
          )
          <fpage>106884</fpage>
          .
          <string-name>
            <surname>Language Processing</surname>
          </string-name>
          (Volume
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , As- [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nallakaruppan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Balusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Shri</surname>
          </string-name>
          , sociation for Computational Linguistics, Beijing,
          <string-name>
            <given-names>V.</given-names>
            <surname>Malathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharyya</surname>
          </string-name>
          ,
          <source>An explainable ai China</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>268</fpage>
          -
          <lpage>272</lpage>
          .
          <article-title>framework for credit evaluation and analysis</article-title>
          , Ap-
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>E.</given-names>
            <surname>Fetahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hamiti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Susuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Selimi</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. I. Saiti</surname>
          </string-name>
          ,
          <source>plied Soft Computing</source>
          <volume>153</volume>
          (
          <year>2024</year>
          )
          <fpage>111307</fpage>
          .
          <article-title>Neural network and transformer-based pos tag</article-title>
          - [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Carta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Podda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>M. ger for low resource languages, in: 2024 Inter- Stanciu, Explainable ai for financial forecasting</article-title>
          ,
          <source>national Conference on Information Technologies in: International Conference on Machine Learning</source>
          ,
          <source>(InfoTech)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . Optimization, and Data Science, Springer,
          <year>2021</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Angioni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tuveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Virdis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Lai</surname>
          </string-name>
          , M. E.
          <volume>51</volume>
          -
          <fpage>69</fpage>
          . Maltesi,
          <article-title>Sardanet: A linguistic resource for sar-</article-title>
          [34]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pisu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Elia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pompianu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Acquadinian language</article-title>
          ,
          <source>in: Proceedings of the 9th Global viva, S. Carta, Enhancing workplace safety: A flexWordnet Conference</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>412</fpage>
          -
          <lpage>419</lpage>
          .
          <article-title>ible approach for personal protective equipment</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Universal</surname>
            <given-names>pos tags</given-names>
          </string-name>
          ,
          <year>2014</year>
          -
          <fpage>2024</fpage>
          . URL: https:// monitoring,
          <source>Expert Systems with Applications</source>
          <volume>238</volume>
          universaldependencies.org/u/pos/. (
          <year>2024</year>
          )
          <fpage>122285</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Palmero Aprosio</surname>
          </string-name>
          , G. Moretti, Tint
          <volume>2</volume>
          .0: an all- [35]
          <string-name>
            <given-names>G.</given-names>
            <surname>Armano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giuliani</surname>
          </string-name>
          ,
          <article-title>A two-tiered 2d visual tool inclusive suite for nlp in italian, in: Proceedings for assessing classifier performance</article-title>
          ,
          <source>Information of the Fifth Italian Conference on Computational Sciences 463-464</source>
          (
          <year>2018</year>
          )
          <fpage>323</fpage>
          -
          <lpage>343</lpage>
          . Linguistics (CLiC-it
          <year>2018</year>
          ),
          <year>2018</year>
          . [36]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Podda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Balia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pompianu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carta</surname>
          </string-name>
          , G. Fenu,
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Eagles</surname>
          </string-name>
          part
          <article-title>-of-speech (pos) tag set,</article-title>
          <year>2014</year>
          -
          <fpage>2024</fpage>
          . URL: R. Saia, Cargram:
          <article-title>Cnn-based accident recognition https://www</article-title>
          .ilc.cnr.it/EAGLES96/home.html.
          <article-title>from road sounds through intensity-projected spec-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Mura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pisano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Carta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giuliani</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Manca, trogram analysis, Digital Signal Processing 147 The corpus of Sardinian emigrants:a tool for a quan-</article-title>
          (
          <year>2024</year>
          )
          <article-title>104431. titative approach to contact phenomena</article-title>
          ,
          <source>MiLES</source>
          <volume>:</volume>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Allouhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hamrani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rehman</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>El JaMinority Languages in European Societies - Inter- maoui, K. Jayachandran, Sustainable ai-based pronational Conference-Turin / Bard - BOOK OF AB- duction agriculture: Exploring ai applications</article-title>
          and
          <source>STRACTS, July 3-6</source>
          ,
          <year>2024</year>
          . implications in agricultural practices, Smart Agri-
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pisano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Piunno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ganfi</surname>
          </string-name>
          ,
          <source>Appunti per un cultural Technology</source>
          <volume>7</volume>
          (
          <year>2024</year>
          )
          <article-title>100416. corpus di sardo multimediale</article-title>
          , in: M. V. D. Marzo, [38]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , H. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luo</surname>
          </string-name>
          , S. Pisano (Ed.),
          <article-title>Per una pianificazione del plurilin- C. Zhang</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Du-bus: a realtime bus waiting guismo in Sardegna</article-title>
          , Condaghes,
          <year>2022</year>
          , pp.
          <fpage>147</fpage>
          -
          <lpage>164</lpage>
          .
          <article-title>time estimation system based on multi-source data,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Carta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chessa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Contu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Corriga</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Dei- IEEE
          <source>Transactions on Intelligent Transportation Systems</source>
          <volume>23</volume>
          (
          <year>2022</year>
          )
          <fpage>24524</fpage>
          -
          <lpage>24539</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>