<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Profiling Anonymous Authors in the Corsican Autonomist Press of the Interwar Period</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vincent Sarbach-Pulicani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Université Côte d'Azur, Centre de la Méditerranée Moderne et Contemporaine</institution>
          ,
          <addr-line>Campus Carlone, 06100 Nice</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>78</fpage>
      <lpage>99</lpage>
      <abstract>
        <p>With the emergence of nationalism in the 1t9h century came regionalist movements to assert and claim cultural particularities. Corsica 昀椀tted very well within this dynamic and even presented itself as a favourable location for the development of such ideas. The centralization of the state around a strong capital and the policies of assimilation of the indigenous populations on the border with France led certain players to defend these particularisms. It was in this context that the Corsican autonomist newspaper A Muvra was born in May 1920 in Paris, under the impetus of Petru and Matteu Rocca. For almost 19 years, hundreds of authors participated in the writing of this massive dialectal work. This paper presents the results of a research that aimed to carry out author pro昀椀ling, i.e., to determine the style and subjects covered by an author. The goals of this study were to determine the identity behind certain authors and also to highlight the role pseudonyms played in the newspaper's propaganda. We conducted authorship attribution to achieve the 昀椀rst objective before completing these analyses with topic modelling in order to meet the second one.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;stylometry</kwd>
        <kwd>topic modelling</kwd>
        <kwd>corsican studies</kwd>
        <kwd>under-ressourced languages</kwd>
        <kwd>computational history</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        a desire in Corsica to structure and study the evolution of the use of the Corsican language. We
can notably mention the work of the linguist Marie-José Dalbera-Stefanaggi with hNeoruvel
atlas linguistique et ethnographique de la Corse. In the republications of this major work in the
2000s, the author incorporated her work on the creation oBfaanque de Données Langue Corse
(BDLC).1 This is the 昀椀rst initiative to lemmatise the Corsican language in its diachrony and
diatopy.2 Since the second part of the 2010s, there has been a signi昀椀cant increase in scholars’
thoughts on the tooling of regional languages using NL1P6[]. Our approach is fully in line
with this state of the art. This paper presents is the continuation of a master’s thesis written
as part of a double degree programme between the École nationale des chartes of Paris and
the Università di Pisa 2[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. It follows a 昀椀rst thesis that highlighted the major ideological
di昀erences between the corsists and the irredentists, despite their obvious proximit2y5.][It resulted
in the creation a database namedAutonomists/Irredentists Database (A/I database).3. This work
establishes that if the corsists admitted to being part of a common cultural and linguistic entity
with Italy, they did not share the same desire for political uni昀椀cation, even if some autonomists
came closer to Fascist ideas just before the beginning of the Second World War.
      </p>
      <p>Like any political pressA, Muvra has a large number of anonymous authors writing under
pseudonyms. While the possibility of individual authors exists, there is also a good chance that
these pseudonyms are the result of recurring authors of the journal publishing under their real
names. This raises a number of questions about the identity of these anonymous authors as
well as the role that a corsist gives to one or more of his pseudonyms. Several preliminary
hypotheses can be proposed at this stage, including the deliberate exaggeration of the number of
activists, the need for protection against censorship, or the desire to express varying viewpoints.
In order to address these inquiries, we will employ two distinct analytical methods. First, we
will utilise stylometry to unveil the identities of anonymous authors, and secondly, we will
apply topic modelling to gain insights into the themes associated with these pseudonyms.
Subsequently, we will engage in an interpretive phase to discern the purpose and characterization
an author assigns to their pseudonym. These dual layers of analysis ultimately encapsulate the
concept of author pro昀椀ling, as previously discussed. The analyses and results are all available
on a GitHub repository dedicated to this researc2h6[].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Datasets construction: starting from scratch</title>
      <sec id="sec-2-1">
        <title>2.1. The OCR processing</title>
        <p>The main issue surrounding the analysis of such a review is the accessibility of the data. In
order to carry out the analyses, the data had to be acquired from the digitised images of the
newspapers. Segmentation and OCR presented signi昀椀cant challenges, as well as postprocessing
and normalisation (see an example of a front page with Figu1r2e). We were able to locate two
online platforms where our documents are available for download. The images come from two
sources: theBibliothèque nationale de France (BnF) and the Archives départementales de Corse
1https://bdlc.univ-corse.fr/bdlc/corse.php
2This database, which includes a wide range of possibilities, was created on the basis of a vast and particularly
impressive 昀椀eld survey.
3https://heurist.huma-num.fr/heurist/?db=vsp_presse_corsiste_irredentiste
du Sud (ADC). So we used Gallica, the digitization platform of the BnF, and THOT, the platform
of the ADC.4 The fact that these are national institutions means that the digitizations are in
the public domain, i.e., open source. A昀琀er the phase of webscrapping, we got a collection of
375 issues of the Muvra, i.e., approximately 1500 pages from 1921 to 1931.</p>
        <p>
          One of the problems with having images from two di昀erent sources is the quality of the
images. This raises the question of whether or not it is appropriate to normalise and clean
images in order to facilitate OCR processing. The original idea of our research was to clean the
documents using binarization with the Otsu method1[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] followed by a despeckling phase. The
“speckling” is a type of noise that corresponds to random clusters of black pixels that impair the
intrinsic quality of a binarized image12[]. However, the quality of the digitizations, especially
from theArchives départementales, varies greatly. While sharpness is not the main problem, it is
more a question of stains on the paper or pages damaged by time. This is an inherent problem
in the conservation of old newspapers; paper is cheap and not made to last over time. The
conservation of these documents is therefore di昀케cult, and this is re昀氀ected in the quality of the
digitization. Standardising all the images at the same time requires an initial sorting organised
according to identical layouts for a gain in OCR quality that is not necessarily guaranteed. So
we decided to prefer quantity over quality, even if the normalisation would occur on the raw
data.
        </p>
        <p>
          One of the major challenges in the world of automatic character recognition today is the
segmentation of newspapers. Their complex layout requires the training of complex models
that are o昀琀en speci昀椀c to a type of newspaper. We decided to train a Kraken segmentation model
from the XML 昀椀les in ALTO format available on Gallica, with the help of the eScriptorium
platform 1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and the module ketos. Once the ALTO 昀椀les were adapted to the good format, we
could train the model to segment the images coming from tAherchives départementales de Corse
du Sud. In order to improve the model, it was necessary to use the tool YALTA8]i d[eveloped
by Thibault Clérice, which allows the use of YOLOv153[], an Ultralytics object detection model,
to be adapted for training segmentation models with Kraken. For the text recognition phase,
we decided to go for Tesseract-OCR, which includes a Corsican model. We needed to create
UZN 昀椀les readable by this engine in order to follow the coordinates of the image (Figu1)r.e
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data standardisation</title>
        <p>Once we got our raw textual data, we had to classify them according to their language, typology,
and author. Then we could perform the cleaning of the textual data, carried out in four main
stages:
• The removal of punctuation
• Case reduction
• Normalization of syntax
• Elimination of accents</p>
        <p>
          The most delicate phase in our methodology is the normalisation of the syntax. It is
important because, for euphonic reasons, contractions occur in written form in the form of elisions,
4https://gallica.bnf.fr/accueil/en/content/accueil-?emnode=desktop| http://archives.isula.corsica/Internet_THOT
/FrmSommaireFrame.asp
which re昀氀ect the discourse practices of speakers of Corsican. For example, the expressios’nè
ellu hè (“if he is”) becomess’ell’è in writing. Inversely, restoring the original form of the elision
requires taking into account the context of gender and numbeerl:l’ can give ellu, ella, elli, or
elle. There is also the question of the normalisation rule: should we base ourselves on the
syntax of the 20th century or on the current one? Moreover, a certain number of ambiguities can
creep into such a correction, such as the wored, which, depending on the context, can mean
either “the” or “and”. We should not forget to take into account that Corsican islaan“gue par
élaboration” or Ausbau language and that, consequently, the syntax has a complexity due to
the distinct instantiations according to the authors. In sociolinguistics, this type of language
is a variant of a structured language (such as Italian) and set up as a distinct elaborated
language [
          <xref ref-type="bibr" rid="ref25">29</xref>
          ].
        </p>
        <p>The issue of data normalisation is particularly delicate due to the very nature of our
methodology. While topic modelling does not include function words in the analysis because they
are meaningless words, stylometry relies mainly on all types of most frequent words. Indeed,
to what extent should we normalise the data? Do we lose information if we normalise the
syntax of certain terms, or do we gain information? The choices that have been made are
recorded in the Python 昀椀le dedicated to data cleaning. This is nevertheless an important bias
for our analysis. Fortunately, the regiolectal diversity of the Corsican language means that the
idiomatic features of the authors are characterised by the great variety of the function words
used. A thorough normalisation should not alter our analysis too much, even if it constitutes
an improvement perspective for our study.</p>
        <p>In the end, we obtained a total of 3 corpora of di昀erent sizes with a total of almost 1.5 million
words (Table1), with approximately 56.7% of the words in Corsican, 27% in French, and 16.3%
in Italian. The main point of improvement in this method of extracting textual data is the
balancing of the corpus. While we were able to obtain almost all the articles in the issues on
Gallica, the issues on THOT were selected according to our needs, given the variation in the
quality of the images. However, this is still quite su昀케cient for the type of analysis we are
carrying out, focusing on a certain number of authors. Details of the samples selected for this
study can be found in the appendix (Table4).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method proposed: two layers of analysis</title>
      <p>
        The last advances in stylometry have been made with the use of machine learning algorithms.
Recent examples include the work of Jean-Baptiste Camps and Florian Ca昀椀ero, who used SVM
classi昀椀er algorithms to identify the authors of the American conspiracy foruQmAnon [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This
means that we can now tackle the question of the statistical units to be analysed with our
algorithms, whether using machine learning techniques or distance metrics. The two French
researchers chose to work on character 3-grams because of the “increase robustness”, they are
“known to reduce sparsity and perform well in attribution studies”. In reality, the features to be
analysed vary according to the nature of the corpus and the quality of the data. One example is
the measurement of verses in poetic works to measure an author’s styl3e] a[nd even the rhymes
in mediaeval texts like Mike Kestemont did in 20121[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. For the previous thesis, we managed to
compare the results obtained with the SVM with a metric distance, the Delta score as de昀椀ned by
John Burrow in 20024[], in order to con昀椀rm them considering the limited length of the corpora.
The objective of this double layer of analysis was to con昀椀rm the results and determine the best
possible approach for our corpus. This paper will focus on the machine learning approach, but
the results obtained with Burrow’s Delta that con昀椀rmed the SVM methods are available on the
GitHub repository2[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The script being used is the SuperStyl one developed by Jean-Baptiste
Camps in 2021 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Whatever the authors and pseudonyms tested, we excluded poetic texts
part in prose or verse from the stylometric due to their speci昀椀cities.
      </p>
      <p>It is very important to vary the hyperparameters available to us in order to optimise
machine learning. To do this, the SuperStyl algorithms allow us great 昀氀exibility in the options
to be taken into account. A昀琀er various tests presented in the benchmark (Table6), we chose
those parameters: the statistical units are the most frequent words; we apply the
PCPArin(cipal Component Analysis) for dimensional reduction; the cross-validation is carried out with
the “Leave-One Out” method; and we balance the dataset with the ”upsampling”. This
technique consists of isolating a portion of our minority corpus and sampling an equal number of
examples from the majority class, as explained by Joseph Barr in 20222].[Once the model has
been trained, we apply it to the unseen data. In view of the large number of candidates for
the second experiment, we initially subdivided them into two groups in order to obtain more
precise results before carrying out an analysis on the whole corpus.</p>
      <p>
        Concerning topic modelling, the LDALa(tent Dirichlet Allocation) is a method based on a
term-document matrix. This method is based on the assumption that “documents are
represented as random mixtures of latent topics, where each topic is characterised by a distribution
of words”. The LSI L(atent Semantic Indexing), on the other hand, consists of creating a
semantic space based on a corpus in which similarities between words or documents are calculated
on a statistical scale. Each of these methods has its own advantages and disadvantages that
need to be taken into account, hence the importance of the notion of comparability inherent in
our study[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In 2020, a group of researchers set out to compare the two methods by training
them on a corpus of BBC articles1[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The results of their research revealed that LSI is more
e昀ective when dealing with a large amount of data and fewer iterations than LDA, while the
latter is more suitable for smaller corpora. The idea is to present here the most interesting
results with an empirical observation of the results obtained as a form of intrinsic evaluation.
In the long term, implementing more e昀ective evaluation metrics such as coherence would
be very relevant, even if it is not necessary in our case, given that we are modelling general
themes rather than assigning a label to each article. To do so, we used the Gensim package
for Python, which o昀ers wide possibilities for performing both LSI and LDA techniques. The
di昀erent experiments presented in the appendix, along with the hyperparameters and
methods used, are detailed in the summary table (Tab5l)e. Table 8 serves as a glossary containing
pertinent words that were modelled in the course of the experiments.
      </p>
      <p>
        The vocabulary plays an essential role in topic modelling. The words chosen to be taken into
account in topic modelling must not be too numerous, as training the model can be extremely
time-consuming. The number of documents and the vocabulary chosen will therefore play a
central role among the various biases to be applied. Unlike stylometry, function words are
of no interest because they are considered to be empty words, i.e., words without a signi昀椀cant
meaning but serve to add details to the sentence1[]. We had to create a speci昀椀c list of stopwords
for our Corsican corpus (Tabl7e) due to the absence of a basic language toolkit1[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The list
creation process occurred in two phases: initially, it involved comparing it with an Italian
list that contained overlapping stopwords with Corsican. Following that, it consisted of the
examination of various corpora, including thMeuvra dataset. This examination led to the
identi昀椀cation of the most frequent words, followed by a selection between stopwords. The
idea is therefore to remove them in order to reduce the vocabulary. But there is also the case of
hapax or infrequent words, as well as frequent words that are not stopwords, suchcoarssi“ca” in
this case. One solution is to include the notion of statistical entropy in the choice of vocabulary
as presented by Susan Dumais in a 1992 article1[0] with the following formula:
      </p>
      <p>In this equation,ndocs represents the number of documentst,f is the frequency of the term
i in the documentj, and gf is the overall frequency of the ter mi. The idea is to calculate the
entropy of each word in the corpus and to select vocabulary within a de昀椀ned interval.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. The two pseudonyms chosen</title>
        <p>The aim is to test our methodology on two di昀erent pseudonyms. The 昀椀rst,P. di B., allows
us to check the reliability of our tools on a relatively small corpus in Corsican by con昀椀rming
the identity of the author. The secondA,ltore, gives us the opportunity to test these tools on a
completely unknown author, leaving us free to interpret and choose the candidates.</p>
        <p>The pseudonymP. di B. is a name that appears fairly regularly in the writings of tMheuvra.
A number of articles were published under this pseudonym, and it is generally accepted that
it is actually Petru Rocca, as mentioned by Carmine Starace in the pages of hBiisbliogra昀椀a
della Corsica [28]. This pseudonym is believed to be the initials of his mother’s surname, Maria
Saveria Rocca-Pozzo di Borgo. The latter had remained very close to her sons Petru and Matteu,
even publishing drawings in theMuvra. Con昀椀rming the writings of contemporary actors from
this period also makes it possible to verify the rigour of their anthological work. It is also an
excellent way of testing our methodology in a more or less reliable setting.</p>
        <p>The other pseudonym seen in this paper isAltore. It is directly inspired by the lake of the
same name in the Asco valley, in the old Cacciapieve within the region of the same name.
Altore is the author ofLettere aiaccine, the letters from Ajaccio, which o昀琀en appeared on the
front page of the newspaper. In this format, he covers all the subjects of society and politics
in general in an open, family-friendly letter format. Our corpus contains 62 of these letters, all
written in the Corsican language. The di昀케culty with this part of our study is that we have no
information or clues about the real author behind this pseudonym. Nevertheless, its presence
on the front pages of many issues at least testi昀椀es to the importance attached to this particular
section and therefore to its author.</p>
        <p>Concerning the candidates, apart from Petru Rocca, who seems obvious to include in the
analysis given the information we provided earlier, we decided to choose two other potential
authors. The 昀椀rst is Martinu Appinzapalu, a pseudonym of the Corsican priest Dumenicu
Carlotti and symbol of the religious aspect of the insular’s autonomist struggle at the time,
who published numerous articles throughout the paper’s existence and was part ofPtahretitu
Corsu d’Azione, the political party attached to theMuvra. The second is Marcellu Alessandri
di Chidazzu, one of the authors most involved in the writing and a fervent defender of the
irredentist cause.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. First experiment: P. di B.</title>
        <p>The evaluation of the trained model is presented in the tab2l,ewe got an accuracy of0.95. We
then obtain a 昀椀le with the predictions of the author of the articles and the results of the decision
function that “tells us how close each sample is to the hyperplane separating each clas5s]”. A[
negative value means that the sample is outside; a positive value means it is inside. The higher
the score, the greater the probability that this sample has been written by the candidate. By
applying this function to our study, we get the 昀椀gure2. We have also added the identi昀椀ers of
articles written byP. di B. whose authorship has not been attributed to Petru Rocca. On the
whole, however, almost all the articles were attributed to Petru Rocca. Of the 34 articles in the
test corpus, 26 are attributed to the director of thMeuvra, i.e., 76% of them. But what is even
more interesting to study is the behaviour of the curves on the decision function graph. On
average, the decision function scores are much higher for Petru Rocca’s texts.</p>
        <p>Petru Rocca is an expert in this 昀椀eld, as nearly 昀椀ve di昀erent identities are attributed to him
in the various anthologies and studies carried out on him. We 昀椀nd his signature, Petru Rocca
or Pierre Rocca, and the pseudonymPsasquale Manfredi, P. di B, and P. di C. In view of the
stylometric results, we can assume that these various identities attributed to him are indeed
his own. In order to optimise the performance of our stylometry models, several parameters
need to be taken into account, such as the number ofk topics, iterations, words, and passes.
Petru Rocca writes mainly in Corsican, although he does leave an important place for French.
He also writes a little in Italian, but there are too few texts to be relevant. If we can reference 139
articles written by Rocca in total, we performed the LDA on sub-corpora according to language
and pseudonym (Figures4, 5, 6, 7, 8). It is important to note that for reasons of data quantity,
we have grouped together in the same sub-corpus the texts signed by Petru Rocca and Pierre
Rocca as well as the texts signed byP. di B. and P. di C. We assume that these have the same
utility, but this is obviously a point to be improved in further analyses of the question.</p>
        <p>The pseudonyms seem to allow Petru Rocca to evoke a wider spectrum of speci昀椀c subjects
that remain around political and cultural current a昀airs. Similarly, the use of language doesn’t
seem to be part of any attempt to separate themes, with French and Corsican acting more as a
complement to each other, even if the local dialect seems to be used more to address cultural
notions. How then to explain the use of several pseudonyms to express himself in his own
newspaper? Let’s not forget that he is in fact the director of thMeuvra. This can be attributed
to propaganda objectives. Indeed, even though there are a large number of contributors, there
are very few who are really involved in the corsist struggle over the long term. For Rocca,
it would be a question of in昀氀ating the numbers of contributors a little in order to get a more
substantial core of regular authors to appear. It’s not all ideology, and there are sometimes
simpler justi昀椀cations to understand the muvrists’ approach. This reason can also be seen in the
public demonstrations organised by the autonomists. Thus, in 1934, a number of participants
are mentioned in the sixth edition of themerendelle d’i pueti còrsi.5 The list includes Dumenicu
Carlotti, Eugeniu Grimaldi, Petru Rocca, and a certain Pasquale Manfredi.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Second experiment: Altore</title>
        <p>In the same way as we con昀椀rmed Petru Rocca’s authorship of the texts ofP. di B., we carried
out the stylometric analysis of those oAf ltore using the SVM classi昀椀er. For the candidates,
we chose a wide range of possible authors among the most important ones in tMheuvra. For</p>
        <sec id="sec-4-3-1">
          <title>VINCIGUERRA PIAZZOLI ALESSANDRI VERSINI</title>
          <p>CARLOTTI</p>
          <p>ROCCA
NOTINI
GIANVITI
macro avg
weighted avg
this experiment, the 昀椀rst sub-group mentioned above was made up of Ghjanettu Notini, Victor
Gianviti, Dumenicu Antone Versini and Marcellu Alessandri. The second was made up of
Simon’Ghjuvanni Vinciguerra, Orsu Francescu Piazzoli, Petru Rocca and Dumenicu Carlotti.</p>
          <p>We thus obtain an accuracy of about0.86 and a model quite good, as seen on the tabl3e.
This test bears witness to another important aspect of stylometry that has not yet really been
addressed in this paper: the notion of corpus size as a function of the number of candidates.
This echoes the article by Eder Maciej published in 2015 at Oxford Universi1ty1][where he
stated that “the e昀ectiveness of attribution depends on corpus size and particularly on the
number of authors tested”.</p>
          <p>The results of the decision function (Figur3e) show us that Ghjanettu Notini is the most
likely candidate among the panel of candidates. But stylometry, like any computational method
used in the 昀椀eld of digital humanities, also requires more in-depth research with “close reading”.
Numbers are not proof. Ghjanettu Notini was born on December 4, 1890, in San Petru di Venacu,
in the oldpieve of Venacu in Corsica’sCurtinese region. Interestingly enough, this region of
central Corsica is relatively close to Lake Altore. He was a Corsican poet and writer who
contributed for many years to theMuvra under the pseudonymU Sampetracciu. Nicknamed
the “Corsican Molière”, according to Ghjacumu Thiers, he was the founder of Ttheaetru corsu
di A Muvra in the early years of the newspaper and a loyal contributor.</p>
          <p>We can notice certain terms that come up frequently on the wordcloud that visualises the
results of topic modelling oAnltore (Figure9), such as “corsu” or “corsica”. This brings us
faceto-face with our vocabulary selection methodology. These words are very frequent but remain
essential in the context of a Corsican autonomist newspaper. Nevertheless, certain trends stand
out, with political issues omnipresent in theslettere aiaccine. In particular, there is the notion
of the French politician and industrialist Paul Lederlin, who was elected Senator for Corsica in
1930. ForU Sampetracciu (Figures10, 11), we see that the plays written by Ghjanettu Notini are
particularly dominant in the detection of topics. This can be seen thanks to the large number of
椀昀rst names, typical of the theatrical style, which incorporates a lot of dialogue. Other elements
highlight this, such as the presence of the onomatopoeiaA“h” or the term “scena” (scene). We
can also observe the poetic dimension of Notini’s work with Topic 3 of the LDA: we 昀椀nd there
the lexical 昀椀eld typical of Corsican poems with the importance of them“amma” (mother).</p>
          <p>It seems fairly obvious that the Corsican author seems more inclined to evoke political and
topical themes with the pseudonym. He does this in a very particular literary style, that of
the open letter, which corresponds quite well to Notini’s great talent for writing. However,
Notini did not hesitate to raise these intrinsically political issues in his plays. Likewise, his
poetry does not appear to be a simple ode to the beauty of Corsica but a complete reworking
of the island’s poetic traditions through the prism of thlaementu, a poetic style cherished by
the muvrists.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Further research</title>
      <p>
        While this research shows promise, it is important to acknowledge its limitations, which are
closely intertwined with its strengths. In the long run, it would be pertinent to develop a
dedicated OCR model for recognising printed Corsican text. Additionally, exploring the possibility
of 昀椀ne-tuning the segmentation model to enhance its e昀ectiveness holds signi昀椀cant potential.
This article has highlighted the constraints of using topic modelling techniques, which may
not be the most suitable approach for detecting word characteristics. Considering this,
alternative methods like frequency-based analysis could be more appropriate, given our knowledge
of the speci昀椀c vocabulary found in the Muvra dataset. Moreover, the time invested in
removing stopwords might have been unnecessary, as demonstrated by the experiments conducted
by Alexandra Scho昀케eld and her colleagues 2[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Lastly, in terms of stylometric analysis, it is
essential to conduct it on the entire newspaper corpus to validate the obtained results, and this
should coincide with a more careful selection of candidates for analysis.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The dual nature of pseudonym usage can also be clari昀椀ed by considering how we employ it.
An identity can be used to evoke more sensitive subjects that we wouldn’t discuss without it.
Ghjanettu Notini makes no secret of the fact that he isU Sampetracciu when he writes his plays
and poetry. Even if he tackles speci昀椀c political themes, he never goes too far and e昀ectively
protects himself from criticism behind his dramatic work. But it’s thanks to his hypothetical
identity as Altore that Notini can really express his intentions, with more assertive political
discourse and fewer 昀椀lters. On the contrary, the use of a pseudonym may not have a purely
ideological role but a more propagandist one, as in the caseP.odfi B for Petru Rocca.</p>
      <p>Studying a weekly newspaper spanning almost 20 years represents a real technical challenge
that forces us to make choices. Confronted with the intricate nature presented by the numerous
metadata within our dataset, we had to make choices and apply biases in order to obtain an
overview of what computational methods can o昀er in the study of such a corpus. It would
be possible to perform a stylometric analysis on all anonymous authors or topic modelling
on every combination of articles, but it would be time-consuming and represent a possible
improvement to this research. In addition to determining the authorship of certain pseudonyms
and the role of others, the question was also to work on an under-resourced language. The aim
is to encourage this type of study in areas other than pure linguistics, as can be done at the
Università di Corsica. While the complexity of the subject is a fact, it does not prevent us from
obtaining coherent and promising results for the future. With better preparation of the data,
as part of a broader project that would include more resources to allocate to the research, this
subject has a lot of potential.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>I would like to thank Jean-Baptiste Camps and Alessandro Lenci for their supervision of this
research. Although this paper is the conclusion of a two-year dissertation, it is also the fruit of
cooperation with several researchers, including Angelo Mario Del Grosso and Federico Boschetti,
members of the CNR Pisa.</p>
      <sec id="sec-7-1">
        <title>Downsampling</title>
        <p>Downsampling
Downsampling</p>
        <p>Upsampling
Downsampling
Downsampling</p>
        <p>Upsampling
Downsampling
Upsampling
Upsampling
Upsampling
Upsampling</p>
        <p>Upsampling
Downsampling
Upsampling
Upsampling
Upsampling
Word
a昀aire
a昀are
ami
amore
article
babbu
barbare
bien
canta
centrale
chemin
concours
confrere
contre
core
corse
corsu/a/e/i</p>
        <p>croce
cumitatu
cummissione
cumpagnu
cumpare
directeur
droit
elettori
esprit
fede
federazione
fonctionnaire
français
francese
franchi
francia</p>
        <p>fuir
gauche
giurnale
gouvernement
guerra
guvernu
histoire
honneur
ile
isula
italie
italien/ne
jente
jeune
jornu
jour
legge
liberta
lingua
french
corsican
french
corsican
french
corsican
french
french
corsican
corsican
french
french
french
french
corsican
french
corsican
corsican
corsican
corsican
corsican
corsican
french
french
corsican
french
corsican
corsican
french
french
corsican
corsican
corsican
french
french
corsican
french
corsican
corsican
french
french
french
corsican
french
french
corsican
french
corsican
french
corsican
corsican
corsican</p>
        <p>Word
manu
marseglia
matrimoniu
megliu
merre
ministru
minuranze
moda
mondu
monsieur
nasitortu
naziunale
oghie
omu
paese
parigi
parti
passager
patrie
pays
poetes
politique
populu
postal/aux
presse
prete
prima
primavera
prisidente
prix
projet
prova
pueti
pulitica
raghione
razza
sangue
santu/a
scena
separatisti
sgio, scio
sicondu
stampa
statu
surete
teatru
temps
varghiolu
vergogna</p>
        <p>vita
vitesse
vole</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Arun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Suresh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. V.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          . ““
          <article-title>Stopword graphs and authorship attribution in text corpora””</article-title>
          .
          <source>In2:009 IEEE international conference on semantic computing. Ieee</source>
          .
          <year>2009</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>196</lpage>
          . doi:
          <volume>10</volume>
          .1109/icsc.
          <year>2009</year>
          .
          <volume>101</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Barr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sobel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Thatcher</surname>
          </string-name>
          . “
          <article-title>“Upsampling, a comparative study with new ideas”</article-title>
          .
          <source>In: 2022 IEEE 16th International Conference on Semantic Computing (ICSC)</source>
          .
          <year>2022</year>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>321</lpage>
          . doi:
          <volume>10</volume>
          .1109/icsc52841.
          <year>2022</year>
          .
          <volume>00059</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Beaudouin</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Yvon</surname>
          </string-name>
          . ““
          <article-title>Contribution de la métrique à la stylométrie””.AIcnt:es des 7èmes Journées Internationales d'Analyse Statistique des données textuelles (JADT)</article-title>
          . Vol.
          <volume>1</volume>
          .
          <year>2004</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>118</lpage>
          . url: https : / / imt . hal . science / file / index / docid / 741596 / filename /JADT%5C%
          <article-title>5F133%5C%5FBeaudouinYvonDef20030116.pd</article-title>
          .f
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burrows</surname>
          </string-name>
          . ““'Delta'
          <article-title>: a measure of stylistic di昀erence and a guide to likely authorship””</article-title>
          .
          <source>In: Literary and linguistic computing 17-3</source>
          (
          <year>2002</year>
          ). doi:
          <volume>10</volume>
          .1093/llc/17.3.267. url: https://a cademic.oup.com/dsh/article-abstract/17/3/267/92927. 7
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ca昀椀ero and J.-B. Camps</surname>
          </string-name>
          . “
          <article-title>“Psyché'as a Rosetta Stone? Assessing Collaborative Authorship in the French 17th Century Theatre””</article-title>
          .
          <source>InP:roceedings of the Conference on Computational Humanities Research</source>
          <year>2021</year>
          . Vol.
          <volume>2989</volume>
          . Ceur-ws.
          <year>2021</year>
          , pp.
          <fpage>377</fpage>
          -
          <lpage>381</lpage>
          . url:http://star.i nformatik.rwth-aachen.de/Publications/CEUR-WS/Vol-
          <volume>2989</volume>
          /long%5C%
          <fpage>5Fpaper51</fpage>
          ..pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ca昀椀ero and J.-B. Camps</surname>
          </string-name>
          . ““
          <article-title>Who could be behind QAnon? Authorship attribution with supervised machine-learning””</article-title>
          . Ina:rXiv Cornwell University abs/2303.
          <year>02078</year>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/arXiv.2303.
          <year>02078</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.-B.</given-names>
            <surname>Camps. SUPERvised STYLometry (SuperStyl)</surname>
          </string-name>
          .
          <source>Version 0.9.0</source>
          .
          <year>2021</year>
          . url:https://github .com/SupervisedStylometry/SuperSty.l/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>ChauhanY.ALTAi</surname>
          </string-name>
          , You Actually Look Twice At it.
          <source>Version v0.0.1rc4</source>
          .
          <year>2022</year>
          . url: https://github.com/PonteIneptique/YALTA.i
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Cvitanic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. I.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Rosen</surname>
          </string-name>
          . ““LDA v.
          <article-title>LSA: A comparison of two computational text analysis tools for the functional categorization of patents””</article-title>
          .
          <source>In: International Conference on Case-Based Reasoning</source>
          .
          <year>2016</year>
          . url: https://par.nsf.gov/biblio/1 0055536.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>S. Dumais. “</surname>
          </string-name>
          <article-title>Enhancing performance in latent semantic indexing (LSI) retrieval”</article-title>
          .
          <year>1992</year>
          . url: http://www2.denizyuret.com/ref/dumais/Enhancing%5C
          <source>%5FLSI%5C%5F%5C%5F%5C%5 FDumais%5C%5F1991.pdf.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eder</surname>
          </string-name>
          . ““
          <article-title>Does size matter? Authorship attribution, small samples, big problem””</article-title>
          .
          <source>In: Digital Scholarship in the Humanities 30.2</source>
          (
          <issue>2015</issue>
          ), pp.
          <fpage>167</fpage>
          -
          <lpage>182</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqt066.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>url: https://academic.oup.com/dsh/article-abstract/30/2/167/39073. 8</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Fracastoro</surname>
          </string-name>
          , E. Magli, G. Poggi,
          <string-name>
            <given-names>G.</given-names>
            <surname>Scarpa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Valsesia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Verdoliva</surname>
          </string-name>
          . ““
          <article-title>Deep learning methods for synthetic aperture radar image despeckling: An overview of trends and perspectives””</article-title>
          .
          <source>In: IEEE Geoscience and Remote Sensing Magazine 9.2</source>
          (
          <issue>2021</issue>
          ), pp.
          <fpage>29</fpage>
          -
          <lpage>51</lpage>
          . doi:
          <volume>10</volume>
          .1109/mgrs.
          <year>2021</year>
          .
          <volume>3070956</volume>
          . url: https://ieeexplore.ieee.org/document/941674. 0
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>G. Jocher.</surname>
          </string-name>
          <article-title>YOLOv5 by Ultralytics</article-title>
          .
          <source>Version 7.0</source>
          .
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.3908559. url: https://github.com/ultralytics/yolo.v5
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kalepalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tasneem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. D. P.</given-names>
            <surname>Teja</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Manne</surname>
          </string-name>
          . “
          <article-title>“E昀ective comparison of LDA with LSA for topic modelling””</article-title>
          .
          <source>In2:020 4th International Conference on Intelligent Computing and Control Systems (ICICCS)</source>
          .
          <source>Ieee</source>
          .
          <year>2020</year>
          , pp.
          <fpage>1245</fpage>
          -
          <lpage>1250</lpage>
          . doi:
          <volume>10</volume>
          .1109/iciccs48265.
          <year>2020</year>
          .
          <volume>9</volume>
          120888. url: https://ieeexplore.ieee.org/abstract/document/91208.88 [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kestemont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Sandra</surname>
          </string-name>
          . ““
          <article-title>Robust rhymes? The stability of authorial style in medieval narratives””</article-title>
          .
          <source>InJ:ournal of Quantitative Linguistics 19-1</source>
          (
          <year>2012</year>
          ), pp.
          <fpage>54</fpage>
          -
          <lpage>76</lpage>
          . doi:
          <volume>10</volume>
          .1080/09296174.
          <year>2012</year>
          .
          <volume>638796</volume>
          . url: https://www.tandfonline.com/doi/full/10.1 080/09296174.
          <year>2012</year>
          .
          <volume>638796</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kevers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gueniot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Tognotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Medori</surname>
          </string-name>
          . ““
          <article-title>Outiller une langue peu dotée grâce au TALN: l'exemple du corse et BDLC””. In2:6e Conférence sur le Traitement Automatique des Langues Naturelles</article-title>
          . Atala.
          <year>2019</year>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>380</lpage>
          . url: https://hal.science/hal02452276/.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kevers</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Medori</surname>
          </string-name>
          . ““
          <article-title>Towards a Corsican Basic Language Resource Kit””. 1In2t:h Language Resources and Evaluation Conference (LREC</article-title>
          <year>2020</year>
          ).
          <year>2020</year>
          . url: https://hal.scienc e/hal-02865699/.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kiessling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tissot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stokes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. S. B.</given-names>
            <surname>Ezra</surname>
          </string-name>
          . “
          <article-title>“eScriptorium: an open source platform for historical document analysis””</article-title>
          .
          <source>IInn:ternational Conference on Document Analysis and Recognition Workshops (ICDARW)</source>
          . Vol.
          <volume>2</volume>
          .
          <string-name>
            <surname>Ieee</surname>
          </string-name>
          .
          <year>2019</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1109/icd arw.
          <year>2019</year>
          .
          <volume>10032</volume>
          . url: https://ieeexplore.ieee.org/abstract/document/88930.29
          <string-name>
            <given-names>N.</given-names>
            <surname>Otsu</surname>
          </string-name>
          . “
          <article-title>“A threshold selection method from gray-level histograms””</article-title>
          .
          <source>IEInE:E transactions on systems, man, and cybernetics 9</source>
          <volume>-1</volume>
          (
          <year>1979</year>
          ), pp.
          <fpage>62</fpage>
          -
          <lpage>66</lpage>
          . url: https://cw.fel.cvut.cz/b 201/%5C%5Fmedia/courses/a6m33bio/otsu.pd.f D.
          <year>Paci</year>
          . “
          <article-title>Il mito del Risorgimento mediterraneo: Corsica e Malta tra politica e cultura nel ventennio fascista”</article-title>
          .
          <source>PhD thesis</source>
          . Université de Nice Sophia-Antipolis,
          <year>2013</year>
          . urlh:ttps://w ww.
          <source>theses.fr/2013NICE2012.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Pellegrinetti</surname>
          </string-name>
          and
          <string-name>
            <surname>A. RovereL.</surname>
          </string-name>
          <article-title>a Corse et la République</article-title>
          .
          <article-title>La vie politique, de la 昀椀n du second Empire au début du XXIe siècle</article-title>
          . Paris, Média Di昀usion,
          <year>2013</year>
          , 688 p.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>A.-T.</given-names>
            <surname>Pietrera</surname>
          </string-name>
          . “
          <article-title>Imaginaires nationaux et mythes fondateurs; la construction des multiples socles identitaires de la Corse française à la geste nationaliste”</article-title>
          .
          <source>PhD thesis</source>
          . Université de Corse Pascal Paoli,
          <year>2015</year>
          . urlh:ttps://www.theses.fr/2015CORT0008.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Rogé</surname>
          </string-name>
          . “
          <string-name>
            <surname>Le</surname>
          </string-name>
          corsisme et l'
          <article-title>irrédentisme 1920-1946: histoire du premier mouvement autonomiste corse et de sa compromission par l'Italie fasciste”</article-title>
          .
          <source>PhD thesis</source>
          . Paris 10,
          <year>2008</year>
          , 1 vol. (
          <volume>882</volume>
          p.) url:http://www.theses.fr/2008PA100048.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sarbach-Pulicani</surname>
          </string-name>
          .
          <article-title>A“uthors pro昀椀ling in Corsican autonomist press during the interwar period</article-title>
          .
          <article-title>Stylometric analysis and topic modeling on ”A Muvra” ”</article-title>
          . MA thesis.
          <article-title>École nationales des chartes (PSL)</article-title>
          and
          <source>Università di Pisa</source>
          ,
          <year>2023</year>
          . doi1:
          <fpage>0</fpage>
          .5281/zenodo.8381161.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sarbach-Pulicani. L“</surname>
          </string-name>
          <article-title>a presse corsiste et irrédentiste des années 1930 : étude comparative et quantitative des revues A Muvra et Corsica antica e moderna entre 1932 et 1939”</article-title>
          . MA thesis. Université de Strasbourg,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sarbach-Pulicani</surname>
          </string-name>
          .
          <article-title>Stylometry and topic modelling in Corsican language</article-title>
          .
          <source>Version 2.0.4</source>
          .
          <year>2022</year>
          . url: https://github.com/vincentsarbachpulicani/Corsican-Stylomet.ry [27]
          <string-name>
            <surname>A. Scho昀椀eld</surname>
            , M. Magnusson, and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Mimno</surname>
          </string-name>
          . “
          <article-title>“Pulling Out the Stops: Rethinking Stopword Removal for Topic Models””. IPnr:oceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics</article-title>
          . Valencia, Spain: Association for Computational Linguistics,
          <year>2017</year>
          , pp.
          <fpage>432</fpage>
          -
          <lpage>436</lpage>
          . urlh:ttps://aclanthology.org/E17-206.9 [28]
          <string-name>
            <given-names>C.</given-names>
            <surname>Starace</surname>
          </string-name>
          .
          <article-title>Bibliogra昀椀a della Corsica</article-title>
          .
          <article-title>Centro di studi per la Corsica. Milano, Istituto per gli studi di politica internazionale: Istituto per gli studi di politica internazionale</article-title>
          ,
          <year>1943</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>A.</given-names>
            <surname>Viaut</surname>
          </string-name>
          . ““
          <article-title>Marge linguistique territoriale et langues minoritaires””L.eIn:gas</article-title>
          . Revue de sociolinguistique. 71. Presses universitaires de la Méditerranée,
          <year>2012</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>28</lpage>
          . url: https://journals.openedition.org/lengas/3.01
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>