<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Modal Sense Classifier for the French Modal Verb Pouvoir</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Colli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Rossini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Delphine Battistelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Modyco laboratory, Paris Nanterre University</institution>
          ,
          <addr-line>200 Av. de la République, 92000 Nanterre</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Paris Nanterre University</institution>
          ,
          <addr-line>200 Av. de la République, 92000 Nanterre</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we address the problem of modal sense classification for the French modal verb pouvoir in a transcribed spoken corpus. To the best of our knowledge, no studies have focused on this task in French. We fine-tuned various BERT-based models for French in order to determine which one performed best. It was found that the Flaubert-base-cased model was the most efective (F1-score of 0.94) and that the most frequent categories in our corpus were material possibility and ability, which are both part of the more global alethic category.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;pouvoir</kwd>
        <kwd>modal verbs</kwd>
        <kwd>Modal Sense Classification</kwd>
        <kwd>BERT</kwd>
        <kwd>modality</kwd>
        <kwd>French</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>In this paper, we present our research into the automatic The first study to focus exclusively on modal sense
classidisambiguation of the French modal verb pouvoir (in En- fication was [ 1], who proposed logistic regression models
glish, this verb can be translated by can, could, may or for each modal verb in English, based on an ensemble of
might) in a corpus of semi-structured interviews1. This hand-crafted syntactic and lexical features. It was also
problem statement is part of a broader quantitative and the first study to present an annotation scheme and an
qualitative analysis currently underway on modal mark- annotated news domain corpus. Further studies pointed
ers in order to better understand which kinds of modal out the problem of the biased distribution and sparsity
categories are prevalent in this kind of corpus. As an NLP of data used in [1]. For example, two of these studies, [2]
task, the problem of the automatic disambiguation of and [3], suggested creating a larger and balanced dataset
modal markers relies on what is generally called “modal using a paraphrase projection approach from German
sense classification” (MSC). As far as we know, no studies data (English-German parallel corpus of film subtitles
have focused on disambiguating modal verbs using a ma- and proceedings from the EU Parliament). More
specifichine learning approach in French. Our aim is to fill this cally, [2] updated the original feature set with semantic
gap by finding the best fine-tuned BERT model to classify features. [3] also updated the original features of [1]
the semantic values of the French modal verb pouvoir with lexical and discourse features to improve the
perforin a transcribed spoken corpus. The article is organized mances of the classifiers; in addition, they explored the
as follows. In section 2 we review related work on the influence of genre on the classification of modal verbs.
task of modal sense classification. Section 3 describes Lastly, [4] proposed the most accurate and flexible
alour corpus and our linguistic model. Section 4 presents ternative to classifiers based on manually engineered
the annotation of the corpus with an annotation scheme. features. Their model is based on a CNN architecture and
Section 5 presents our experiments in fine-tuning difer- is able to automatically extract features that are relevant
ent BERT models in order to choose the most efective for classification (word embeddings). By adapting the
one. Finally, in section 6 we discuss our results and in model to German, they demonstrated the model’s ability
section 7 we close our contribution with conclusions and to generalize across diferent languages. [ 5] introduced
suggestions for future research. another model architecture in which a simple classifier is
fed with a combination of three sets of hand-crafted
features and a concatenation of pre-trained embeddings of
context words. This representation of the modal context
was obtained by testing various weighting schemes. More
CLiC-it 2024: Tenth Italian Conference on Computational Linguistics, recent studies have attempted to solve the problem as a
Dec 04 — 06, 2024, Pisa, Italy classical modal sense classification task by probing BERT
($D.aRnonsas.icnoil)l;i@dbpaattriisstn@anptaerrirsen.afrn(tAer.reC.forll(iD);.4B2a0t1t3is1t8e9ll@i)gmail.com architecture [6]. BERT-based models do not need a
hand© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License crafted feature set and they are claimed to be better at
1The codAettriabuntidon 4t.0hIneternaantionnaol t(CaCtBeYd4.0)c. orpus is available on GitHub capturing contextual information than previous models.
https://github.com/DiegoRossini/Modal-verbs-modality-detector. [7] showed that BERT does not have a unique
representaThe model is available at https://huggingface.co/DiegoRossini/ tion for each modal sense, but, given the same semantic
lfaubert-pouvoir-modality-detector
value, BERT encodes it diferently for each modal verb. ers6 in the ES_CF corpus which contains globally 150.000
For this reason, individual classifiers for each verb per- modal markers. The marker pouvoir is a “highly
polyform better than a classifier for each modal sense. Finally, semous” marker as it can potentially be part of three
[8] used BERT’s last hidden layer representations of the categories: alethic, epistemic and deontic (see section 3.2
English modal verbs and their context to feed a k-nn and for their examination in detail). In order to determine
logistic regression model. In addition, they tried to train the semantic value of each instance of polysemic modal
a single common model for all the modal verbs but they markers, we propose a NLP approach for disambiguating
showed that for some of them, including can and could, the modal verb pouvoir in its context. Our approach is
this does not improve the results. [8] used the [1] and based on the linguistic model of [12].
[2] datasets and also introduced a new and richer dataset
from COCA2, characterized by 5 genres including the 3.2. Linguistic model for analysing
spoken genre.In general, BERT-based models outperform
semantic values of pouvoir
the frequency baseline and previous models for almost
all modal verbs. Regarding French, as far as we know, no
research has yet focused on the disambiguation of modal
verbs using a machine learning approach. The only NLP
approach is [9] which studied the notion of “possible”
and adopted a symbolic approach with a set of rules to
semantically annotate epistemic possibility. The present
paper aims to fill this void by using a BERT architecture
to solve the MSC task in a transcribed spoken French
corpus. We present here the work carried out for the
disambiguation of the modal verb pouvoir.</p>
      <sec id="sec-2-1">
        <title>In French, several studies have focused on elucidating</title>
        <p>the various contextual meanings of the modal verb
pouvoir, e.g. ([13]; [14]; [12]). In order to build our
annotation scheme (see section 4.1), we rely on the analysis
presented in [12]. This is the model that was used in
the ModalE tool used for extracting modal markers [10].
As mentioned in section 3.1, this tool assigns 3 possible
global modal categories to pouvoir: alethic, epistemic
and deontic. A deeper analysis of pouvoir, based on [12],
led us to consider that this modal verb can have 6
possible refined modal categories (see table 6): 4 belong to
the alethic category (descriptive judgements on a reality
independent of the subject), 1 is part of the epistemic
category (descriptive judgements referring to a subjective
evaluation of the reality by the subject) and 1 belongs to
the deontic one (prescriptive judgements based on
institutions or systems of conventions). In [12], the values of
“possibilité matérielle” (material possibility) and
“capacité” (ability) are first [ 12, p. 442] presented as two distinct
values, and later [12, p. 448] as part of a single one. Since
this ambiguity is not resolved in Gosselin’s typology, we
decided to treat them as two distinct values.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Corpus and linguistic model</title>
      <sec id="sec-3-1">
        <title>This section presents our corpus (3.1) and the linguistic model (3.2) on which the annotation scheme is based.</title>
        <sec id="sec-3-1-1">
          <title>3.1. The ES_CF corpus</title>
          <p>Our corpus – named here corpus ES_CF – is composed
of 221 semi-structured interviews extracted from two
diferent corpora 3. In the first corpus, named Eslo 4, we
selected 207 interviews featuring questions to the
citizens of Orléans about their habits and feelings regarding
their city. In the second one, named CFPP5, we selected 4. Corpus annotation
14 interviews containing similar questions but focusing
on the city of Paris. An automatic tool, named ModalE, In order to follow a supervised learning procedure, it
described in ([10]; [11]), was employed to count the dif- is necessary to have a manually annotated corpus. We
ferent modal categories that are present in these two describe here the process of manual annotation (4.1) and
corpora. The tool is built on the typology proposed by the constitution of 4 diferent versions of our annotated
[12]. Each French modal marker is associated with one corpus (4.2) that we used for the experiments detailed in
or more modal categories depending on its more or less section 5.
polysemous nature. The results indicate that the verb
pouvoir is among the four most frequent modal
mark2https://www.english-corpora.org/coca/
3Among the diferent types of interviews and recordings which are
present in these two corpora, we have extracted only the
semistructured interviews between an interviewer and an interviewee
4https://www.ortolang.fr/market/corpora/eslo (700 recordings in
total).
5https://www.ortolang.fr/market/corpora/cfpp2000 (60 recordings
in total).</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>4.1. Annotation procedure</title>
          <p>Table 2 presents the elements of our annotation scheme
based on [12]’s typology summarized in table 6 (for a
fuller version with examples and definitions, see A).
Table 2 shows the 7 possible modal categories of pouvoir</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>6the others: “bien” (well) (7.3% of the total modal markers), “dire”</title>
        <p>(to say) (6.9%), “savoir” (to know) (5.6%), “pouvoir” (4.94%).
4.2. Corpus preparation
(the logical possibility category is included in the annota- Table 2
tion scheme even though we did not find any examples The 7 categories of pouvoir in the annotation scheme
in our corpus). We have also added an “undetermined” global modal categories
category, which includes the occurrences of pouvoir for
which an annotator hesitates between two or more
values and the ones that we were unable to annotate due alethic
to a lack of context. We annotated 24 interviews from
the ES_CF (17 from the Eslo corpus and 7 from the CFPP epistemic
corpus) with an average length of 15,000 tokens. The an- deontic
notation was carried out by three annotators (first author undetermined
and two linguistic masters students) using Glozz [15]. We
then calculated two inter-annotator agreements using
Fleiss’ Kappa. The first one is called “strict” and includes
the 6 values (excluding logical possibility). For the second
one, denominated “broad”, we decided to merge “ability”
and “physical possibility” into a single category called
“physical possibility and ability” because of the ambiguity
that persists in Gosselin [12]’s typology (see section 3.2),
confirmed also by the frequent disagreement between
annotators on these two categories. We obtained a
result of 0.6 for the strict inter-annotator agreement and
0.66 for the broad inter-annotator agreement. Since the
result of the broad inter-annotator agreement was
better, we decided to adopt this version of the annotated
corpus for training. The model was trained on all the
categories except for logical possibility and the
“undetermined” category. The total number of occurrences of
pouvoir manually annotated in the corpus is 8797.8
modal categories
sporadicity
material possibility
ability
logic possibility
eventuality
permission
undetermined
we prepared 4 distinct datasets, each crafted to address
specific challenges and enhance performance (see
examples in C).</p>
        <p>• Corpus Base: this dataset contains 776 sentences
with at least one occurrence of pouvoir. Serving
as our foundational dataset, it sufers from an
imbalance in the distribution of modality categories.</p>
        <p>This imbalance could bias the classifier toward
more common categories, making it essential to
address this issue in subsequent datasets.
• Corpus Base Augmented: to rectify the
imbalance observed in the "corpus base", we
created this augmented dataset containing 1716
sentences. We employed data augmentation using
the cc.fr.300.bin model and the gensim library
for lexical substitution. This process balanced
the distribution of modality categories, resulting
in a more evenly distributed training set for our
classifier.
• Corpus Context: considering the significant
influence of surrounding context on the
meaning of the modal verb pouvoir we constructed a
third dataset (776 sentences with context). This
dataset includes sentences with pouvoir along</p>
      </sec>
      <sec id="sec-3-3">
        <title>In order to efectively train and evaluate our classifier for</title>
        <p>detecting the semantic value of the French verb pouvoir,</p>
      </sec>
      <sec id="sec-3-4">
        <title>7sporadicity (71 occurrences), material possibility or ability (448),</title>
        <p>eventuality (131), permission (229)
8The annotated corpus is available on GitHub: https://github.com/
DiegoRossini/Modal-verbs-modality-detector
with one speaker’s phrase before and after,
ofering a broader contextual framework to help the
classifier better understand the modal sense of
pouvoir and make more accurate predictions (see
.
• Corpus Context Augmented: this fourth and
ifnal dataset combines the benefits of both data
augmentation and expanded contextual framing
(1716 sentences with context).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Experiments and results</title>
      <sec id="sec-4-1">
        <title>In our experiments, the primary objective was to identify</title>
        <p>the most efective configurations regarding training data
and model selection for the token classification of the
French modal verb pouvoir. We chose to perform token
classification to isolate occurrences of pouvoir, enabling
us to label them with the specific categories we developed.</p>
        <p>The primary evaluation metric used across these tests was
the F1-score, which harmonically combines precision and
recall. This metric is particularly crucial in scenarios such
as ours where class imbalance is significant; over 97%
of the dataset constituted the non-pouvoir class labeled
"O". This label was used to mark all tokens that did not
correspond to instances of pouvoir, allowing the model
to focus specifically on identifying and classifying the
modality of pouvoir’s occurrences.
tiveness in the modal classification of the French verb
pouvoir. Throughout this phase, we maintained the
stratified 80-20 split for training and testing, ensuring that
the 20% test set remained unseen for final evaluations.</p>
        <p>For all models tested, the training set was subjected to
5-fold cross-validation during training to leverage its
demonstrated benefits. As shown in table 3, the best
performing model was the flaubert-base-cased which
achieved an F1-score of 0.94 and 0.92 when the "O" class
was excluded9. One possible reason for its superior
performance could be attributed to the extensive and diverse
pretraining corpus it was trained on, which is specifically
designed to capture various nuances of the French
language. Given that our dataset is based on oral corpora,
5.1. Training Data selection the flaubert-base-cased model may be particularly
wellsuited for this type of data, as the other models have been
Initially, the corpus listed in 4.2 was experimented upon trained on less diversified data forms. In the final
evaluausing the camembert-base model with a stratified train- tions, the flaubert-base-cased model demonstrated strong
validation-test split of 80-10-10 over seven epochs to performance in identifying non-modal occurrences and
determine the most efective training data. This split distinguishing specific modalities such as "eventuality"
allowed us to monitor model performance on a small val- and "permission" (see confusion matrix and results per
idation set during training, and the augmented context category in appendix B). However, it encountered some
corpus (corpus_context_augmented) proved to be supe- challenges with the "material possibility or ability"
caterior, achieving an F1-score of 0.90 in evaluation and 0.88 gory, indicating slight semantic overlaps. The confusion
when the "O" class was excluded. These results indicated matrix corroborates these findings, showing minimal
misthat data balancing coupled with contextual enhance- classifications, particularly between categories such as
ments significantly benefits model performance. After "material possibility or ability”. This final analysis
highidentifying the corpus_context_augmented dataset as lights that holistic advancements in both model selection
the optimal choice, we applied a 5-fold cross-validation and detailed category definition refinement are crucial.
strategy to evaluate the model’s robustness. This cross- By leveraging models optimized for the French language
validation process was conducted on the 80% training such as FlauBERT, alongside meticulously curated and
portion of the dataset, while the 20% test set remained un- balanced training data, the task of modality classification
touched. Cross-validation yielded further improvements for pouvoir is approached with an increasingly nuanced
in model performance, solidifying the combination of the understanding and precision, promising further
enhancecorpus_context_augmented dataset and the camembert- ments and consistency in future NLP applications of the
base model as our most reliable setup. same kind.</p>
        <sec id="sec-4-1-1">
          <title>5.2. Model performance comparison</title>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>After determining the optimal training data setup, we</title>
        <p>tested various pre-trained models to assess their
efec</p>
      </sec>
      <sec id="sec-4-3">
        <title>9The model is available at https://huggingface.co/DiegoRossini/</title>
        <p>lfaubert-pouvoir-modality-detector
10for RoBERTa see https://huggingface.co/FacebookAI; for
DistilBERTseehttps://huggingface.co/distilbert; for
CamemeBERT see https://huggingface.co/almanach; for FlauBERT see</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion</title>
      <sec id="sec-5-1">
        <title>For example, the model classifies Example 3. as “possi</title>
        <p>bilité matérielle et capacité” even though the institution
The semantic substitution process was particularly chal- (i.e., "headquarters") granting permission to the subject is
lenging due to the resource-intensive nature of avail- clearly mentioned. The solution will be to enrich the data
able models such as FastText11 and the complexity of of deontic pouvoir with some examples of diferent
struchandling text derived from spoken language. Our ap- tures. To address this problem, it would be necessary to
proach involved using Spacy to capture verbs, determin- enrich and to vary, in terms of structures, the examples in
ing the most semantically similar verbs with FastText, the deontic category. Finally, we tested our model on all
and then conjugating them to match the form of the the 221 interviews in the ES_CF corpus. The results show
original verbs. This sequence of operations proved ex- that most instances of pouvoir belong to the category of
tremely resource-demanding and dificult to implement. physical possibility or ability (51% of pouvoir instances),
Additionally, Spacy and FastText both demonstrated sig- followed by permission (35%), eventuality (9%) and
sponificant dificulties with the French language, leading to radicity (5%). In general, the most representative modal
several inconsistencies during lexical substitution. These category is the alethic one (value of material possibility
ifndings underscore the need for more robust, language- and ability and sporadicity: 56%). These results are
conspecific tools to improve the accuracy and eficiency of sistent with those we obtained in the manually annotated
semantic substitution in NLP tasks involving French, par- portion of the ES_CF corpus presented in section 4.1.
ticularly with spoken text.</p>
        <p>If we take a closer look at the model’s results, we notice
that “permission” is the second best classified category 7. Conclusion
with an f-score of 0.95. However, a qualitative analysis
of the classified sentences revealed some incongruences.</p>
        <p>Among the various uses of pouvoir with the value of
permission, there are two that are very frequent (40%
of permission annotations) and have a typical structure.</p>
        <p>These are the “pouvoir of politeness” (see Ex. 1.), a
question that allows the subject to express a request politely,
and the expression “je/nous/on” (I/we/impersonal pronoun
“on” ) + “pouvoir” + “dire” (to say) , called “pouvoir_dire”
(see Ex. 2.).</p>
        <p>(1) Euh attends j’ai un train de
retard tu peux répéter ? (Uh, wait,
I’m a bit behind, can you repeat that?)
(ESLO2_ENTJEUN_1235)
(2) Enfin j’ai fait essentiellement des
mesures on peut dire (Well, I mostly
took measurements, one could say [...])
(ESLO2_ENT_1014)
Our model is biased by the fact that most of the
permission pouvoir follow one of these two patterns that are
characterized by a fixed structure: the model is not able to
identify as pouvoir of permission any use that is diferent
from 1. or 2.</p>
        <p>(3) Je suis nommé par le siège qui peut
du jour au lendemain si je ne fais pas le
travail me me basculer. (I am appointed by
headquarters, which can, from one day to
the next, if I don’t do the job, toss me out.)
(ESLO1_INTPERS_438)
https://huggingface.co/flaubert; for BERT-base-multilingual:
https://huggingface.co/google-bert
11https://fasttext.cc/</p>
      </sec>
      <sec id="sec-5-2">
        <title>This study demonstrates significant first progress in the</title>
        <p>automatic classification of the French verb pouvoir by
ifnding the best fine-tuned BERT model. Moderate to
substantial inter-annotator agreement led to merging
some subcategories for more streamlined annotations.</p>
        <p>The flaubert-base-cased model, with contextual data
augmentation, achieved an impressive F1-score of 0.94 with
cross-validation, highlighting the importance of context
(see section 4.2 “Corpus Context”). However, challenges
persist, such as limited training data and the need for
better annotation tools and more powerful computational
resources. The model struggles with certain deontic
usages that humans easily identify. Intentional ambiguity
by the speaker also poses a challenge for both annotators
and the model. Future work should expand and enrich
the dataset and consider training on full texts instead of
isolated sentences to capture context better. [8] propose a
similar approach, emphasizing the importance of taking
a large context around the target token and advocating
for the use of full texts as context. In the future, we will
also experiment with an augmented context window of
10 lines before and after the target token. These
enhancements will improve model robustness and set the stage
for further advancements in natural language
processing, particularly for classifying semantic values of French
modal verbs. This is the first step in a larger project that
will soon include the verb devoir (must). More globally,
the ultimate goal of our approach is to be able to identify
which modal categories are prevalent in any given
corpus [16]. Indeed, given that the verb pouvoir is present
in all types of texts, the ability to identify its modality
becomes a necessary tool for refining the overall analysis
of modality in diferent tasks such as sentiment analysis
([17] or hedge detection ([18]).
elsevier.com/retrieve/pii/S1532046410001140.</p>
        <p>doi:10.1016/j.jbi.2010.08.003.</p>
        <p>A. Annexe A: Extended version of annotation examples of the 7
semantic values of pouvoir
Parfois dramatique comme les les
romans qui peuvent rappeler des
situations plus ou moins pénibles.
(Sometimes dramatic, like novels that can
evoke more or less painful situations)
(ESLO1_ENT_003_C)
C’est un un personnage donc il y a des
choses que vous ne pouvez pas faire
uniquement avec du verre et du plomb
par exemple ces cheveux-là le nez la
bouche oui. (It is a character, so there
are things you cannot do with just glass
and lead, for example, the hair, the nose,
the mouth, yes.) (ESLO1_ENT_002_C)
À l’intérieur on a une galette on a un
gâteau on le partage en X morceaux
on peut pas le le faire grandir par
le le un coup de baguette magique.
(Inside, we have a cake, we share
it into X pieces, we cannot make it
grow with a wave of a magic wand.)
(ESLO1_INTPERS_421_C)
ø
Les payer pour qu’ils euh fassent leur
boulot et euh qu’on donne un un
prix euh au meilleur grapheur money
price et on prend cinq mille euros ça
pourrait être pas mal. (Pay them so
they, uh, do their job and, uh, give
a, uh, prize, uh, to the best grafiti
artist, money prize, and we take five
thousand euros, that could be nice)
(ESLO2_ENTJEUN_1228_C)
Euh les gens sont libres de venir
consulter quelque médecin que ce soit et
ils peuvent en changer à tout moment
et que donc euh après être venus me
consulter euh si je ne leur plais pas.
(Uh, people are free to consult any
doctor they choose and they can change
at any time, and so, uh, after coming
to see me, uh, if they don’t like me.)
(ESLO1_ENT_003_C)
C’est ça ? justement je me dis
comment est-ce que je vais pouvoir
utiliser mes capacités informatiques ?
(That’s it? Exactly, I’m wondering how I
will be able to use my computer skills?)
(ESLO2_ENTJEUN_1235_C)
Parce que sinon on aurait pu ...
(Otherwise, we could have...) (CFPP,
Catherine_Lecuyer)
B. Annexe B: confusion matrix of the best model’s results</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>C. Annexe C:</title>
      <p>Datasets
Corpus_base (1 example = 1 oral speech turn)
Corpus_Base_Augmented (from a Corpus Base example
another is created performing lexical substitution)
Corpus_Context (1 exemple = 1 oral speech turn + the oral
speech turn before and the oral speech turn after)
Corpus_Context_Augmented (from a Corpus Context
exemple another is created performing lexical substitution)</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>