<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Computational Humanities Research Conference, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Beyond Idiolectometry? On Racine's Stylometric Signature</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon Gabay</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Geneva Rue des Battoirs 7</institution>
          ,
          <addr-line>CH-1205 Genève -</addr-line>
          <country>Suisse/ Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <issue>4</issue>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>If stylometry has proven to be useful for literary history, especially for distant reading approaches of texts, it still has to show its efficiency regarding close reading. Taking the example of famous French playwright Jean Racine, we propose a double analysis of his plays, both distant and close, following the double objective of controlling its newly alleged paternity on Campistron's plays (which proves to be wrong using standard methods in stylometry), and interpreting the stylometric markers used for this attribution procedure. 17th c. French having a relatively unstable spelling system, we also propose a new method for denoising, based on full linguistic annotation rather than simple lemmatisation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;stylometry</kwd>
        <kwd>serial stylistics</kwd>
        <kwd>Jean Racine</kwd>
        <kwd>Authorship attribution</kwd>
        <kwd>French classical theatre</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>try, with diferent means, to characterise the writing of an author. In other words, do
computationally detected markers have a literary value? It is indeed not clear if a stylome is only
composed of idiolectal traits, or, to an extent that needs to be determined, of sylistic features
with an interpretative yield.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Racine’s case</title>
      <p>
        Jean Racine (La Ferté-Milon, 1639 – Paris, 1699) is one of the most prominent French writers,
so important that he has been considered a “zero point of the critical object” (degré zéro de
l’objet critique) by critics such as Barthes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. He is the author of twelve plays, eleven of which
are tragedies which are considered the quintessence of the genre, and a comedy: Les Plaideurs.
This production is relatively small compared to the production of his famous contemporaries
Molière and Pierre Corneille, who have both written more than thirty plays.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Problem A</title>
        <p>
          New theories have recently emerged regarding the work of Racine: Dominique Labbé has
postulated that he should be attributed fourteen other tragedies signed by another playwright,
Jean Galbert de Campistron (Toulouse, 1656 - ibid., 1723) [
          <xref ref-type="bibr" rid="ref26 ref4">32, 4</xref>
          ]. Such a claim has to be put
in the broader context of D. Labbé’s research on classical French theatre and the théorie des
prête-noms (“figurehead theory”), according to which more than half of the plays (90% of the
comedies) published and played in 17th c. France were signed by intermediaries rather than by
real authors [
          <xref ref-type="bibr" rid="ref25">31</xref>
          ] – the most famous of these figureheads being Molière [
          <xref ref-type="bibr" rid="ref27">33</xref>
          ].
        </p>
        <p>
          D. Labbé’s theories have to be taken with a lot of care, since they have already been severely
discarded by both solid traditional [20] and computational [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] cross-checking, but we still think
that such ideas deserve a scientific answer – at the possible cost of a Streisand efect – for two
reasons. First, editions of those tragedies supposedly written by Racine have already been
published under his name [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], which might create confusions among readers. Second, in an age
of credulity [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], it is important not to let slip without a meticulous verification hypotheses that
have already been considered as conspiracy theories by some scholars [21].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Problem B</title>
        <p>
          Investigating the attribution of new plays to Racine will lead to the identification of stylometric
markers, which should diferentiate him from other writers. It is therefore a perfect opportunity
to explore the possible meaning(s) of these markers, and assess their literary value, following
an approach inspired by serial stylistics. This evaluation will obviously benefit from previous
works, such as Leo Spitzer’s article on Racine’s style [
          <xref ref-type="bibr" rid="ref46">51</xref>
          ] 1 and his idea of a klassische Dämpfung
(“muting efect”), defined as followed:
das oft Nüchtern-Gedämpfte, Verstandesmäßigkühle, fast Formelhafte an diesem
Stil, das dann oft plötzlich und unvermutet für Augenblicke in poetisches Singen
und erlebte Form übergeht, worauf aber wieder rasch ein Löschhütchen von
Verstandeskühle das sich schüchtern hervorwagende lyrische Sich-Ausschwelgen des
Lesers niederdämpft. [
          <xref ref-type="bibr" rid="ref47">52</xref>
          ]
1The article has been fully translated into French [
          <xref ref-type="bibr" rid="ref47">52</xref>
          ] and partially into English [
          <xref ref-type="bibr" rid="ref48">53</xref>
          ].
the frequently sober, muted quality of this style, rational, cool and formulistic, which
then often, suddenly and unexpectedly, makes a transition for some moments into
poetic song and form realised in experience, after which, however, an extinguisher
of rational coolness quenches the shy beginnings of the reader’s lyrical
expansiveness. [
          <xref ref-type="bibr" rid="ref48">53</xref>
          ]
        </p>
        <p>Racine’s style would be characterised by intensity variations, and especially attenuation,
the trace of which can be found, according to Spitzer, in a long list of examples, such as die
Entindividualisierung durch den ubestimmten Artikel (“the de-individualisation by means of
the indefinite article”):</p>
        <p>Je révoque des lois dont j’ai plaint la rigueur. (Phèdre II.2)
(I revoke laws whose rigour I have blamed.)
or der distanzierend Gebrauch des Demonstrativ (“the distancing use of the demonstrative”):
Mais j’ai vu près de vous ce superbe Hippolyte. (Phèdre II.1)
(But I have seen next to you this superb Hippolytus.)</p>
        <p>Such stylistic stylemes (i.e. textual units characterising the discourse as literary in
traditional stylistics2) are particularly interesting because they are based on stop words (articles,
prepositions, adverbs, etc.), and therefore possible stylometric markers. By analysing the
overlap between stylemes and markers, we should be able to evaluate the stylistic nature of the
stylometric analysis.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <p>
        For this experiment, we follow a approach focused on data extracted directly from the sources,
without the mediation of editions, presenting significant engineering challenges. Stylometric
analysis requires an important amount of data that is hard to gather, especially for historical
documents such as 17th c. French texts. If until now most of the research has been carried on
already existing corpora [
        <xref ref-type="bibr" rid="ref44">49</xref>
        ], we have to prepare the ground for further data acquisition
directly from sources that are sometimes hardly readable (e.g. scans of old prints or manuscripts,
cf. figure 1) and require additional processing to neutralise inconsistencies ( e.g. spelling
variation or ancient glyphs such as ‹ſ›).
      </p>
      <p>
        To cope with these problems, we follow a dedicated pipeline partially inspired by those
designed for other states of language to process and clean data [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This latter idea is particularly
crucial in our case, because not only do we want to interpret the classification itself, but also
2We paraphrase here Georges Molinié, who uses the jakobsonian concept of literaturnost (“literariness”) to
define the styleme as un caractérisème de littérarité (“a characteriseme of literariness”) [38].
the stylometric markers used to produce the diferent clusters. Is it the personal pronoun je
(“I”)? The negative adverb pas (“not”)? Or the verb étoit (“was”)? The answer is particularly
complicated for the 17th c. because:
1. Letters can be elided in some cases: je → j’.
2. Some words are homographs: negative adverb pas (“not”) vs noun pas (“step”).
3. Spelling varies from one occurrence to the other: étoit (“was”) = eſtoit.
      </p>
      <p>It is more than likely that minor variations such as these have a limited impact on the
classiifcation [ 16] but afect the stylometric signature: je and j’ are counted as two diferent words,
and both pas as one.</p>
      <p>
        The tools used for our pipeline have already been described in detail [
        <xref ref-type="bibr" rid="ref17">22</xref>
        ]. The text is
extracted from old prints (cf. figure 1) with a text recognition engine, then both linguistically
normalised (i.e. aligned with contemporary French modulo some exceptions, mainly for metric
reasons) and annotated with lemma, POS and full morphology (cf. table 1). Normalisation
provides a simple denoising and deals mainly with simple phenomena such as unstable spellings
(étoit vs eſtoit → était). Linguistic annotation ofers a more efficient reduction of noise: it
diferentiates homographs such as the noun pas (annotated pas NOMcom sing. masc.) and
the adverb pas (pas ADVgen), or reconciles the elided j’ (je PROper P1) with its full form je
(also je PROper P1).
      </p>
      <p>The corpus (cf. table 2) has been deliberately designed as heterogeneous to allow a precise
exploration of stylometric markers used for the classification of our texts. If they all are plays
dating from the last third of the 17th c., they belong to two diferent major genres (tragedies,
comedies) and an additional minor one (heroic comedy). They are written indiscriminately in
prose or in verse. Prints are produced by diferent marchands-libraires (i.e. publishers) and
printers, from Paris and abroad (Bruxelles), to maximise the variation of spelling choices. We
have prepared two plays for each playwright (Pradon and Campistron), three when there are
two genres for one writer (Molière and Racine).</p>
      <p>
        Texts have been corrected before being normalised and annotated. Because they have all
been encoded in XML-TEI (cf. figure 2) it has been possible to keep only replies and to remove
stage directions, notes, numbering of scenes and acts, etc. because we aim to study the text
and not the paratext [
        <xref ref-type="bibr" rid="ref49">54</xref>
        ]. The name of places and characters, which could introduce biases3,
have also been removed.
      </p>
      <p>
        Such a corpus being too small to provide robust results and the process to create additional
data being extremely time consuming4, we have decided to fall back on modernised versions of
3Two plays about the same event would be artificially overcorrelated because of similar rare words.
4The very poor quality of many prints forces editors to correct the entire transcription produced by the
OCR engine.
plays available online [
        <xref ref-type="bibr" rid="ref16">19</xref>
        ] to increase the amount of texts studied (cf. tab. 6, in the appendix).
However, merging the primary and this secondary corpus remains possible at two diferent
levels: using the normalised version of the original texts automatically produced, but also via
the linguistic annotation, the model providing it being trained on both original and normalised
transcriptions [
        <xref ref-type="bibr" rid="ref18">23</xref>
        ].
      </p>
      <p>
        A control corpus, with a symmetrical composition to the primary corpus, but composed
by 18th c. French plays, has been prepared for benchmarking purposes (cf. table 5).
Reproducibility of our experiments with similar results on another corpus has been thought to be an
additional safety net, on top of the careful use of previous methodological studies on stylometric
evaluation [
        <xref ref-type="bibr" rid="ref15">16, 18</xref>
        ] and similar experiments [
        <xref ref-type="bibr" rid="ref10 ref43">10, 48</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Problem A: Authorship attribution</title>
      <p>
        4.1. Set up
We have drawn the ascendant hierarchical clustering (henceforth AHC), using mainly two R
packages, FactoMineR [
        <xref ref-type="bibr" rid="ref28">34</xref>
        ] and Stylo [17], with the following parameters:
• Distance is calculated with Burrows’s delta (i.e. computing a manhattan distance
between two z-scored vectors) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] combined with vector-length Euclidean normalisation,
following here the conclusions of previous stylometric studies on French literature [
        <xref ref-type="bibr" rid="ref10 ref43">10,
48</xref>
        ].
• Linkage criterion follows Ward’s minimum variance method (i.e. the pair of clusters to
merge at each step is based on the optimal value of an objective function). [
        <xref ref-type="bibr" rid="ref51">56</xref>
        ]
In order to evaluate the results, two evaluation measures have been used:
• The agglomerative coefficient (henceforth AC) measures the strength of the clustering
structure by calculating the mean similarity of each object with the first cluster it is
merged with, normalised on the total height of the plot (i.e., the height of the merger in
the last step of the classification algorithm [
        <xref ref-type="bibr" rid="ref40">45</xref>
        ]). Let H be the vector of the heights at
which each node i is merged with its first cluster:
1 − 1 ∑in=1 Hi
      </p>
      <p>
        n max(H)
it is expressed by a number between 0 and 1, the closer to one being the better.
• Cluster purity (henceforth CP) is the average percentage of the dominant class label
(the putative author) in each cluster [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A result below 1 (=100%) indicates that some
objects (texts) were not classified correctly regarding the class label. Because our corpus
is made of texts written by four authors and belong to two diferent genres, we expect
six clusters, none of which containing two diferent authors or two diferent genres.
Four diferent methods to select the most relevant features have been experimented:
• Using the 100 most frequent words (henceforth MFW). We are well aware that such a
number is well below the recommended average [15], but it allows us to minimise the
importance of thematic/generic words and does not afect the the clustering (100, 1,000 and
3,000 MFW have been tested with equivalent results). A bootstrap consensus tree [14]
(cf. figure 3) confirms this stability, no matter the number of tokens used (between 100
and 5000).
• Using function words, i.e. a selection of tokens excluding nouns, verbs (except auxiliary
verbs), adjectives, and including preposition, articles, determiners, prepositions.
Pronouns have not been kept because previous studies on the indice pronominal (“pronominal
index”) in French have shown that they are a generic rather than a stylistic feature [
        <xref ref-type="bibr" rid="ref35 ref45">41,
50</xref>
        ].
• Using (pseudo-)affixes (suffixes and prefixes), i.e. the first / last three characters of
each word, and the first / last two characters of each word and the space preceding /
following [
        <xref ref-type="bibr" rid="ref41">46</xref>
        ].
• For stop words and affixes, we have additionally applied Moisl’s selection method [
        <xref ref-type="bibr" rid="ref30">36</xref>
        ]
adapted for stylometry by Camps and Cafiero [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with a 1.645 critical value (i.e. 95%
confidence interval).
      </p>
      <p>No matter what the scenario is, the main split is made according to the genre (with comedies
in the upper part of the dendogram and tragedies in the lower part) and all the possible
pairs are correctly classified (except for affixes without Moisl’s selection). These results being
relatively clear, it has not been thought relevant to pursue with other tests via other features
(e.g. character N-grams) or other methods (e.g. support vector machine), whose results are
expected to be the same. Regarding CP, we observe minor misclassifications (cf. figure 4): only
stop words (with and without Moisl’s selection) and affixes (with Moisl’s selection) ofer 100%
purity (cf. figure 5).</p>
      <sec id="sec-4-1">
        <title>4.2. Stylometric Results</title>
        <p>The same experiments have been repeated on three diferent versions of our corpus:
• The original version, with maximal spelling variation.
• A normalised version, the spelling of which has been aligned with contemporary French.
• An annotated version, with lemma, POS and full morphology for each token (for which
we do not ofer a clustering based on affixes for obvious reasons).</p>
        <p>A detailed breakdown of the results (cf. table 3) shows that a perfect CP is achieved in many
diferent ways, no matter the version of the corpus (cf. figure 6), and that normalisation has an
ambivalent (but marginal) impact. Moisl’s selection always improves the CP (if it is not at its
maximum) no matter what it is combined with (stop words or affixes). MFW ofer a slightly
lower CP.</p>
        <p>
          These results show that we are able to disentangle the authorial and the generic signal
from one another, despite an unstable spelling, with maximal denoising of data via a complete
linguistic annotation. Because it provides a unique ID for each type, impermeable to spelling
variation, flexion or elision, this strategy ofers, for pre-orthographic states of language, an
excellent alternative to character N-grams [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. However, with the use of a tagger, linguistic
annotation introduces an additional step in the workflow, which inevitably increases noise in
the data, especially when performed on unclean transcriptions. Full annotation should however
be preferred to a simple lemmatisation [
          <xref ref-type="bibr" rid="ref24">30</xref>
          ], which is not precise enough and too dependent on
annotation choices behind the lemmatisation model (e.g. nominalisation, etc.).
        </p>
        <p>Regarding authorship attribution, despite variations in the results, no scenario suggests that
Racine’s and Campistron’s plays would be written by the same playwright. When extending
the size of the corpus by merging the primary and the secondary corpus, the AHC given with
the best configuration produces the same classification, confirming our first results (cf. figure 7).
Version</p>
        <p>Original</p>
        <p>Normalised
Linguistic annotation</p>
        <p>Method
100 MFW
Stop words
Stop+Moisl</p>
        <p>Affixes
Affixes+Moisl
100 MFW
Stop words
Stop+Moisl</p>
        <p>Affixes
Affixes+Moisl
100 MFW
Stop words
Stop+Moisl
Labbé’s hypothesis clearly proves to be, once again,
wrong when using standard methods in stylometry.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3. Additional experiments</title>
        <p>As previously explained, because our model for
linguistic annotation has been trained on more than 17th c.
prints, may they be normalised or not, it has been
possible to tag the control corpus and merge it with
our primary corpus. Interestingly, the results not only
validate the previous ones, but prolong them.
Using the same configuration (Moisl+stop words on
linguistic annotation), we are now able to disentangle
genre, authors but also centuries. Looking at the AHC
(cf. figure 9), we see a first split according to the genre
(comedies are in the upper part, tragedies in the lower
part of the tree), then centuries (with 17th c. texts Figure 7: AHC with the best
configuin the upper sub-parts and 18th c. texts in the lower ration (primary + secondary corpora)
sub-parts), and finally authors.</p>
        <p>A fairly reliable PCA (cf. figure 8, 45% of the total
information is retained) shows similar results with comedies on the left and tragedies on the
right, 18th c. texts on the upper part and 17th c. texts on the lower part, but emphasises
some limits of our clustering, especially for three texts which are loosely attributed to a cluster
(Marivaux’s tragedy as a green square, Molière’s heroic comedy as a turquoise circle, and, to
a lesser extent, Racine’s comedy as a purple square). It is extremely interesting to note that
these three texts are all a play of the other genre than the speciality of the writer: this tension
between personal and generic traits could be interpreted as a limited ability to mimic the
characteristics of an other genre that one’s speciality, literary “cross-dressing” showing here its
limits. Tragedies of a comic writer would be, in a way, less tragic, and comedies of a tragedian
less comic.</p>
        <p>
          This capacity to cross-dress seems however to vary from one author to another according to
the PCA: some playwrights show a better homogeneity of the authorial signal despite genre
variation, such as Racine, whose works are less spread on the graph than Voltaire’s. A
tSNE (cf. figure 10) – i.e. a visualisation of high-dimensional data in a two-dimensional space
prioritising short distances rather than long ones [
          <xref ref-type="bibr" rid="ref29">35</xref>
          ] – can highlight such a phenomenon by
clustering together all the plays of a single author no matter the genre, revealing the inner
homogeneity of apparently scattered works. Thus, if Marivaux’ or Voltaire’s plays are clearly
divided by genre, it is not the case for Molière and Racine, whose stylometric signature seems
more dense.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Problem B: On style</title>
      <sec id="sec-5-1">
        <title>5.1. Stylistic interpretation of stylometric markers</title>
        <p>
          Now, another question arises: what are the tokens behind these clusters? The splits we have
presented in our AHC are indeed based on tokens in texts, and we can hypothesise that those
tokens reflect a specific trait, a stylome, of the author. To do so, we need to compute the link
between a token and a cluster, which can be done with a v test (a test value).
v − test = x¯q − x¯
σ
In the following equation, x¯q is the average of variable X for the individuals (tokens) for a
category q (clusters), x¯ the average of the variable X accross all categories, and σ is the square
root of the variance (i.e. the standard deviation) [
          <xref ref-type="bibr" rid="ref23">28</xref>
          ].
        </p>
        <p>Token
même (indefinite
adjective, singular)
tant (adverb)
encore (adverb)
après (preposition)
enfin (adverb)</p>
        <p>si (adverb)
quel (relative determiner,
masculine singular)
contre (preposition)
jusque (preposition)
où (relative pronoun)</p>
        <p>17th c. 17th c.
(without Moisl) (with Moisl)</p>
        <p>With our best configuration on the primary corpus (cf. the second column of table 4), the v
test defines six tokens as typical of Racine’s tragedies, but these results are not absolute: test
values are computed in contrast to the rest of the corpus, and change with the latter. However,
the results of the v test for the same plays, but with another configuration (without Moisl’s
selection) or a larger corpus (e.g. the 17th c. and 18th c. texts merged), are similar, proving a
relative stability, which would deserve further research.</p>
        <p>The interpretation of these markers is partially facilitated by Spitzer’s study: the intensive
adverb tant (“so much”), analysed together with si (“so”), is identified as characteristic of
Racine’s muting efect.</p>
        <p>
          An das distanzierende Demonstrativ können wir das beteuernde si und tant
anschließen. Ein si […] ruft ja den Gesprächspartner zum Zeugen an, man sollte also
auf eine besonders ‘warme’ Wirkung schließen. Die gegenteilige Wirkung scheint
nun bei Racine herauszukommen: das si hat etwas Kühl-Abgeschwächtes [
          <xref ref-type="bibr" rid="ref47">52</xref>
          ]
To the distancing demonstrative we can add the affirming si and tant. A si […] calls
the interlocutor to witness, so we should conclude that it has a particularly ‘warm’
efect. The opposite efect now seems to come out of Racine: the si has something
cool and weakened [our translation]
A typical example with tant would be the following:
        </p>
        <p>Astyanax, d’Hector jeune et malheureux fils,
Reste de tant de rois sous Troie ensevelis. (Andromaque I.1)
(Astyanax, Hector’s young and unfortunate son,</p>
        <p>Remainder of so many kings buried under Troy.)
The notion of plenty introduced by tant is immediately counterbalanced by the idea that this
profusion has disappeared: so many kings are dead.</p>
        <p>Stylometric markers seem to point in another direction: the unfolding of the narration,
altering the flow of the story with a similar “muting efect”. In that sense, the best example is
encore (“again”):</p>
        <p>Où suis-je ? Qu’ai-je fait ? Que dois-je faire encore ? (Andromaque V.1)
(Where am I? What have I done? What do I still have to do?)
Rather than adding new peripeteias, it conveys a sensation of lingering, of endless repetition
without clear direction highlighted in our example by the interrogation.</p>
        <p>The use of the indefinite adjective même, used in pronominal locution such as lui-même
(“himself”) or nous-mêmes (“ourselves”), creates a similar efect of circularity, but within the
sentence itself, with a redundancy of pronouns provoking a loop in the narration.</p>
        <p>Mais moi-même, seigneur, que faut-il que je croie ? (Bérénice III.2)
(As for myself, my lord, what must I believe?)
The polyptoton (moi-même/“myself”-je/“I”) does strengthen the affirmation of the self ( me,
myself and I ) but is used to emphasises doubt, accentuated in this very example by an
interrogation.</p>
        <p>Finally, the two tokens après (“after”) and enfin (“finally”) both have this “cooling” efect
Spitzer talks about: the analepsis ofers a reversed perspective on the story, looking at it from
its end and therefore preventing any potential suspense:</p>
        <p>Enfin , après un siège aussi cruel que lent,
Il dompta les mutins, reste pâle et sanglant
Des flammes, de la faim, des fureurs intestines, ( Bérénice I.4)
(Finally, after the siege as cruel as slow,
He tamed the rebels, pale and bloody remainders
of flames, hunger and intestine feuds, )
The suddenness with which Racine wraps up the story (enfin /“finally”) contrasts with its
supposed length (lent/“slow”), and all the adventures seem to be concealed. In the light of this
last example and the previous ones, we can conclude that if stylometric markers are stylistically
relevant, their interpretation is far from being straightforward, and a careful examination of
occurrences remains compulsory to avoid misinterpretations.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Stylometry, style and idiolect</title>
        <p>
          For a few decades, it has been accepted that among all the available criteria, statistical
repetition and deviation are not sufficient to identify a styleme [
          <xref ref-type="bibr" rid="ref33">39</xref>
          ], because this latter is not
hyletic [
          <xref ref-type="bibr" rid="ref38">43</xref>
          ] but has a fully heuristic status [
          <xref ref-type="bibr" rid="ref31">37</xref>
          ]: a word is not per se poetic, but used
poetically, and it is this usage that defines its literariness. Stylometric markers are nothing more
than idiolectal traits with a potential aesthetic value that awaits to be deciphered.
        </p>
        <p>
          The too loose definition of style proposed by Herrmann, Schöch &amp; van Dalen-Oskam, “a
property of texts constituted by an ensemble of formal features which can be observed quantitatively
or qualitatively” [
          <xref ref-type="bibr" rid="ref22">27</xref>
          ], is therefore unsatisfactory because it does not disentangle idiolectometry
from stylistics – two related, yet substantially diferent approaches to the text, and potentially
any other work of art. As G. Philippe explains: l’idiolecte c’est le style sans la signification,
et le style, l’idiolecte en tant qu’il peut faire l’objet d’une interprétation (“the idiolect is style
without signification, and style is the idiolect in so far as it can be interpreted”) [
          <xref ref-type="bibr" rid="ref36">42</xref>
          ].
        </p>
        <p>
          The example of Racine demonstrates that with a minimal definition of style as “a property
of texts constituted by an ensemble of formal features with an interpretative yield which can
be observed quantitatively or qualitatively”, stylometry, if carefully used, can potentially
contribute to the identification of stylistic signatures. This identification would however not be
complete via this only mean and needs to be combined with other approaches, such as
textometry, to be fully captured. Textual motifs, which combine several words in a (semi-)rigid
order [
          <xref ref-type="bibr" rid="ref20">25</xref>
          ], remain for instance a blind spot of a purely stylometric research despite their
importance to describe Racine’s style (cf. figure 11). It is the same for more syntactic studies,
looking at sentences [
          <xref ref-type="bibr" rid="ref19">24</xref>
          ].
        </p>
        <p>
          If stylometric markers are not perfect and constitute only a portion of the stylistic features
of a writer, they also have their virtues. They could be of great help for an old, important
and complex challenge of stylistics: the contextualisation of stylemes. What, from a given
author, belongs to his time? or his school? Studying carefully markers, we can observe that
the genre, as previously mentioned and exposed elsewhere [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], but also the date of writing do
play an important role in the clustering, and even that the generic and the diachronic signals
prevail upon the authorial one. In that sense, because a sub-cluster (the author) inherits from
characteristics of the previous ones (the genre and the period), tant, encore or enfin are not
only Racinian stylemes, but also classical5 and tragic features.
        </p>
        <p>5Used in the French sense to designate 17th c. literature</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>We can now, with certainty, remove any doubt about the paternity of Racine on Campistron’s
plays: the latter is not a figurehead of the former, despite Labbé’s claims. Such a result is
guaranteed by a battery of tests which all confirm our classification, but a careful study of
their respective accuracy highlights the efficiency of performing an HCA on a linguistically
annotated corpus rather than the raw text. Such a method not only ofers a perfect CP,
but also disambiguates homographs and corrects polymorphism due to spelling variations or
elisions, which is of high interest when one needs to interpret both the classification itself and
the words behind this classification.</p>
      <p>These words can be identified with a standard v test, which computes the link between
a given token and a cluster. Despite a certain volatility of the results, relatively dependent
of the corpus used, stylometric markers do have a clear interpretative yield, which confirm
Spitzer’s idea of a muting efect in Racine’s plays, but also prolong this idea by new examples.
If traditional and stylometric stylemes concur, they are however of a slightly diferent nature
because the latter characterise (mathematically) Racine’s tragedies and cannot be transverse,
i.e. shared to a significant extent with another writer, genre or period.</p>
      <p>
        Such results show that stylometry does not recognise only idiolects, and can contribute to
stylistic surveys at various levels, starting with the close reading of the text by identifying the
stylemes that make it special. However, because of its comparative nature, stylometry does not
limit itself to authors and does identify other broader clusters related to the genre or the period.
Doing so, it answers Barthes’ wish to dépasser la notion d’idiolecte (primitivement retenue
comme point de départ) et à voir dans toute écriture, fût-elle apparemment très individuelle, le
fragment d’un sociolecte ou langage de groupe (“to go beyond the notion of idiolect (originally
retained as a starting point) and to see in all writing, however apparently very individual, the
fragment of a sociolect or group language”) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Stylometry naturally articulates individual
traits to global ones and contradicts S. Vaudrey-Luigi’s affirmation, according to whom ce n’est
peut-être pas tant un style d’auteur que l’on reconnait qu’un style d’époque (“it is not the style
of an author that we recognise, but the style of a period”) [
        <xref ref-type="bibr" rid="ref50">55</xref>
        ]: it might very well be both
of them, one hidden under the other. Just like J. Scherer explained with the construction of
plays [
        <xref ref-type="bibr" rid="ref42">47</xref>
        ], behind a frame made of strict rules, we see the apparition of individual traits, in
the background, probably until the advent of romanticism and le sacre de l’écrivain [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The identification of these traits is however still problematic, because stylometric results
remain specific to the primary corpus, whereas a proper “stylome”, like the “genome” it has
been named after, should be absolute. Indeed, if the definition of a stylistic signature that is
relative to a given context is sufficient for authorship attribution, it remains of a limited interest
for stylistic studies. The solution to this problem is still unclear to us, but clearly passes by a
diferent approach to corpora, which need to be less homogeneous, and more representative of
the production of the time, to ofer more precise results.</p>
    </sec>
    <sec id="sec-7">
      <title>Data and scripts</title>
      <p>Supplementary materials (doc+code) are available on zenodo: https://doi.org/10.5281/zenodo.
5526586.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The final version of this article would not have been possible without the help of J.-B. Camps,
Fl. Cafiero, Th. Clérice, K. Abiven, G. Forestier and our reviewers. Thank you also to
Éléonore, directrice du “projet suisse”.
M. Eder. “Computational stylistics and Biblical translation : how reliable can a
dendrogram be ?” In: The Translator and the Computer. Wrocław: Wyższa Szkoła Filologiczna
we Wrocławiu, 2012, pp. 155–170. url:
http://docplayer.pl/949875-The-translator-andthe-computer.html.</p>
      <p>M. Eder. “Does size matter? Authorship attribution, small samples, big problem”. In:
Digital Scholarship in the Humanities 30.2 (2015), pp. 167–182. doi: 10.1093/llc/fqt066.
M. Eder. “Mind your corpus: systematic errors in authorship attribution”. In: Literary
and Linguistic Computing 28.4 (2013), pp. 603–614. doi: 10.1093/llc/fqt039.
M. Eder, J. Rybicki, and M. Kestemont. “Stylometry with R: A Package for
Computational Text Analysis”. In: The R Journal 8.1 (2016), pp. 107–121. url:
https://journal.rproject.org/archive/2016/RJ-2016-007/index.html.</p>
      <p>G. Forestier. Molière auteur des oeuvres de Molière. 2011. url: http://moliere-corneille.
huma-num.fr.</p>
      <p>G. Forestier. “Révéler la vérité cachée : le cas Molière comme symptôme du
fonctionnement et des enjeux de la pensée hypercritique de la Renaissance à aujourd’hui”. In:
La Vérité. Congrès annuel de l’IUF. Toulouse, France, 2013. url:
https://hal.archivesouvertes.fr/hal-01888357.</p>
      <p>M. Kestemont. “Function Words in Authorship Attribution. From Black Magic to
Theory?” In: Proceedings of the 3rd Workshop on Computational Linguistics for Literature
(CLFL). Gothenburg, Sweden: Association for Computational Linguistics, 2014, pp. 59–
66. doi: 10.3115/v1/W14-0908.
Catilina
Électre
Inés de Castro</p>
      <p>Place
Paris
Paris
Paris
Paris
Paris</p>
      <p>Author
Campistron
Campistron</p>
      <p>Title
Acis et Galatée</p>
      <p>Adrien</p>
      <p>Aétius
Alcibiade
Andronic</p>
      <p>Juba, roi
de Mauritanie</p>
      <p>Phocion</p>
      <p>Pompéia</p>
      <p>Place</p>
      <p>Paris
Paris
Paris</p>
      <p>Date
1690
Tragedy
Tragedy</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , J. Han,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>“A framework for projected clustering of high dimensional data streams”</article-title>
          .
          <source>In: Proceedings of the Thirtieth international conference on Very large data bases -</source>
          Volume
          <volume>30</volume>
          . Vldb '
          <fpage>04</fpage>
          . Toronto, Canada: VLDB Endowment,
          <year>2004</year>
          , pp.
          <fpage>852</fpage>
          -
          <lpage>863</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Barthes</surname>
          </string-name>
          . Œuvres complètes:
          <fpage>1968</fpage>
          -
          <lpage>1971</lpage>
          . Paris: Éditions du Seuil,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Barthes</surname>
          </string-name>
          . Sur Racine. Pierres vives. Paris: Éditions du Seuil,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Basson</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Labbé</surname>
          </string-name>
          . “De précieux manuscrits”. In:
          <article-title>Actes des 15es Journées internationales d'Analyse statistique des Données Textuelles. 15es Journées internationales d'Analyse statistique des Données Textuelles (JADT</article-title>
          <year>2020</year>
          ). Toulouse, France,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Basson</surname>
          </string-name>
          and D. Labbé, eds. Jean Racine. Aétius, Juba, Tachmas.
          <article-title>Tragédies inédites transcrites et présentées par Jean-Charles Basson et Dominique Labbé</article-title>
          . Montréal:
          <string-name>
            <surname>Monière-Wollank Editeurs</surname>
          </string-name>
          ,
          <year>2015</year>
          . url: https://hal.archives-ouvertes.fr/hal-01165969.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Benichou. Le Sacre de l'Écrivain</surname>
          </string-name>
          ,
          <fpage>1750</fpage>
          -
          <lpage>1780</lpage>
          .
          <article-title>Essai sur l'avènement d'un pouvoir spirituel laı̈que dans la France moderne</article-title>
          . Paris: Joseph Corti,
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bronner. La</surname>
          </string-name>
          <article-title>Démocratie des crédules</article-title>
          . Paris: Presses Universitaires de France,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burrows</surname>
          </string-name>
          . “'Delta'
          <article-title>: a Measure of Stylistic Diference and a Guide to Likely Authorship”</article-title>
          .
          <source>In: Literary and Linguistic Computing 17.3</source>
          (
          <issue>2002</issue>
          ), pp.
          <fpage>267</fpage>
          -
          <lpage>287</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/17.3. 267.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cafiero</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gabay</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Puren</surname>
          </string-name>
          . “
          <article-title>La naissance du style: auteur vs genre aux XVIIe et XIXe siècles”</article-title>
          . In:
          <article-title>Humanistica 2020 - Archives du colloque</article-title>
          . Bordeaux, France: Humanistica,
          <year>2020</year>
          . url: https://hal.archives-ouvertes.fr/hal-02577853.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>J.-B. Camps</surname>
            and
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Cafiero</surname>
          </string-name>
          . “
          <article-title>Why Molière most likely did write his plays”</article-title>
          .
          <source>In: Science Advances 5.1</source>
          (
          <year>2019</year>
          ). url: https://advances.sciencemag.org/content/5/11/eaax5489.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J.-B. Camps</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Clérice</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinche</surname>
          </string-name>
          . “
          <article-title>Stylometry for Noisy Medieval Data: Evaluating Paul Meyer's Hagiographic Hypothesis”</article-title>
          .
          <source>In: Digital Scholarship in the Humanities</source>
          <volume>36</volume>
          (
          <year>2021</year>
          ). url: http://arxiv.org/abs/
          <year>2012</year>
          .03845.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Choiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eder</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. Rybicki. “Harper</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <article-title>Other People: A Stylometric Diagnosis”</article-title>
          .
          <source>In: Mississippi Quarterly 70.3</source>
          (
          <issue>2017</issue>
          ), pp.
          <fpage>355</fpage>
          -
          <lpage>374</lpage>
          . doi:
          <volume>10</volume>
          .1353/mss.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>0022. url: https://muse.jhu.edu/article/747862.</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>W.</given-names>
            <surname>Daelemans</surname>
          </string-name>
          . “
          <article-title>Explanation in Computational Stylometry”</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Text Processing. Ed. by A. Gelbukh. Lecture Notes in Computer Science</source>
          . Berlin, Heidelberg: Springer,
          <year>2013</year>
          , pp.
          <fpage>451</fpage>
          -
          <lpage>462</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -37256- 8\_
          <fpage>37</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Evert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Proisl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jannidis</surname>
          </string-name>
          , I. Reger,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pielström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schöch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Vitt</surname>
          </string-name>
          . “
          <article-title>Understanding and explaining Delta measures for authorship attribution”</article-title>
          .
          <source>In: Digital Scholarship in the Humanities</source>
          <volume>32</volume>
          (
          <issue>suppl</issue>
          _2
          <year>2017</year>
          ), pp.
          <fpage>ii4</fpage>
          -
          <lpage>ii16</lpage>
          . doi:
          <volume>10</volume>
          .1093/llc/fqx023.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fièvre</surname>
          </string-name>
          . Théâtre classique.
          <year>2007</year>
          . url: http://www.theatre-classique.fr.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bartz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deguin</surname>
          </string-name>
          . “
          <article-title>CORPUS17: a philological corpus for 17th c</article-title>
          .
          <article-title>French”</article-title>
          .
          <source>In: Proceedings of the 2nd International Digital Tools &amp; Uses Congress (DTUC '20)</source>
          . Hammamet, Tunisia,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1145/3423603.3424002.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gabay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Clérice</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Camps</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-B. Tanguy</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gille-Levenson</surname>
          </string-name>
          .
          <article-title>“Standardizing linguistic data: method and tools for annotating (pre-orthographic) French”</article-title>
          .
          <source>In: Proceedings of the 2nd International Digital Tools &amp; Uses Congress (DTUC '20)</source>
          . Hammamet, Tunisia,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1145/3423603.3423996.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>R.</given-names>
            <surname>Garrette</surname>
          </string-name>
          . La Phrase de Racine :
          <article-title>étude stylistique et stylométrique</article-title>
          .
          <article-title>Champs du signe: sémantique, rhétorique, poétique</article-title>
          . Toulouse: Presses universitaires du Mirail,
          <year>1995</year>
          . 331 p. url: http://data.rero.ch/01-2209268/html.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gonon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Goossens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kraif</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Novakova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sorba</surname>
          </string-name>
          . “
          <article-title>Motifs textuels spécifiques au genre policier et à la littérature 'blanche'”</article-title>
          .
          <source>In: SHS Web of Conferences</source>
          <volume>46</volume>
          (
          <year>2018</year>
          ), p.
          <fpage>06007</fpage>
          . doi:
          <volume>10</volume>
          .1051/shsconf/20184606007.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H. v.</given-names>
            <surname>Halteren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Baayen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tweedie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haverkort</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. Neijt. “</surname>
          </string-name>
          <article-title>New Machine Learning Methods Demonstrate the Existence of a Human Stylome”</article-title>
          .
          <source>In: Journal of Quantitative Linguistics 12.1</source>
          (
          <issue>2005</issue>
          ), pp.
          <fpage>65</fpage>
          -
          <lpage>77</lpage>
          . doi:
          <volume>10</volume>
          .1080/09296170500055350.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [27]
          <string-name>
            <surname>J. B. Herrmann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Schöch</surname>
            , and
            <given-names>K. van Dalen-Oskam.</given-names>
          </string-name>
          “
          <article-title>Revisiting Style, a Key Concept in Literary Studies”</article-title>
          .
          <source>In: Journal of Literary Theory</source>
          <volume>9</volume>
          .1 (
          <year>2015</year>
          ). doi:
          <volume>10</volume>
          .1515/jlt-2015-0003.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>F.</given-names>
            <surname>Husson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lè</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pagès. Exploratory Multivariate Analysis by Example Using R. Computer</surname>
          </string-name>
          <article-title>Science and Data Analysis Series</article-title>
          . Boca Raton London New York: Chapman and Hall/CRC,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>C.</given-names>
            <surname>Labbé</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Labbé</surname>
          </string-name>
          . “
          <string-name>
            <surname>Inter-Textual Distance</surname>
            and
            <given-names>Authorship</given-names>
          </string-name>
          <string-name>
            <surname>Attribution</surname>
          </string-name>
          .
          <article-title>Corneille and Molière”</article-title>
          .
          <source>In: Journal of Quantitative Linguistics 8.3</source>
          (
          <issue>2001</issue>
          ), pp.
          <fpage>213</fpage>
          -
          <lpage>231</lpage>
          . url: https: //halshs.archives-ouvertes.fr/halshs-00139671.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>D.</given-names>
            <surname>Labbé</surname>
          </string-name>
          . “
          <article-title>Comédiens et écrivains au XVIIe siècle</article-title>
          .
          <article-title>À la redécouverte des frères Corneille”</article-title>
          . In: Séminaire de stylistique française. Cologne, Germany,
          <year>2011</year>
          . url: https : / / halshs . archives-ouvertes.fr/halshs-00657083.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>D.</given-names>
            <surname>Labbé</surname>
          </string-name>
          . “Jean Racine, plume de l'ombre ?” In:
          <article-title>Séminaire Linguistique du français moderne</article-title>
          . Neuchâtel, Switzerland,
          <year>2017</year>
          . url: https : / / hal . archives - ouvertes . fr / hal - 01480917.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>D.</given-names>
            <surname>Labbé</surname>
          </string-name>
          .
          <article-title>Si deux et deux sont quatre, Molière n'a pas écrit Dom Juan</article-title>
          ...: Essais - documents. Paris: Max Milo Editions,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lê</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Josse</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Husson</surname>
          </string-name>
          . “
          <article-title>FactoMineR: A Package for Multivariate Analysis”</article-title>
          .
          <source>In: Journal of Statistical Software 25.1</source>
          (
          <issue>2008</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          . doi:
          <volume>10</volume>
          .18637/jss.v025.
          <year>i01</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [35]
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>v</year>
          . d. Maaten and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          . “
          <article-title>Visualizing Data using t-SNE”</article-title>
          .
          <source>In: Journal of Machine Learning Research 9.86</source>
          (
          <year>2008</year>
          ), pp.
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          . url: http://jmlr.org/papers/v9/ vandermaaten08a.html.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>H.</given-names>
            <surname>Moisl</surname>
          </string-name>
          . “
          <article-title>Finding the Minimum Document Length for Reliable Clustering of MultiDocument Natural Language Corpora”</article-title>
          .
          <source>In: Journal of Quantitative Linguistics 18.1</source>
          (
          <issue>2011</issue>
          ), pp.
          <fpage>23</fpage>
          -
          <lpage>52</lpage>
          . doi:
          <volume>10</volume>
          .1080/09296174.
          <year>2011</year>
          .
          <volume>533588</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [37] [38]
          <string-name>
            <surname>G. Molinié.</surname>
          </string-name>
          “Sémiostylistique : à propos de Proust”.
          <source>In: Versants: revue suisse des littératures romanes 18</source>
          (
          <year>1990</year>
          ), pp.
          <fpage>21</fpage>
          -
          <lpage>30</lpage>
          . doi:
          <volume>10</volume>
          .5169/seals-259858.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Molinié</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Viala</surname>
          </string-name>
          . Approches de la réception.
          <source>Perspectives littéraires</source>
          . Paris: Presses Universitaires de France,
          <year>1993</year>
          . doi:
          <volume>10</volume>
          .3917/puf.molin.
          <year>1993</year>
          .
          <volume>01</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>J.</given-names>
            <surname>Molino</surname>
          </string-name>
          . “
          <article-title>Pour une théorie sémiologique du style”</article-title>
          . In:
          <article-title>Qu'est-ce que le style</article-title>
          ? Paris: Presses Universitaires de France,
          <year>1994</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mosteller</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Wallace</surname>
          </string-name>
          . “
          <article-title>Inference in an Authorship Problem”</article-title>
          .
          <source>In: Journal of the American Statistical Association</source>
          <volume>58</volume>
          .302 (
          <year>1963</year>
          ), pp.
          <fpage>275</fpage>
          -
          <lpage>309</lpage>
          . doi:
          <volume>10</volume>
          .2307/2283270.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>C.</given-names>
            <surname>Muller</surname>
          </string-name>
          . “Les 'pronoms de dialogue'
          <article-title>: interprétation stylistique d'une statistique de mots grammaticaux”</article-title>
          . In:
          <article-title>Langue française et linguistique quantitative</article-title>
          . Travaux de linguistique quantitative.
          <source>Genève: Slatkine</source>
          ,
          <year>1979</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [42]
          <string-name>
            <surname>G. Philippe.</surname>
          </string-name>
          “
          <article-title>Traitement stylistique et traitement idiolectal des singularités langagières”.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>In</surname>
          </string-name>
          : Cahiers de praxématique
          <volume>44</volume>
          (
          <year>2005</year>
          ), pp.
          <fpage>77</fpage>
          -
          <lpage>92</lpage>
          . doi:
          <volume>10</volume>
          .4000/praxematique.1659.
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>F.</given-names>
            <surname>Rastier</surname>
          </string-name>
          .
          <article-title>Sémantique interprétative</article-title>
          .
          <source>Formes sémiotiques</source>
          . Paris: Presses Universitaires de France,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rebora</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Salgaro</surname>
          </string-name>
          . “Is '
          <article-title>Late Style' measurable? A stylometric analysis of Johann Wolfgang Goethe's, Robert Musil's, and Franz Kafka's late works”</article-title>
          .
          <source>In: Elephant &amp; Castle: laboratorio dell'immaginario 18</source>
          (
          <year>2018</year>
          ), pp.
          <fpage>4</fpage>
          -
          <lpage>39</lpage>
          . url: https://www.dlls.univr.it/?ent=
          <source>pubbdip%5C&amp;id=988359.</source>
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          . “
          <article-title>A visual display for hierarchical classification”</article-title>
          .
          <source>In: Data Analysis and Informatics</source>
          <volume>4</volume>
          (
          <year>1986</year>
          ), pp.
          <fpage>743</fpage>
          -
          <lpage>748</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>U.</given-names>
            <surname>Sapkota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bethard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Solorio</surname>
          </string-name>
          . “
          <string-name>
            <surname>Not All Character N-grams Are Created</surname>
          </string-name>
          <article-title>Equal: A Study in Authorship Attribution”</article-title>
          .
          <source>In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          .
          <year>2015</year>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>102</lpage>
          . doi:
          <volume>10</volume>
          .3115/v1/
          <fpage>N15</fpage>
          -1010.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>J.</given-names>
            <surname>Scherer</surname>
          </string-name>
          .
          <source>La Dramaturgie classique en France. 1 vols</source>
          . Paris: Nizet,
          <year>1950</year>
          . 488 pp.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schöch</surname>
          </string-name>
          . “
          <article-title>Fine-tuning Stylometric Tools: Investigating Authorship and Genre in French Classical Theater”</article-title>
          . In: DH2013 conference - Book of abstracts.
          <source>Lincoln (NE)</source>
          ,
          <year>2013</year>
          . url: http://dh2013.unl.edu/schedule-and-events/program/.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>C.</given-names>
            <surname>Schöch</surname>
          </string-name>
          . “
          <article-title>Zeta für die kontrastive Analyse literarischer Texte: Theorie, Implementierung, Fallstudie”</article-title>
          . In: Quantitative Ansätze in den Literatur- und
          <string-name>
            <surname>Geisteswissenschaften</surname>
          </string-name>
          :
          <article-title>Systematische und historische Perspektiven</article-title>
          . Berlin: de Gruyter,
          <year>2018</year>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>94</lpage>
          . doi:
          <volume>10</volume>
          .1515/9783110523300.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Sjöblom</surname>
          </string-name>
          . “L'
          <article-title>indice pronominal est-il encore d'actualité ?”</article-title>
          <source>In: Lexicometrica</source>
          <volume>5</volume>
          (
          <year>2004</year>
          ). url: http://lexicometrica.univ-paris3.fr/article/numero5.htm.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>L.</given-names>
            <surname>Spitzer</surname>
          </string-name>
          . “
          <article-title>Die klassische Dämpfung in Racines Stil”</article-title>
          .
          <source>In: Archivum romanicum 12</source>
          (
          <year>1928</year>
          ), pp.
          <fpage>361</fpage>
          -
          <lpage>472</lpage>
          . url: http://digitale.bnc.roma.sbn.it/tecadigitale/giornale/TO00176940/ 1928/unico/00000379.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>L.</given-names>
            <surname>Spitzer</surname>
          </string-name>
          . Etudes de style.
          <source>Trans</source>
          . by
          <string-name>
            <given-names>E.</given-names>
            <surname>Kaufholz</surname>
          </string-name>
          .
          <source>Bibliothèque des idées Gallimard</source>
          . Paris: Gallimard,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>L.</given-names>
            <surname>Spitzer</surname>
          </string-name>
          . “
          <article-title>The Muting Efect of Classical Style in Racine (</article-title>
          <year>1928</year>
          )
          <article-title>”</article-title>
          . In: Racine:
          <string-name>
            <given-names>Modern</given-names>
            <surname>Judgements. Trans. by R. C.</surname>
          </string-name>
          <article-title>Knight</article-title>
          . Modern Judgements. London: Macmillan
          <string-name>
            <surname>Education</surname>
            <given-names>UK</given-names>
          </string-name>
          ,
          <year>1969</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>131</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-1-
          <fpage>349</fpage>
          -15297-1\_9.
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [54]
          <string-name>
            <surname>J.-M. Thomasseau</surname>
          </string-name>
          . “
          <article-title>Pour une analyse du para-texte théâtral : quelques éléments du para-texte hugolien”</article-title>
          .
          <source>In: Littérature 53.1</source>
          (
          <issue>1984</issue>
          ), pp.
          <fpage>79</fpage>
          -
          <lpage>103</lpage>
          . doi:
          <volume>10</volume>
          .3406/litt.
          <year>1984</year>
          .
          <volume>2218</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>S.</given-names>
            <surname>Vaudrey-Luigi</surname>
          </string-name>
          . “
          <article-title>De la signature stylistique à la reconnaissance d'un style d'auteur”</article-title>
          .
          <source>In: Le francais aujourd'hui n°175.4</source>
          (
          <issue>2011</issue>
          ), pp.
          <fpage>37</fpage>
          -
          <lpage>46</lpage>
          . url: https://www.cairn.
          <article-title>info/revuele-francais-aujourd-</article-title>
          <string-name>
            <surname>hui-</surname>
          </string-name>
          2011-4-page-37.htm.
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Ward</surname>
          </string-name>
          . “
          <article-title>Hierarchical Grouping to Optimize an Objective Function”</article-title>
          .
          <source>In: Journal of the American Statistical Association</source>
          <volume>58</volume>
          .301 (
          <year>1963</year>
          ), pp.
          <fpage>236</fpage>
          -
          <lpage>244</lpage>
          . doi:
          <volume>10</volume>
          .1080/01621459.
          <year>1963</year>
          .
          <volume>10500845</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>