=Paper=
{{Paper
|id=Vol-3290/long_paper988
|storemode=property
|title=The Process of Imitatio Through Stylometric Analysis: the Case
of Terence’s Eunuchus
|pdfUrl=https://ceur-ws.org/Vol-3290/long_paper988.pdf
|volume=Vol-3290
|authors=Andrea Peverelli,Marieke van Erp,Jan Bloemendal
|dblpUrl=https://dblp.org/rec/conf/chr/PeverelliEB22
}}
==The Process of Imitatio Through Stylometric Analysis: the Case
of Terence’s Eunuchus==
The Process of Imitatio Through Stylometric Analysis: the Case of Terence’s Eunuchus Andrea Peverelli1,2,∗ , Marieke van Erp2 and Jan Bloemendal1 1 Huygens Institute, Oudezijds Achterburgwal 185, 1012 DK Amsterdam, the Netherlands 2 KNAW Humanities Cluster, DHLab, Oudezijds Achterburgwal 185, 1012 DK Amsterdam, the Netherlands Abstract The Early Modern Era is at the forefront of a widespread enthusiasm for Latin works: texts from classical antiquity are given new life, widely re-printed, studied and even repeatedly staged, in the case of dramas, throughout Europe. Also, new Latin comedies are again written in quantities never seen before (at least 10,000 works published 1500 to 1800 are known). The authors themselves, within the game of literary imitation (the process of imitatio), start to mimic the style of ancient authors, and Terence’s dramas in particular were considered the prime sources of reuse for many decades. Via a case study ”the reception of Terence’s Eunuchus in Early Modern literature”, we take a deep dive into the mechanisms of literary imitation. Our analysis is based on four comedy corpora in Latin, Italian, French and English, spanning roughly 3 centuries (1400-1700). To assess the problem of language shi昀琀 and multi-language inter- corpora analysis, we base our experiments on translations of the Eunuchus, one for each sub-corpus. Through the use of tools drawn from the 昀椀eld of Stylometry, we address the topic of text reuse and textual similarities between Terence’s text and Early-Modern corpora to get a better grasp on the internal 昀氀uctuations of the imitation game between Early Modern and Classical authors. Keywords Neo-Latin, text reuse, Neo-Latin, textual similarity, computational literary studies, stylometry 1. Introduction In the last few decades, Stylometry has been used to track authorial signals and stylistic similar- ities between authors with great e昀昀ectiveness ([9] and [11]). Stylometric tools can be a useful means to help clarify issues of style, relationship and network construction, and it has become a paramount methodology in the 昀椀eld of Computational Literary Studies and Stylistics. While eminently a distant reading environment, it can account for general and speci昀椀c features of correlation and possible connection between sets of corpora, o昀琀en with high precision results (cf. [23, 32]). Literary scholars can therefore be presented with new perspectives and de昀椀ni- tive evidence; as stated by [25] on the usefulness of Stylometry in literary studies: ”literary interpretations can be focused, with computational precision, on the relevant passages. In us- CHR 2022: Computational Humanities Research Conference, December 12 – 14, 2022, Antwerp, Belgium ∗ Corresponding author. £ andrea.peverelli@huygens.knaw.nl (A. Peverelli); marieke.van.erp@dh.huc.knaw.nl (M. v. Erp); jan.bloemendal@huygens.knaw.nl (J. Bloemendal) ç https://andreapeverelli.com/ (A. Peverelli); https://mariekevanerp.com (M. v. Erp); http://https://www.huygens.knaw.nl/en/medewerkers/jan-bloemendal-2/ (J. Bloemendal) ȉ 0000-000-0000-0000 (A. Peverelli); 0000-0001-9195-8203 (M. v. Erp); 0000-0002-5768-9932 (J. Bloemendal) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 337 ing such methods, [...] [Stylometry] uses computer-assisted criticism to shed new light on a pre-existing concern”. Our paper gives an account of the precise spots where the two drama corpora (Terence and Early Modern comedies) overlap and where Terence’s features are most prominent in relation to each textual turning point. The primary scope of our paper is thus to demonstrate the interference of Terence’s most renowned piece, the Eunuchus, in Early Modern drama writing, by analysing the 昀氀uctuations in preference and similarity that di昀昀erent authors might display. Our aim is to account for this deep and complex imitation game between Early Modern and classical authors from a distant reading point of view. Through the use of stylo- metric methodologies applied to a case study, we set the base ground for a wider inquiry into the intricate phenomena behind the choice of classical models and how these operate in the background of modern writing. Our main contribution is a methodology: gathering of a suit- able corpus, analysing the texts and building a stable network of interconnected plays. This new set of correlated analyses on similarity and dissimilarity between corpora can in turn be then replicated on wider cases. The remainder of this paper is structured as follows. Section 2 gives a brief overview of the related research. Section 3 presents the data and the process of structuring our corpus. Section 4 sketches our experimental setup. Section 5 gives a detailed analysis of the results and a discussion on a literary history level. Section 6 concludes the paper with an overview on future work. 2. Related Work Our processing line builds on very stable and already frequented ground. Stylometric tools of di昀昀erent origin and applications have been gaining popularity since the work of Burrows, Holmes, and Craig in the early ’90s [4, 16, 7]. For a comprehensive overview of the set of tools o昀昀ered by Stylometry, the main references are [23] and [12] (which o昀昀er a complete rundown of the usage of the main R library for stylometry, stylo). A number of stylometric studies has been devoted to tasks such as authorship attribution and detection (cf. [5, 17, 20, 27, 22, 8]): which is di昀昀erent from our goal, but can serve as a layout for a more general analysis on style and similarities between authors, and, when needed, we point out the di昀昀erences in approach and scope throughout our paper. More akin to our topic is that of style variation analysis and stylistic similarities (cf. [32] for an overview): [25, 6, 30] apply stylometric analysis tools to the study of contemporary (19th-20th century) authors’ style, while [9, 14, 26] take di昀昀erent perspective on text reuse for ancient historical languages, also implementing network analysis through Stylometry. The core statistics and algorithms we use in this paper are Burrows’ Delta and Sequential Stylometric Analysis (SSA), being already known for their e昀케ciency in the 昀椀eld of stylistic analysis. For an in-depth overview of Burrows’ Delta and, in general, Delta analysis for textual similarity tasks, see [18, 3, 1], while for a more hands-on application of Burrows’ Delta in a literary case study we suggest [24]. As for Sequential Stylometric Analysis (SSA), [10] is paramount, while for a more in-depth explanation of the usage of the NSC (Nearest Shrunken Centroids) algorithm in SSA, see [22, 30, 29, 28]. 338 3. Data and Corpus Construction We collected a corpus of 85 comedy pieces from the DraCor Project1 database (English, French, and Italian) and the Translatin repository (Latin).2 The original classical Latin text of the Eu- nuchus is taken from the LASLA corpus.3 The corpus statistics are shown in Table 1 Table 1 Corpus statistics for the di昀昀erent language corpora in terms of number of texts, period covered and number of tokens. Language # of Texts # Tokens Time span Latin 15 160,741 1510-1639 French 34 456,493 1634-1668 Italian 21 409,000 1496-1761 English 15 250,646 1592-1611 Total 85 1,276,879 1496-1761 As is visible from the time spans, the covered period varies, but it 昀椀ts the general bound- aries of the Modern Era (roughly: early 15th - late 18th centuries). A practical reason for this diversi昀椀ed selection is due to the selection o昀昀ered by DraCor and in literary history: the two most famous Italian comedy writers of the Modern Era, Ariosto and Goldoni, are set apart by 2 centuries, while the vast majority of the most important drama writers of France’s Modern Era (Molière, Racine, Corneille) are found in the middle of the 17th century. For English, Dra- Cor only possessed the complete Shakespeare corpus, from which we selected 15 comedies. Finally, we added a random selection of Neo-Latin works from a wide variety of authors, na- tionalities and years of production, drawn from our project repository, automatically cleaned and manually checked from a previous OCR process.4 For this whole experiment, we wanted the translations to be in the background and cause as little noise as possible to keep the focus on the original work by Terence. The translations were thus gathered according to the following criteria: 1. As close as possible (time-wise) to the period span of its related sub-corpus; 2. Freely available and downloadable; 3. A philological translation, as close as possible (style-wise) to the original from Terence. The 昀椀rst and third criteria helped make up for the linguistic and stylistic divide. If contem- porary translations had been taken into consideration, the experiment would have been inval- idated from the start, the language of those translations being too distant from their relative Modern Era counterparts. The third point poses another subtle but important aspect: the trans- lation must be ”philological”, thus as close as possible to the unadulterated original. This rules 1 https://dracor.org/ Last visited: 29 August 2022 2 https://www.translatin.nl 3 https://www.lasla.uliege.be/cms/c_8508894/fr/lasla Last visited: 29 August 2022 4 The whole corpus, together with the parameter speci昀椀cations for the Stylo code, can be found in our GitHub repository: https://github.com/AndrewPeverells/The-Imitation-Game 339 out, for example, an ”artistic” translation of Terence’s works by Nicolò Machiavelli, supposedly of very little rigorousness as far as adherence to the original was concerned.5 Spelling variants were uni昀椀ed with a layer of pre-processing from the CLTK/NLTK pipeline.6 The translations meeting all criteria at the time of our experiment are: • FRENCH - H. Clouard, 1937 • ITALIAN - L. Perelli, 1869 • ENGLISH - G. Colman, 1768 The Italian and French translations, while rather late, are still su昀케ciently suitable, as French altered very little from the end of the 17th century, while Perelli’s translation of 1869 still falls roughly under the same pre-uni昀椀ed contemporary Italian. 4. Experiments In this section, we given an overview of our experiments. We start with a preliminary ex- ploration of our dataset using a clustering algorithm to identify general groupings within our corpus. Then we analyse our texts in their sequential development, with the aim of identifying overlaps and distances between our modern drama sub-corpus and their relative translations of the Eunuchus, to accurately identify modern authorial 昀椀ngerprints and takeovers against Terence’s piece, throughout the succession of acts and scenes of the original. 4.1. Experimental Setup As a pre-processing step, we divided our corpus into two distinct sets: the primary set (test set), composed of the 81 modern texts, subdivided into four language corpora; and the secondary set (training or reference set), composed of the four versions of the Eunuchus, again subdivided per language. The training or reference set is the relevant Eunuchus translation, or the ”known” text, against which our test is to be conducted to identify authorial signals and similarities. For the 昀椀rst experiment, the clustering analysis, this subdivision is irrelevant, since we need to distribute all the texts together into di昀昀erent clusters, and we need the Eunuchus to be evident in our clusters. The general parameters that were kept for both sets of experiments are the following, based on our previous experiment [26]: • To overcome issues related with an arbitrary selection of MFWs (Most Frequent Words), we ran several trials on every setting from 100 to 1500 MFWs.7 . We found 300 MFWs to be the most suitable for our experiment, as the texts are not too long (they rarely exceed 15,000 words) and we noticed that, over this number, the clustering began to gradually merge every branch together. This was already observed in our recent experiment [26] and is con昀椀rmed by [18]. This somewhat low parameter of 300 MFWs makes the set of words on which our analysis is conducted almost entirely comprised of function words 5 Cf. the article on Terence from the Italian Enciclopedia Machiavelliana 6 NLTK, and CLTK Last visited: 22 August 2022 7 A practice already well-established in the 昀椀eld of Stylometry: [11] 340 Figure 1: Cluster analysis of the English sub-corpus, Classic Delta distance, 300 Most Frequent Words. Plays are clustered together in branches based on their similarity, with ancestors reuniting them at the root. The score on the x-axis at the bottom indicates the level of Delta (distance) between branches, the lower this score (and the shorter the distance between the texts) the more similar the plays are between themselves. and a handful of high-frequency semantically meaningful words, mainly related to the dramatic language (e.g. for English ”lord, father, lady, pray, brother, true”, or for Latin ”quaeso, mehercle, dico, amor, senex”); • Contractions were not removed; • No 昀椀ltering of the function words was introduced, thus keeping the texts as they appear originally. Interjections and personal pronouns especially were not removed on purpose, as they are a vital part of the dramatic language; • Maximum culling was set to 20%, meaning that a word needs to appear in at least 20% of the text, to eliminate some background noise. 4.2. Clustering The 昀椀rst experiment set is a cluster analysis produced via a Delta algorithm. The selected delta measure is Burrows [3], which is widely used for stylistic analysis as a reliable similarity mea- sure between two candidates (cf. [33]). Burrows’ Delta is also con昀椀rmed to be better suited for shorter-vector corpora [26, 21]. Although Cosine measures are con昀椀rmed to outscore others, it is repeatedly stated that it depends on many factors, such as corpus selection, choice of MFWs, type of language and length of vectors [21, 13]. As we are looking for evidence that Cosine mea- sures outscore other traditional measures even in shorter-vector literary texts (10,000-15,000 words) with a lower selection of MFWs for tasks other than authorship attribution, we decided 341 Figure 2: Cluster analysis of the Latin sub-corpus, Classic Delta distance, 300 Most Frequent Words. Plays are clustered together in branches based on their similarity, with ancestors reuniting them at the root. The score on the x-axis at the bottom indicates the level of Delta (distance) between branches, the lower this score (and the shorter the distance between the texts) the more similar the plays are between themselves. to use Burrows’ Classic Delta. Burrows’ Delta is calculated as the geometric distance between ”two standard-deviation-normalised mean word frequencies” [1]. In summary, ”Delta may be viewed as an axis-weighted form of ‘nearest neighbor’ classi昀椀cation, where a test document is classi昀椀ed the same as the known document at the smallest ‘distance’” [1]. A simpli昀椀ed formula of Burrows’s Delta, which enriches Burrows’s original by taking into account standard deviations between word frequencies, is given in Equation 1: �㕛 (�㕛) 1 Δý (ÿ, ÿ ′ ) = ∑ |�㕓�㕖 (ÿ) − �㕓�㕖 (ÿ ′ )| (1) �㔎 �㕖=1 �㕖 Where n is the set of MFWs, the subscript B indicates the relation to Burrows’s original Delta, ′ |�㕓�㕖 (ÿ) − �㕓�㕖 (ÿ )| the computed normalised di昀昀erence between the frequency of a word �㕓�㕖 in a text D, and �㔎�㕖 its standard deviation. Using the 300 MFWs max-culling 20% parameters we introduced in Subsection 4.1, we cal- culated and produced clusters for each sub-corpus as shown in Figures 1-4. We then produced a slightly modi昀椀ed version of this delta analysis by implementing the Rolling Delta procedure initially proposed by [8] and implemented in the R Stylo package.8 This methodology, instead of inspecting the whole corpus as one batch and calculating the 8 https://www.rdocumentation.org/packages/stylo/versions/0.7.4/topics/rolling.delta 342 Figure 3: Cluster analysis of the French sub-corpus, Classic Delta distance, 300 Most Frequent Words. Plays are clustered together in branches based on their similarity, with ancestors reuniting them at the root. The score on the x-axis at the bottom indicates the level of Delta (distance) between branches, the lower this score (and the shorter the distance between the texts) the more similar the plays are between themselves. Delta distance from the sum frequency of two texts, subdivides every text into equal-sized windows, following the evolution (in words) of the reference set text, and then compares each of these windows for every text in the corpus. This is particularly useful to track stylistic shi昀琀s along the evolution of a text’s length. First, each text is divided into samples, or ”windows”. Then the centroid (C) is computed for the mean relative frequency of the n most frequent words of each window. The centroid C then consists of a one-dimensional vector composed of 3 elements: the mean frequencies (�㔇�㕖 ) computed against the relative frequencies (�㕤�㕖 ) of the window samples, and enriched with the standard deviation for each of the n’s (most frequent words) relative frequencies for each window sample. Finally, the standard Delta is computed for each window (W ) and its relative reference C (centroid), calculated as above as shown by Equation 2. �㕛 1 Δ(þ, �㕊 ) = ∑ |�㔇�㕖 (þ) − �㕓�㕖 (�㕊 )| (2) �㔎 �㕖=1 �㕖 (þ) A昀琀er having ”rolled” through each window, the result is the plotted Delta in an x,y axes space, where the x-axis corresponds to the evolution (in words) of the texts, and their relative window samples, against the reference set (the Eunuchus) and the y-axis to the Delta distance. Thus, the closer to the x-axis, the higher the similarity; the further away from the x-axis (= higher Delta), the lower the similarity. This methodology was originally developed for authorship 343 Figure 4: Cluster analysis of the Italian sub-corpus, Classic Delta distance, 300 Most Frequent Words. Plays are clustered together in branches based on their similarity, with ancestors reuniting them at the root. The score on the x-axis at the bottom indicates the level of Delta (distance) between branches, the lower this score (and the shorter the distance between the texts) the more similar the plays are between themselves. attribution and detection [5, 20, 8, 19], but the underlying tenets can be assimilated in our case. Normally, ”if the curve for a text shows a sudden drop, this may indicate a stylistic change in the test text, caused, for instance, by one author taking over from another” [27]: in our case, it indicates takeovers in an author’s style, therefore loci where an author is closer to Terence and actually re-using parts of his style (or the contrary, in case of high spikes on the y-axis). 4.3. SSA - Sequential Stylometric Analysis The second experiment focuses on the use of Sequential Stylometric Analysis [10], which takes the previous method of calculating distances between texts in regards to their evolu- tion in words (hence ”sequential”), and combines it with the application of a machine-learning algorithm, such as �㕘-Nearest Neighbor, Support Vector Machine, Naive Bayes, and Nearest Shrunken Centroid (NSC). We chose the NSC algorithm as it is already widely and successfully used in the 昀椀eld of Stylometry [30, 22, 29, 28]. As this classi昀椀cation methodology is also primarily used for au- thorship attribution and detection tasks, we here start from Burrows’ assumption of a ’closed game’ [3], i.e. the situation in which we know for certain that the author (or one of the au- thors) in the test set is the certain and true one amongst the candidates to be evaluated in an authorship detection task: our environment then shi昀琀s from an identi昀椀cation environment of one sample among others, to the next level of analysing an author’s stylistic imprint on the 344 Figure 5: Rolling Delta Analysis of the Latin sub-corpus. Di昀昀erent visualisation of Figure 2: the Delta here is plotted in an x,y axes space where the x-axis corresponds to the evolution (in words) of the texts, and their relative window samples, against the reference set, and the y-axis to the Delta distance. other candidates. The NSC algorithm is a form of feature selection and evaluation: it performs the evaluation of a sample, extracting classes of features and shi昀琀ing them towards the more central ones (centroids, interpreted as the geometric centre of a data distribution set and calculated as the mean average value of each feature) and removing the more distant ones as noise (shrinking) until only a few classes of features have an actual impact on the classi昀椀cation. The algorithm calculates the centroid for each class group in the dataset, thus assigning a model label to each class group based on its relative centroid; a昀琀er the shrinkage, the remaining centroids are the ones composing the general model on which the Delta distance is calculated. For an in-depth mathematical analysis of the NSC classi昀椀er applied to stylometry, see [29]. The NSC algorithm, built in the R package Stylo [12] for the ”Rolling” function, produced 4 di昀昀erent visualisations of our analysis, in which the bottom (bold) horizontal line indicates the 昀椀rst set of most probable candidates (i.e. the highest scoring ”closest” authors to the Eunuchus), while the second (lighter) one corresponds to the second set of most probable overlapping authors, based on the di昀昀erent class calculations from the algorithm. The thickness of the line also contributes visually to the analysis: a thicker line indicates more overlapping sets of features (thus closer stylistic similarity), and a thinner line marks a decreasing degree of similarity. 345 Figure 6: Sequential Stylometric Analysis (”Rolling Stylometry” methodology) of the English sub- corpus. Here each text from the test set is chunked into windows and analysed sequentially against the reference set (the Eunuchus). The x-axis corresponds to the evolution of the latter, the y-axis ac- counts for the usual Delta distance (expressed in the thickness of the horizontal line), while the di昀昀erent colours correspond to the di昀昀erent texts that are given as the most probable candidates for similarities. 5. In-depth Analysis In this section, we present and discuss the results from our experiments and combine them together to have a general framework of the authors’ preferences towards a model, thus gaining more insights on the general process of imitatio. First we discuss each sub-language corpus separately, followed by a summary of the general tendencies that stand out from our analysis. 5.1. Italian From the clustering part of the experiment (4) we note a clear closeness between earlier works (16th century) and the Eunuchus, despite its translation being closer to Goldoni. This similarity is especially evident for the works of Machiavelli (Mandragola and Clizia) and Ariosto: they were both notorious connoisseurs of Terence, the former even producing one of the 昀椀rst Italian translations of some of its works, while the latter was responsible for numerous re-enactments of Terence’s comedies (especially the Eunuchus), both in Latin and Italian, when he was in Ferrara at the Este court.9 ) We can therefore assume that Terence’s style was deeply rooted in their own, while the Eunuchus had little to no in昀氀uence in the later stages of Italian comedy production. The clear-cut distance between two texts closely related to Terence and the Este court, that was a the forefront of the revitalisation of Latin drama (especially Terence) at the end of the 15th century.10 Both Comedia di Timon Greco by Galeotto del Carretto (who dedicated his work to Beatrice d’Este) and the Comedia di Danae by Baldassarre Taccone (who was patronised 9 For an overview on Ariosto’s life and production and the literary role of the Este court in Italian late Renaissance, see [15] 10 For a more in-depth analysis on the matter [31] 346 Figure 7: Sequential Stylometric Analysis of the French sub-corpus. Here each text from the test set is chunked into windows and analysed sequentially against the reference set (the Eunuchus). The x-axis corresponds to the evolution of the latter, the y-axis accounts for the usual Delta distance (expressed in the thickness of the horizontal line), while the di昀昀erent colours correspond to the di昀昀erent texts that are given as the most probable candidates for similarities. by the Este in Mantova), while very close, are kept strictly separated from Terence. Therefore, the style of the Eunuchus does not seem to have had a particular in昀氀uence on their works, where one would expect so. From the SSA experiment (Figure 8), the closest relative to the Eunuchus appears to be I Suppositi, in the largest part, being closest to Terence’s play at the start and at the end. Both works are a switching doubles comedy, and in both many of the characters go undercover. We interpret this in the lights of the fact that the start and ending of a switching doubles comedy are especially important in the setting-up of the disguise and the 昀椀nal resolution of the misunderstanding at the very heart of the comedy, while the actual central plot of scheming and deception is le昀琀 to the invention of the author (being the central part, up until the very end of the piece, the most interpolated section). The turning point between the end of act IV and the start of act V, right before the 昀椀nal resolution, is instead taken by Ariosto’s Cassaria, a plautine-inspired comedy. 5.2. Latin The clustering algorithm automatically drew two very distinct clusters (see Figure 2), separat- ing the 16th century works from the 17th century ones, and the Eunuchus is the clear dominant model in the 昀椀rst 16th century cluster. This is con昀椀rmed by literary scholars in for example [26] and [2]. One exception is in the 1615 text by Jacob Bidermann. A possible explanation is that Bi- dermann is the only Jesuit in our 17th century cluster: our previous paper ([26]) con昀椀rmed the general tendency of 16th century catholic authors, such as Macropedius, Crocus, Diether, 347 Figure 8: Sequential Stylometric Analysis of the Italian sub-corpus. Here each text from the test set is chunked into windows and analysed sequentially against the reference set (the Eunuchus). The x-axis corresponds to the evolution of the latter, the y-axis accounts for the usual Delta distance (expressed in the thickness of the horizontal line), while the di昀昀erent colours correspond to the di昀昀erent texts that are given as the most probable candidates for similarities. Simonides and Schonaeus, to heavily favour Terence as a literary model (before switching to Seneca). From the sequential analysis (Figure 9), the closest overlapping relatives to the Eunuchus are Crocus and Macropedius, which take more than half of the work’s body. This is an already well- estabilished parallel, in line with the start of the 16th century’s widespread passion for terentian drama. As a general note, the Latin sub-corpus appears to be the one with the heaviest and most varied in昀氀uences from the Eunuchus, with the least branching in the clustering and the most numerous switches in author similarity in the SSA analysis. This con昀椀rms the heavy usage of textual instances from Neo-Latin authors towards their models, rather than an underlying echo of in昀氀uence: Neo-Latin authors read, copied, transcribed, imitated, staged, and taught ancient authors on a daily basis [2]. 5.3. French From the cluster analysis (Figure 3) we can observe the almost complete preponderance of Molière, the very minor presence of Corneille and complete absence of Racine, Fontaine and other minor authors. Furthermore, the complete distance of Fontaine’s Eunuque from the orig- inal model on which it is based stands out. Even though Fontaine’s work is based upon Ter- ence’s, his style is apparently very di昀昀erent (a consideration perfectly in line with the chosen methodology, stylometry, which is in most cases considered independent of content and se- mantics). From the sequential analysis (Figure 7), the absolute winners of this imitation game with Terence are Les Fourberies de Scapin by Molière and Les Fausses Vérités by Antoine d’Ouville. Both are comedies of love intrigue and infatuation of a man for a young girl that are stylistically 348 Figure 9: Sequential Stylometric Analysis of the Latin sub-corpus. Here each text from the test set is chunked into windows and analysed sequentially against the reference set (the Eunuchus). The x-axis corresponds to the evolution of the latter, the y-axis accounts for the usual Delta distance (expressed in the thickness of the horizontal line), while the di昀昀erent colours correspond to the di昀昀erent texts that are given as the most probable candidates for similarities. in昀氀uenced by the prior Italian comedy, which is in turn heavily Terentian. As for Ariosto, one particularl work (Les Fausses Vérités) takes the parts of I Suppositi, gathering heavy in昀氀uences from the very start and end of the Eunuchus; for the remaining part of the play, Les Fausses Vérités and Les Fourberies de Scapin battle themselves for predominance, continuously switching primacy for the in昀氀uence within the Eunuchus. Again, the most interpolated part seems to be the second half of Terence’s work, with act IV displaying the heaviest in昀氀uences. 5.4. English The English sub-corpus presents a di昀昀erent situation, because all the results are negative. From the clustering experiment (Figure 1) we note a clear-cut distance between the Eunuchus and the Shakespearean comedy corpus. This poses the following questions: could this distance be due to the translation not being contemporary to Shakespeare? although it is the oldest and the closest to the test corpus out of all the languages inquired? This also poses a question about style and language that deserves further investigation: is it the diachronical variation of language or is it Shakespeare’s style that sets them apart? Is this issue due to the language of Colman’s translation or is it entirely to be attributed to Shakespeare’s notoriously idiosyncratic and peculiar style, so that even a contemporary translation would not be su昀케cient to track similarities between his corpus and Terence’s works? This renders necessary a digitisation of Webbe’s translation (1638), which is, to our knowledge, the closest to Shakespeare’s times done by a ”professional”11 and, at the time of our experiment, not freely available. 11 In our experiment, at 昀椀rst, we used William Heming’s 1602 translation, but the results were even worse: the delta distance between Heming and Shakespeare’s comedies more than tripled relative to our current test Colman. 349 Finally, From the sequential analysis (Figure 6), the closest candidate is The Comedy of Er- rors, but it is a result that cannot be trusted: the overlapping line is almost negligibly thin, thus the score of Delta distance is comparably quite high. 5.5. General considerations From our experiments, and by looking at the results shown by the Rolling Delta visualisation on the primary sub-case study of Neo-Latin (Figure 5), we can draw some conclusions about the in昀氀uence of Terence’s Eunuchus on the Modern Era drama production, that give us some insights on the underlying process of imitation towards ancient authors. • Act IV seems to be the most interpolated and reused, as shown from the high coinci- dence of overlaps in the SSA. This indicates a preference, by modern authors, for taking inspiration from a speci昀椀c topos in classical theatre writing, as the fourth act always cor- responds to an escalation in the web of intrigues and a turning point in the general plot before the 昀椀nal settlement: we can then assume that a particularly animated style of narration is at work and modern authors tend to be close to it; • Act I and V, the opening and closing of the comedy, are always taken by a speci昀椀c work, from the modern perspective. Act I and V are inextricably tied to the speci昀椀c play’s plot and are usually made up almost exclusively of fast-paced spoken dialogues (monologues and reported speech, typically from serfs, taking up the middle of the play): the new characters are presented (Act I) and their misadventures are resolved in a turning of events (Act V), so it makes sense that only works with a similar story could tie to their speci昀椀c style, mimicking in particular the new characters’ exchange of gags and blows. From the Rolling Delta analysis, applied to the sub-case study of Neo-Latin against the orig- inal Eunuchus, we can note some clear patterns: • The imitation game follows a rough ups-and-downs style, with two clear areas of low and high delta, respectively: the end of act II, act III scene 5, the end of act IV, and the ending of the play; • The 昀椀rst low delta (= high similarity and overlap with the original) corresponds to the end of the initial story set-up and character presentation, where usually the characters starts to get into the thick of the machinations, con昀椀rming again that the usefulness of the original play stops when the plot overcomes the possible usage of styloms (that is, when modern plays’ stories become too distant from the Eunuchus to justify the re-use of style); • The 昀椀rst high delta (= low similarity) perfectly overlaps with the most famous of Ter- ence’s scenes: the rape scene of act III scene 5. Although in reported speech, this scene goes into details of the rape, and it is surprising that the Neo-Latin authors do not make use of the seduction and emotional violence styloms of Terence’s scene; This is probably due to Heming’s very loose translation: he was a playwright and a poet, not a translation expert, grammarian and language teacher as Webbe was. 350 • The second low delta corresponds to the end of act IV, and it ties in to the aforementioned turning point in the comedy’s structure; • The second, and last, high variation section coincides with the near-end of the play, but it shows a very interesting pattern: on the one hand, plays that originally turned out to be very close to the Eunuchus from the other parts of our experiment, at that point plunge even further down, reaching a new low delta and con昀椀rming their similarity in the important ending section of the play; on the other hand, works that originally turned out to be distant, sport an opposite fashion, going ever upward in their delta distance and reaching the second highest point of dissimilarity. 6. Conclusion and Future Work In this paper, we described a case study for the application of computational methods on the issue of assessing the process of imitatio between authors from the Early Modern Period and classical models. We started by gathering a corpus, consisting of one of Terence’s works as a case study, the Eunuchus, its translations in another three languages (English, Italian and French), and four sub-corpora of drama from the Early Modern Period, in the four respective languages, that served as the proper test set. We then explained the methodology we em- ployed for our experiment and the two di昀昀erent and complementary analysis it enabled us to perform: Cluster Analysis through Delta measures, and Sequential Stylometric Analysis. We then proceeded to the in-depth analysis of the results for each of the 4 languages, evaluating the peculiarities of each sub-corpus and the broader patterns that stood out. Finally, we drew some general conclusions on the process of modern authors’ imitation of classics within theatrical writing. By this, we achieved our initial, more general, aim of describing the methodology for multi- language literary studies that can be used for other case studies beyond Terence, for example: 20th century authors reusing Renaissance authors). Furthermore, provided the correct param- eters (such as the act-scene subdivision for drama or the verse-stanza structure for poetry), our methodology can be used not only for inquiring drama, but any other genre. The combinations coming out of the possibilities given by such methodology are copious, as many other studies showed in the past (Section 2). However, the chosen methodology has limitations. Stylometry only tackles issues of style in a purely ”formal” way, that is by only taking most frequent words,12 it is by no means a methodology for semantic analysis, and it only provides tools for a distant reading environment. Furthermore, it showed its internal limitations in the analysis of the English sub-corpus (Sub- section 5.4), when it yielded poor results both in the Cluster Analysis and the SSA. Conversely, Stylometry can o昀琀en catch hidden patterns especially thanks to its core features (distant read- ing environment and function words analysis): Stylometry has proven successful and useful when analysing literary corpora with the aim of building networks of common and dissimilar features, o昀琀en handling quite large sets of these features at the same time. To us therefore, 12 MFWs are o昀琀en going to be function words in a traditional literary work, but for other cases this depends on the type of input text. There is also no consensus on the exact de昀椀nition of function words. 351 Stylometry is not to be taken on its own, but to be combined with other methodologies that can complement its structural 昀氀aws, and in turn be enriched by Stylometry’s unique take. It is with these caveats in mind that we plan to expand our methodology with implementa- tions from other methods, primarily semantic analysis. Internally, one critical step to cement our 昀椀ndings would be to access to proper contemporary (to the Modern Era) translations of the Eunuchus, to get rid of every possible imprecision due to the distance between the author’s language and the translation’s very own, while one entire sub-project could devoted to com- paring di昀昀erent translations and how they perform. Furthermore, we deem a mandatory step to expand our reference corpus of classical drama writers to the point of including every Latin play and replicating our methodology on each one of them. Finally, enriching our corpus with new data from the copious drama production of the Modern Era would be a natural next step. This would bring our the ultimate goal of accounting for the complex issue of imitatio within Modern Era drama writing closer. Acknowledgements This research is conducted within the framework of the TransLatin project funded by the Dutch Research Council (NWO). References [1] S. Argamon. “”Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations””. In: Literary and Linguistic Computing 23, Issue 2 (2007), pp. 131–147. [2] J. Bloemendal and H. Nordland. ”Neo-Latin Drama in Early Modern Europe”. Brill, 2013. [3] J. F. Burrows. “”‘Delta’: a measure of stylistic di昀昀erence and a guide to likely authorship””. In: Literary and Linguistic Computing 17, Issue 3 (2002), pp. 267–287. [4] J. F. Burrows. ”Computation into Criticism: A Study of Jane Austen’s Novels and an Exper- iment in Method 1987”. Clarendon Pr, 1987. [5] J. F. Burrows. “”Never say always again: Re昀氀ections on the numbers game””. In: in Mc- Carty, W. (ed.), Text and Genre in Reconstruction: E昀昀ects of Digitalization on Ideas, Behav- iors, Products and Institutions (2010), pp. 13–36. [6] M. Choiński, M. Eder, and J. Rybicki. “”Harper Lee and other people: a stylometric diag- nosis””. In: Mississippi Quarterly 70/71, Issue 3 (2019), pp. 355–374. [7] H. Craig. “”Authorial attribution and computational stylistics: if you can tell authors apart, have you learned anything about them? ””. In: Literary and Linguistic Computing 14, Issue 1 (1999), pp. 103–113. [8] K. van Dalen-Oskam and J. van Zundert. “”Delta for Middle Dutch Author and Copy- ist Distinction in Walewein””. In: Literary and Linguistic Computing 22, Issue 3 (2007), pp. 345–362. [9] M. Eder. “”A bird’s-eye view of early modern Latin: Distant reading, network analysis, and style variation””. In: (2017). 352 [10] M. Eder. “”Rolling Stylometry””. In: Digital Scholarship in the Humanities 31, Issue 3 (2016), pp. 457–469. [11] M. Eder and J. Rybicki. “”Do birds of a feather really 昀氀ock together, or how to choose training samples for authorship attribution””. In: Literary and Linguistic Computing 28, Issue 2 (2012), pp. 229–236. [12] M. Eder, J. Rybicki, and M. Kestemont. “”Stylometry with R: a package for computational text analysis””. In: R Journal (2016). url: https://journal.r-project.org/archive/2016/RJ-2 016-007/index.html. [13] S. Evert, T. Proisl, T. Vitt, C. Schöch, F. Jannidis, and S. Pielström. “”Towards a better understanding of Burrows’s Delta in literary authorship attribution””. In: Proceedings of the Fourth Workshop on Computational Linguistics for Literature. Denver, Colorado, USA, 2015, pp. 79–88. [14] V. B. Gorman and R. J. Gorman. “”Approaching questions of text reuse in ancient greek using computational syntactic stylometry””. In: Open Linguistics 2, Issue 1 (2016), pp. 500–510. [15] P. Hainsworth and D. Robey. ”The Oxford Companion to Italian Literature”. Oxford Uni- versity Press, 2002. [16] D. I. Holmes. “”The Evolution of Stylometry in Humanities Scholarship””. In: Literary and Linguistic Computing 13, Issue 3 (1998), pp. 111–11. [17] D. Hoover. “”Statistical Stylistics and Authorship Attribution: an Empirical Investiga- tion””. In: Literary and Linguistic Computing 16, Issue 4 (2001), pp. 421–444. [18] D. Hoover. “”Testing Burrows’s Delta””. In: Literary and Linguistic Computing 19, Issue 4 (2004), pp. 453–475. [19] D. Hoover. “”The Full-Spectrum Text-Analysis Spreadsheet””. In: Dh2013 (2013), pp. 226– 228. [20] D. Hoover. “”The Tutor’s Story: A Case Study of Mixed Authorship””. In: English Studies 93, Issue 3 (2012), pp. 324–339. [21] F. Jannidis, S. Pielström, C. Schöch, and T. Vitt. “”Improving Burrows’ Delta – An empiri- cal evaluation of text distance measures””. In: Book of Abstracts of the Digital Humanities Conference 2015 , ADHO, UWS, (2015). Sidney, Australia, 2015. [22] M. Jockers, D. M. Witten, and C. S. Criddle. “”A comparative study of machine learning methods for authorship attribution””. In: Literary and Linguistic Computing 25, Issue 2 (2008), pp. 215–223. [23] K. Lagutina, N. Lagutina, E. Boychuk, I. Vorontsova, E. Shliakhtina, O. Belyaeva, I. Para- monov, and P. Demidov. “”A Survey on Stylometric Text Features””. In: 2019 25th Con- ference of Open Innovations Association (FRUCT). Ieee. 2019, pp. 184–195. [24] G. Lauer and F. Jannidis. “”Burrows’s Delta and Its Use in German Literary History””. In: Distant Readings. Topologies of German Culture in the Long Nineteenth Century. Camden House, 2014, pp. 29–54. 353 [25] J. O’Sullivan, K. Bazarnik, M. Eder, and J. Rybicki. “”Measuring Joycean In昀氀uences on Flann O’Brien””. In: Digital Studies/le Champ Numérique, 8(1), 6 (2018). [26] A. Peverelli, M. van Erp, and J. Bloemendal. “”Tracking Textual Similarities in Neo-Latin Drama Networks””. In: Proceedings of the 13th Conference on Language Resources and Eval- uation (LREC 2022). European Language Resources Association (ELRA). 2022, pp. 5295– 5303. [27] J. Rybicki, D. Hoover, and M. Kestermont. “”Collaborative authorship: Conrad, Ford and Rolling Delta””. In: Literary and Linguistic Computing 29, Issue 3 (2014), pp. 422–431. [28] G. B. Schaalje, P. J. Fields, and M. Roper. “”Stylometric Analyses of the Book of Mormon: A Short History””. In: Journal of Book of Mormon Studies 21 (2012). [29] G. B. Schaalje, P. J. Fields, M. Roper, and G. L. Snow. “”Extended nearest shrunken cen- troid classi昀椀cation: A new method for open-set authorship attribution of texts of varying sizes””. In: Literary and Linguistic Computing 26, Issue 1 (2011), pp. 71–88. [30] S. Schöberlein. “”Poe or not Poe? A stylometric analysis of Edgar Allan Poe’s disputed writings””. In: Digital Scholarship in the Humanities 32, Issue 3 (2016), pp. 643–659. [31] G. Torello-Hill. “”The Revival of Classical Roman Comedy in Renaissance Ferrara: From the Scriptorium to the Stage””. In: Terence between Late Antiquity and the Age of Printing. Brill Academic Pub, 2015. [32] E. Zangerle, M. Mayerl, M. Potthast, and B. Stein. “”Overview of the Style Change Detec- tion Task at PAN 2021””. In: Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum. Bucharest, Romania, 2021. [33] H. Þorgeirsson. “”How similar are Heimskringla and Egils saga? An application of Bur- rows’ delta to Icelandic texts””. In: European Journal of Scandinavian Studies 48, Issue 1 (2018). 354