=Paper= {{Paper |id=Vol-3834/paper86 |storemode=property |title=Univariate Statistical Analysis of a Non-Canonical Literary Genre. Quantifying German-Language One-Act Plays (1740–1850) |pdfUrl=https://ceur-ws.org/Vol-3834/paper86.pdf |volume=Vol-3834 |authors=Viktor J. Illmer,Dîlan Canan Çakir,Frank Fischer,Lilly Welz,Carsten Milling |dblpUrl=https://dblp.org/rec/conf/chr/IllmerC0WM24 }} ==Univariate Statistical Analysis of a Non-Canonical Literary Genre. Quantifying German-Language One-Act Plays (1740–1850)== https://ceur-ws.org/Vol-3834/paper86.pdf
                                Univariate Statistical Analysis of a Non-Canonical
                                Literary Genre
                                Quantifying German-Language One-Act Plays (1740–1850)

                                Viktor J. Illmer1,∗ , Dîlan Canan Çakir1,∗ , Frank Fischer1,∗ , Carsten Milling2 and
                                Lilly Welz1
                                1
                                    EXC 2020 Temporal Communities, Freie Univerisät Berlin, Germany
                                2
                                    CLS INFRA, University of Potsdam, Germany


                                              Abstract
                                              This article explores the use of metadata to analyse German-language one-act plays from 1740 to 1850,
                                              addressing the need to expand beyond canonical texts in literary studies. Utilising the Database of
                                              German-Language One-Act Plays, we examine aspects such as the number of scenes and characters as
                                              well as the role of different original languages on which the translated plays in the corpus are based.
                                              We find that one-act plays exhibit strong genre signals that set them apart from multi-act plays of the
                                              time. Our metadata-driven approach provides a comprehensive and statistically grounded understand-
                                              ing of the genre, demonstrating the potential of digital methods to enhance genre studies and overcome
                                              traditional limitations in literary scholarship.

                                              Keywords
                                              literary studies, drama, genre theory, univariate statistics, metadata




                                1. Introduction
                                It is an early promise of the digital humanities “to look beyond the canon” [19]. The specifics
                                of how this is to be achieved often remain unclear, whether due to obstacles in obtaining ap-
                                proporiate material, unfamiliarity with non-canonical sources, or simply resource constraints,
                                but the fact that this must happen has been emphasised time and again:

                                          “The literary scholar of the twenty-first century can no longer be content with
                                          anecdotal evidence, with random ‘things’ gathered from a few, even ‘representa-
                                          tive,’ texts. We must strive to understand these things we find interesting in the
                                          context of everything else, including a mass of possibly ‘uninteresting’ texts.” [15]

                                  At the time, Matthew Jockers, from whom the quote is taken, was aiming at full-text corpora,
                                which he examined using digital humanities methods such as stylometry and topic modelling.

                                CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark
                                ∗
                                 Corresponding author.
                                £ v.illmer@fu-berlin.de (V. J. Illmer); dilan.cakir@fu-berlin.de (D. C. Çakir); fr.fischer@fu-berlin.de (F. Fischer);
                                milling@uni-potsdam.de (C. Milling); l.welz@fu-berlin.de (L. Welz)
                                ȉ 0000-0002-7334-781X (V. J. Illmer); 0009-0001-0013-7205 (D. C. Çakir); 0000-0003-2419-6629 (F. Fischer);
                                0000-0003-0553-7512 (C. Milling); 0009-0008-0404-8147 (L. Welz)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




                                                                                                            1158
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
It was clear to him and his readers that he would never have all the 19th-century English-
language novels that could potentially belong to his working corpus available in full-text ver-
sions suitable for research.
   Recognising this problem, the editors of the European Literary Text Collection (ELTeC) pro-
posed a different kind of approach. Rather than aiming to collect as many full-texts as possible,
or a representative sample, they built several corpora that would only contain 100 novels from
a given language and would be balanced according to various criteria:

          “In the absence of exhaustive bibliographic records of novelistic production for
          most of the languages covered in ELTeC, no attempt at a randomly sampled,
          statistically representative corpus can reasonably be made. Instead, the corpus-
          composition criteria aim to ensure that the breadth and variety of novels produced
          during the period covered by ELTeC are well represented, while at the same time
          ensuring rough comparability across collections.” [21]

   The two corpus-based projects mentioned above have identified their own shortcomings,
shortcomings that we would like to partly overcome in this article. We refrain from analysing
full-text corpora and focus our attention on metadata, i.e. we use metadata to describe aspects
of literary history. In doing so, we refer to two types of metadata, “the kind of descriptive,
bibliographic metadata found in repositories such as library catalogues” [10] for one, but also
other metadata that can be systematically collected beyond bibliographic records.
   With help of the Database of German-Language One-Act Plays 1740–1850 (Einakter Database
hereafter) located at einakter.dracor.org, we will analyse the German-language one-act plays
that were written, performed and/or printed in the mentioned time span, and we will do this
on the basis of the aforementioned “exhaustive bibliographic records”.
   The assumption of actually operating on such exhaustive bibliographic records is based on
our consultation of all researchable sources available to us in our field (encyclopaedias, theatre
programmes, bibliographies, library catalogues) for the purpose of research – what is evidently
documented is therefore included in our database. We are therefore dealing with an extensive
representative sample of German-language one-act plays from 1740 to 1850, and we can de-
scribe core aspects of the genre1 with the means of statistics. Our analysis is guided by the
following five research questions:

     Q1 What is the proportion of subtitle-based categories among one-act plays?
     Q2 Is there a difference in the number of scenes per act between German one-act plays and
        other German plays of the same period?
     Q3 Is there a difference in the number of characters between German one-act plays and other
        German plays of the same period?
     Q4 What is the proportion of original languages among translated plays?
     Q5 Does an estimation of the mean number of characters based on Pazarkaya’s sample match
        the estimation based on the Einakter Database’s sample?


1
    We regard one-act plays as a genre in their own right; we follow the reasoning in [3].




                                                         1159
2. Genre theories in literary studies
Categorisation is one of the core tasks in literary studies; genre theories are almost as old
as literary studies themselves [2]. Yet, analyses of characteristics are often only heuristic ap-
proximations for a genre definition. Genre characteristics are essentially summarised in three
points, which describe that a genre has both necessary and alternative characteristics, that text
features are recognised as a convention by a temporal community at a certain time and are
marked with simple genre signals [13].
   Generally, these genre signals are formulated on the basis of a small group of (canonical)
texts and many paratexts and are rarely fundamentally reviewed or supplemented, usually for
pragmatic reasons.
   Economic scientific communication requires defined terms and genre characteristics, but
classical methods of literary studies rarely offer the necessary tools or resources to describe a
genre extensively, let alone exhaustively. Although the methodological alternatives are few in
practice (especially in the pre-digital era), researchers are often criticised for the numerical dis-
crepancy between the number of works known to exist and the number of works actually used
for analysis [28]. This approach of considering only a few selected works as examples regarded
as representative of the whole is a methodological shortcoming, without which, however, no
genre theories can be developed in literary studies – at least none that can be mastered by a
single person using the traditional tools of the field.
   With the help of the Einakter Database, questions of genre can be negotiated on a new,
broader and more inclusive digital and statistical basis [4]. Our aim is to analyse one-act plays
from the given time frame in their entirety in order to make statements about the characteristics
of the genre as a whole. Without access to full texts for the entire corpus, many questions will
have to remain unexplored for now. We will therefore focus on demonstrating that essential
literary-historical insights can also be derived from metadata.
   One of the last comprehensive works on one-acters in the 18th and early 19th centuries was
published about 50 years ago [20],2 and there, too, an attempt was made to describe the genre
with statistics and counts – however, only around 200 to 300 one-act plays were considered,
whereas the Einakter Database has so far assembled 2,568 one-act plays for the same period,
a whole order of magnitude more. As the database offers its content in a machine-readable
format, we can carry out statistical analyses to describe the generic characteristics of the genre.


3. Einakter Database
The Einakter Database recognises one-act plays as a distinct dramatic genre in the 18th and
early 19th centuries. Currently, it contains metadata on 2,586 one-act plays from 1740 to 1850
(Figure 1). Explicit labelling of a play as a play “in one act” (in the epitext or peritext) serves
as a criterion for inclusion in the corpus. This type of marking one-act plays first emerged in
the middle of the 18th century and, up to the middle of the 19th century, was associated with
very specific features, which will be discussed below.

2
    According to its title, Pazarkaya’s work only refers to the 18th century, but he does include plays from the 19th
    century in his corpus.




                                                         1160
   By focusing on the subtitle, the corpus for the present database is precisely defined, not
merely for pragmatic reasons (for more information on inclusion criteria, see [4]). This method
precludes further classification issues, as the works themselves bear the subtitle, thereby ac-
tively aligning themselves with a genre convention through their authors, editors, printers or
theatre directors. This does not exclude the possibility of some structural differences between
works with the same subtitles, which may lead to some being considered formally or themati-
cally atypical. Following the concept of family resemblance, it is assumed that the works share
several features, though never all simultaneously. Furthermore, this does not preclude the ex-
istence of plays that exhibit exactly the same typical features as those marked as one-act plays
but do not carry such a title. Such works are not considered in this study.
   As most of the plays in the corpus are non-canonical [3], only very few (114 plays, to be
precise3 ) are available as full text versions. Instead, we rely exclusively on metadata to analyse
the genre.




Figure 1: Screenshot of the einakter.dracor.org frontend. At the time of writing, the database contained
2,586 one-act plays.


  The database contains, among other things, information on the period of origin, the first
performance, the number of scenes, bibliographical details, links to digital copies, names and
gender of the characters, information on the setting, links to encyclopaedias and keywords on
the content of the plays (Figure 2).


4. Analysis
4.1. Subtitle categorisation
     Q1 What is the proportion of subtitle-based categories among one-act plays?

  Little is known about the genre of the one-act play in the period covered by this study. Ac-
cording to what is stated about them, the plays are mostly simple comedies [20]. This assertion


3
    We are referring to the plays in the German Drama Corpus at https://github.com/dracor-org/gerdracor (revision
    94678457159dbfd8961dd954c24ee5d3ce8a6e35). In contrast to the Einakter Database, however, this database
    does not contain all plays from the period under investigation, but a selection of mostly canonical texts [11].




                                                        1161
Figure 2: Detailed view of a single play in the database.


can be verified with the help of the subtitles of the one-act plays in the database. The subti-
tles not only contain information on the number of acts (“in one act”), but often also on the
subgenre (“comedy”, “tragedy”, etc.). They were grouped into categories using simple regular
expressions to account for common spelling variations (“Komödie”, “Comoedie”, “Komoedie”,
etc.).




                                                1162
Table 1
Distribution of categories in the dataset with binomial confidence intervals.
                      Category                      n   Percent       95% CI
                      Lustspiel      Comedy      1433      55.41   53.47   57.34
                      Schauspiel                  249       9.63    8.52   10.83
                      Posse          Farce        223       8.62    7.57    9.77
                      Nachspiel      Postlude      65       2.51    1.95    3.19
                      Drama                        61       2.36    1.81    3.02
                      Trauerspiel    Tragedy       61       2.36    1.81    3.02
                      Schwank                      51       1.97    1.47    2.58
                      Vorspiel       Prelude       46       1.78    1.31    2.37
                      Other          Other        397      15.35   13.98   16.80


  Binomial proportion confidence intervals were calculated for these categories in an attempt
to generalise the results to one-act plays of this period not in the sample. The intervals
were calculated from the Clopper-Pearson interval (“exact” method) [5]. Results show comedy
(Komödie) to be the most common category by a wide margin (CI = [53.5%, 57.3%]), followed
by Schauspiel (CI = [8.5%, 10.8%]) and farce (Posse, CI = [7.6%, 9.8%]), whose order cannot
be determined due to overlapping CIs (Table 1). All categories that follow (postlude, tragedy,
Drama, Schwank and prelude) exhibit no meaningful rank differences among them.
  In terms of literary history, this is an eminent realisation: Since this result is not merely an
observation based on anecdotal knowledge from a selection of well-known (canonical) plays,
but on a probability calculation on all identifiable works of a genre, we have an entirely new
basis for argumentation in literary studies. Even without this statistical analysis, it has been
generally assumed in 18th-century theatre research that comedies were more commonly writ-
ten than tragedies [18]. This hypothesis can now be adequately supported, at least for one-act
plays. It is important to emphasise that the objective was not merely to pursue surprising
outcomes, but rather to uncover empirically verifiable metrics pertaining to a specific genre.
This approach goes beyond the scope of conventional literary studies, which often relies on
heuristic methods.
  The various subgenres not only appear with varying frequency, but also at different times;
the one subgenre that is relatively constantly represented is comedy (Figure 3). For the further
analysis of the one-act genre, these moments of transition (such as a decrease in Nachspiel
around 1740 or an increase in Schwank after 1800) would have to be considered.

4.2. Number of scenes
  Q2 Is there a difference in the number of scenes per act between German one-act plays and
     other German plays of the same period?

  To answer this question, the Drama Corpora Project’s GerDraCor corpus metadata was used
for comparison. In contrast to the database of one-act plays, however, this database does not
contain all plays from the period under investigation, but a selection of mostly canonical texts.
This data was filtered to the time period 1740–1850 and plays with 2–5 acts, both to avoid




                                                 1163
Figure 3: Histogram with kernel density estimation line of number of plays per subtitle category by
normalised year (bin width: 2 years). Bar heights are normalised per category. For details on the
normalised year variable, see [7].


duplicates between the Einakter and GerDraCor datasets and to avoid the inclusion of one-act
plays that do not fall under the Einakter definition (cf. [3]). There are only few plays with
six or more acts in GerDraCor, which were therefore excluded. In addition, this number of
acts is rather atypical for the period of analysis and less significant for our genre questions.4
Because GerDraCor’s scene counts are dynamically derived from the underlying TEI encoding,
the relevant variable is named numOfSegments. Nonetheless, segments may be interpreted as
scenes for this purpose.5
   From this data, we calculated the number of scenes per act and aggregated the results for
each number of acts, which gives the mean number of scenes per act (Table 2). To calculate
confidence intervals for the mean number of scenes per act, we first checked whether the
number of scenes followed a normal distribution. A Shapiro-Wilk test for normality [22] was
conducted on both the Einakter and GerDraCor datasets, which yielded p-values 𝑝 < 0.001 for
both, showing that the data is not approximately normally distributed. We therefore chose
to determine the confidence intervals via bootstrapping [8]. For this purpose, we generated
10,000 resamples of size 10,000 per category, calculating the mean number of scenes per act for
each. The 95% confidence intervals were calculated using the bias-corrected and accelerated
(𝐵𝐶𝑎 ) method [9]. Additionally, we conducted the Mann-Whitney 𝑈 test [16] for a pairwise
4
    Mentions of GerDraCor for the purpose of comparison will henceforth refer to this filtered subset.
5
    We also excluded Johann Nestroy’s plays Zu ebener Erde und erster Stock and Das Haus der Temperamente due to
    idiosyncrasies in their encoding, which lead to incorrect segment counts for the purposes of this analysis.




                                                      1164
Table 2
Mean number of scenes per act across Einakter Database and GerDraCor datasets, grouped by number
of acts with bootstrapped confidence intervals.
        Dataset              Number of acts      n      Mean scenes per act       95% CI
        Einakter Database                 1   2306                     14.20   13.95   14.46
        GerDraCor                         2     33                     13.36   11.23   15.71
        GerDraCor                         3     70                     10.46    9.20   11.82
        GerDraCor                         4     28                      9.04    7.41   10.77
        GerDraCor                         5    140                      6.97    6.35    7.68
        GerDraCor                       2–5    271                      8.86    8.25    9.52


Table 3
Mann-Whitney U test results comparing the number of scenes per act between the Einakter Database
and GerDraCor datasets. Significance levels: * 𝑝 < 0.05, ** 𝑝 < 0.01, *** 𝑝 < 0.001, ns 𝑝 ≥ 0.05.
                         Dataset      Number of acts              𝑈
                         GerDraCor                  2      40,942.50   ns
                         GerDraCor                  3     108,192.00   ***
                         GerDraCor                  4      47,753.00   ***
                         GerDraCor                  5     269,931.50   ***
                         GerDraCor                2–5     466,819.00   ***


comparison of the Einakter Database against GerDraCor’s 2–5-act plays individually and in the
aggregate (Table 3).
   For the atypical two-act and four-act plays, sample sizes are low and confidence intervals
consequently wide. Due to this, we cannot observe any difference in the mean number of scenes
per act between one-act and two-act plays. The confidence interval for the mean number of
scenes per act is particularly wide for two-act plays (CI = [11.23, 15.71]) and the Mann-Whitney
𝑈 test is not statistically significant at 𝑝 = 0.45. Thus, we detect no significant difference in
the number of scenes per act between one-act and two-act plays.
   All other comparisons suggest a highly significant (𝑝 < 0.001) difference between one-act
and multi-act plays. Each of these groups exhibits a lower number of scenes per act compared
to one-act plays, though rank differences cannot be reasonably determined. As can also be
gathered from the comparisons of the distributions of the number of scenes per act for the
Einakter Database, one-act plays exhibit a much stronger genre signal for this variable compared
to other German-language plays of the period (Figure 4).
   This observation captures the unique dramaturgy of one-act plays in the 18th and early 19th
centuries. It suggests, for example, a rushed, hectic play on stage. This is because every change
of scene entails a change of configuration, i.e. a change of characters on stage, which means
that dialogue partners in a one-act play change more frequently than in an average act of a
five-act play.




                                              1165
Figure 4: Distribution of number of scenes per act in the Einakter Database and GerDraCor. Bar
heights sum to 100% per dataset.


4.3. Number of characters
  Q3 Is there a difference in the number of characters between German one-act plays and other
     German plays of the same period?

   The reduced number of characters is often referenced in attempts to define the one-act play.
Some lexicons even claim that the one-act play “has only two or three characters” [6]. The
significance of the number of characters for describing this genre was already observed by
Pazarkaya, who began manually counting the number of characters in about 200 one-act plays
in 1973. We have replicated this simple count using our corpus, which is an order of magnitude
larger than Pazarkaya’s, containing 2,586 one-act plays (the database contains around 150 one-
act plays for which the number of characters is unknown as there is no available full-text
version or cast list).
   Just like for Q2, we conducted a Shapiro-Wilk test for normality on the number of characters
across both Einakter and GerDraCor datasets, which again yielded results with 𝑝 < 0.001 for
both. Using sample bootstrapping with the same parameters as for Q2, we determined 95%
confidence intervals for the mean number of characters (Table 4). Similarly, we calculated
pairwise Mann-Whitney 𝑈 values for the Einakter Database against each group of 𝑛-act plays
in our filtered GerDraCor dataset (Table 5).
   We find that there is a significant difference in the number of characters between one-act
plays and each group of multi-act plays. Though confidence intervals overlap among multi-
act plays, we can infer that the mean number of characters in multi-act plays is at least twice
that of one-act plays. A comparison of the number of characters in one-act and five-act plays




                                            1166
Table 4
Mean number of characters per play across Einakter Database and GerDraCor datasets with boot-
strapped confidence intervals.
     Dataset             Number of acts      n   Mean number of characters           95% CI
     Einakter Database                1   2416                             7.06    6.94    7.19
     GerDraCor                        2     34                            19.47   15.35   24.59
     GerDraCor                        3     71                            19.46   16.52   24.00
     GerDraCor                        4     28                            21.43   17.36   33.46
     GerDraCor                        5    140                            28.49   24.34   35.25


Table 5
Mann-Whitney U test results comparing the number of characters between the Einakter Database and
GerDraCor datasets. Significance levels: * 𝑝 < 0.05, ** 𝑝 < 0.01, *** 𝑝 < 0.001, ns 𝑝 ≥ 0.05.
                         Dataset      Number of acts           𝑈
                         GerDraCor                  2   15,040.50   ***
                         GerDraCor                  3   18,586.50   ***
                         GerDraCor                  4    3,343.50   ***
                         GerDraCor                  5   28,585.50   ***


reveals a poetic economy [1] – one-act plays have a measurably more compact structure. The
reduced number of characters is often related to their shorter and less complex plot [3]. For
example, the one-act plays in the period under investigation rarely contain changes of setting
and, even more relevant in the context of the number of characters, there are fewer subplots
than in the larger five-act play, for example.
   The observation made in subsection 4.1 is also reflected in the number of characters: While
the one-act play is often able to develop a plot with only two, three or four characters due
to the typical comedy plot (in which, for example, a marriage with obstacles is at issue), the
five-act play of the time seems to present plots in which there are never fewer than three and
only rarely fewer than four characters (Figure 5).
   With this type of findings we can, for example, take a canonical one-act play, such as
Goethe’s Der Bürgergeneral from 1793, and mirror some structural metadata with hard num-
bers “in the context of everything else”, to quote Matthew Jockers. Goethe’s play was both
negatively received by his contemporaries and largely disregarded by scholars [27]. The play,
which was based on the characters of another popular one-act play, was written to be practi-
cally implementable on stage. And structurally, in terms of the mean number of scenes and
characters, the play is right in the middle: 14 scenes, 7 characters.

4.4. Translations
  Q4 What is the proportion of original languages among translated plays?

  We deal with the question of translations of foreign-language plays into German from the
point of view of influence. A simple quantitative presentation of the data clearly shows that




                                             1167
Figure 5: Distribution of number of characters in the Einakter Database and GerDraCor. Bar heights
sum to 100% per dataset.


German-language one-act plays were mainly influenced by French theatre, whose one-act
plays were translated en masse [3]. Numerous attempts have been made to quantify and visu-
alise the influence of foreign-language literature on German literature. Flaischlen’s Graphische
Litteratur-Tafel (Figure 6) from 1890, a giant graph (58 × 86.5 cm) visualising German literature
since its early beginnings as a stream that absorbs various other currents, is one of the many
positivist attempts. It shows the influence of foreign literature – epochs, movements, authors,
works – on German literature [12, 14].
   The red tributaries to the main stream denote the influences of French literature. One-act
plays are just one of many genres, but one name stands out, that of Eugène Scribe around the
year 1830. Of all French translations contained in the Einakter Database, 58 were produced un-
der Scribe’s prolific (co-)authorship, who is the only author of French one-act plays mentioned
in the Litteratur-Tafel and who stands pars pro toto for all the others.
   As early as the 18th century, there were complaints that most authors of one-act plays merely
translated from French into German [3]. But how many one-act plays are proven to be trans-
lations, and from which languages are they actually translated? Just like in subsection 4.1,
we aggregated all entries with a known original by the original’s language and calculated bi-
nomial proportion confidence intervals for each original language using Clopper-Pearson’s
method (Table 6). This shows that the overwhelming majority of originals are indeed in French
(CI = [89.6%, 93.9%]). The confidence intervals for all other languages are too wide to make
any meaningful statement about their order.




                                              1168
Figure 6: Detail from Cäsar Flaischlen’s Graphische Litteratur-Tafel (1890) for the period 1740 to 1850.


4.5. Comparison with Pazarkaya’s 1973 sample
  Q5 Does an estimation of the mean number of characters based on Pazarkaya’s sample match
     the estimation based on the Einakter Database’s sample?

   Some researchers argue that literary history may still contain undiscovered material, as only
a small proportion of historical texts have been analysed to date. Others believe that, according
to specific studies, certain tendencies in literature can be observed regardless of the size of the
corpus and that the call to consult more non-canonical literature and larger corpora may be
superfluous [23]. For a statistical analysis, however, it is sometimes necessary to use a larger
corpus, as we would like to outline briefly here. We compare the analysis of the number of




                                                 1169
Table 6
Proportion of original languages of translated plays with binomial confidence intervals.
                             Language        n    Percent      95% CI
                             French         626     91.92   89.60   93.90
                             Italian         16      2.35    1.35    3.79
                             Danish          15      2.20    1.24    3.61
                             English          9      1.32    0.61    2.49
                             Spanish          6      0.88    0.32    1.91
                             Dutch            3      0.44    0.09    1.28
                             Latin            3      0.44    0.09    1.28
                             Russian          2      0.29    0.04    1.06
                             Czech            1      0.15    0.00    0.81


figures in Pazarkaya’s corpus with those in the Einakter Database.
   Applying the Mann-Whitney 𝑈 test to compare the Einakter Database and Pazarkaya’s sam-
ple, we received a result of 𝑈 = 313403.5, 𝑝 < 0.001, suggesting that there is a statistically highly
significant difference between the two samples. Though we know for a fact that Pazarkaya
refers to the same population of plays, the test finds strong evidence against the null hypoth-
esis that the distributions of the two samples are the same. Looking at the mean number of
characters as well as their confidence intervals and distributions (Table 7 and Figure 7), we de-
tect no overlap in the confidence intervals of the means. This suggests that at least one of the
datasets may not be a true random sample if they indeed originate from the same population.

Table 7
Mean number of characters per play in Einakter Database and Pazarkaya’s samples with bootstrapped
confidence intervals.
               Dataset                  n     Mean number of characters      95% CI
               Einakter Database     2416                            7.06   6.94   7.19
               Pazarkaya              221                            6.12   5.85   6.41

   There is some indication that Pazarkaya’s dataset may be more of a convenience sample con-
sisting of one-act plays available to him at the time. Pazarkaya’s work, now 50 years old, was
influenced by the circumstances of his time. Without digital tools and databases, his options
were more limited, and his corpus was significantly smaller, consisting of about 200–300 one-
act plays. These often included particularly canonical and well-preserved works that were
accessible in libraries. While a similar criticism could be levelled against the Einakter Database,
it nonetheless allows us to consider numerous first editions, including many preserved only
as single copies or manuscripts, thanks to the availability of many publicly accessible digitised
texts.




                                                  1170
Figure 7: Distribution of number of characters in the Einakter Database and Pazarkaya’s sample. Solid
lines represent the mean, dashed lines the 95% confidence intervals of the mean.


5. Conclusion
In this paper, we demonstrate how statistical results based on the analysis of a database of
German-language one-act plays can be utilised for literary historiography. The aim was not
primarily to seek unexpected results, but to discover empirically determined metrics about a
particular genre. While it is evident that one-act plays are “short”, the question is how this
shortness can be measured. The issue arises because brevity is a relative concept – something
is only considered short in comparison to something else, and this relational aspect poses a sig-
nificant challenge in the study of one-act plays and other short forms of literature. Addressing
this issue is crucial for advancing our understanding of the genre’s formal characteristics.
   A subtitle analysis has drawn attention to the development of the individual subgenres of
the one-act play between 1740 and 1850 (Q1). We also show that the genre signal is strongest
for the mean number of characters, where one-act plays differ significantly from their German
multi-act counterparts of the period. Similarly, a significant difference is found for the num-
ber of characters between one-act plays and those with three, four and five acts (Q2 and Q3).
Our empirical data confirm the dominance of French plays as the preferred models for German
translations by a wide margin, but also show the distribution for other original languages from
which translations were produced (Q4). And finally, we were able to show that the data from
an older analysis, which had a much smaller sample of one-act plays available, is not represen-
tative of the wider data, and that the accuracy of statistical findings may be increased under
the conditions of the current digital landscape (Q5).
   In conclusion, this paper’s metadata-driven approach demonstrates the potential of digital




                                               1171
methods to provide a statistically grounded contribution to genre studies beyond the limitations
of traditional literary scholarship.

Reproducibility
To enhance transparency, the code and data for this paper is available on GitHub at https:
//github.com/v-ji/einakter-chr2024-stats. We used the Nix package manager (nixos.org) to
ensure the reproducibility of our computational environment. Nix locks the exact versions of
all dependencies, including Python, the necessary libraries for executing the Jupyter notebook,
and the Einakter Database dataset. By using Nix, we can precisely recreate the Python envi-
ronment, fetch the dataset from a stable URL, and verify its integrity with a hash to ensure it
matches the version used in this contribution.
   The main libraries used are Python Polars 1.7.1 [24] for data manipulation, SciPy 1.14.0 [25]
for statistical analysis, as well as Matplotlib 3.9.1 [17] and seaborn 0.13.2 [26] for plotting.


Acknowledgments
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foun-
dation) under Germany’s Excellence Strategy in the context of the Cluster of Excellence Tempo-
ral Communities: Doing Literature in a Global Perspective – EXC 2020 – Project ID 390608380.


References
 [1] M. Bauer. “Poetic Economy: ellipsis and redundancy in literature”. In: Connotations. A
     Journal for Critical Debate 21.2-3 (2011). url: https://www.connotations.de/article/matt
     hias-bauer-poetic-economy-ellipsis-and-redundancy-in-literature/.
 [2] M. Bies, M. Gamper, and I. Kleeberg. “Einleitung”. In: Gattungs-Wissen: Wissenspoetologie
     und literarische Form. Göttingen: Wallstein, 2013, pp. 7–18.
 [3] D. C. Çakir. Poetische Ökonomie im Drama: Einakter im 18. und frühen 19. Jahrhundert.
     De Gruyter, 2024. doi: 10.1515/9783111334059. url: https://www.degruyter.com/docum
     ent/doi/10.1515/9783111334059/html.
 [4] D. C. Çakir and F. Fischer. “Dramatische Metadaten: Die Datenbank deutschsprachiger
     Einakter 1740–1850”. In: DHd2022: »Kulturen des digitalen Gedächtnisses« (2022). doi: 10
     .5281/zenodo.6327977.
 [5] C. J. Clopper and E. S. Pearson. “The use of confidence or fiducial limits illustrated in the
     case of the binomial”. In: Biometrika 26.4 (1934), pp. 404–413. doi: 10.2307/2331986. url:
     https://www.jstor.org/stable/2331986.
 [6] J. A. Cuddon. A dictionary of literary terms. Penguin reference books. London: Penguin
     Books, 1979.
 [7] Drama Corpora Project. DraCor FAQ. 2024. url: https://dracor.org/doc/faq.




                                              1172
 [8] B. Efron. “Bootstrap methods: Another look at the jackknife”. In: The Annals of Statistics
     7.1 (1979), pp. 1–26. doi: 10.1214/aos/1176344552. url: https://projecteuclid.org/journal
     s/annals-of-statistics/volume-7/issue-1/Bootstrap-Methods-Another-Look-at-the-Jack
     knife/10.1214/aos/1176344552.full.
 [9] B. Efron. “Better bootstrap confidence intervals”. In: Journal of the American Statistical
     Association 82.397 (1987), pp. 171–185. doi: 10.1080/01621459.1987.10478410. url: https:
     //www.tandfonline.com/doi/abs/10.1080/01621459.1987.10478410.
[10]   M. Erlin. “From literature to metadata”. In: Goethe Yearbook 27.1 (2020), pp. 189–196. doi:
       10.1353/gyr.2020.0005. url: https://muse.jhu.edu/article/762245.
[11]   F. Fischer, I. Börner, M. Göbel, A. Hechtl, C. Kittel, C. Milling, and P. Trilcke. “Pro-
       grammable Corpora: Introducing DraCor, an infrastructure for the research on European
       drama”. In: DH2019: »Complexities« (2019). doi: 10.5281/zenodo.4284002. url: https://ze
       nodo.org/record/4284002.
[12]   C. Flaischlen. Graphische Litteratur-Tafel: Die deutsche Litteratur und der Einfluß fremder
       Litteraturen auf ihren Verlauf von Beginn einer schriftlichen Überlieferung an bis heute in
       graphischer Darstellung. Stuttgart: Göschen, 1890. url: http://resolver.sub.uni-goetting
       en.de/purl?PPN860488233.
[13]   H. Fricke. “Invarianz und Variabilität von Gattung”. In: Handbuch Gattungstheorie. Ed. by
       R. Zymner. Stuttgart: J.B. Metzler, 2010, pp. 19–21. doi: 10.1007/978-3-476-00509-0\_2.
[14]   A. Hechtl, I. Börner, F. Fischer, and P. Trilcke. “Cäsar Flaischlen’s ›Graphische Litteratur-
       Tafel‹: digitising a giant historical flowchart of foreign influences on German literature”.
       In: DH2017: »Access/Accès« (2017). url: https://dh2017.adho.org/abstracts/506/506.pdf.
[15]   M. L. Jockers. Macroanalysis: digital methods and literary history. Topics in the Digital
       Humanities. Urbana: University of Illinois Press, 2013.
[16]   H. B. Mann and D. R. Whitney. “On a test of whether one of two random variables is
       stochastically larger than the other”. In: The Annals of Mathematical Statistics 18.1 (1947),
       pp. 50–60. url: https://www.jstor.org/stable/2236101.
[17]   Matplotlib Development Team. Matplotlib: visualization with Python. 2024. doi: 10.5281
       /zenodo.10916799. url: https://zenodo.org/doi/10.5281/zenodo.10916799.
[18]   R. Meyer. Das deutsche Trauerspiel des 18. Jahrhunderts: eine Bibliographie. mit ca. 1250
       Titeln, einer Einleitung sowie Verfasser- und Stichwortregister. München: Fink, 1977.
[19]   F. Moretti. “Conjectures on world literature”. In: New Left Review 1 (2000), pp. 54–68. url:
       https://newleftreview.org/issues/ii1/articles/franco-moretti-conjectures-on-world-liter
       ature.
[20]   Y. Pazarkaya. Die Dramaturgie des Einakters: der Einakter als eine besondere Erscheinungs-
       form im deutschen Drama des 18. Jahrhunderts. Göppinger Arbeiten zur Germanistik ; Nr.
       69. Göppingen: Kümmerle, 1973.




                                               1173
[21]   C. Schöch, R. Patras, T. Erjavec, and D. Santos. “Creating the European Literary Text
       Collection (ELTeC): Challenges and perspectives”. In: Modern Languages Open 1 (2021),
       p. 25. doi: 10.3828/mlo.v0i0.364. url: http://www.modernlanguagesopen.org/articles/10
       .3828/mlo.v0i0.364/.
[22]   S. S. Shapiro and M. B. Wilk. “An analysis of variance test for normality (complete sam-
       ples)”. In: Biometrika 52.3/4 (1965), pp. 591–611. doi: 10.2307/2333709. url: https://www
       .jstor.org/stable/2333709.
[23]   T. Underwood. Distant horizons: digital evidence and literary change. Chicago: The Uni-
       versity of Chicago Press, 2019.
[24]   R. Vink, S. d. Gooijer, A. Beedie, M. E. Gorelli, W. Guo, O. Peters, J. v. Zundert, name-
       exhaustion, G. Hulselmans, C. Grinstead, G. Burghoorn, Marshall, chielP, L. Mitchell,
       I. Turner-Trauring, M. Santamaria, D. Heres, J. Magarick, H. Harbeck, ibENPC, K.
       Genockey, M. Wilksch, deanm0000, J. Leitao, M. v. Gelderen, P. Barbagiannis, I. Kout-
       souris, J. Haag, and O. Borchert. Python Polars 1.7.1. 2024. doi: 10.5281/zenodo.13754515.
       url: https://zenodo.org/doi/10.5281/zenodo.13754515.
[25]   P. Virtanen et al. “SciPy 1.0: fundamental algorithms for scientific computing in Python”.
       In: Nature Methods 17.3 (2020), pp. 261–272. doi: 10 . 1038 / s41592 - 019 - 0686 - 2. url:
       https://www.nature.com/articles/s41592-019-0686-2.
[26]   M. Waskom. “seaborn: statistical data visualization”. In: Journal of Open Source Software
       6.60 (2021), p. 3021. doi: 10.21105/joss.03021. url: https://joss.theoj.org/papers/10.21105
       /joss.03021.
[27]   W. D. Wilson. “Dramen zum Thema der Französischen Revolution”. In: Goethe Handbuch.
       Ed. by T. Buck. Stuttgart: J.B. Metzler, 1996, pp. 258–287. doi: 10.1007/978-3-476-03653-
       7\_14. url: http://link.springer.com/10.1007/978-3-476-03653-7%5C%5F14.
[28]   T. Ziolkowski. “Review of Der moderne Einakter. Eine poetologische Untersuchung”. In:
       The Journal of English and Germanic Philology 67.3 (1968), pp. 505–507. url: https://ww
       w.jstor.org/stable/27705581.




                                              1174