<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analyses of Literary Texts by Using Statistical Inference Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mehmet Can Yavuz</string-name>
          <email>mehmetyavuz@sabanciuniv.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science and Engineering Department, Sabancı University, Tuzla Management Information Systems Department, Kadir Has University, Cibali Physics Department, Bog ̆ azic ̧i University</institution>
          ,
          <addr-line>Bebek</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>If a road map had to be drawn for Computational Criticism and subsequent Artificial Literature, it would have certainly considered Shakespearean plays. Demonstration of these structures through text analysis can be seen as both a naive effort and a scientific view of the characteristics of the texts. In this study, the textual analysis of Shakespeare plays was carried out for this purpose. Methodologically, we consecutively use Latent Dirichlet Allocation (LDA) and Singular Value Decomposition (SVD) in order to extract topics and then reduce topic distribution over documents into two-dimensional space. The first question asks if there is a genre called Romance between Comedy and Tragedy plays. The second question is, if each character's speech is taken as a text, whether the dramatic relationship between them can be revealed. Consequently, we find relationships between genres, also verified by literary theory and the main characters follow the antagonisms within the play as the length of speech increases. Although the results of the classification of the side characters in the plays are not always what one would have expected based on the reading of the plays, there are observations on dramatic fiction, which is also verified by literary theory. Tragedies and revenge dramas have different character groupings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>If a road map had to be drawn for Computational</p>
      <p>
        Criticism
        <xref ref-type="bibr" rid="ref18">(Moretti, 2013)</xref>
        and subsequent
Artificial Literature, it would have certainly considered
Elizabethan drama. In particular, Shakespearean
texts are the most outstanding examples of
dramatic fiction. Demonstration of these structures
through text analysis can be seen as both a naive
effort and a scientific view of the characteristics
of the texts. In this study, the textual analysis of
Shakespeare plays was carried out for this
purpose.
      </p>
      <p>To begin with, “the First Folio” is the printed
material in which all Shakespeare’s works are
brought together for the first time, (Synder, 2001).
The edition of 1623 was directed by two actors
from the group called King’s Men. King’s Men
is the ensemble that Shakespeare is also a
member of. Half of the 36-play collection had never
been published anywhere before. The Folio was
also printed in Quarto form. These prints took
their names from the way the books were folded.
It is known that the First Folio has 800 prints,
233 of them have reached today. In the First
Folio, Shakespearean plays are typically divided into
three groups: Comedies, Tragedies, and Histories.
Romance is the genre that hybridizes Comedy and
Tragedy, developed at the beginning of the 17th
Century. At the end of his career, he wrote four
romances: Pericles, Cymbeline, The Winter’s Tale
and The Tempest. “The First Folio” groups
Cymbeline with Tragedies; and The Winter’s Tale and
The Tempest together with Comedies. The
reason for this may be that The Winter’s Tale and
The Tempest began as tragedies and then turned
to comedies, and Cymbeline started as a comedy
and ended as a tragedy.</p>
      <p>Shakespeare’s two tragedies Macbeth and Othello
are two very good examples of a true tragedy and
a revenge tragedy. Tragedies are designed as the
struggle of the main characters and the
opposing characters who create obstacles for the main
character. The protagonist is generally the main
character that the audience sympathizes with.
Although not sympathetic, Macbeth is a protagonist
and the opposing characters are antagonists:
Duncan and Banquo. Similarly, there is also
antagonism in revenge drama and the main theme is
revenge. The antagonist or protagonist seeks
revenge for an imaginary or real injury. Iago the
antagonist gets his revenge provoking Othello, the
protagonist, against his wife.</p>
      <p>
        Computerized analysis of literary texts, in other
words computational criticism is a new and
promising field,
        <xref ref-type="bibr" rid="ref20">(Ramsay, 2011)</xref>
        . Pioneering
works aim to answer critical questions by using
Natural Language Processing (NLP) methods. It
is of interest to create fictional texts with the help
of computer in the developing artificial literature
along with these studies. In this study, we make
a computational analysis of Shakespearean texts.
There are basically two questions we’re trying to
answer. The first is if the genres in Shakespeare’s
theater texts can be classified by computer.
Secondly, if the sentences in which the characters
speak are taken as texts, can antagonisms be
revealed? I tried to find answers to both with the
same unsupervised learning technique.
      </p>
      <p>In recent years, NLP methods have been
developing rapidly and text analysis methods are
getting more advanced. Topic Modeling articles are
among the top cited articles. An unsupervised
topic modelling algorithm is used in this study.
It is able to generate latent topics in which each
document is a mixture. Having the latent topic
distribution, by using dimension reduction
algorithm, each document is mapped onto two
dimensional coordinates without losing intrinsic
characteristics.
1.1</p>
      <p>
        Related Works
Digital Humanities field lets researchers discuss
quantitative methods in literary and cultural
studies
        <xref ref-type="bibr" rid="ref5 ref6">(Clement et al., 2008; Crane, 2006)</xref>
        .
”Drametrics” is a field that deals with quantitative analysis
of the literary genre of drama
        <xref ref-type="bibr" rid="ref21">(Romanska, 2015)</xref>
        .
Digital Shakespeare studies also have gotten
attention since the 2000,
        <xref ref-type="bibr" rid="ref17">(Hirsch, 2017; Mueller,
2008)</xref>
        . The studies includes issues from digital
archives to authorship analysis,
        <xref ref-type="bibr" rid="ref27 ref8">(Vickers, 2011;
Evert, 2017)</xref>
        . Besides, machine learning based
text analyses are also carried out for genre
classifications,
        <xref ref-type="bibr" rid="ref10 ref26 ref30">(Ardunuy, 2004; Hope, 2010; Schoch,
2016; Underwood, 2013; Yu, 2008)</xref>
        .
Information theoretical approaches are also successfully
applied,
        <xref ref-type="bibr" rid="ref19">(Rosso, 2009)</xref>
        . In literature, structural
elements are quantified, such as the dramatis
personae as well as scene structures; and applications
are developed to further increase analysis
        <xref ref-type="bibr" rid="ref14 ref23 ref25 ref29 ref31 ref7">(Dennerlein, 2015; Krautter, 2018; Schmidt, 2019;
Trilcke, 2015; Wilhelm, 2013; Xanthos, 2016)</xref>
        .
In order to analyze a literary text, we would like
to use unsupervised topic modeling. Although
there are linear-algebraic models such as
NonNegative Matrix Factorization
        <xref ref-type="bibr" rid="ref15">(Lee, 1999)</xref>
        ,
probabilistic models are more reliable and capable of
representing true distributions of topics.
Probabilistic Latent Semantic Analysis (Hoffman, 1999)
and Latent Dirichlet Allocation
        <xref ref-type="bibr" rid="ref2">(Blei, 2003)</xref>
        are
the two major unsupervised topic modeling
algorithms. Although both allow us to classify texts
according to topic distribution, Latent Dirichlet
Allocation as a generative model has a proven
superiority over competitors. Principal
Component Analysis
        <xref ref-type="bibr" rid="ref13">(Jolliffe, 2002)</xref>
        , Linear Discriminant
Analysis
        <xref ref-type="bibr" rid="ref3">(Brown, 2000)</xref>
        or Non-Negative Matrix
Factorization (NMF) techniques are all dimension
reduction algorithms, along side Singular Value
Decomposition
        <xref ref-type="bibr" rid="ref12">(Golub, 1970)</xref>
        . The last algorithm
we use is K-Means Clustering algorithm, a well
known clustering algorithm that minimize
variance within clusters (Llyod, 1982).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Theory</title>
      <p>In this study, we will use text analysis to
investigate genres and antagonisms in Shakespearean
plays. By using Latent Dirichlet Allocation
(LDA), document distributions over topics are
generated. Firstly, optimum number of topics will
be obtained for LDA with grid search optimization
and then dimension reduction algorithm, truncated
Singular Value Decomposition (tSVD) will map
these documents into a two-dimensional plane and
graphed.</p>
      <p>In the following sections, generating topics with
LDA algorithm and dimension reduction by tSVD
algorithm are explained. The aim of using tSVD
algorithm is to express each text with two floating
numbers while preserving the latent topic
properties. Thus, classification can be made depending
on the distances between each text in the new
twodimensional feature space. At the last step, we use
a clustering with Euclidean distance. Theoretical
section is kept brief and explanatory due fact that
the main focus is on experimental results.
If data has a large number of features, reduce it
into a subset of features that are the most relevant
to the prediction problem. SVD breaks any A
matrix into a multiplication of three matrices so that,
A = U SV 0 which</p>
      <p>U U 0 = I and V V 0 = I
S is a diagonal matrix that consists of r
singular values. r is the rank of A. Truncated SVD is
a reduced rank approximation. All singular
values are equated to zero except for the largest k,
and largest singular values are the first k columns
of U and V. The dimensions of truncated SVD
are [uxk] ⇤ [kxk] ⇤ [kxv] Since A matrix is
approximated by k dimensions, there is a dimension
reduction between matrix multiplications. A
descriptive subset of the data is called T, which is a
dense summary of the matrix A,
LDA is a generative statistical model that explains
why certain parts of the data are similar based on
an observation set. LDA assumes that
observations are generated by latent variables, or latent
topics. Thus, each document is a mixture of
topics and each topic is a distribution over words and
each word is drawn from the mixture. The
observations are frequency statistics of each document,
so called the document-term matrix. The method
is called the bag-of-words approach and intends to
reflects how important a word is in a document.
Thus, topics are identified on the basis of term
cooccurrence, the topics-term matrix, and each
document is assumed to be characterized by a
particular set of topics, the document-topics matrix.
Topics, mixtures and other variables are all hidden and
need to be predicted from the observation data, the
document-term matrix. In Figure 1, plate notation
of LDA is represented. In the plate notation, there
are NxD different variables that represent
observations. There are K total topics and D total
documents.</p>
      <p>All at once, ↵ and ⌘ are parameters of the prior
distributions over ✓ and respectively. ✓ d the
distribution of topics for document d (real vector of
length K). k is the distribution of words for topic
k (real vector of length V). zd,n is the topic for
the nth word in the dth document. wd,n the nth
word of the dth document. Only gray shaded
circles are the observed variables. The rest of the
white circles would be inferred by using Variation
Inference. The topic for each word, the
distribution over topics for each document, and the
distribution of words per topic are all latent variables in
this model. By this formulation, similarities can
be introduced between documents.</p>
      <p>The model contains both continuous and discrete
variables. ✓ d and k are vectors of probabilities.
zd,n is an integer in {1, ...K} that indicates the
topic of the nth word in the dth document. wd,n is
an integer in {1, ...V }which indexes over all
possible words.
Sk denotes k largest singular values, which is the
number of reduced features. Each feature can be
expressed by a percentage of variance, the reason
behind this is choosing only the most significant
ones.
We included two evaluations in our experiments.
The first is whether or not the genre of Romance
can be distinguished computationally by
computer. In order to carry out this experiment, each
tragedy, comedy and romance is treated as a
different document; and is processed by LDA.
Afterwards, for the document-topic distribution matrix,
the number of topics is reduced to two by means of
dimension reduction algorithm, tSVD. Similarly,
in the second evaluation, the lines of each
character were treated as a text and the document-subject
matrix was reduced to two after processing it with
LDA. Two different type of tragedies are
considered: Macbeth and Othello. Thus, three different
2In Python, Scikit-learn library used for LDA, tSVD and
GridSearch functions.
experiments and optimization were conducted for
these two evaluations.
Two preprocesses were performed for each set of
documents. Primary, stop-words were removed
from the dictionary. These stop-words were
created for both the usual English and Elisabethan
English. The number of stop words is 1144. The
characteristic of these words is that they often
appear in every text. The secondary process is the
expression of texts with word frequencies and the
creation of the document-term matrix. Thus, each
text could be expressed in a dictionary size
fixedlength vector. Concatenations of these vectors
creates the document-term matrix.
In order to find the right topic number, we need an
optimization. Since the subjects/topics are latent
variables, there is no right number of topics.
Gridsearch optimization over topic numbers is carried
out, and the highest log-likelihood is the optimal
settings. In all three experiments, the values
between 6 and 12 were tried three times and drawn
in Figure 2. Thus for example, for Macbeth, 3
experiments were conducted for a certain topic
number. The LDA function that we called for the
experiment was repeated up to 10 times before
giving results. Thus, for example, the LDA algorithm
was repeated up to 30 times in total for a certain
topic number.</p>
      <p>
        As an observation, as the number of topics
decreases, log-likelihood increases. However, we
prefer not to try less than 6 latent topics because, in
literature, the number of themes/topics for
Shakespearean plays is generally at least 6,
        <xref ref-type="bibr" rid="ref28">(”William
Shakespeare”, 2015)</xref>
        .
      </p>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>In Figure 3, documents consisting of
TragedyComedy-Romance plays are represented. The
document-topic distribution matrix is reduced to
two dimensions, and graphed. More than half of
variances is explained by these two components.
Even in three dimensions, the clustering does not
change. The plays that are shown in red are
Comedies, the blues are Tragedies and the greens are
Romances according to the First Folio.</p>
      <p>
        In the upper left corner, the majority of the
Comedies are clustered, and likewise in the lower right
corner Tragedies are clustered. In the middle of
these two clusters, three plays, ”All’s Well That
Ends Well”, ”Measure for Measure” and ”Troilus
and Cressida” are placed known as problem plays.
Some critics also includes ”Timon of Athens”
which is a neighbor of other problem plays,
        <xref ref-type="bibr" rid="ref24">(Snyder, 2001)</xref>
        . Thus, in the middle of the two
clusters, there is a gray zone in which problem plays
are placed. An interesting fact is, although “All’s
Well That Ends Well“ and “Measure for Measure”
are grouped as Comedies in the First Folio, they
are much closer to tragedies. An unexplained fact
is that Coriolanus and Othello are also placed in
this gray zone. Another question in this grouping
is ”Romeo and Juliet”. As a tragedy that has
comedy elements is placed thematically very close to
the Comedies cluster.
      </p>
      <p>Another important distinction is that these three
Romances are clustered within the Tragedies.
According to this analysis, the genre of Romance is
not different from tragedy.
After the analysis, the characters of Macbeth
clearly demonstrate Antagonist/Protagonist
relations as graphed in Figure 4. There are two basic
clusters in the tragedy of Macbeth. The first is the
protagonists, led by Macbeth and Lady Macbeth.
The second is the antagonists, who are the
murdered king and Macduff who suspects foul play.
In the plot, protagonists are shown in blue and
antagonists in red. Lady Macbeth stands at the
bottom left corner, since Lady Macbeth doesn’t have
much to talk except to Macbeth. Macbeth’s
himself is closer to the red cluster. He has relations
with red clusters as a new King. Macduff, who
is suspicious and kills Macbeth in the last scene,
is in the center of the red cluster. Lady Macduff
is also in this cluster. The murdered King
Duncan is also at the center of this cluster. However,
there is also a misclassification. Siward is in the
blue cluster. However, Siward and Macbeth have
a clash in which Siward is killed. Other than that,
the witches who oracles, are in the opposite
cluster of Macbeth. Other characters may not be fully
explained due to their small and ambiguous roles.
Apart from these two clusters, there is a top left
green cluster. The main character of this cluster
is Banquo. This character is Macbeth’s brave and
noble companion. But he had no idea about
Macbeth’s machinations until he is killed.</p>
      <p>Tragedy of Macbeth has a very clear separation
between clusters. The distance between clusters is
also meaningful. The reds are between green and
blues. The greens are actually closer to reds rather
than Macbeth’s evil cluster.
The characters of the Othello play are shown in the
Figure 5 in accordance with the analysis. I give
Othello as an example of revenge tragedies.
Unlike a true tragedy, Macbeth, the Othello play does
not have antagonist/protagonist clusters in the
Figure 5. Iago is a single character who sets traps
to get revenge on Othello. Throughout the play,
Iago misleads Othello for reasons and purposes
that only he and the reader know. Othello kills his
beloved wife in a crisis of jealousy.</p>
      <p>There are three different colored clusters shown.
The red set consists of the main people of the play.
Blue and green clusters belong to side characters
and antagonisms are computationally ambiguous.
The main characters of the red cluster at the
bottom right, Othello, Emilia, Iago and Cassio have
spoken almost the same subject because of the
frequency of their dialogue with each other.
Therefore, a conflict between them is not visible. But
Iago is shown in the lower right corner because he
shows his true intention in his monologues.
Therefore, Othello is a negative example for the
methodology we developed. Characters such as the Duke
of Venice and the Senator are mentioned in the
top left corner and are in fact extremely outside
the plot. Shown from the green cluster, Bianca is
again outside the plot as Cassio’s lover.</p>
      <p>In Othello, there are interesting observations on
revenge tragedies. In revenge tragedies of
Shakespeare, a lonely character shows him/herself
differently and his/her true intentions remain hidden.
Thus, the clear difference from tragedies, is their
dramatic structure.</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The classification of genres shows us that the
method we use provides successful quantitative
information for the differentiation of genres. The
length of the texts can be mentioned among the
reasons for this success. Positioning the plays
between Tragedy and Comedy is much discussed
in the literature theory. The Romance genre
hybridizes Tragedy and Comedy elements. Instead
of mapping the Romance genre in between, the
algorithm mapped four ”Problem Plays” in a region
between Tragedies and Comedies. Another
interesting finding is that Romance cannot be
distinguished from Tragedies. The method used shows
that the reason for some literary discussion is at
the same time quantitative. The method
classifies Romances within the Tragedies. In the light
of theoretical discussions, of course, there may be
a genre called Romance, but we have not been able
to quantify this difference yet.</p>
      <p>There are also some results from our experiments
on the two tragedies we have chosen. I
intentionally choose a tragedy and a revenge play,
although Macbeth clearly shows antagonisms. This
is mainly due to the frequency of conversations
within these clusters. For example, Macbeth and
Lady Macbeth are always aware of each others
true intentions. Dialogues within these clusters
are always compatible with each other. Therefore,
the cluster forms. There is a group subjectivity,
also verified computationally. The war scene at the
end of Macbeth can clearly be observable from the
clusters. Two clusters to clash are formed through
out the play, which is quantifiable. On contrary,
Iago who hides his true intention from everyone,
has apparently always agreed with Othello. On the
contrary, Iago never shares his intentions with
anyone in the play. His intentions are shared through
monologues. Thus, he could not form a cluster. He
is a lonely character. That is why, algorithm fails
to find an antagonisms. From this point of view,
we can say that the method forms clusters of
characters that agree with each other. The dramatic
structure of revenge plays cannot be revealed by
the method we proposed. Our method is
successful when finding the clusters. We carried out a
similar analysis for the play Hamlet, another type
of revenge plays. Hamlet distinguished himself in
a different cluster, as a lonely character with Lord
Polonius who is responsible for spying on Hamlet.
Lord Polonius is a similar character with Iago in
terms of hiding their true intentions.</p>
      <p>The dramatic fiction in Shakespeare’s texts is
shown to a certain extent. The advantage of the
proposed pipeline is using non-linearity over a
linear layer. Instead of directly clustering the
document-term matrix, a powerful representation
of each document in a feature space is generated
by LDA. After generating document-topic matrix,
a linear layer of dimension reduction, tSVD, that
extracts principal directions or principal axes in
which the document-topic matrix have the largest
variance.</p>
      <p>I think that these naive efforts on the way to
Artificial Literature also have a positive effect. The
production of a play is possible with the knowledge of
authorship for humans and even for Shakespeare.
By authoring knowledge, we mean, for example,
how to write a play from dramatic perspective. It
is firstly introduced by Aristotle to shed light on
present-day methods. It would be possible to
reverse engineering them for artificial literature.
Going from a quantitative analysis to plays would be
possible. Therefore, as we analyze literary pieces,
especially texts in dialogue form can help us verify
critical questions and theories. From these
analyses, going back to the literary text generation
becomes possible.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by grant 12B03P4 of
Bog˘ azic¸i University.</p>
      <p>The author would like to thank Muhittin Mungan
for suggesting Master of Science thesis as his
advisor and Meltem Gu¨ rle Mungan for her kind
opinion. The author would also like to thank actor
Gu¨ nes¸ Yakın, for talks together on Shakespeare.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ardanuy</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sporleder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2014</year>
          , April).
          <article-title>Structure-based clustering of novels</article-title>
          .
          <source>In Proceedings of the 3rd Workshop on Computational Linguistics for Literature (CLFL)</source>
          (pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M. I.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>3</volume>
          ,
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          . doi: http://dx.doi.org/10.1162/jmlr.
          <year>2003</year>
          .
          <volume>3</volume>
          .4-
          <fpage>5</fpage>
          .
          <fpage>993</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , M. T., &amp;
          <string-name>
            <surname>Wicker</surname>
            ,
            <given-names>L. R.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Discriminant analysis</article-title>
          . In H. E. A.
          <string-name>
            <surname>Tinsley</surname>
          </string-name>
          &amp; S. D. Brown (Eds.),
          <source>Handbook of applied multivariate statistics and mathematical modeling</source>
          (pp.
          <fpage>209</fpage>
          -
          <lpage>235</lpage>
          ). San Diego, CA, US: Academic
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>Press. http://dx.doi.org/10.1016/B978-012691360-</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Clement</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unsworth</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Uszkalo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>How not to read a million books</article-title>
          . Available online at http://people.brandeis.edu/ unsworth/hownot2read.html
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Crane</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>What do you do with a million books? D-Lib Magazine</article-title>
          . Available online at http://www.dlib.org/dlib/march06/crane/03crane.html
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Dennerlein</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Measuring the average population densities of plays. A case study of Andreas Gryphius, Christian Weise and Gotthold Ephraim Lessing</article-title>
          . Semicerchio. Rivista di poesia comparata LIII:
          <fpage>80</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Evert</surname>
          </string-name>
          , Thomas &amp; Proisl, &amp;
          <string-name>
            <surname>Jannidis</surname>
            , Fotis &amp; Reger, Isabella &amp; Pielstro¨m, Steffen &amp; Scho¨ch, Christof &amp; Vitt,
            <given-names>Thorsten.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Understanding and explaining Delta measures for authorship attribution</article-title>
          .
          <source>Digital Scholarship in the Humanities. 32. 4-16. 10</source>
          .1093/llc/fqx023.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Probabilistic latent semantic analysis</article-title>
          .
          <source>Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence</source>
          (p./pp.
          <fpage>289</fpage>
          -
          <lpage>296</lpage>
          ), .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Hope</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Witmore</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>The Hundredth Psalm to the Tune of ”Green Sleeves”: Digital Approaches to Shakespeare's Language of Genre</article-title>
          .
          <source>Shakespeare Quarterly</source>
          ,
          <volume>61</volume>
          (
          <issue>3</issue>
          ),
          <fpage>357</fpage>
          -
          <lpage>390</lpage>
          . Retrieved from http://www.jstor.org/stable/40985589
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Hirsch</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Craig</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>”Mingled Yarn”: The State of Computing in Shakespeare 2.0</article-title>
          . In T. Bishop, &amp; A.
          <string-name>
            <surname>Huang</surname>
          </string-name>
          (Eds.),
          <source>The Shakespearean International Yearbook</source>
          (Vol.
          <volume>14</volume>
          :
          <string-name>
            <surname>Special</surname>
            <given-names>Section</given-names>
          </string-name>
          , Digital Shakespeares, pp.
          <fpage>3</fpage>
          -
          <lpage>35</lpage>
          ). United Kingdom: Ashgate Publishing Limited.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Golub</surname>
            ,
            <given-names>G. H.</given-names>
          </string-name>
          ; Reinsch,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>1970</year>
          ).
          <article-title>”Singular value decomposition and least squares solutions”</article-title>
          .
          <source>Numerische Mathematik</source>
          .
          <volume>14</volume>
          (
          <issue>5</issue>
          ):
          <fpage>403</fpage>
          -
          <lpage>420</lpage>
          . doi:
          <volume>10</volume>
          .1007/BF02163027. MR 1553974.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Jolliffe</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>Principal component analysis</article-title>
          . New York: Springer Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Krautter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Quantitative microanalysis? Different methods of digital drama analysis in comparison</article-title>
          .
          <source>Book of Abstracts</source>
          ,
          <string-name>
            <surname>DH</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Mexico-City, Mexico</article-title>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Daniel</given-names>
          </string-name>
          <string-name>
            <surname>Seung</surname>
          </string-name>
          , H..
          <article-title>(1999). Learning the Parts of Objects by Non-Negative Matrix Factorization</article-title>
          .
          <source>Nature</source>
          .
          <volume>401</volume>
          .
          <fpage>788</fpage>
          -
          <lpage>91</lpage>
          .
          <fpage>10</fpage>
          .1038/44565.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Lloyd</surname>
            ,
            <given-names>S.P.</given-names>
          </string-name>
          (
          <year>1982</year>
          ).
          <article-title>Least squares quantization in PCM</article-title>
          .
          <source>IEEE Trans. Information Theory</source>
          ,
          <volume>28</volume>
          ,
          <fpage>129</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Mueller</surname>
            ,
            <given-names>Martin.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Digital Shakespeare, or towards a literary informatics</article-title>
          .
          <source>Shakespeare</source>
          .
          <volume>4</volume>
          .
          <fpage>284</fpage>
          -
          <lpage>301</lpage>
          .
          <fpage>10</fpage>
          .1080/17450910802295179.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Moretti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Distant reading</article-title>
          . Verso Books.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Rosso</surname>
          </string-name>
          , Osvaldo &amp; Craig, Hugh Moscato, Pablo. (
          <year>2009</year>
          ).
          <article-title>Shakespeare and other English Renaissance authors as characterized by Information Theory complexity quantifiers</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          .
          <volume>388</volume>
          .
          <fpage>916</fpage>
          -
          <lpage>926</lpage>
          .
          <fpage>10</fpage>
          .1016/j.physa.
          <year>2008</year>
          .
          <volume>11</volume>
          .018.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Ramsay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2011</year>
          ). Reading Machines:
          <article-title>Toward an Algorithmic Criticism</article-title>
          . University of Illinois Press.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Romanska</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Drametrics: what dramaturgs should learn from mathematicians</article-title>
          . In Romanska, M. (ed.), The Routledge Companion to Dramaturgy. Routledge, pp.
          <fpage>472</fpage>
          -
          <lpage>481</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>Scho¨ch,</article-title>
          <string-name>
            <surname>Christof.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama. Digital Humanities Quarterly</article-title>
          . http://doi.org/10.5281/zenodo.166356
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burghardt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dennerlein</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wolff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Katharsis - A Tool for Computational Drametrics</article-title>
          . In Book of Abstracts,
          <string-name>
            <surname>DH</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Snyder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>The genres of Shakespeare's plays</article-title>
          . In M. De Grazia S. Wells (Eds.), The Cambridge Companion to Shakespeare (Cambridge Companions to Literature, pp.
          <fpage>83</fpage>
          -
          <lpage>98</lpage>
          ). Cambridge: Cambridge University Press. doi:
          <volume>10</volume>
          .1017/CCOL0521650941.006
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Trilcke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kampkaspar</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Digital Network Analysis of Dramatic Texts</article-title>
          . Book of Abstracts,
          <string-name>
            <surname>DH</surname>
          </string-name>
          <year>2015</year>
          . Sidney, Australia
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Underwood</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auvil</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Capitanu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Mapping mutable genres in structurally complex volumes</article-title>
          .
          <source>2013 IEEE International Conference on Big Data</source>
          ,
          <fpage>95</fpage>
          -
          <lpage>103</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Vickers</surname>
            ,
            <given-names>Brian.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Shakespeare and Authorship Studies in the Twenty-First Century</article-title>
          .
          <source>Shakespeare Quarterly</source>
          .
          <volume>62</volume>
          .
          <fpage>106</fpage>
          -
          <lpage>142</lpage>
          .
          <fpage>10</fpage>
          .1353/shq.
          <year>2011</year>
          .
          <volume>0004</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>William</given-names>
            <surname>Shakespeare</surname>
          </string-name>
          . (
          <year>2015</year>
          ,
          <year>August 21</year>
          ). New World Encyclopedia, .
          <source>Retrieved</source>
          <volume>12</volume>
          :
          <fpage>11</fpage>
          ,
          <string-name>
            <surname>September</surname>
            <given-names>16</given-names>
          </string-name>
          ,
          <year>2019</year>
          from //www.newworldencyclopedia.org/p/index.php?title= William Shakespeareoldid=
          <fpage>990237</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Wilhelm</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burghardt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wolff</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2013</year>
          ). “To See or Not to See”
          <article-title>- An Interactive Tool for the Visualization and Analysis of Shakespeare Plays</article-title>
          . In R. Franken-Wendelstorf,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lindinger</surname>
          </string-name>
          , and J.
          <string-name>
            <surname>Sieck</surname>
          </string-name>
          (Eds.),
          <source>Kultur und Informatik: Visual Worlds &amp; Interactive Spaces. Glu¨ckstadt: Verlag Werner Hu¨lsbusch</source>
          , pp.
          <fpage>175</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>An evaluation of text classification methods for literary study</article-title>
          .
          <source>Literary and Linguistic Computing</source>
          <volume>23</volume>
          (
          <issue>3</issue>
          ):
          <fpage>327</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>Xanthos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pante</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rochat</surname>
            ,
            <given-names>Y</given-names>
          </string-name>
          and Grandjean,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Visualising the dynamics of character networks</article-title>
          .
          <source>Book of Abstracts</source>
          ,
          <string-name>
            <surname>DH</surname>
          </string-name>
          <year>2016</year>
          . Krako´w, Poland, pp.
          <fpage>417</fpage>
          -
          <lpage>419</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>