Analyses of Literary Texts by Using Statistical Inference Methods
                                   Mehmet Can Yavuz
           Computer Science and Engineering Department, Sabancı University, Tuzla
           Management Information Systems Department, Kadir Has University, Cibali
                      Physics Department, Boğaziçi University, Bebek
                                     İstanbul, Türkiye
                          mehmetyavuz@sabanciuniv.edu


                      Abstract                             Criticism (Moretti, 2013) and subsequent Artifi-
                                                           cial Literature, it would have certainly considered
    If a road map had to be drawn for Com-
                                                           Elizabethan drama. In particular, Shakespearean
    putational Criticism and subsequent Arti-
                                                           texts are the most outstanding examples of dra-
    ficial Literature, it would have certainly
                                                           matic fiction. Demonstration of these structures
    considered Shakespearean plays. Demon-
                                                           through text analysis can be seen as both a naive
    stration of these structures through text
                                                           effort and a scientific view of the characteristics
    analysis can be seen as both a naive effort
                                                           of the texts. In this study, the textual analysis of
    and a scientific view of the characteristics
                                                           Shakespeare plays was carried out for this pur-
    of the texts. In this study, the textual anal-
                                                           pose.
    ysis of Shakespeare plays was carried out
                                                           To begin with, “the First Folio” is the printed
    for this purpose.
                                                           material in which all Shakespeare’s works are
    Methodologically, we consecutively use
                                                           brought together for the first time, (Synder, 2001).
    Latent Dirichlet Allocation (LDA) and
                                                           The edition of 1623 was directed by two actors
    Singular Value Decomposition (SVD) in
                                                           from the group called King’s Men. King’s Men
    order to extract topics and then reduce
                                                           is the ensemble that Shakespeare is also a mem-
    topic distribution over documents into
                                                           ber of. Half of the 36-play collection had never
    two-dimensional space. The first question
                                                           been published anywhere before. The Folio was
    asks if there is a genre called Romance be-
                                                           also printed in Quarto form. These prints took
    tween Comedy and Tragedy plays. The
                                                           their names from the way the books were folded.
    second question is, if each character’s
                                                           It is known that the First Folio has 800 prints,
    speech is taken as a text, whether the dra-
                                                           233 of them have reached today. In the First Fo-
    matic relationship between them can be re-
                                                           lio, Shakespearean plays are typically divided into
    vealed.
                                                           three groups: Comedies, Tragedies, and Histories.
    Consequently, we find relationships be-
                                                           Romance is the genre that hybridizes Comedy and
    tween genres, also verified by literary the-
                                                           Tragedy, developed at the beginning of the 17th
    ory and the main characters follow the an-
                                                           Century. At the end of his career, he wrote four
    tagonisms within the play as the length
                                                           romances: Pericles, Cymbeline, The Winter’s Tale
    of speech increases. Although the results
                                                           and The Tempest. “The First Folio” groups Cym-
    of the classification of the side charac-
                                                           beline with Tragedies; and The Winter’s Tale and
    ters in the plays are not always what one
                                                           The Tempest together with Comedies. The rea-
    would have expected based on the reading
                                                           son for this may be that The Winter’s Tale and
    of the plays, there are observations on dra-
                                                           The Tempest began as tragedies and then turned
    matic fiction, which is also verified by lit-
                                                           to comedies, and Cymbeline started as a comedy
    erary theory. Tragedies and revenge dra-
                                                           and ended as a tragedy.
    mas have different character groupings.
                                                           Shakespeare’s two tragedies Macbeth and Othello
1   Introduction                                           are two very good examples of a true tragedy and
                                                           a revenge tragedy. Tragedies are designed as the
If a road map had to be drawn for Computational            struggle of the main characters and the oppos-
     Copyright c 2019 for this paper by its authors. Use   ing characters who create obstacles for the main
permitted under Creative Commons License Attribution 4.0   character. The protagonist is generally the main
International (CC BY 4.0).
character that the audience sympathizes with. Al-        tion theoretical approaches are also successfully
though not sympathetic, Macbeth is a protagonist         applied, (Rosso, 2009). In literature, structural el-
and the opposing characters are antagonists: Dun-        ements are quantified, such as the dramatis per-
can and Banquo. Similarly, there is also antag-          sonae as well as scene structures; and applications
onism in revenge drama and the main theme is             are developed to further increase analysis (Den-
revenge. The antagonist or protagonist seeks re-         nerlein, 2015; Krautter, 2018; Schmidt, 2019;
venge for an imaginary or real injury. Iago the an-      Trilcke, 2015; Wilhelm, 2013; Xanthos, 2016).
tagonist gets his revenge provoking Othello, the         In order to analyze a literary text, we would like
protagonist, against his wife.                           to use unsupervised topic modeling. Although
Computerized analysis of literary texts, in other        there are linear-algebraic models such as Non-
words computational criticism is a new and               Negative Matrix Factorization (Lee, 1999), prob-
promising field, (Ramsay, 2011). Pioneering              abilistic models are more reliable and capable of
works aim to answer critical questions by using          representing true distributions of topics. Proba-
Natural Language Processing (NLP) methods. It            bilistic Latent Semantic Analysis (Hoffman, 1999)
is of interest to create fictional texts with the help   and Latent Dirichlet Allocation (Blei, 2003) are
of computer in the developing artificial literature      the two major unsupervised topic modeling algo-
along with these studies. In this study, we make         rithms. Although both allow us to classify texts
a computational analysis of Shakespearean texts.         according to topic distribution, Latent Dirichlet
There are basically two questions we’re trying to        Allocation as a generative model has a proven
answer. The first is if the genres in Shakespeare’s      superiority over competitors. Principal Compo-
theater texts can be classified by computer. Sec-        nent Analysis (Jolliffe, 2002), Linear Discriminant
ondly, if the sentences in which the characters          Analysis (Brown, 2000) or Non-Negative Matrix
speak are taken as texts, can antagonisms be re-         Factorization (NMF) techniques are all dimension
vealed? I tried to find answers to both with the         reduction algorithms, along side Singular Value
same unsupervised learning technique.                    Decomposition (Golub, 1970). The last algorithm
In recent years, NLP methods have been devel-            we use is K-Means Clustering algorithm, a well
oping rapidly and text analysis methods are get-         known clustering algorithm that minimize vari-
ting more advanced. Topic Modeling articles are          ance within clusters (Llyod, 1982).
among the top cited articles. An unsupervised
topic modelling algorithm is used in this study.         2   Theory
It is able to generate latent topics in which each
document is a mixture. Having the latent topic           In this study, we will use text analysis to inves-
distribution, by using dimension reduction algo-         tigate genres and antagonisms in Shakespearean
rithm, each document is mapped onto two dimen-           plays. By using Latent Dirichlet Allocation
sional coordinates without losing intrinsic charac-      (LDA), document distributions over topics are
teristics.                                               generated. Firstly, optimum number of topics will
                                                         be obtained for LDA with grid search optimization
1.1   Related Works                                      and then dimension reduction algorithm, truncated
                                                         Singular Value Decomposition (tSVD) will map
Digital Humanities field lets researchers discuss        these documents into a two-dimensional plane and
quantitative methods in literary and cultural stud-      graphed.
ies (Clement et al., 2008; Crane, 2006). ”Dramet-        In the following sections, generating topics with
rics” is a field that deals with quantitative analysis   LDA algorithm and dimension reduction by tSVD
of the literary genre of drama (Romanska, 2015).         algorithm are explained. The aim of using tSVD
Digital Shakespeare studies also have gotten at-         algorithm is to express each text with two floating
tention since the 2000, (Hirsch, 2017; Mueller,          numbers while preserving the latent topic proper-
2008). The studies includes issues from digital          ties. Thus, classification can be made depending
archives to authorship analysis, (Vickers, 2011;         on the distances between each text in the new two-
Evert, 2017). Besides, machine learning based            dimensional feature space. At the last step, we use
text analyses are also carried out for genre clas-       a clustering with Euclidean distance. Theoretical
sifications, (Ardunuy, 2004; Hope, 2010; Schoch,         section is kept brief and explanatory due fact that
2016; Underwood, 2013; Yu, 2008). Informa-               the main focus is on experimental results.
2.1   Latent Dirichlet Allocation (Blei, 2003)          2.2      SVD (Golub, 1970)
LDA is a generative statistical model that explains     If data has a large number of features, reduce it
why certain parts of the data are similar based on      into a subset of features that are the most relevant
an observation set. LDA assumes that observa-           to the prediction problem. SVD breaks any A ma-
tions are generated by latent variables, or latent      trix into a multiplication of three matrices so that,
topics. Thus, each document is a mixture of top-
ics and each topic is a distribution over words and                      A = U SV 0 which                     (1)
                                                                              0                0
each word is drawn from the mixture. The obser-                          U U = I and V V = I                  (2)
vations are frequency statistics of each document,
so called the document-term matrix. The method          S is a diagonal matrix that consists of r singu-
is called the bag-of-words approach and intends to      lar values. r is the rank of A. Truncated SVD is
reflects how important a word is in a document.         a reduced rank approximation. All singular val-
Thus, topics are identified on the basis of term co-    ues are equated to zero except for the largest k,
occurrence, the topics-term matrix, and each doc-       and largest singular values are the first k columns
ument is assumed to be characterized by a particu-      of U and V. The dimensions of truncated SVD
lar set of topics, the document-topics matrix. Top-     are [uxk] ⇤ [kxk] ⇤ [kxv] Since A matrix is ap-
ics, mixtures and other variables are all hidden and    proximated by k dimensions, there is a dimension
need to be predicted from the observation data, the     reduction between matrix multiplications. A de-
document-term matrix. In Figure 1, plate notation       scriptive subset of the data is called T, which is a
of LDA is represented. In the plate notation, there     dense summary of the matrix A,
are NxD different variables that represent obser-
                                                                            T = U Sk                          (3)
vations. There are K total topics and D total docu-
ments.                                                  Sk denotes k largest singular values, which is the
All at once, ↵ and ⌘ are parameters of the prior        number of reduced features. Each feature can be
distributions over ✓ and respectively. ✓d the dis-      expressed by a percentage of variance, the reason
tribution of topics for document d (real vector of      behind this is choosing only the most significant
length K). k is the distribution of words for topic     ones.
k (real vector of length V). zd,n is the topic for
the nth word in the dth document. wd,n the nth          2.3      K-means Clustering (Llyod, 1982)
word of the dth document. Only gray shaded cir-         The K-Means clustering algorithm separates n
cles are the observed variables. The rest of the        group of equal variance samples from data by min-
white circles would be inferred by using Variation      imizing the sum-of-squares within clusters. The
Inference. The topic for each word, the distribu-       number of clusters needs to be pre-determined.
tion over topics for each document, and the distri-
bution of words per topic are all latent variables in   3       Experiments2
this model. By this formulation, similarities can
                                                        We included two evaluations in our experiments.
be introduced between documents.
                                                        The first is whether or not the genre of Romance
The model contains both continuous and discrete
                                                        can be distinguished computationally by com-
variables. ✓d and k are vectors of probabilities.
                                                        puter. In order to carry out this experiment, each
zd,n is an integer in {1, ...K} that indicates the
                                                        tragedy, comedy and romance is treated as a dif-
topic of the nth word in the dth document. wd,n is
                                                        ferent document; and is processed by LDA. After-
an integer in {1, ...V }which indexes over all pos-
                                                        wards, for the document-topic distribution matrix,
sible words.
                                                        the number of topics is reduced to two by means of
                                                        dimension reduction algorithm, tSVD. Similarly,
                                                        in the second evaluation, the lines of each charac-
                                                        ter were treated as a text and the document-subject
                                                        matrix was reduced to two after processing it with
                                                        LDA. Two different type of tragedies are consid-
                                                        ered: Macbeth and Othello. Thus, three different
Figure 1: Plate notation representing the LDA               2
                                                            In Python, Scikit-learn library used for LDA, tSVD and
model.                                                  GridSearch functions.
experiments and optimization were conducted for           4     Discussion
these two evaluations.
                                                          4.1    Tragedy-Comedy-Romance
                                                          In Figure 3, documents consisting of Tragedy-
3.1   Dataset and Preprocess
                                                          Comedy-Romance plays are represented. The
Two preprocesses were performed for each set of           document-topic distribution matrix is reduced to
documents. Primary, stop-words were removed               two dimensions, and graphed. More than half of
from the dictionary. These stop-words were cre-           variances is explained by these two components.
ated for both the usual English and Elisabethan           Even in three dimensions, the clustering does not
English. The number of stop words is 1144. The            change. The plays that are shown in red are Come-
characteristic of these words is that they often ap-      dies, the blues are Tragedies and the greens are
pear in every text. The secondary process is the          Romances according to the First Folio.
expression of texts with word frequencies and the         In the upper left corner, the majority of the Come-
creation of the document-term matrix. Thus, each          dies are clustered, and likewise in the lower right
text could be expressed in a dictionary size fixed-       corner Tragedies are clustered. In the middle of
length vector. Concatenations of these vectors cre-       these two clusters, three plays, ”All’s Well That
ates the document-term matrix.                            Ends Well”, ”Measure for Measure” and ”Troilus
                                                          and Cressida” are placed known as problem plays.
                                                          Some critics also includes ”Timon of Athens”
3.2   Optimization                                        which is a neighbor of other problem plays, (Sny-
                                                          der, 2001). Thus, in the middle of the two clus-
In order to find the right topic number, we need an       ters, there is a gray zone in which problem plays
optimization. Since the subjects/topics are latent        are placed. An interesting fact is, although “All’s
variables, there is no right number of topics. Grid-      Well That Ends Well“ and “Measure for Measure”
search optimization over topic numbers is carried         are grouped as Comedies in the First Folio, they
out, and the highest log-likelihood is the optimal        are much closer to tragedies. An unexplained fact
settings. In all three experiments, the values be-        is that Coriolanus and Othello are also placed in
tween 6 and 12 were tried three times and drawn           this gray zone. Another question in this grouping
in Figure 2. Thus for example, for Macbeth, 3 ex-         is ”Romeo and Juliet”. As a tragedy that has com-
periments were conducted for a certain topic num-         edy elements is placed thematically very close to
ber. The LDA function that we called for the ex-          the Comedies cluster.
periment was repeated up to 10 times before giv-          Another important distinction is that these three
ing results. Thus, for example, the LDA algorithm         Romances are clustered within the Tragedies. Ac-
was repeated up to 30 times in total for a certain        cording to this analysis, the genre of Romance is
topic number.                                             not different from tragedy.
As an observation, as the number of topics de-
creases, log-likelihood increases. However, we
prefer not to try less than 6 latent topics because, in
literature, the number of themes/topics for Shake-
spearean plays is generally at least 6, (”William
Shakespeare”, 2015).


Figure 2: Optimization. Likelihood w.r.t. Top-
ics Numbers. Tragedies-Comedies, Macbeth, Oth-
ello, respectively.
                                                          Figure 3: Genre classification of Tragedies,
                                                          Comedies and Romance
4.2   Macbeth                                          4.3   Othello

After the analysis, the characters of Macbeth          The characters of the Othello play are shown in the
clearly demonstrate Antagonist/Protagonist rela-       Figure 5 in accordance with the analysis. I give
tions as graphed in Figure 4. There are two basic      Othello as an example of revenge tragedies. Un-
clusters in the tragedy of Macbeth. The first is the   like a true tragedy, Macbeth, the Othello play does
protagonists, led by Macbeth and Lady Macbeth.         not have antagonist/protagonist clusters in the Fig-
The second is the antagonists, who are the mur-        ure 5. Iago is a single character who sets traps
dered king and Macduff who suspects foul play.         to get revenge on Othello. Throughout the play,
In the plot, protagonists are shown in blue and an-    Iago misleads Othello for reasons and purposes
tagonists in red. Lady Macbeth stands at the bot-      that only he and the reader know. Othello kills his
tom left corner, since Lady Macbeth doesn’t have       beloved wife in a crisis of jealousy.
much to talk except to Macbeth. Macbeth’s him-         There are three different colored clusters shown.
self is closer to the red cluster. He has relations    The red set consists of the main people of the play.
with red clusters as a new King. Macduff, who          Blue and green clusters belong to side characters
is suspicious and kills Macbeth in the last scene,     and antagonisms are computationally ambiguous.
is in the center of the red cluster. Lady Macduff      The main characters of the red cluster at the bot-
is also in this cluster. The murdered King Dun-        tom right, Othello, Emilia, Iago and Cassio have
can is also at the center of this cluster. However,    spoken almost the same subject because of the fre-
there is also a misclassification. Siward is in the    quency of their dialogue with each other. There-
blue cluster. However, Siward and Macbeth have         fore, a conflict between them is not visible. But
a clash in which Siward is killed. Other than that,    Iago is shown in the lower right corner because he
the witches who oracles, are in the opposite clus-     shows his true intention in his monologues. There-
ter of Macbeth. Other characters may not be fully      fore, Othello is a negative example for the method-
explained due to their small and ambiguous roles.      ology we developed. Characters such as the Duke
Apart from these two clusters, there is a top left     of Venice and the Senator are mentioned in the
green cluster. The main character of this cluster      top left corner and are in fact extremely outside
is Banquo. This character is Macbeth’s brave and       the plot. Shown from the green cluster, Bianca is
noble companion. But he had no idea about Mac-         again outside the plot as Cassio’s lover.
beth’s machinations until he is killed.                In Othello, there are interesting observations on
Tragedy of Macbeth has a very clear separation         revenge tragedies. In revenge tragedies of Shake-
between clusters. The distance between clusters is     speare, a lonely character shows him/herself dif-
also meaningful. The reds are between green and        ferently and his/her true intentions remain hidden.
blues. The greens are actually closer to reds rather   Thus, the clear difference from tragedies, is their
than Macbeth’s evil cluster.                           dramatic structure.


Figure 4: Characters of the play Macbeth are rep-      Figure 5: Characters of the play Othello are repre-
resented.                                              sented.
5   Conclusion                                         terms of hiding their true intentions.
                                                       The dramatic fiction in Shakespeare’s texts is
The classification of genres shows us that the         shown to a certain extent. The advantage of the
method we use provides successful quantitative in-     proposed pipeline is using non-linearity over a
formation for the differentiation of genres. The       linear layer. Instead of directly clustering the
length of the texts can be mentioned among the         document-term matrix, a powerful representation
reasons for this success. Positioning the plays        of each document in a feature space is generated
between Tragedy and Comedy is much discussed           by LDA. After generating document-topic matrix,
in the literature theory. The Romance genre hy-        a linear layer of dimension reduction, tSVD, that
bridizes Tragedy and Comedy elements. Instead          extracts principal directions or principal axes in
of mapping the Romance genre in between, the al-       which the document-topic matrix have the largest
gorithm mapped four ”Problem Plays” in a region        variance.
between Tragedies and Comedies. Another inter-         I think that these naive efforts on the way to Artifi-
esting finding is that Romance cannot be distin-       cial Literature also have a positive effect. The pro-
guished from Tragedies. The method used shows          duction of a play is possible with the knowledge of
that the reason for some literary discussion is at     authorship for humans and even for Shakespeare.
the same time quantitative. The method classi-         By authoring knowledge, we mean, for example,
fies Romances within the Tragedies. In the light       how to write a play from dramatic perspective. It
of theoretical discussions, of course, there may be    is firstly introduced by Aristotle to shed light on
a genre called Romance, but we have not been able      present-day methods. It would be possible to re-
to quantify this difference yet.                       verse engineering them for artificial literature. Go-
There are also some results from our experiments       ing from a quantitative analysis to plays would be
on the two tragedies we have chosen. I inten-          possible. Therefore, as we analyze literary pieces,
tionally choose a tragedy and a revenge play, al-      especially texts in dialogue form can help us verify
though Macbeth clearly shows antagonisms. This         critical questions and theories. From these analy-
is mainly due to the frequency of conversations        ses, going back to the literary text generation be-
within these clusters. For example, Macbeth and        comes possible.
Lady Macbeth are always aware of each others
true intentions. Dialogues within these clusters       Acknowledgments
are always compatible with each other. Therefore,
                                                       This work was supported by grant 12B03P4 of
the cluster forms. There is a group subjectivity,
                                                       Boğaziçi University.
also verified computationally. The war scene at the
                                                       The author would like to thank Muhittin Mungan
end of Macbeth can clearly be observable from the
                                                       for suggesting Master of Science thesis as his
clusters. Two clusters to clash are formed through
                                                       advisor and Meltem Gürle Mungan for her kind
out the play, which is quantifiable. On contrary,
                                                       opinion. The author would also like to thank actor
Iago who hides his true intention from everyone,
                                                       Güneş Yakın, for talks together on Shakespeare.
has apparently always agreed with Othello. On the
contrary, Iago never shares his intentions with any-
one in the play. His intentions are shared through     References
monologues. Thus, he could not form a cluster. He
is a lonely character. That is why, algorithm fails    Ardanuy, M. C., & Sporleder, C. (2014, April).
                                                         Structure-based clustering of novels. In Proceedings
to find an antagonisms. From this point of view,         of the 3rd Workshop on Computational Linguistics
we can say that the method forms clusters of char-       for Literature (CLFL) (pp. 31-39).
acters that agree with each other. The dramatic
structure of revenge plays cannot be revealed by       Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003). Latent
                                                         dirichlet allocation. J. Mach. Learn. Res., 3, 993–
the method we proposed. Our method is success-           1022. doi: http://dx.doi.org/10.1162/jmlr.2003.3.4-
ful when finding the clusters. We carried out a          5.993
similar analysis for the play Hamlet, another type
of revenge plays. Hamlet distinguished himself in      Brown, M. T., & Wicker, L. R. (2000). Dis-
                                                         criminant analysis. In H. E. A. Tinsley & S.
a different cluster, as a lonely character with Lord     D. Brown (Eds.), Handbook of applied mul-
Polonius who is responsible for spying on Hamlet.        tivariate statistics and mathematical modeling
Lord Polonius is a similar character with Iago in        (pp. 209-235). San Diego, CA, US: Academic
  Press. http://dx.doi.org/10.1016/B978-012691360-         Moretti, F. (2013). Distant reading. Verso Books.
  6/50009-4
                                                           Rosso, Osvaldo & Craig, Hugh Moscato, Pablo.
Clement, T., Steger, S., Unsworth, J. and                    (2009). Shakespeare and other English Renais-
  Uszkalo, K. (2008). How not to read                        sance authors as characterized by Information The-
  a     million    books.    Available   online   at         ory complexity quantifiers. Physica A: Statisti-
  http://people.brandeis.edu/ unsworth/hownot2read.html      cal Mechanics and its Applications. 388. 916-926.
                                                             10.1016/j.physa.2008.11.018.
Crane, G. (2006). What do you do with a mil-
  lion books? D-Lib Magazine. Available online at          Ramsay, S. (2011). Reading Machines: Toward an Al-
  http://www.dlib.org/dlib/march06/crane/03crane.html        gorithmic Criticism. University of Illinois Press.
                                                           Romanska, M. (2015). Drametrics: what dramaturgs
Dennerlein, K. (2015). Measuring the average popu-           should learn from mathematicians. In Romanska,
  lation densities of plays. A case study of Andreas         M. (ed.), The Routledge Companion to Dramaturgy.
  Gryphius, Christian Weise and Gotthold Ephraim             Routledge, pp. 472-481.
  Lessing. Semicerchio. Rivista di poesia comparata
  LIII: 80–88.                                             Schöch, Christof. (2016). Topic Modeling Genre:
                                                             An Exploration of French Classical and En-
Evert, Thomas & Proisl, & Jannidis, Fotis & Reger,           lightenment Drama. Digital Humanities Quarterly.
  Isabella & Pielström, Steffen & Schöch, Christof         http://doi.org/10.5281/zenodo.166356
  & Vitt, Thorsten. (2017). Understanding and ex-
  plaining Delta measures for authorship attribution.      Schmidt, T., Burghardt, M., Dennerlein, K. & Wolff, C.
  Digital Scholarship in the Humanities. 32. 4-16.           (2019). Katharsis – A Tool for Computational Dra-
  10.1093/llc/fqx023.                                        metrics. In Book of Abstracts, DH 2019.

Hofmann, T. (1999). Probabilistic latent semantic anal-    Snyder, S. (2001). The genres of Shakespeare’s
  ysis. Proceedings of the Fifteenth conference on Un-       plays. In M. De Grazia      S. Wells (Eds.),
  certainty in artificial intelligence (p./pp. 289–296),     The Cambridge Companion to Shakespeare
  .                                                          (Cambridge Companions to Literature, pp. 83-
                                                             98). Cambridge: Cambridge University Press.
Hope, J., & Witmore, M. (2010). The Hundredth                doi:10.1017/CCOL0521650941.006
  Psalm to the Tune of ”Green Sleeves”: Digital
                                                           Trilcke, P., Fischer, F. and Kampkaspar, D. (2015).
  Approaches to Shakespeare’s Language of Genre.
                                                              Digital Network Analysis of Dramatic Texts. Book
  Shakespeare Quarterly, 61(3), 357-390. Retrieved
                                                              of Abstracts, DH 2015. Sidney, Australia
  from http://www.jstor.org/stable/40985589
                                                           Underwood, T., Black, M.L., Auvil, L., & Capitanu,
Hirsch, B., & Craig, H. (2014). ”Mingled Yarn”:              B. (2013). Mapping mutable genres in structurally
  The State of Computing in Shakespeare 2.0. In T.           complex volumes. 2013 IEEE International Confer-
  Bishop, & A. Huang (Eds.), The Shakespearean In-           ence on Big Data, 95-103.
  ternational Yearbook (Vol. 14: Special Section, Dig-
  ital Shakespeares, pp. 3-35). United Kingdom: Ash-       Vickers, Brian. (2011). Shakespeare and Authorship
  gate Publishing Limited.                                   Studies in the Twenty-First Century. Shakespeare
                                                             Quarterly. 62. 106-142. 10.1353/shq.2011.0004.
Golub, G. H.; Reinsch, C. (1970). ”Singular
  value decomposition and least squares solu-              William     Shakespeare.     (2015,        August
  tions”. Numerische Mathematik. 14 (5): 403–420.            21). New World Encyclopedia,            . Re-
  doi:10.1007/BF02163027. MR 1553974.                        trieved 12:11, September 16, 2019 from
                                                             //www.newworldencyclopedia.org/p/index.php?title=
Jolliffe, I. (2002). Principal component analysis. New       William Shakespeareoldid=990237.
   York: Springer Verlag.
                                                           Wilhelm, T., Burghardt, M., and Wolff, C. (2013). “To
Krautter, B. (2018). Quantitative microanalysis? Dif-        See or Not to See” - An Interactive Tool for the
  ferent methods of digital drama analysis in com-           Visualization and Analysis of Shakespeare Plays.
  parison. Book of Abstracts, DH 2018. Mexico-City,          In R. Franken-Wendelstorf, E. Lindinger, and J.
  Mexico, pp. 225-228.                                       Sieck (Eds.), Kultur und Informatik: Visual Worlds
                                                             & Interactive Spaces. Glückstadt: Verlag Werner
Lee, Daniel Seung, H.. (1999). Learning the Parts of         Hülsbusch, pp. 175–185.
  Objects by Non-Negative Matrix Factorization. Na-
  ture. 401. 788-91. 10.1038/44565.                        Yu, B. (2008). An evaluation of text classification
                                                             methods for literary study. Literary and Linguistic
Lloyd, S.P. (1982). Least squares quantization in PCM.       Computing 23(3): 327-343.
  IEEE Trans. Information Theory, 28, 129-136.             Xanthos, A., Pante, I., Rochat, Y and Grandjean,
Mueller, Martin. (2008). Digital Shakespeare, or to-         M. (2016). Visualising the dynamics of character
 wards a literary informatics. Shakespeare. 4. 284-          networks. Book of Abstracts, DH 2016. Kraków,
 301. 10.1080/17450910802295179.                             Poland, pp. 417-419.