Deep Learning meets Post-modern Poetry

                           Timo Baumann1[0000−0003−2203−1783] and Burkhard Meyer-Sickendiek2
                                 1
                                   Department of Informatics, Universität Hamburg, Germany
                                           baumann@informatik.uni-hamburg.de
                              2
                                Department of Literary Studies, Freie Universität Berlin, Germany
                                                bumesi@zedat.fu-berlin.de
                                                  http://www.rhythmicalizer.net


                             Abstract. We summarize our project Rhythmicalizer in which we an-
                             alyze a corpus of post-modern poetry in a combination of qualitative
                             hermeneutical and computational methods, as we have run the project
                             over the course of the past three years (and preparing it for some time
                             before that). Interdisciplinary work is always challenging and we here
                             focus on some of the highlights of our collaboration.

                             Keywords: Literary Studies · Machine Learning · Meta-Research.


                  1        Introduction

                  At least 80 % of modern and post-modern poems exhibit neither rhyme nor
                  metrical schemes like iamb or trochee. However, does this mean that they are
                  free of any rhythmical features? Of course not and the US American research
                  on free verse prosody claims the opposite: Modern poets like Whitman, the
                  Imagists, the Beat poets as well as contemporary Slam poets have developed a
                  post-metrical idea of prosody, using rhythmical features of everyday language,
                  prose, and musical styles like Jazz or Hip Hop, yielding a large and complex
                  variety in their poetic prosodies which, however, appear to be much harder to
                  quantify and regularize than traditional patterns.
                      In our joint project, we examine the largest portal for spoken poetry Lyrik-
                  line 1 and analyze and classify such rhythmical patterns in a human-in-the-loop
                  approach in which we interleave manual annotation with computational modelling
                  and data-based analysis [5].
                      The remainder of this paper is structured as follows: in the next section,
                  we describe the research questions that we set out to address in our project;
                     Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
                     License Attribution 4.0 International (CC BY 4.0).
                     This work is funded by the Volkswagen Foundation in the programme ‘Mixed Methods
                     in the Humanities? Funding possibilities for the combination and the interaction of
                     qualitative hermeneutical and digital methods’ (funding codes 91926 and 93255). We
                     wish to thank Hussein Hussein for his valuable contributions to the project.
                   1
                     http://lyrikline.org


Twin Talks 2 and 3, 2020             Understanding and Facilitating Collaboration in Digital Humanities    30/143
                  2          T. Baumann and B. Meyer-Sickendiek

                  in Sections 3 and 4, we describe the methods used and the results obtained,
                  respectively, and in Section 5 we describe – as much as is adequate in a public
                  setting – the lessons taken during the course of the project. (It is still unclear, if
                  all of these lessons also were lessons learned.)


                  2        Research Questions

                  Automating the analysis and differentiating the prosodies of free verse post-
                  modern poetry comes with a multitude of challenges, including the manual
                  analysis and classification of sufficiently many poems as training material for
                  an automated method. Yet, the amount of data that can be acquired (even in
                  the order of hundreds of poems) is still very little for modern machine learning
                  methods. However, the by far largest challenge of this endeavour lies in the nature
                  of free verse prosody, which is an almost purely spoken aspect of a poem and
                  is not readily observable from the pure textual form as in traditional metric
                  schemes (see, e.g., [6]). We thus base our analyses on the audible form of the
                  poem, spoken aloud by the original author, as a gold standard for the intended
                  prosodic realization of the poem (instead of, e.g., attempting to derive such
                  features in a generic way from the written form). The use of speech data greatly
                  improves our modelling ability, yet it also increases the complexity of the data
                  and the need for automated analyses.
                      Our primary research questions are hence: (a) can we relate theories on the
                  classification of the prosodies found in post-modern poetry to poems that we
                  find our collection of spoken poems, (b) can we pre-process the poems in our
                  collections in such a way that they can be automatically analyzed, (c) can we
                  build an automatic classification system that uses quantitative features derived
                  from the poems to yield the classification despite the significant data sparsity,
                  and (d) can we gather additional, new insight from the automatic classification
                  in the humanities.


                  3        Method

                  We here describe the setup of our project, in particular how we went about
                  defining and implementing the philological classification, as well as the processes
                  used to prepare and automatically classify our data. Over the course of the
                  project, we have extended and sometimes changed our methods, as is reflected in
                  our publications, and we here only describe a subset of the methods used.


                  3.1      Theoretical Foundation: Grammetrical Ranking and Rhymic
                           Phrasing

                  Our classification of poetry is based on two theoretical approaches: 1) the Idea
                  of grammetrical ranking and 2) the Idea of rhythmic phrasing. 1) The term
                  grammetrics, coined by Donald Wesling, is a hybridization of grammar and


Twin Talks 2 and 3, 2020          Understanding and Facilitating Collaboration in Digital Humanities       31/143
                                                         Deep Learning meets Post-modern Poetry       3

                  metrics: the key hypothesis is that the interplay of sentence-structure and line-
                  structure can be accounted for more economically by simultaneous than by
                  successive analysis. [10] In poetry as a kind of versified language, the singular
                  sentence interacts with verse periods (syllable, foot, part-line, line, rhymed pair
                  or stanza, whole poem), a process for which Wesling finds ‘scissoring’ an apt
                  metaphor: “Grammetrics assumes that meter and grammarcan be scissored by
                  each other, that the cutting places can be graphed with some precision. One
                  blade of the shears is meter, the other grammar. When they work against each
                  other, they divide the poem. It is their purpose and necessity to work against
                  each other.” [10]
                      2) The concept of rhythmic phrasing was developed by Richard Cureton. For
                  Cureton, rhythm embraces what has traditionally been regarded as very different
                  kinds of perceptual phenomena. Cureton divided rhythm - a global term covering
                  all relations of strength and weakness - into three distinct components, which he
                  terms meter, grouping, and prolongation. Meter involves the perception of beats in
                  regular patterns; grouping involves the apprehension of linguistic units organized
                  around a single peak of prominence; and prolongation involves the experience of
                  anticipation and arrival. Cureton claimed a hierarchical, multi-dimensional, and
                  preferential treatment of poetic rhythm, going back to [9] treatment of rhythm in
                  Tonal Music. Lerdahl and Jackendoff claimed four different rhythmic dimensions:
                  a) The grouping structure in terms of a hierarchical segmentation, b) the metrical
                  structure in terms of a regular alternation of strong and weak beats at a number
                  of hierarchical levels, c) the time span-reduction in terms of an organizations
                  uniting time-spans at all temporal levels of a work, and d) the prologational
                  reduction in terms of a ‘psychological’ awareness of tensing and relaxing patterns
                  in a given piece.
                      As a result, the prosodic hierarchy to be considered for free verse is consider-
                  ably more complex than for metric poetry and our working hypothesis of this
                  hierarchy is depicted in Figure 1 (left). As can be seen, all levels of the linguistic
                  hierarchy can carry poetic prosodic meaning, from the segment up to the periodic
                  sentence. Based on this prosodic hierarchy, Figure 1 (right) depicts a categoriza-
                  tion of some poets' works along two axes, the governing prosodic unit (x-axis)
                  and the degree of iso/heterochronicity, or regularity of temporal arrangement
                  (y-axis). The respective rhythm derives from the combined prosodic units and the
                  degree of its isochronic (or heterochronic) succession. The green lines are meant
                  as a localization of poets according to the interplay of grammetrical ranking and
                  prosodic succession. The blue brackets mark the time span reduction (Rubato)
                  as well as the prolongational reduction (Phrasing). According to the idea of
                  time-span-reduction, the rubato is caused by a deviation from the isochronous
                  rhythm. And according to the idea of prolongational reduction, the phrasing is
                  divided into three different articulation techniques (legato, portato, staccato).

                  3.2      Manual Analysis and Annotation
                  Based on our theoretical foundation, we devised a number of prosodic classes of
                  poems and built a manually curated collection of poems for each class. Over the


Twin Talks 2 and 3, 2020         Understanding and Facilitating Collaboration in Digital Humanities        32/143
                  4         T. Baumann and B. Meyer-Sickendiek


                  Fig. 1. Working hypothesis for a prosodic hierarchy for free verse poetry, as well as a
                  placement of poetic styles along the two axes governing unit and regularity of temporal
                  arrangement.


                  course of our project, we extended the kinds of poems to differentiate, starting
                  with only various kinds of sound poetry (letristic vs. syllabic decompositions) via
                  kinds of poems that differ by the way that enjambments are realized (variable foot
                  poems vs. unemphasized enjambments vs. gestic rhythms) to a large collection
                  of 18 classes. Presently, we work on identifying and specifying the relation of the
                  classes to each other which will help inform the automatic classification methods
                  described below.

                  3.3      Data Extraction and Preparation
                  We work with the website Lyrikline which hosts a large collection of modern and
                  post-modern readout poetry. Lyrikline was created by the Literaturwerkstatt
                  Berlin and hosts contemporary international poetry as audio files (read by the
                  authors) and texts (original versions & translations), so it offers the melodies,
                  sounds, and rhythms of international poetry, recited by the authors themselves.
                  It covers more than 10,000 poems from about 1000 international poets from more
                  than 60 different countries. Nearly 80 % of these are postmetrical poems. We
                  focus our analysis on the roughly 2400 German-language poems on the page.2
                      To enable our analyses, we perform forced alignment [8] (which we refine
                  manually as necessary) in order to be able to relate speech and text line-by-line.
                   2
                       An extraction program for extracting poems (text, audio and meta-data) is available
                       by request from the first author.


Twin Talks 2 and 3, 2020           Understanding and Facilitating Collaboration in Digital Humanities        33/143
                                                            Deep Learning meets Post-modern Poetry                       5


                             line repre-                                                                       class
                                              concatenated representation
                             sentation
                                                                                                              softmax
                              encoder       attention


                                                                encoder


                                                                            encoder
                                                                                                              decision


                                                                                         poem-level encoder
                              RNN     RNN    ...   RNN
                                                                                                               layer
                              RNN     RNN    ...   RNN


                                                               acoustic features of
                                          ...                features of following
                             characters of poem line          poem line    pause

                                             line representation
                                                    ...

                                             line representation


                  Fig. 2. Full model for poetry style detection: each line is encoded character-by-character
                  by a recurrent neural network (using GRU cells) with attention. Acoustic features of
                  each line, as well as of the pause following up to the next line, are encoded similarly.
                  Per-line representations are concatenated and passed to a poem-level encoder. The final
                  decision layer optimizes for the poem’s class.


                  In addition, we use various tools to extract syntactic information from the text
                  as required.

                  3.4      Classification Methods
                  We devised our project with primary concerns on data sparsity (i.e., the limitation
                  that there is too little data to fully train advanced machine learning algorithms)
                  as well as interpretability (i.e., the ability of a method to explain what aspects of
                  the data determine its behaviour).
                      Thus, we first focused our efforts on building interpretable classifiers (e.g.,
                  decision trees) on specific interpretable features (e.g., the presence or absence of a
                  verb in a line). However, based on the (relatively obvious) fact that the prosodic
                  structure of a poem is reflected in most of its lines, and that our interpretable
                  features need to be aggregated across the poem’s lines anyway, we also build a
                  hierarchical model for our poems, as depicted in Figure 2.
                      Our deep-learning based hierarchical attention model [11] uses multiple re-
                  current neural-network-based encoders for each line, focusing on the text, the
                  speech, and the quality of the pause following the line, respectively, and using
                  inner-attention. We aggregate across the lines with another recurrent layer. The
                  architecture is more thoroughly described in [3]. Given the fact that most lines
                  exhibit the structural properties that yield a poem’s prosodic classification, we
                  use a two-stage training procedure, in which we first train the line-by-line classi-
                  fication in isolation and only afterwards train the full network including the final
                  decision layer.


Twin Talks 2 and 3, 2020            Understanding and Facilitating Collaboration in Digital Humanities                       34/143
                  6          T. Baumann and B. Meyer-Sickendiek

                  4        Results
                  Over the course of our project, we have performed a number of experiments; we
                  here only give a very rough overview of the results and refer the reader to the
                  corresponding publications for further detail.
                      In [3], we described our first experiments on classifying poetry using a hier-
                  archical attention network. We found that free verse prosodies can be classified
                  along a fluency continuum and that the classifier’s mis-classifications cluster along
                  this fluency continuum. Via ablation studies (i.e., leaving out certain features to
                  find their relative importance for the final results), we found that, depending on
                  the classes analyzed, not only is the acoustic realization of the speech itself highly
                  relevant for classification but also the realization of the pause following each
                  line of the poem. In fact, our classifier is highly reliable in determining different
                  kinds of enjambments [2] and that an additional annotation of enjambments in
                  the poems is unnecessary (unlike what we had originally hypothesized). We also
                  tried traditional classification approaches using engineered features [7] and in a
                  comparison with our deep learning-based method found these to be inferior [1],
                  although they have direct access on theoretically grounded features. Recently, we
                  work on integrating such symbolic features into our deep learning-based approach.
                      We find that pre-training and data augmentation are crucial for the success
                  of neural learning techniques on the (relatively) small data sets typically found
                  in digital humanities. Another way to success on difficult-to-describe material
                  like sound poetry is the granularity of modelling: instead of using words as the
                  textual unit of processing, our recurrent networks use the sequence of individual
                  characters found in the poem. This has (at least) the following two advantages:
                  there is a fixed set of characters (as compared to the endless possibilities for
                  words) which means that our models has far fewer parameters to train and
                  cannot be confronted with ‘out-of-vocabulary words’ when applied to a new
                  poem. Secondly, the concept of ‘word’ does not necessarily reflect what we find
                  in sound poetry and the prosodic features are actually very well reflected by the
                  stream of characters (e.g., the recurrence of consonant-vowel pairs) as compared
                  to words.
                      Finally, and coming back to our original ideas [5], we have implemented an
                  interface for humanist-in-the-loop classification and analysis of our corpus [4] and
                  have gradually applied our method to eventually cover the full corpus.

                  5        Lessons (Learned?)
                  Over the course of our interdisciplinary project, the common research question
                  has been a strong uniting factor given the sometimes differing research interests
                  involved (creating classification systems that work despite data sparsity vs. the
                  interest in finding evidence in data for humanistic theories). A strong enabling
                  factor in the joint work was the ability to openly ask about the other field’s
                  theory, applications, limitations, and to not be shy to ask again if the given
                  answer was unclear. It often helped to agree to disagree and to move on despite
                  of this, or to compromise on what should be done.


Twin Talks 2 and 3, 2020          Understanding and Facilitating Collaboration in Digital Humanities       35/143
                                                         Deep Learning meets Post-modern Poetry         7

                      The project sometimes was hindered by technical limitations, such as forced
                  alignment of text to speech (or syntactic analysis) breaking down for sound poetry,
                  and often being too inaccurate (or with too low coverage) for large amounts of the
                  material. This meant that a great deal of manual annotation of the base material
                  had to be performed, despite the later stages being successfully automated. This
                  means that the classifiers produced in the project cannot always be applied
                  to new data in a fully automatic way and the work required to prepare data
                  (although relatively straightforward) can be more than would be required for a
                  manual prosodic analysis itself.

                  References
                   1. Baumann, T., Hussein, H., Burkhard: Analysis of rhythmic phrasing: Feature
                      engineering vs. representation learning for classifying readout poetry. In: Pro-
                      ceedings of the Joint LaTeCH&CLfL Workshop. pp. 44–49. Association for Com-
                      putational Linguistics, Santa Fe, USA (Sep 2018), https://aclanthology.info/
                      papers/W18-4505/w18-4505
                   2. Baumann, T., Hussein, H., Meyer-Sickendiek, B.: Analysing the focus of a hi-
                      erarchical attention network: The importance of enjambments when classifying
                      post-modern poetry. In: Proceedings of Interspeech. pp. 2162–2166. Hyderabad,
                      India (Sep 2018). https://doi.org/10.21437/Interspeech.2018-2533
                   3. Baumann, T., Hussein, H., Meyer-Sickendiek, B.: Style detection for free verse
                      poetry from text and speech. In: Proceedings of the 27th International Conference
                      on Computational Linguistics (COLING 2018). pp. 1929–1940. Santa Fe, USA (Aug
                      2018), https://aclanthology.info/papers/C18-1164/c18-1164
                   4. Baumann, T., Hussein, H., Meyer-Sickendiek, B., Elbeshausen, J.: A tool for
                      human-in-the-loop analysis and exploration of (not only) prosodic classifications
                      for post-modern poetry. In: Proceedings of INF-DH. pp. 151–156. Gesellschaft für
                      Informatik, Kassel, Germany (Sep 2019). https://doi.org/10.18420/inf2019 ws15
                   5. Baumann, T., Meyer-Sickendiek, B.: Large-scale analysis of spoken free-verse poetry.
                      In: Proceedings of Language Technology Resources and Tools for Digital Human-
                      ities (LT4DH). Osaka, Japan (Dec 2016), https://www.aclweb.org/anthology/
                      W16-4017
                   6. Bobenhausen, K.: The metricalizer2–automated metrical markup of german poetry.
                      Current Trends in Metrical Analysis, Bern: Peter Lang pp. 119–131 (2011)
                   7. Hussein, H., Meyer-Sickendiek, B., Baumann, T.: Automatic detection of enjamb-
                      ment in german readout poetry. In: Proceedings of Speech Prosody. Poznán, Poland
                      (Jun 2018). https://doi.org/10.21437/SpeechProsody.2018-67
                   8. Katsamanis, A., Black, M., Georgiou, P.G., Goldstein, L., Narayanan, S.: SailAlign:
                      Robust long speech-text alignment. In: Proc. of Workshop on New Tools and
                      Methods for Very-Large Scale Phonetics Research (2011)
                   9. Lerdahl, F., Jackendoff, R.: A Generative Theory of Tonal Music. MIT Press series
                      on cognitive theory and mental representation, MIT Press (1983)
                  10. Wesling, D.: The Scissors of Meter: Grammetrics and Reading. University of
                      Michigan Press (1996)
                  11. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention
                      networks for document classification. In: Proceedings of the 2016 Conference of the
                      North American Chapter of the Association for Computational Linguistics: Human
                      Language Technologies. pp. 1480–1489 (2016)


Twin Talks 2 and 3, 2020         Understanding and Facilitating Collaboration in Digital Humanities          36/143