=Paper=
{{Paper
|id=Vol-2552/Paper13
|storemode=property
|title=Internal Dynamics of Text: Parts of Speech Distribution in Verse 
|pdfUrl=https://ceur-ws.org/Vol-2552/Paper13.pdf
|volume=Vol-2552
|authors=Vadim Andreev,Larisa Beliaeva
}}
==Internal Dynamics of Text: Parts of Speech Distribution in Verse ==
<pdf width="1500px">https://ceur-ws.org/Vol-2552/Paper13.pdf</pdf>
<pre>
                Internal Dynamics of Text:
           Parts of Speech Distribution in Verse∗
                    Vadim Andreev1                             Larisa Beliaeva2
                 vadim.andreev@ymail.com                     lauranbel@gmail.com
                            1
                            Smolensk State University, Smolensk
                    2
                        Herzen State Pedagogical University of Russia,
                                    Russian Federation


                                                Abstract

            The research is aimed at the study of the degree of regularity in the relationship
        between the frequencies of different parts of speech in a verse text, in particular between
        verbs and nouns. The data-base for the analysis includes 20 sonnets of famous Russian
        poets of the Silver Age of Russian poetry. The results demonstrate regularity in the
        distribution of parts of speech frequencies. The exponential function provides a good fit.
            Keywords: parts of speech, exponential function, static and dynamic description.


1       Introduction
Parts of speech (PoS) are often used in research in the sphere of quantitative linguistics,
stylometry and others [Best, 1994; Stamou, 2008]. Analysis of the frequencies of different PoS
in a text and proportions between them allow to solve important problems in “linguistics of
verse” which has been intensively developing in Russia in the 20th century [Gasparov, 2012].
    Depending on the peculiarities of individual styles the frequency of PoS vary to some
extent, and sometimes rather considerably [Čech, Altmann, 2013]. Nevertheless it is possible
to raise a question of the possible limits of variation and if there are any tendencies of keeping
certain proportions between PoS, if there is any general regularity in their frequencies common
for all the speakers of the same language. In some studies the results obtained demonstrated
the existence of certain order in PoS distribution in speech [Andreev, Popescu, Altmann,
2017]. The present research is aimed at exploring the possibility of such general tendencies
in the distribution of parts of speech in verse texts written by authors, differing in style and
creativity manner.
    ∗
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attri-
bution 4.0 International (CC BY 4.0)
2     Data-base
Verse is characterized by a much bigger number of restrictions and rules in choosing words
than prose whereas sonnets is a poetic genre with one of the most formalized structure and a
big number of strict schemes.
    The data-base includes 20 sonnets by famous Russian authors V. Brusov, K. Balmont, V.
Ivanov and M. Voloshin, written by these poets in the first part of their creative activities.
All these poets belong to the period known as the Silver Age of Russian poetry (the beginning
of the 20th century) when much attention was paid exactly to this strictly structured genre
of sonnet. Below the names (titles) of these sonnets are given with their text numbers in the
data-base.

Valery Brusov
T1    Teny proshlogo
T2    K portretu R. D. Balmonta
T3    Zhenshchine
T4    Kleopatra
T5    Sonet o poete

Konstantin Balmont
T6    Bretan’
T7    Propovednikam
T8    Proklyatiye gluposty
T9    Razluka
T10    Put’ pravdy

Maksimilyan Voloshin
T11    “Starinnym zolotom i zhelchyu napital. . . ”
T12    “Zdes’ byl svyaschenny les. Bozhestvenny gonets. . . ”
T13    “Ravnina vod kolishitsa shiroko. . . ”
T14    “Nad zibkoy ryab’u vod vsayet iz glubiny. . . ”
T15    “Mare internum”

Vyacheslav Ivanov
T16    Na mig (“Den’ purpur tsarstvenny dayet. . . ”)
T17    Polyet
T18    La superba
T19    La pineta
T20    Nostal’giya

                                                2
3    Methods and feature set
The following parts of speech were counted in the sonnets: nouns ( N ), adjectives ( A ), verbs
( V ), adverbs (ADV), personal pronouns (PRNP), other types of pronouns which can be used
in attributive function (PRNA), participles (PTL), adjectivized participles (PTLA), category
of state words – adjectives, used as predicates in non personal sentences (STW).
    After counting the PoS in the sonnets quantitative data were obtained, specifying their
frequencies. Thus in the sonnet Zhenshchine by Brusov (3) the following numbers of PoS
were obtained: 27 nouns, 15 personal pronouns, 9 verbs, 5 adjectives, 4 adjectivized partici-
ples, 3 participles and 2 adverbs. (Category of state words and pronouns-adjectives were not
registered).
    The frequencies of all morphological classes in the samples were ranked in decreasing order
so that the most frequent PoS is ranked higher than all the others, the second frequency PoS
receives Rank 2, etc. To fit the distribution of such ranked PoS frequencies the formula of the
exponential function which was suggested for such purposes in [Andreev, Mistecky, Altmann,
2018] was used:
                                        y = a ∗ exp(−b∗x) ,
where a and b are parameters.
    If some PoS class was not found in the sample it was omitted (no zero classes were used).
    Consider, for example, the above-mentioned sonnet (3). PoS counting in it brought about
the following numbers, represented in Table 1. The first column of this table shows rank
numbers, the second represents the PoS classes, the third is their observed frequencies in the
sample, the forth column shows theoretically expected frequencies which should be according
to the formula. Besides, at the bottom of the table the values of a and b parameters and of
the determination coefficient R2 are presented. The coefficient of determination is a measure
of goodness of fit and provides information on whether a statistical model fitted to empirical
data is successful. R2 ranges between 0 and 1. When R2 > 0.8 the model fits well.


   In our case the value of R2 is over 0.99 which implies a very good fit. In Figure 1 this is
demonstrated graphically.

                                              3
    Along the x-axis we have ranks and along the y-axis—the values of the observed (empirical)
and theoretically expected frequencies. As shown in Figure 1 the observed frequencies (OBSF)
are very close to those expected on the curve (EXPF).
    For all the sonnets the results of fitting were obtained, they are shown in Table 2.
    All the values of the determination coefficient R2 are very high. Thus even the lowest R2
value for fitting the distribu-tion of PoS in 4 (R2 = 0.8897) and in 7 (R2 = 0.8924) should
be considered as a proof of a good fit. Since the sonnets were written by different poets whose
style and manner of writing as well as the topics were different, it should be recognized that
the distribution of PoS does not depend on such individual matters, but displays some kind
of regularity.
    From the point of view of how description takes place in sonnets one can group different PoS
into two classes. One of these classes actualizes a static vision of the poetic world [Naumann,
Popescu, Altmann, 2012]. In this case the author depicts the world attributing to the themes,
expressed by nouns, some features which are viewed as more or less permanent qualities. This
is achieved by using such PoS as A , PRNA, PTLA, STW. The other class, on the contrary,
gives a description which can be called dynamic, because the features ascribed to the themes
in the sonnet are represented as a process or action. This class includes V and PTL. Further
on we shall analyze how these two classes of PoS interrelate with one another [Martynenko,
2004].


                                               4
5
6
    Replacing the PoS of two classes by the name of the class to which they belong: S for the static
description and D for the dynamic one, and omitting all other PoS, one obtains sequences which
characterize the level of homogeneity of description.
    Let us consider, as example, T3 again. After marking up the two above-mentioned classes we get
the following sequence:

   D − D − S − S − S − S − S − D − D − D − D − D − S − D − S − S − D − S − D − D − D,

This sequence consists of a number of strings formed by repeated elements which further on will be
called “runs” [Andreev, Mistecky, Altmann, 2018, p. 50–52]. Here the following runs can be singled
out:

[D − D] − [S − S − S − S − S] − [D − D − D − D − D] − [S] − [D] − [S − S] − [D] − [S] − [D − D − D].

Little number of runs, including big chains of similar members, indicates intensified monotony of
description, big number of runs with few elements in them, on the contrary, suggests something
like variability in depicting poetic world. In this example the total number of elements in all the
runs equals 21, the number of runs equels 9. Thus it follows that the index of the homogeneity
of description is Itotal = 21/9 = 2.33. Measuring homogeneity of static and dynamic descriptions
separately we get the following:

(1) static Ic = 2, 25(9/4);
(2) dynamic Iv = 2, 4(12/5).


   Table 3 contains indices of homogeneity of different types of description in all 20 sonnets. With the
exception of 20 in the samples homogeneity of description demonstrates a comparatively limited range

                                                   7
of variability. It should be noted that sonnets are rather brief limited to 14 lines, and nevertheless
in a number of cases the differences are apparent. This can be shown graphically. In Figure 2 the
scatterplot demonstrates the relations of two indices— Ic and Iv in 20 sonnets. On the horizontal
axis the values of Ic are set, the y-axis sets the values of Iv in the sonnets.


4     Conclusion & Discussion
The scatterplot shows that priority should be given to y-axis coordinate. X-axis does not provide
a basis for classification, except for 1 outlier (T20) all the other texts form a rather dense group.
On the other hand, index Iv , showing the dynamic homogeneity, divides the sonnets into 2 groups.
The first one consists of 2, 3, 5, 9 17. It should be noted that 2 and 9 completely overlap and are
marked by a common dot. All the other sonnets form another group. At the level of Iv = 1.5 it is
split into two subgroups of equal number of texts. The homogeneity index of dynamics in description
Iv = 1.5 is observed in three texts (15, 18, 20) thus forming a basis for the splitting of the whole
group. Above and below this borderline there are 6 texts in each subgroup.
    On the whole it is possible to conclude that this research demonstrated some order in all PoS
distribution in sonnets and the relations between words which depict static and dynamic description
of the poetic world.


                                                  8
References
[Best, 1994] Best K.-H. (1994). Word class frequencies in contemporary German short prose texts
     // Journal of Quantitative Linguistics. 1994. Vol. 1. P. 144–147.

[Stamou, 2008] Stamou C. (2008) Stylochronometry: Stylistic Development, Sequence of Composi-
    tion, and Relative Dating // Literary Linguistic Computing, 23(2). P 181–199.

[Gasparov, 2012] Gasparov M. L. (2012). Exact methods of grammar analysis in verse. In.: M.L Gas-
    parov. Selected works. V. 4. Moscow: Languages of Slave culture, 2012. P. 23–35. (In Russian)
    = Tochniye metody analiza grammatiki v styhe // M.L. Gasparov. Izbranniye trudy. .4. 2012.
    S. 23–35.

[Čech, Altmann, 2013] Čech R., Altmann G. (2013) Descriptivity in Slovak lyrics. Glottotheory.
     2013. Vol. 4 (1). P. 92–104.

[Andreev, Popescu, Altmann, 2017] Andreev, S., Popescu, I.-I., Altmann, G. (2017). Skinner’s hy-
    pothesis applied to Russian adnominals. In: Glottometrics 36. RAM-Verlag. P. 32–69.

[Andreev, Mistecky, Altmann, 2018] Andreev S., Mistecky M., Altmann G (2018). Studies in quan-
    titative linguistics - 29. Lûdenscheid: RAM-Verlag, 2018. – 130 p.

[Naumann, Popescu, Altmann, 2012] Naumann S., Popescu I.-I., Altmann G. (2012). Aspects of
    nominal style // Glottometrics. 2012. V. 23. P. 23–55.

[Martynenko, 2004] Martynenko G. Ya. (2004) Rhythmic-semantic dynamics of the Russian classical
    sonnet. Saint-Petersburg: SPb University, 2004. – 30 p. (In Russian) = Ritmiko-smyslovaya
    dynamika russkogo klassicheskogo soneta. SPb: SPbGU, 2004. – 30 s.


                                               9

</pre>