=Paper= {{Paper |id=Vol-2667/paper9 |storemode=property |title=Improvement of the algorithm of automated definition of rhyme |pdfUrl=https://ceur-ws.org/Vol-2667/paper9.pdf |volume=Vol-2667 |authors=Vladimir Barakhnin,Olga Kozhemyakina,Ilya Pastushkov,Irina Kuznetsova,Yulia Borzilova }} ==Improvement of the algorithm of automated definition of rhyme == https://ceur-ws.org/Vol-2667/paper9.pdf
            Improvement of the algorithm of automated
                      definition of rhyme
          Vladimir Barakhnin                                  Olga Kozhemyakina                                            Ilya Pastushkov
      Federal Research Center for                          Federal Research Center for                                Federal Research Center for
    Information and Computational                        Information and Computational                              Information and Computational
             Technologies                                         Technologies                                               Technologies
         Novosibirsk, Russia                                  Novosibirsk, Russia                                        Novosibirsk, Russia
            bar@ict.nsc.ru                               ORCID: 0000-0003-3619-1120                                 ORCID: 0000-0002-0341-7931

           Irina Kuznetsova                                     Yulia Borzilova
      Federal Research Center for                          Federal Research Center for
    Information and Computational                        Information and Computational
             Technologies                                         Technologies
         Novosibirsk, Russia                                  Novosibirsk, Russia
    ORCID: 0000-0002-6890-1636                           ORCID: 0000-0002-8265-9356

    Abstract—The paper considers approaches to the                           authors of the study [3] conducted an experiment for
improvement of one of the steps of the algorithm used for the                languages similar in structure, which ended unsuccessfully
automated determination of rhyme in poetic texts. The                        due to the specifics of each of the languages considered by
automated rhyme detection tool is one of the modules of the                  the authors.
system of complex analysis of poetic texts. In the current
module implementation, the rhyme search and definition                           The problem of analyzing the metrorhythm of poetic
subtask are solved by finding words with consonant endings                   texts for each language (or a group of the similar languages)
using the A. A. Zaliznyak Grammar Dictionary of the Russian                  is obtained differently. Next, we will consider some of the
Language and the basic rules of phonetic analysis. Alternative               projects of the authors who solve the indicated problem for
solutions to the search problem in the dictionary of words with              different languages.
consonant endings are proposed. The results obtained will
allow us to conclude that the current implementation is                           D. Fusi in studies [4, 5] introduced the Chiron system,
optimistic and the methods used can be finalized to solve the                which allows analyze with several languages (Latin and
problems of determining the rhyme of a poetic text.                          Greek). The system is built at a level of abstraction in which
                                                                             it is possible to work with several different languages, meters
   Keywords—analysis of poetic texts, metrorhythmic analysis,                and texts. The developed system have a modular structure,
rhyme identification                                                         each module interacts with the next one by data transfering
                                                                             (in a predetermined format). The higher the level of the
                        I. INTRODUCTION                                      hierarchical chain, the more abstract analysis is performed by
    One of the tasks of the automated complex analysis of                    this component. Hierarchy levels in the system:
poetic texts [1] is to determine the characteristics which are
                                                                                        phonetics and prosody;
related with the metrorhythmics of a poem. Among the
works where the statistical information extracted from the                              appositives and clique;
poetic text was used for the solving of philological problems,
we can mention the study [2], which, despite compiling it                               metric scan.
manually, presents a rather comprehensive statistical picture                    The author [4, 5] does not mention the accuracy statistic
of the metrorhythmics of Pushkin’s works, what allows the                    in determining each level (phonetics, clitics, metrics), but it
authors to find the patterns inherent to Pushkin’s rhyme. The                can be assumed that the accuracy is not the maximum. The
modern information technologies make possible to conduct                     author emphasized that the developed system (as well as
such studies, if not completely automatically, then with                     similar ones) does not imply a complete replacement of the
minimal usage of the work of expert-philologists.                            expert; the main task is to provide researchers with data
    The problem solution of automated analysis of poetic                     whose processing costs occupy a significant share of human
texts requires the adaptation to various languages. The                      resources.
different approaches are caused by both the specifics of the                     B. Navarro proposed a tool that studies the metrics of
language (in particular, the features of the construction of                 Spanish sonnets and performs semantic analysis of poems [6,
poetic texts) and the tools used by researchers. The toolkit, in             7]. Currently, this system is applied to a corpus of 5078
turn, depends on the goals set by the researchers (for                       sonnets of the XVI and XVII centuries. The corpus is
example, to obtain the confirmation of any regularity in the                 converted to the TEI format 1 ; the sequence of characters
structure of the poetic text), and, to some extent, on                       from one poem without additional marking is input to the
technologies that were relevant at the time of the study.                    system. A rule-based module performs separating syllables:
   As for the linguistic versatility of instruments, it is                   an external grammar marking system is used. If the syllabic
impossible to develop a system for automated analysis of                     partition produces 10 metric syllables, then the system
meter and rhythm, designed for a wide range of languages.                    considers that the scan is complete. For non-standard
Moreover, the insoluble task is to develop a metrorhythmic                   situations, a number of rules applied (a detailed description
analysis system suitable for at least a group of related                     of the rules is not given by the authors of the project).
languages — each language requires the development of its
own approaches that take into account its structure. The                     1
                                                                                 TEI: Text Encoding Initiative: https://tei-c.org/


Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science

    The system [3], mentioned previously, is a tool for the                        line numbering of the poem;
complete analysis of poetic texts in Portuguese. The system
input is a poems in XML format and it scan each poem                               tokenization of words;
independent of other poems in the corpus. Includes the                             accentuation of the poem;
following steps:
                                                                                   selection of rhymed lines;
        text preprocessing (conversion to XML format);
                                                                                   syllabic determination.
        extract words from a poem;
                                                                              The authors2 developed an open network resource, which
        finding a stressed syllable;                                     is represented by the components: the problem-oriented
        division into syllables;                                         “Poetology Thesaurus” and the “Block of Analysis and
                                                                          Specification” of the text objects. In the “analysis and
        phonetic transcription forming (using an independent             specification” block, two sets of tasks are identified [15]: a
         dictionary);                                                     specification of terminological articles of a thesaurus and a
                                                                          specification of a poem. The structure of the complex
        selection of transcription options for each poem                 includes groups of solutions of the problems:
         (determination of the rhythmic scheme);
                                                                                   metrorhythmic marking of the text;
        an attempt to determine the metric of a poem;
                                                                                   filling of the fields of the specification of the poem;
        search for matching metrics based on the most
         appropriate rhythmic scheme;                                              meter identification.
        splitting a work into syllables according to metric.                 Among the tools that execute metrorhythmic analysis,
                                                                          web resources 3 , 4 are of interest. The first of them,
    The results of the analysis [3] showed a high percentage              Rifmoved.ru, is positioned as an supporting tool for the
of accuracy (95–98%), however, for other languages (similar               analysis of poetic text, which determines the stanza and the
in structure), the experiment on the analysis of poetic texts             forms (sonnet, sextine). The algorithms were developed on
ended unsuccessfully due to the specifics of individual                   the basis of the author’s concept of program poetry analysis
languages.                                                                by V. Onufriev, however, a theoretical description of these
    M. Agirrezabal et al. [8] developed the ZeuScansion                   algorithms was not found in scientific sources. The authors
system, which performs syntax analysis for English poems.                 of the resource indicate that the work of the algorithms is
The system uses dictionaries to determine the stressed                    designed to analyze poetic texts written in traditional forms,
syllable in a word. By combining words to form the stress                 classical stanzas and sizes. This fact greatly limits the usage
pattern of the whole poem, the system also performs syntax                of the tool for large corpuses of texts of poets who are not
analysis, followed by a series of rules. If the word was not              related to classical literature.
found in the dictionary, the program searches and uses the                    The second resource, the Neogranka.ru, obviously, is an
nearest word in heuristics.                                               amateur web portal for determining the poetic size,
    R. Ibrahim and P. Plecháč [9, 10] developed the KVĚTA                 generating new poems and selecting rhymes. When a user
system, the purpose of which is to analyze Czech poems. The               tries to determine the verse size, the service clarifies all
system got a poem as input, the words of which should                     controversial situations (accentuation options), what takes a
contain morphological marking. KVĚTA applies a series of                  lot of user time. There is also no theoretical description of
rules to poems that transform a poetic text into a phonetic               the algorithms used in available sources.
transcription; if the rules cannot be applied, a dictionary is                It is important to note that almost all the algorithms
applying. The system compares the patterns found in the                   mentioned above are aimed to study relatively small text
poems with the generated variations. Initially, the idea of a             corps covering the work of one or more authors, therefore,
metric index was used [11]. Later, the authors used a metric              the speed of rhyme determination algorithms is not a critical
coefficient using some others parameters, which allowed to                parameter. However, in the research conducted at the Federal
increase the accuracy from 94.88% to 95.94%.                              Research Center for Information and Computational
   A number of works are known devoted to the analysis of                 Technologies (FRC ICT), it is planned to study the
versification for Arabic and similar languages. A. Kurt and               interdependence of the phonometric and lexical-thematic
M. Kara [12] proposed an algorithm for recognizing and                    levels of poetic texts with the aim of identifying and
analyzing poems written in a special, typical for eastern                 measuring the relationships of semantic associations
(Arabic, Persian, Turkish) poetry, versification system                   described on the basis of semantic fields with poetic sizes;
“arud”. M. A. Alnagdawi described a method for finding                    the so-called textures that take into account the construction
poetic metrics using context-free grammars [13]. A.                       and metrorhythmics (a detailed statement of the problem
Almuhareb et al. [14] described some methods for defining                 described in [17]). One of the main difficulties in solving
poetic patterns in Arabic for extracting verses.                          problem mensioned above is the need to analyze corpus of
                                                                          poetic texts of a large volume, as a result of which the task of
   For the Russian language, a number of solutions during                 optimizing rhyme search algorithms from the point of view
metrorhythmic analysis problem are also proposed. In study                of time spending becomes necessary. As usual, these
[15], the automatic procedures for specifying a poetic text —             algorithms use multiple queries to databases containing
metrorhythmic marking and identification of a verse meter
— were considered. The automation of metrorhythmic                        2
                                                                            Wikipoetics: http://wikipoetics.ru/
marking is achieved by using the following procedures [16]:               3
                                                                            Rifmoved.ru. http://rifmoved.ru/analiz_stihov.htm
                                                                          4
                                                                            Neogranka.ru. http://neogranka.ru/razmer_stiha.html



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                    37
Data Science

phonetic transcription of words, so we are faced with the task                              III. THE RESEARCH PROCESS
of optimizing such SQL queries. Note that this task is
becoming actual in all areas of scientific research working               A. Data preparation
with Big data: from business analytics to the analysis of                     One of the options for the search implementation is the
Earth remote sensing data (for example, [18, 19]).                        partition of the source table with words into sections. The
                                                                          version of the PostgreSQL database deployed on the FRC
                     II. THE PROBLEM STATEMENT                            ICT server supports simple partitioning: the splitting of one
    A web application has been developed at FRC ICT 5 ,                   large logical table into several small physical sections7. The
which is used to analyze the structural level of Russian-                 benefits of the usage of sections:
language poetic texts. The algorithms are described in [20],                       When queries or updates access a large percentage of
they does not involve a work with complex cases of analysis                         a single partition, the performance can be improved
of poetic texts, therefore, in [21], the implementation of the                      by taking advantage of sequential scan of that
algorithms from [15] was proposed, what includes a more                             partition instead of using an index and random
rigorous classification of poems by meter. But in the                               access reads scattered across the whole table.
algorithm for determining the rhyme from [15], the authors
use the web-based application “Big Rhyme Dictionary”6: the                         The bulk loads and deletes can be accomplished by
application receives a word, the output returns the full set of                     adding or removing partitions, if that requirement is
words rhyming with it (out of context). However, this                               planned into the partitioning design. ALTER TABLE
approach takes a lot of time and resources, therefore, the                          NO INHERIT and DROP TABLE are both far faster
rhyme search algorithm [21] is implemented for reasons of                           than a bulk operation. These commands also entirely
the possibility of rhyme creation: the lines rhyme if the last                      avoid the VACUUM overhead caused by a bulk
words in the line have the same position of the stressed                            DELETE.
syllable and the endings phonetically match.
                                                                                   Seldom-used data can be migrated to cheaper and
   To identify the phonetically matching endings, the data                          slower storage media.
about endings from article [22] are used. The algorithm
request a word into a table with words aggregated from A.A.                   PostgreSQL supports partitioning via range partitioning
Zaliznyak’s dictionary [23], implemented in a standard way.               (for example, one might partition by date ranges) and list
                                                                          partitioning − the table is partitioned by explicitly listing
   The purpose of the study is to find for alternative                    which key values appear in each partition. In this study, the
approaches to search for rhyming lines in the database and                list partitioning is used, where the ends of the dictionary
conduct a series of experiments to find out the most effective            words are indicated as key values.
method for usage in the algorithm. The proposed solution:
                                                                              To create a list of sections in form of tables, a Python
    1) To build a table with inverted rows sorted                         script is used that operates by the following algorithm:
lexicographically.
                                                                              1) The request to a table with words.
    2) To separate all words into sections by ranges after
                                                                              2) The selection of the N-last characters from the word.
endings. In other words, the store endings (inverted)
                                                                              3) The formation of an array of all dictionary endings.
separately from the word (as metadata).
                                                                              4) The counting and sorting the usage of each ending in
    3) To add the trigram symbolic indexes to the original
                                                                          descending order.
table with all the words.
                                                                              5) The separating of M-first endings from the array, on
    4) To perform an experiment with the aim of find
                                                                          the basis of which the sectioning will be performed.
rhyming words using sections (search only endings) and                        As the last N characters, four characters are taken, this
trigram indexes.                                                          value can be changed in the future. To build a sorted
    5) To compare the performance of a section search                     dictionary, we use the collections module of the Counter
option using indexes or a combination of these options.                   library8. The result is a dictionary of the following structure:
    To test the hypothesis about the effectiveness of
application of the trigram symbolic indexes, it was decided                  Counter({'НОГО': 86077, 'ЕЙСЯ': 76978, 'ВШЕЙ':
to conduct an additional experiment to measure the                        76400, 'ИМСЯ': 62934, 'ШЕГО': 61719, 'ГОСЯ': 57630,
performance of SQL queries using the indexes in the search                'ИХСЯ': 57617, 'НОМУ': 57354, 'ВШИМ': 57282 ...})
module of the complex analysis of poetic texts. This module                   In the received dictionary, the key is the desired ending
solves the problem of searching for low-level characteristics             (last N characters), and the value is the number of
(for example, metrorhythmic statistics) and high-level                    occurrences of this ending. It was decided to isolate the
characteristics (for example, genre-style affiliation). When a            values of endings with coefficients included in the 90th
search query is done, SQL queries to the database are                     percentile from the created dictionary. These endings were
generated, some of which include a search by values of the                used to create the sections.
varchar and text type. The execution time of such queries can
be reduced by using the symbolic indexes.                                     The process of the creation of partitioned tables includes
                                                                          the following steps:



                                                                          7
                                                                            PostgreSQL: Documentation 9.4: Partitioning.
                                                                          https://www.postgresql.org/docs/9.4/ddl-partitioning.html
5
    Analysis of poetic texts online. http://poem.ict.nsc.ru/              8
                                                                            Collections — Container datatypes.
6
    Big Rhyme Dictionary. http://rifmovnik.ru/docs.htm                    https://docs.python.org/3.7/library/collections.html



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                 38
Data Science

   1) To create a parent table whose properties inherit all                natural languages, as well as for solving related problems,
the child tables (sections). The parent table is a table with              such as, for example, fuzzy search (search by similarity).
words from the dictionary of A. A. Zaliznyak [23] with the                     PostgreSQL supports two types of text indexes 10 : GIN
structure:                                                                 (Generalized Inverted Index) and GIST (Generalized Search
       a) identifier;                                                      Tree), which provide a work with symbol trigrams, what is
       b) word;                                                            prerequisite for using GIN, which operates the lexemes by
                                                                           defaults. Despite the fact that the GIN by description is very
       c) accentuation (the number of the syllable to which
                                                                           similar to the experiment with inverted strings, GIST also has
the stress falls);                                                         a basis for application: its tree-like structure increases the
       d) word type;                                                       completeness of search results by including inaccurate hits,
    As an additional column, the endings (last N characters)               which is quite suitable for the rhyme search task, since the
of each word in inverted order are added.                                  table from the work of V.M. Zhirmunsky [22] contains, inter
                                                                           alia, the pairs of endings that do not coincide in spelling.
    2) To create the child tables with inheritance of parent
structure. In these tables there will be no additional columns                 As part of the search module for a comprehensive
except legacy ones. All child tables will be called the                    analysis of poetic texts, it is possible to add text indexes to
sections.                                                                  solve the following problems:
    3) To add the restrictions to the section tables that                           search by accentuation mask;
define the valid key values for each section. The restrictions
                                                                                    search by words from the name and text.
do not overlap — no key values apply to several sections at
once.                                                                          The corpus of Pushkin’s works was loaded to the
    4) To create a key column index for each section. In this              database; the main tables with the texts of works and their
study, the indexes were created for the “word” column.                     metadata contain data with a volume of more than 700 rows.
    5) To define a trigger to redirect data added to the main              The search query includes not only the direct solution of the
table to the corresponding section. Created trigger is work                above problems, but is also adapted for the user to
                                                                           understand: the response array includes additional entries,
out when SQL command INSERT is run.
                                                                           such as the author’s full name and title of the poem, i.e. the
    The created trigger launches a function that adds values
                                                                           request is composite. During the experiment, the query
to the corresponding section (table). Fragment of the
                                                                           runtime of processing additional parts of the request are not
function:
                                                                           taken into account.
   CREATE         OR       REPLACE            FUNCTION
                                                                           B. Experiments
words_with_reversed_endings_insert_function()
                                                                               It is supposed to conduct the following experiments with
    RETURNS TRIGGER AS $$                                                  the search for rhymes in corpuses of the PostgreSQL
    BEGIN                                                                  database:

            IF (NEW.ending = 'нии') THEN                                       1) To search for the desired ending among the section
                                                                           names: SELECT * from pg_catalog.pg_tables where
                  INSERT INTO words_with_endings_nii                       %section name conditions%. It is worth noting that only in
(id, word_form, ending, accent, word_type)
                                                                           this experiment the previously inverted lines described
                VALUES (NEW.id, NEW.word_form,                             above are used.
reverse(NEW.ending), NEW.accent, NEW.word_type);                               2) To search the endings by the incomplete match of
            ELSIF (NEW.ending = 'ний') THEN                                LIKE on a table without indexes.
                                                                               3) To search the endings by the incomplete match of
                  INSERT INTO words_with_endings_niy                       LIKE on the table with the constructed GIN index by symbol
(id, word_form, ending, accent, word_type)
                                                                           trigrams: CREATE INDEX trgm_idx ON test_trgm USING
                VALUES (NEW.id, NEW.word_form,                             GIN (t gin_trgm_ops);
reverse(NEW.ending), NEW.accent, NEW.word_type);                               4) To search the endings by the incomplete match of
    The creation of tables, indexes, trigger and function is               LIKE on the table with the constructed GIST index by
performed through a Python script in an automated mode.                    symbol trigrams: CREATE INDEX trgm_idx ON test_trgm
The manual adjustment of table and index names is required,                USING GIST (t gist_trgm_ops).
since transliterated ending names were used for naming —                       For conducting the experiment, the smallest possible
some cases required the manual intervention (transliterate9 is             sample of 100 examples of endings was taken; 80% of the
used). These cases include, for example, the coincidence of                sample consisted of randomly selected the most frequently
names during transliteration of the endings “ЕМСЯ” and                     used endings (the first 500 one), the remaining 20% were
“ЁМСЯ”.                                                                    examples from the following 100 used endings (also
                                                                           randomly selected). The time spent on experiments were
    In the context of a PostgreSQL database, a trigram is a                measured for each of the options (1)–(4). During each
group of three consecutive characters. We can measure the                  experiment, the characteristics are received (the
similarity of the two lines by counting the number of                      abbreviations are indicated in brackets):
matching trigrams. This simple idea turns out to be very
effective for measuring the similarity of words in many
                                                                           10
                                                                             Postgres Pro Standard.
       9
           Transliterate – PyPi. https://pypi.org/project/transliterate/   https://postgrespro.ru/docs/postgrespro/9.5/textsearch-indexes



VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                      39
Data Science

          average SQL query runtime (avg);                                      LEFT JOIN
          50th percentile (median);                                                   "MRSTATISTICS" m ON m."POEM_ID" = p."ID"
          90th percentile (90 perc);                                            WHERE
          95th percentile (95 perc);                                                  m."ACCENTUATION_MASK" LIKE 'cC cC cC c'

          98th percentile (98 perc).                                         The query runtime was measured with different variants
                                                                          of the search conditions (for example, a search for a different
      The results are shown in table I.                                   number of words); the result was an average score of 10
                                                                          queries with GIST and GIN indexes. The results of the
            TABLE I.           THE RESULTS OF THE EXPERIMENT              experiment are shown in table II.
                       Avg         Median   90 perc   95 perc   98 perc
                                                                            TABLE II.         THE RESULTS OF AN EXPERIMENT TO ADD INDEXES TO A
    Search among
                       0.798        0.831    1.704    1.942     2.381                                    SEARCH MODULE
    section names
    Without                                                                             Index name                   Avg query runtime, msec
                       2.497        2.364    3.450    4.109     4.193
    indexes                                                                                                      without
                                                                                                                               GIST         GIN
    GIN                2.258        2.130    2.974    3.343     3.645                                            indexes
                                                                                Search by accentuation mask        99           117         116
    GIST               2.407        2.392    3.179    3.330     3.607
                                                                                Search by words from the
      It is possible to make a number of conclusions:                                                              129          146         144
                                                                                name and text
          The least time-consuming option turned out to be
           (1), suggesting a search among section names. This                 The results of an additional experiment showed an
           indicator is partly conditioned by those endings for           increase in the time for processing queries for text values
           which the sections were not created — in such cases,           used in the SQL query. Such an increase can be caused either
           the cost of executing the SQL query was negligible.            by insufficient test data, or by the inefficiency of the applied
                                                                          indexes within the framework of the problem being solved.
          Search results without indexes and searches using
           the GIST index differ slightly from each other, what                                      IV. CONCLUSIONS
           indicates the inappropriateness of using the GIST                  The usage of PostgreSQL built-in database tools has long
           index to solve the research problem.                           been limited by search engines in their modern
          Satisfactory results showed the usage of the GIN               understanding, the results were returned on request in a
           index to search for incomplete matches (3).                    natural language using a DBMS (Database Management
                                                                          System). For the task of rhyme search, the program
    An additional experiment on measuring the time which is               performance is not a determining factor. In the present work,
spent for searching by the accentuation mask or by words                  the most prospective approaches were shown, as well as the
from the poems consists in the formation of search queries                examples on how to significantly speed up the algorithm
and comparison their effectiveness. Trigram symbolic                      using simple steps, what allows other researchers to apply
indexes GIST and GIN affected in the query are added                      these approaches as part of their research without requiring
separately to the text fields of the tables, namely the fields            expert knowledge of the PostgreSQL database. In addition,
“Mask of accentuation” and “Text of the poem”; at the first               the interface to access the DBMS does not change (except
stage of the experiment, the query runtime without indexes                for the manual construction of a table with inverted rows),
was measured. Types of executed requests:                                 what is convenient for developers who integrate the text
                                                                          analysis systems with the PostgreSQL database.
          without indexes;
          using GIST index (the operator class gist_trgm_ops                                        ACKNOWLEDGMENT
           was used);                                                        The study was carried with the support of the Russian
                                                                          Science Foundation (project No. 19-18-00466).
          using GIN index (the operator class gin_trgm_ops
           was used).                                                                                     REFERENCES
                                                                          [1]     V. Barakhnin and O. Kozhemyakina, “About the automation of the
   A fragment of a typical SQL query for which runtime                            complex analysis of russian poetic text,” CEUR Workshop
was measured:                                                                     Proceedings, vol. 934, pp. 167-171, 2012.
      SELECT                                                              [2]     N.V. Lapshina, I.K. Romanovich and V.I. Yarkho, “Metrical
                                                                                  Handbook for Pushkin’s poems,” M., L.: Academia, 1934.
           a."ID" as AUTHOR_ID,                                           [3]     A. Mittmann, “Escansão automático de versos em português,”
                                                                                  Universidade Federal de Santa Catarina, 2016.
           a."LASTNAME", a."FIRSTNAME",                                   [4]     D. Fusi, “An Expert System for the Classical Languages: Metrical
                                                                                  Analysis Components,” Lexis, vol. 27, pp. 25-45, 2008.
           a."MIDDLENAME",                                                [5]     D. Fusi, “A Multilanguage, Modular Framework for Metrical
                                                                                  Analysis: IT Patterns and Theorical Issues,” Langages, vol. 3, no.
           p."ID" as POEM_ID, p."NAME" as POEM_NAME,                              199, pp. 41-66, 2015.
                                                                          [6]     B. Navarro, “A Computational Linguistic Approach to Spanish
           m."ACCENTUATION_MASK" FROM "AUTHOR"                                    Golden Age Sonnets: Metrical and Semantic Aspects,” Proceedings of
a                                                                                 the Fourth Workshop on Computational Linguistics for Literature, pp.
                                                                                  105-113, 2015.
      LEFT JOIN
                                                                          [7]     B. Navarro, M.R. Lafoz and N. Sánchez, “Metrical Annotation of a
           "POEM" p ON p."AUTHOR_ID" = a."ID"                                     Large Corpus of Spanish Sonnets: Representation, Scansion and




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                             40
Data Science

     Evaluation,” Proceedings of the Tenth International Conference on        [16] I. Pilshchikov, and A. Starostin, “Reconnaissance automatique des
     Language Resources and Evaluation (LREC), pp. 4360-4364, 2016.                mètres des vers russes: Une approche statistique sur corpus,”
[8] M. Agirrezabal, “ZeuScansion: a Tool for Scansion of English                   Langages, vol. 3, no. 199, pp. 89-106, 2015.
     Poetry,” Journal of Language Modelling, vol. 4, no. 1, pp. 3-28, 2016.   [17] K. Taranovsky, “About the relationship between poetic rhythm and
[9] R. Ibrahim and P. Plecháč, “Towards the Automatic Analysis of                  topic,” About poetry and poetics, Moscow: Languages of Russian
     Czech Verse,” Formal Methods in Poetics, Lüdenscheid: RAM-                    culture, pp. 372-403, 2000.
     Verlag, pp. 295-305, 2011.                                               [18] N.I. Golov and L. Ronnback, “SQL query optimization for highly
[10] P. Plecháč, “Czech Verse Processing System KVĚTA — Phonetic                   normalized Big Data,” Business Informatics, vol. 33, no. 3, pp. 7-14,
     and Metrical Components,” Glottotheory, vol. 7, no. 2, 2016.                  2015.
[11] I. Pilshchikov, and A. Starostin, “The problems of automation of         [19] L.I. Lebedev, Yu.V. Yasakov, T.Sh. Utesheva, V.P. Gromov, A.V.
     basic procedures rhythmic parsing accentual-syllabic texts,” Russian          Borusjak and V.E. Turlapov, “Complex analysis and monitoring of
     National Corpus: 2006-2008: New results and perspective, pp. 298-             the environment based on Earth sensing data,” Computer Optics, vol.
     315, 2009.                                                                    43, no. 2, pp. 282-295, 2019. DOI: 10.18287/2412-6179-2019-43-2-
[12] A. Kurt and M. Kara, “An algorithm for the detection and analysis of          282-295.
     arud meter in Diwan poetry,” Turkish journal of electrical engineering   [20] V.B. Barakhnin, O.Y. Kozhemyakina and A.V. Zabaykin, “The
     & computer sciences, vol. 20, no. 6, pp. 948-963, 2012.                       Algorithms of Complex Analysis of Russian Poetic Texts for the
[13] M.A. Alnagdawi, H. Rashideh and A.F. Aburumman, “Finding                      Purpose of Automation of the Process of Creation of Metric
     Arabic Poem Meter using Context Free Grammar,” Journal of                     Reference Books and Concordances,” CEUR Workshop Proceedings,
     Communications and Computer Engineering, vol. 3, no. 1, pp. 52-59,            vol. 1536, pp. 138-143, 2015.
     2013.                                                                    [21] V.B. Barakhnin, O.Yu. Kozhemyakina and I.V. Kuznetsova,
[14] A. Almuhareb, “Recognition of Classical Arabic Poems,”                        “Development and Implementation of the Algorithm for Automatic
     Proceedings of the Workshop on Computational Linguistics for                  Analysis of Metrorhythmic Characteristics of Russian Poetic Texts,”
     Literature, pp. 9-16, 2013.                                                   CEUR Workshop Proceedings, vol. 2523, pp. 290-298, 2019.
[15] V.N. Boikov, M.S. Karyaeva, V.A. Sokolov and I.A. Pilshchikov,           [22] V.M. Zhirmunsky, “Rhyme, its history and theory,” Petrograd:
     “On an Automatic Procedure for the Specification of a Poetic Text for         Academia, 1923.
     an Open Information-Analytical System,” CEUR Workshop                    [23] A.A. Zaliznyak, “Grammatical dictionary of the Russian language.
     Proceedings, vol. 1536, pp. 144-151, 2015.                                    The changing word forms: about 10,000 words,” M.: Russian
                                                                                   language, 1980.




VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                               41