=Paper=
{{Paper
|id=Vol-2870/paper52
|storemode=property
|title=Marking up Dramatic Text: a Case Study of “7 Stories” by Morris Panych
|pdfUrl=https://ceur-ws.org/Vol-2870/paper52.pdf
|volume=Vol-2870
|authors=Ivan Bekhta,Nataliia Hrytsiv,Anastasiia Matviychuk
|dblpUrl=https://dblp.org/rec/conf/colins/BekhtaHM21
}}
==Marking up Dramatic Text: a Case Study of “7 Stories” by Morris Panych==
Marking up Dramatic Text: a Case Study of “7 stories” by Morris Panych Ivan Bekhtaa, Nataliia Hrytsivb and Anastasiia Matviychukb a Lviv Franko National University, Universytetska Street, 1, Lviv, 79000, Ukraine b Lviv Polytechnic National University, Stepana Bandery Street, 12, Lviv, 79000, Ukraine Abstract The paper elucidates the process, challenges and results of using computational linguistics tools (NLP) and pre-computer technique (TEI for personage utterance tagging) in processing dramatic text. As the material for analysis we have chosen the modern play ―7 stories‖ of the Canadian playwright Morris Panych, researched from the viewpoint of statistical indicator’s and textual coefficients. Special attention is paid to statistical parameters of main personages in the play. Results obtained show numeric characteristics of such data: number of meanings (N); maximal meaning (max); minimal meaning (min); range (R); mode (Mo); median (Md); mean (Ẋ); standard deviation (Ϭ); coefficient of variation (ν); standard error (Sẋ); measurement error (ε). Keywords 1 Translation, NLP, quantitative analysis, text mark-up, applied linguistics, drama text, tagging. 1. Introduction Modern practices and techniques of using markup and NLP tools have proved its efficiency in relation to systematic processing and analysis of texts, which results in generating novel systems and well-elaborated tools for language processing. It is a powerful and promising technology to objectively reconstruct arguments in order to amply exemplify its findings and formulate well- grounded hypotheses. This study is part of a larger research project on the creativity of Morris Panych and the reception of his writings via translation(s). Presented in the paper is the preliminary algorithm for developing analytical findings concerning the reasons behind deviation within the aspect of statistical parameters of a source and target texts. Further elaborations are forthcoming. In focus is the drama ―7 stories‖ of Morris Panych [1] and its Ukrainian translation [2]. We briefly discuss key characteristics of marking-up dramatic text and illustrate the results obtained; we also demonstrate its primary advances. Based on the aquired evidence, certain considerations are made regarding the usage of quantitative comparable analysis for further comparison of ST and TT statistics and ratio findings. Thus, linguistic research determines an effective approach to the study of text, using mathematical methods and tools in combination with computer technology, which open new horizons for the linguistic analysis of new broad perspectives. A complete and comprehensive description of language and speech requires a diligent insight into both qualitative and quantitative features of linguistic objects. COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22–23, 2021, Kharkiv, Ukraine EMAIL: ivan.bekhta@gmail.com (I. Bekhta); nataliia.m.hrytsiv@lpnu.ua (N. Hrytsiv); anastasija838@gmail.com (A. Matviychuk) ORCID: 0000-0002-9848-1505 (I. Bekhta); 0000-0001-6660-7161 (N. Hrytsiv); 0000-0001-6056-1799 (A. Matviychuk) 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) In addition, an approach towards and detailed study dramatic texts, as a unique literary genre, is a separate challenge in current studies, which has special requirements within NLP tools application and text mark-up. Therefore, the study of Morris Panych's playwork "7 Stories" is relevant. The idea is that modern Canadian drama is the aspect, little studied from numerous viewpoints, i.e. philological, translatological, rhethorical; however, least studied from the angle of mathematical linguistics and statistics. In order to understand the specifics of dramatic works, the concept of author's style, postmodern literature, to which the work under study belongs, the life path of the author and translator were additionally considered. The play "7 Stories" by Morris Panych and translated by Ivan Krychfalushiy is an example of postmodern literature that has become a challenge and opposition to the laws of modernism. 2. Method and preparation characteristics Considering the vast quantities of ST and TT data available today for analysis, as discussed in [3, 4, 5, 6], Natural Language Processing is among most interesting and promising aspects of data science [7, 8, 9, 10, 11, 12, 13]. By default, text data of the original text is difficult to process [14, 15, 16, 17, 18] given the challenge of comparing/contrasting it to the translated drama text [19, 20, 21, 22, 23, 24], the task can be complicated [25, 26, 27, 28], though, incredibly appealing [29, 30, 31, 32, 33]. Within this study project, we opted for exploring the way NLP techniques, especially mark-up possibilities, can advance processing performing/drama text for statistical profiling of ST and TT. The project outlined in the current paper explores the ddistribution of the number of words in a sentence as well as other numeric characteristics being analyzed collectively and for all the characters of drama under analysis in their contrast with the Ukrainian translation. 2.1. Stages of working with the text document “7 stories” by Morris Panych A number of actions were performed for statistical analysis. Therefore, the analysis took place in the following stages: The books of the original text and the translation were pre-scanned for further manipulations using ABBYY Fine Reader software; Afterwards, it was converted from pdf to .docx to make it possible to work with text in terms of mark-up; The correct formatting of text was checked and discrepancies between scanned pdf file and text documents were detected; it was normalized in the MS Word editor; Next, the focus was on: Selection of text marking up system according to its features; Implementation of proper tags for the original work Implementation of proper tags for the translated version; Calculated texts results were processed using the Python programming language; Afterwards, the results of the statistical parameters, such as N, max, min, R, Mo, Md, Ẋ, Ϭ, ν, Sẋ, ε were analyzed and described. The original text and its translation was marked up using the same marking rules. To recall, the use was made of the XML (eXtensible Markup Language) – a text markup language. It was used to conduct research and implement on the structural level. The XML language was preffered since it fully determines the logical structure of a document. The task of the XML language is to ensure certain data: images, texts, and other parts of a Web document; it can be defined and structured regardless of the platform used to recreate them. Since in the current paper we deal with a dramatic work, text mark up and tag patterns were selected and adjusted for the appropriate analysis of this type of work. Thus, let us now turn our sights to text mark-up system, peculiar to drama text. 2.2. Mark-up pattern 2.2.1. Pattern Thus, the following text markings were chosen according to the features of the dramatic work:... — paired marking, which is used to indicate a solid whole part of the text related to a particular character;... — paired marking, which is used to indicate the name of the character with a colon;...—— paired marking, which is used to denote a sentence in the speech of the character;... — paired marking, which is used to mark all author's remarks throughout the text 2.2.2. ExampleThe action of the play takes place outside an apartment building-on the ledge, outside various windows of the seventh storey. As the play progresses, the lights emphasize the time elapsed between early evening and late night. As the play opens, we hear a party in progress from one of the windows, MAN stands on the ledge, in a state of perplexity, contemplating the depths below. He seems disturbed, confused. Then he comes to what seems to be a resolution. He prepares to jump. When he is about to leap, the window next to him flies open. CHARLOTTE appears. She holds a MAN wAllet, which she attempts to throw out the window, RODNEY,charging up from behind, grabs her hand. A window-ledge struggle ensues. CHARLOTTE Let GO of me!!!Let GO!!RODNEY (threatening) So-help-me-GOD, CHARLOTTE.CHARLOTTE (daring him) What??WHAT??!!RODNEY Give me back my wallet!She tries to throw it again. They struggle. RODNEY What’s WRONG with you?Are you CRAZY?!CHARLOTTE YES!YES, I AM!!!3. Results This section of the study presents statistics taken from the calculation of data based on the number of words in a sentence. That is, the unit of measurement in this statistical calculation is the word. The findings illustrate the contrast of ST and TT results of statistical parameters, i.e. N, max, min, R, Mo, Md, Ẋ, Ϭ, ν, Sẋ, ε. The schematic representation follows the data of each drama character one by one. 3.1. Analysis of the part of the text that belongs to the drama character of "Charlotte" Having analysed the distribution of the number of words in a sentence by absolute and relevant frequency, we have obtained such numeric characteristics: Charlotte: the whole ST data: 1 — 58 (90,62%); 2 — 4 (6,25%); 3 — 1 (1,56%); 4 — 1 (1,56%);. The data for «Charlotte» presupposes that the absolute frequency of sentence lengths with word number 1 equals to 58; consequently, with word number of 2 equals to 4; with word number 3 equals to 1; with word number of 4 equals to 1. Talking about translation, the most frequent are sentences with the number of words that equals to 1. Charlotte: the whole TT data: 1 — 35 (30,97%); 4 — 17 (15,04%); 5 — 13 (11,50%); 2 — 12 (10,62%); 6 — 12 (10,62%); 3 — 10 (8,85%); 7 — 5 (4,42%); 11 — 3 (2,65%); 9 — 2 (1,77%); 10 — 2 (1,77%); 8 — 1 (0,88%); 12 — 1 (0,88%). The last two are the least frequent. On the basis of the data above the following calculations are made of number of meanings, maximal meaning, minimal meaning, range, mode, median, mean, standard deviation, coefficient of variation, standard error, measurement error. Results are presented in Table 1. Table 1 CHARLOTTE Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 64 113 max 4 12 min 1 1 R 3 11 Mo 1 1 Md 2,5 6,5 Ẋ 1,14 3,72 Ϭ 0,25 2,69 ν 0,4347 0,7243 Sẋ 0,062 0,2532 ε 0,1065 0,1335 Table 1 proves the following results: ST data numeric characteristic: Number of meanings (N) — 64; maximal meaning (max) — 4; minimal meaning (min) — 1; range (R) — 3; mode (Mo) — 1; median (Md) — 2,5; mean (Ẋ) — 1,14; standard deviation (Ϭ) — 0,50; coefficient of variation (ν) — 0,4347; standard error (Sẋ) — 0,0620; measurement error (ε) — 0,1065. TT data numeric characteristic: Number of meanings (N) — 113; maximal meaning (max) — 12; minimal meaning (min) — 1; range (R) — 11; mode (Mo) — 1; median (Md) — 6,5; mean (Ẋ) — 3,72; standard deviation (Ϭ) — 2,69; coefficient of variation (ν) — 0,7243; standard error (Sẋ) — 0,2532; measurement error (ε) — 0,1335. 3.2. Analysis of the part of the text that belongs to the drama character of "Rodney" Having analysed the distribution of the number of words in a sentence by absolute and relevant frequency, we have obtained such numeric characteristics: Rodney: the whole ST data: 1 — 37 (90,24%); 2 — 3 (7,32%); 3 — 1 (2,44%). The data for «Rodney» presupposes that the absolute frequency of sentence lengths with word number 1 equals to 37; consequently, with word number of 2 equals to 3; with word number 3 equals to 1. Rodney: the whole TT data: 1 — 16 (21,05%); 2 — 13 (17,11%); 3 — 11 (14,47%); 4 — 9 (11,84%); 5 — 9 (11,84%); 6 — 7 (9,21%); 7 — 5 (6,58%); 9 — 3 (3,95%); 8 — 2 (2,63%); 10 — 1 (1,32%). Based on the data above, the following calculations are made and presented in Table 2. Table 2 RODNEY Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 41 76 max 3 10 min 1 1 R 2 9 Mo 1 1 Md 2 5,5 Ẋ 1,12 3,76 Ϭ 0,39 2,37 ν 0,3519 0,6304 Sẋ 0,0617 0,2721 ε 0,1077 0,1417 Table 2 shows the following results: ST data numeric characteristic: Number of meanings (N) — 41; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,12; standard deviation (Ϭ) — 0,39; coefficient of variation (ν) — 0,3519; standard error (Sẋ) — 0,0617; measurement error (ε) — 0,1077. TT data numeric characteristic: Number of meanings (N) — 76; maximal meaning (max) — 10; minimal meaning (min) — 1; range (R) — 9; mode (Mo) — 1; median (Md) — 5,5; mean (Ẋ) — 3,76; standard deviation (Ϭ) — 2,37; coefficient of variation (ν) — 0,6304; standard error (Sẋ) — 0,2721; measurement error (ε) — 0,1417. 3.3. Analysis of the part of the text that belongs to the drama character of "Man" By analogue to the previous characters (Charlotte and Rondey) we obtain the results for other characters; here – Man. Man: the whole ST data: 1 — 228 (87,36%); 2 — 27 (10,34%); 3 — 6 (2,30%). Thus, the data for «Man» states that the absolute frequency of sentence lengths with word number 1 equals to 228; consequently, with word number of 2 equals to 27; with word number 3 equals to 6. Man: the whole TT data: 1 — 99 (18,50%); 3 — 90 (16,82%); 4 — 78 (14,58%); 2 — 61 (11,40%); 5 — 48 (8,97%); 6 — 47 (8,79%); 7 — 37 (6,92%); 8 — 18 (3,36%); 9 — 16 (2,99%); 10 — 9 (1,68%); 11 — 8 (1,50%); 12 — 7 (1,31%); 15 — 4 (0,75%); 13 — 3 (0,56%); 16 — 3 (0,56%); 18 — 2 (0,37%); 14 — 1 (0,19%); 17 — 1 (0,19%); 19 — 1 (0,19%); 23 — 1 (0,19%); 27 — 1 (0,19%). Next, we have calculated number of meanings, maximal meaning, minimal meaning, range, mode, median, mean, standard deviation, coefficient of variation, standard error, measurement error. The results are demonstrated in Table 3. Table 3 MAN Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 261 535 max 3 27 min 1 1 R 2 26 Mo 1 1 Md 2 11 Ẋ 1,15 4,52 Ϭ 0,42 3,47 ν 0,3619 0,7678 Sẋ 0,0258 0,15 ε 0,0439 0,0651 ST data numeric characteristic: Number of meanings (N) — 261; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,15; standard deviation (Ϭ) — 0,42; coefficient of variation (ν) — 0,3619; standard error (Sẋ) — 0,0258; measurement error (ε) — 0,0439. TT data numeric characteristic: Number of meanings (N) — 535; maximal meaning (max) — 27; minimal meaning (min) — 1; range (R) — 26; mode (Mo) — 1; median (Md) — 11,0; mean (Ẋ) — 4,52; standard deviation (Ϭ) — 3,47; coefficient of variation (ν) — 0,7678; standard error (Sẋ) — 0,1500; measurement error (ε) — 0,0651. 3.4. Analysis of the part of the text that belongs to the drama character of "Leonard" By analogue to the previous characters we obtain the results for the character – Leonard. Leonard: the whole ST data: 1 — 92 (86,79%); 2 — 12 (11,32%); 3 — 2 (1,89%). Leonard: the whole TT data: 1 — 30 (14,49%); 5 — 28 (13,53%); 2 — 27 (13,04%); 3 — 27 (13,04%); 4 — 24 (11,59%); 6 — 23 (11,11%); 8 — 15 (7,25%); 7 — 8 (3,86%); 9 — 5 (2,42%); 10 — 5 (2,42%); 12 — 4 (1,93%); 14 — 3 (1,45%); 13 — 2 (0,97%); 16 — 2 (0,97%); 17 — 2 (0,97%); 11 — 1 (0,48%); 19 — 1 (0,48%). Table 4 LEONARD Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 106 207 max 3 19 min 1 1 R 2 18 Mo 1 1 Md 2 9 Ẋ 1,15 4,94 Ϭ 0,41 3,53 ν 0,3539 0,7148 Sẋ 0,0396 0,2453 ε 0,0674 0,0974 ST data numeric characteristic: Number of meanings (N) — 106; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,15; standard deviation (Ϭ) — 0,41; coefficient of variation (ν) — 0,3539; standard error (Sẋ) — 0,0396; measurement error (ε) — 0,0674. TT data numeric characteristic: Number of meanings (N) — 207; maximal meaning (max) — 19; minimal meaning (min) — 1; range (R) — 18; mode (Mo) — 1; median (Md) — 9,0; mean (Ẋ) — 4,94; standard deviation (Ϭ) — 3,53; coefficient of variation (ν) — 0,7148; standard error (Sẋ) — 0,2453; measurement error (ε) — 0,0974. 3.5. Analysis of the part of the text that belongs to the drama character of "Jennifer" Jennifer: the whole ST data: 1 — 21 (84,00%); 2 — 3 (12,00%); 6 — 1 (4,00%);. Jennifer: the whole TT data: 6 — 5 (19,23%); 4 — 4 (15,38%); 2 — 3 (11,54%); 3 — 3 (11,54%); 5 — 2 (7,69%); 9 — 2 (7,69%); 1 — 1 (3,85%); 7 — 1 (3,85%); 8 — 1 (3,85%); 10 — 1 (3,85%); 11 — 1 (3,85%); 14 — 1 (3,85%); 15 — 1 (3,85%). Table 5 JENNIFER Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 25 26 max 6 15 min 1 1 R 5 14 Mo 1 6 Md 2 7 Ẋ 1,32 5,96 Ϭ 1,01 3,55 ν 0,7642 0,5948 Sẋ 0,2018 0,6955 ε 0,2996 0,2287 ST data numeric characteristic: Number of meanings (N) — 25; maximal meaning (max) — 6; minimal meaning (min) — 1; range (R) — 5; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,32; standard deviation (Ϭ) — 1,01; coefficient of variation (ν) — 0,7642; standard error (Sẋ) — 0,2018; measurement error (ε) — 0,2996. TT data numeric characteristic: Number of meanings (N) — 26; maximal meaning (max) — 15; minimal meaning (min) — 1; range (R) — 14; mode (Mo) — 6; median (Md) — 7,0; mean (Ẋ) — 5,96; standard deviation (Ϭ) — 3,55; coefficient of variation (ν) — 0,5948; standard error (Sẋ) — 0,6955; measurement error (ε) — 0,2287. 3.6. Analysis of the part of the text that belongs to the drama character of "Marshall" Marshall: the whole ST data: 1 — 94 (85,45%); 2 — 15 (13,64%); 4 — 1 (0,91%). Marshall: the whole TT data: 2 — 31 (15,74%); 4 — 27 (13,71%); 3 — 26 (13,20%); 5 — 25 (12,69%); 6 — 21 (10,66%); 8 — 16 (8,12%); 7 — 11 (5,58%); 9 — 11 (5,58%); 1 — 9 (4,57%); 10 — 7 (3,55%); 11 — 6 (3,05%); 12 — 2 (1,02%); 16 — 2 (1,02%); 17 — 2 (1,02%); 23 — 1 (0,51%). Table 6 MARSHALL Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 110 197 max 4 23 min 1 1 R 3 22 Mo 1 2 Md 2 8 Ẋ 1,16 5,39 Ϭ 0,44 3,38 ν 0,376 0,6279 Sẋ 0,0417 0,2409 ε 0,0703 0,0877 ST data numeric characteristic: Number of meanings (N) — 110; maximal meaning (max) — 4; minimal meaning (min) — 1; range (R) — 3; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,16; standard deviation (Ϭ) — 0,44; coefficient of variation (ν) — 0,3760; standard error (Sẋ) — 0,0417; measurement error (ε) — 0,0703. TT data numeric characteristic: Number of meanings (N) — 197; maximal meaning (max) — 23; minimal meaning (min) — 1; range (R) — 22; mode (Mo) — 2; median (Md) — 8,0; mean (Ẋ) — 5,39; standard deviation (Ϭ) — 3,38; coefficient of variation (ν) — 0,6279; standard error (Sẋ) — 0,2409; measurement error (ε) — 0,0877. 3.7. Analysis of the part of the text that belongs to the drama character of "Joan" Joan: the whole ST data: 1 — 43 (84,31%); 2 — 7 (13,73%); 3 — 1 (1,96%);. Joan: the whole TT data: 3 — 16 (16,49%); 4 — 16 (16,49%); 1 — 13 (13,40%); 5 — 12 (12,37%); 2 — 10 (10,31%); 7 — 10 (10,31%); 6 — 6 (6,19%); 9 — 4 (4,12%); 8 — 3 (3,09%); 12 — 3 (3,09%); 11 — 1 (1,03%); 14 — 1 (1,03%); 17 — 1 (1,03%); 18 — 1 (1,03%). Table 7 JOAN Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 51 97 max 3 18 min 1 1 R 2 17 Mo 1 3 Md 2 7,5 Ẋ 1,18 4,81 Ϭ 0,43 3,35 ν 0,3651 0,6958 Sẋ 0,0602 0,3402 ε 0,1002 0,1385 ST data numeric characteristic: Number of meanings (N) — 51; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,18; standard deviation (Ϭ) — 0,43; coefficient of variation (ν) — 0,3651; standard error (Sẋ) — 0,0602; measurement error (ε) — 0,1002. TT data numeric characteristic: Number of meanings (N) — 97; maximal meaning (max) — 18; minimal meaning (min) — 1; range (R) — 17; mode (Mo) — 3; median (Md) — 7,5; mean (Ẋ) — 4,81; standard deviation (Ϭ) — 3,35; coefficient of variation (ν) — 0,6958; standard error (Sẋ) — 0,3402; measurement error (ε) — 0,1385. 3.8. Analysis of the part of the text that belongs to the drama character of "Michael" Michael: the whole ST data: 1 — 34 (91,89%); 2 — 3 (8,11%);. Michael: the whole TT data: 4 — 11 (20,00%); 5 — 10 (18,18%); 3 — 7 (12,73%); 7 — 7 (12,73%); 6 — 6 (10,91%); 2 — 5 (9,09%); 12 — 3 (5,45%); 1 — 2 (3,64%); 8 — 2 (3,64%); 10 — 2 (3,64%). Table 8 MICHAEL Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 37 55 max 2 12 min 1 1 R 1 11 Mo 1 4 Md 1,5 5,5 Ẋ 1,08 5,16 Ϭ 0,27 2,57 ν 0,2525 0,4979 Sẋ 0,0449 0,3467 ε 0,0814 0,1316 ST data numeric characteristic: Number of meanings (N) — 37; maximal meaning (max) — 2; minimal meaning (min) — 1; range (R) — 1; mode (Mo) — 1; median (Md) — 1,5; mean (Ẋ) — 1,08; standard deviation (Ϭ) — 0,27; coefficient of variation (ν) — 0,2525; standard error (Sẋ) — 0,0449; measurement error (ε) — 0,0814. TT data numeric characteristic: Number of meanings (N) — 55; maximal meaning (max) — 12; minimal meaning (min) — 1; range (R) — 11; mode (Mo) — 4; median (Md) — 5,5; mean (Ẋ) — 5,16; standard deviation (Ϭ) — 2,57; coefficient of variation (ν) — 0,4979; standard error (Sẋ) — 0,3467; measurement error (ε) — 0,1316. 3.9. Analysis of the part of the text that belongs to the drama character of "Rachel" Rachel: the whole ST data: 1 — 53 (91,38%); 2 — 5 (8,62%). Rachel: the whole TT data: 4 — 18 (15,00%); 5 — 14 (11,67%); 7 — 14 (11,67%); 3 — 12 (10,00%); 2 — 11 (9,17%); 6 — 11 (9,17%); 1 — 10 (8,33%); 8 — 5 (4,17%); 9 — 4 (3,33%); 11 — 4 (3,33%); 10 — 3 (2,50%); 12 — 3 (2,50%); 13 — 3 (2,50%); 14 — 3 (2,50%); 16 — 3 (2,50%); 15 — 1 (0,83%); 20 — 1 (0,83%). Table 9 RACHEL Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 58 120 max 2 20 min 1 1 R 1 19 Mo 1 4 Md 1,5 9 Ẋ 1,09 6,03 Ϭ 0,28 3,94 ν 0,2584 0,6529 Sẋ 0,0369 0,3596 ε 0,0665 0,1168 ST data numeric characteristic: Number of meanings (N) — 58; maximal meaning (max) — 2; minimal meaning (min) — 1; range (R) — 1; mode (Mo) — 1; median (Md) — 1,5; mean (Ẋ) — 1,09; standard deviation (Ϭ) — 0,28; coefficient of variation (ν) — 0,2584; standard error (Sẋ) — 0,0369; measurement error (ε) — 0,0665. TT data numeric characteristic: Number of meanings (N) — 120; maximal meaning (max) — 20; minimal meaning (min) — 1; range (R) — 19; mode (Mo) — 4; median (Md) — 9,0; mean (Ẋ) — 6,03; standard deviation (Ϭ) — 3,94; coefficient of variation (ν) — 0,6529; standard error (Sẋ) — 0,3596; measurement error (ε) — 0,1168. 3.10. Analysis of the part of the text that belongs to the drama character of "Percy" Percy: the whole ST data: 1 — 34 (80,95%); 2 — 7 (16,67%); 3 — 1 (2,38%). Percy: the whole TT data: 6 — 12 (16,44%); 3 — 11 (15,07%); 4 — 10 (13,70%); 5 — 7 (9,59%); 1 — 6 (8,22%); 2 — 5 (6,85%); 7 — 4 (5,48%); 8 — 4 (5,48%); 9 — 3 (4,11%); 11 — 3 (4,11%); 14 — 3 (4,11%); 10 — 2 (2,74%); 12 — 1 (1,37%); 18 — 1 (1,37%); 23 — 1 (1,37%). Table 10 PERCY Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 42 73 max 3 23 min 1 1 R 2 22 Mo 1 6 Md 2 8 Ẋ 1,21 5,9 Ϭ 0,46 4,04 ν 0,3827 0,6839 Sẋ 0,0717 0,4727 ε 0,1158 0,1569 ST data numeric characteristic: Number of meanings (N) — 42; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,21; standard deviation (Ϭ) — 0,46; coefficient of variation (ν) — 0,3827; standard error (Sẋ) — 0,0717; measurement error (ε) — 0,1158. TT data numeric characteristic: Number of meanings (N) — 73; maximal meaning (max) — 23; minimal meaning (min) — 1; range (R) — 22; mode (Mo) — 6; median (Md) — 8,0; mean (Ẋ) — 5,90; standard deviation (Ϭ) — 4,04; coefficient of variation (ν) — 0,6839; standard error (Sẋ) — 0,4726; measurement error (ε) — 0,1569. 3.11. Analysis of the part of the text that belongs to the drama character of "Al" Al: the whole ST data: 1 — 23 (74,19%); 2 — 6 (19,35%); 3 — 2 (6,45%). Al: the whole TT data: 6 — 11 (18,97%); 3 — 8 (13,79%); 4 — 7 (12,07%); 5 — 7 (12,07%); 1 — 6 (10,34%); 2 — 5 (8,62%); 7 — 3 (5,17%); 10 — 3 (5,17%); 9 — 2 (3,45%); 12 — 2 (3,45%); 8 — 1 (1,72%); 13 — 1 (1,72%); 14 — 1 (1,72%); 16 — 1 (1,72%). Table 11 AL Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 31 58 max 3 16 min 1 1 R 2 15 Mo 1 6 Md 2 7,5 Ẋ 1,32 5,4 Ϭ 0,59 3,41 ν 0,4457 0,6316 Sẋ 0,1059 0,4476 ε 0,1569 0,1626 ST data numeric characteristic: Number of meanings (N) — 31; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,32; standard deviation (Ϭ) — 0,59; coefficient of variation (ν) — 0,4457; standard error (Sẋ) — 0,1059; measurement error (ε) — 0,1569. TT data numeric characteristic: Number of meanings (N) — 58; maximal meaning (max) — 16; minimal meaning (min) — 1; range (R) — 15; mode (Mo) — 6; median (Md) — 7,5; mean (Ẋ) — 5,40; standard deviation (Ϭ) — 3,41; coefficient of variation (ν) — 0,6316; standard error (Sẋ) — 0,4476; measurement error (ε) — 0,1626. 3.12. Analysis of the part of the text that belongs to the drama character of "Nurse Wilson" Nurse Wilson: the whole ST data: 1 — 42 (87,50%); 2 — 5 (10,42%); 3 — 1 (2,08%);. Nurse Wilson: the whole TT data: 3 — 10 (13,16%); 4 — 10 (13,16%); 1 — 9 (11,84%); 5 — 9 (11,84%); 2 — 7 (9,21%); 6 — 6 (7,89%); 7 — 6 (7,89%); 12 — 4 (5,26%); 8 — 3 (3,95%); 9 — 3 (3,95%); 11 — 2 (2,63%); 13 — 2 (2,63%); 18 — 2 (2,63%); 10 — 1 (1,32%); 17 — 1 (1,32%); 23 — 1 (1,32%). Table 12 NURSE WILSON Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 48 76 max 3 23 min 1 22 R 2 3 Mo 1 3 Md 2 8,5 Ẋ 1,15 5,91 Ϭ 0,41 4,48 ν 0,3558 0,7586 Sẋ 0,0588 0,5141 ε 0,1007 0,1705 ST data numeric characteristic: Number of meanings (N) — 48; maximal meaning (max) — 3; minimal meaning (min) — 1; range (R) — 2; mode (Mo) — 1; median (Md) — 2,0; mean (Ẋ) — 1,15; standard deviation (Ϭ) — 0,41; coefficient of variation (ν) — 0,3558; standard error (Sẋ) — 0,0588; measurement error (ε) — 0,1007. TT data numeric characteristic: Number of meanings (N) — 76; maximal meaning (max) — 23; minimal meaning (min) — 1; range (R) — 22; mode (Mo) — 3; median (Md) — 8,5; mean (Ẋ) — 5,91; standard deviation (Ϭ) — 4,48; coefficient of variation (ν) — 0,7586; standard error (Sẋ) — 0,5141; measurement error (ε) — 0,1705. 3.13. Analysis of the part of the text that belongs to the drama character of "Lilian" Lilian: the whole ST data: 1 — 68 (91,89%); 2 — 6 (8,11%);. Lilian: the whole TT data: 2 — 23 (14,94%); 4 — 19 (12,34%); 3 — 18 (11,69%); 5 — 17 (11,04%); 1 — 14 (9,09%); 6 — 13 (8,44%); 10 — 12 (7,79%); 8 — 10 (6,49%); 7 — 9 (5,84%); 9 — 7 (4,55%); 11 — 3 (1,95%); 13 — 2 (1,30%); 16 — 2 (1,30%); 17 — 2 (1,30%); 12 — 1 (0,65%); 14 — 1 (0,65%); 18 — 1 (0,65%). Table 13 LILIAN Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 74 154 max 2 18 min 1 1 R 1 17 Mo 1 2 Md 1,5 9 Ẋ 1,08 5,51 Ϭ 0,27 3,69 ν 0,2525 0,6704 Sẋ 0,0317 0,2975 ε 0,0575 0,1059 ST data numeric characteristic: Number of meanings (N) — 74; maximal meaning (max) — 2; minimal meaning (min) — 1; range (R) — 1; mode (Mo) — 1; median (Md) — 1,5; mean (Ẋ) — 1,08; standard deviation (Ϭ) — 0,27; coefficient of variation (ν) — 0,2525; standard error (Sẋ) — 0,0317; measurement error (ε) — 0,0575. TT data numeric characteristic: Number of meanings (N) — 154; maximal meaning (max) — 18; minimal meaning (min) — 1; range (R) — 17; mode (Mo) — 2; median (Md) — 9,0; mean (Ẋ) — 5,51; standard deviation (Ϭ) — 3,69; coefficient of variation (ν) — 0,6704; standard error (Sẋ) — 0,2975; measurement error (ε) — 0,1059. 3.14. Analysis of the part of the text that belongs to the secondary drama characters 3.14.1. Character "One" One: the whole ST data: 1 — 1 (100,00%). One: the whole TT data: 4 — 2 (40,00%); 3 — 1 (20,00%); 5 — 1 (20,00%); 6 — 1 (20,00%). Table 14 ONE Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 1 5 max 1 6 min 1 3 R 0 3 Mo 1 4 Md 1 4,5 Ẋ 1 4,4 Ϭ 0 1,02 ν 0 0,2318 Sẋ 0 0,4561 ε 0 0,2032 ST data numeric characteristic: Number of meanings (N) — 1; maximal meaning (max) — 1; minimal meaning (min) — 1; range (R) — 0; mode (Mo) — 1; median (Md) — 1,0; mean (Ẋ) — 1,00; standard deviation (Ϭ) — 0,00; coefficient of variation (ν) — 0,0000; standard error (Sẋ) — 0,0000; measurement error (ε) — 0,0000. TT data numeric characteristic: Number of meanings (N) — 5; maximal meaning (max) — 6; minimal meaning (min) — 3; range (R) — 3; mode (Mo) — 4; median (Md) — 4,5; mean (Ẋ) — 4,40; standard deviation (Ϭ) — 1,02; coefficient of variation (ν) — 0,2318; standard error (Sẋ) — 0,4561; measurement error (ε) — 0,2032. 3.14.2. Character "Two" Two: the whole ST data: 1 — 2 (66,67%); 2 — 1 (33,33%). Two: the whole TT data: 4 — 2 (33,33%); 8 — 2 (33,33%); 5 — 1 (16,67%); 10 — 1 (16,67%). Table 15 TWO Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 3 6 max 2 10 min 1 4 R 1 6 Mo 1 4 Md 1,5 6,5 Ẋ 1,33 6,5 Ϭ 0,47 2,29 ν 0,3536 0,3525 Sẋ 0,2722 0,9354 ε 0,4001 0,2821 ST data numeric characteristic: Number of meanings (N) — 3; maximal meaning (max) — 2; minimal meaning (min) — 1; range (R) — 1; mode (Mo) — 1; median (Md) — 1,5; mean (Ẋ) — 1,33; standard deviation (Ϭ) — 0,47; coefficient of variation (ν) — 0,3536; standard error (Sẋ) — 0,2722; measurement error (ε) — 0,4001. TT data numeric characteristic: Number of meanings (N) — 6; maximal meaning (max) — 10; minimal meaning (min) — 4; range (R) — 6; mode (Mo) — 4; median (Md) — 6,5; mean (Ẋ) — 6,50; standard deviation (Ϭ) — 2,29; coefficient of variation (ν) — 0,3525; standard error (Sẋ) — 0,9354; measurement error (ε) — 0,2821. 3.14.3. Character "Three" Three: the whole ST data: 1 — 4 (80,00%); 2 — 1 (20,00%). Three: the whole TT data: 4 — 2 (40,00%); 3 — 1 (20,00%); 6 — 1 (20,00%); 8 — 1 (20,00%). Table 16 THREE Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 5 5 max 2 8 min 1 3 R 1 5 Mo 1 4 Md 1,5 5 Ẋ 1,2 5 Ϭ 0,4 1,79 ν 0,3333 0,3578 Sẋ 0,1789 0,8 ε 0,2922 0,3136 ST data numeric characteristic: Number of meanings (N) — 5; maximal meaning (max) — 2; minimal meaning (min) — 1; range (R) — 1; mode (Mo) — 1; median (Md) — 1,5; mean (Ẋ) — 1,20; standard deviation (Ϭ) — 0,40; coefficient of variation (ν) — 0,3333; standard error (Sẋ) — 0,1789; measurement error (ε) — 0,2922. TT data numeric characteristic: Number of meanings (N) — 5; maximal meaning (max) — 8; minimal meaning (min) — 3; range (R) — 5; mode (Mo) — 4; median (Md) — 5,0; mean (Ẋ) — 5,00; standard deviation (Ϭ) — 1,79; coefficient of variation (ν) — 0,3578; standard error (Sẋ) — 0,8000; measurement error (ε) — 0,3136. 3.14.4. Character "Four" Four: the whole ST data: 1 — 2 (100,00%). Four: the whole TT data: 4 — 2 (50,00%); 1 — 1 (25,00%); 2 — 1 (25,00%). Table 17 FOUR Numeric characteristics of word distribution within the sentence of the drama Unit ST TT N 2 4 max 1 4 min 1 1 R 0 3 Mo 1 4 Md 1 2 Ẋ 1 2,75 Ϭ 0 1,3 ν 0 0,4724 Sẋ 0 0,6495 ε 0 0,4629 ST data numeric characteristic: Number of meanings (N) — 2; maximal meaning (max) — 1; minimal meaning (min) — 1; range (R) — 0; mode (Mo) — 1; median (Md) — 1,0; mean (Ẋ) — 1,00; standard deviation (Ϭ) — 0,00; coefficient of variation (ν) — 0,0000; standard error (Sẋ) — 0,0000; measurement error (ε) — 0,0000. TT data numeric characteristic: Number of meanings (N) — 4; maximal meaning (max) — 4; minimal meaning (min) — 1; range (R) — 3; mode (Mo) — 4; median (Md) — 2,0; mean (Ẋ) — 2,75; standard deviation (Ϭ) — 1,30; coefficient of variation (ν) — 0,4724; standard error (Sẋ) — 0,6495; measurement error (ε) — 0,4629. 4. Comparative analysis of word distribution in sentences 4.1. Difference in the Number of meanings (N) in ST and TT Given form the results above that the translated variant statistical parameters data exceeds the original drama in the majority of cases, we now turn our sights to one parameter – Number of meanings (N). We tend to compare the data and find the difference (if present). Our assumption 1 is that the TT is much longer in terms of word usage within the sentence. Character’s name ST TT Difference Charlotte 64 113 +49 Rodney 41 76 +35 Man 261 535 +274 Leonard 106 207 +101 Jennifer 25 26 +1 Marshal 110 197 +87 Joan 51 97 +46 Michael 37 55 +18 Rachel 58 120 +62 Percy 42 73 +31 Al 31 58 +27 Nurse Wilson 48 76 +28 Lilian 74 154 +80 One 1 5 +4 Two 3 6 +3 Three 5 5 0 Four 2 4 +2 Figure 1: Comparative statistics of Number of meanings (N) in ST and TT To recall, character "Man" is the protagonist and the main character of the play. He is a well- dressed gentleman who is willing to jump off the seventh story. He has a number of conversations with the residents of the building. He feels lost and compelled to stand on the seventh story of the building. Taking into account the results of Figure 1 we hold assumption 2 that the translator adds a considerable number of words (274), or, he rather, doubles the ST quantity, due to a number of reasons: to explain the original; to compensate literary imagery losses; to add something from the translator himself, to recreate, so to say, the original; due to structural and lexico-gramatical allomorphic features of a language pair. Whatever reason stands behind this translator’s decision-making, it is a prosperous ground for further Translation Studies analysis. 4.2. Analysis of the whole text Here we focus on statistical parameters with the defined unit of measurement – a word. The number of words in a drama text utterunces is important due to a couple of reasons: the length of lines of the written script; chronometry and metrics of the whole drama act; pithiness and iconicity of each phrase. Below are the results on the distribution of the number of words in a TT sentence by absolute and relevant frequency. The most frequent are sentences in the translated text with the number of words 4 – 259 (14,2%), 1 – 255(13,98%), 3– 255(13,98%), 2 – 219(12,01%) 5 – 205(11,24%), 6 -182 (9,98), 7-121 (6,63%), 8 – 84(4,61), 9- 62 (3,4%), 10 – 49 (2,69%), 11 – 32 (1,75), 12 – 31 (1,7%), 14 – 14 (0,77%), 13 – 13 (0,71%), 16 – 13(0,71%), 17 – 9 (0,49%), 18 – 7 (0,38%), 15 – 6 (0,33%), 23 – 4 (0,22%), 19 – 2 (0,11%), 20 – 1 (0,05%), 27 – 1 (0,05%). The last two results are the least frequent. In the following Figure 2 we can see a comparison of the number of words in the sentences of the whole TT drama work. The x-axis is the number of sentences, and the y-axis is the number of words in a sentence. Figure 2: Number of words in sentences of TT 5. Conclusions The main advances of statistical linguistics have been retrieved in the article. The original Canadian play has been compared with the corresponding translated text in terms of statistical parameters, which has never been done before. The paper is of practical and applied value; however, the scientific value of the paper is seen as such that the suggested approach and methods will eventually allow formulating and substantiating a plausible scientific hypothesis in the realm of statistical linguistics and translation studies. At this point it is proven that bilingual drama texts are well adoptable for NLP and reveal promising outcomes. We have verified absolute and relevant distribution, probability measurement, also: N, max, min, R, Mo, Md, Ẋ, Ϭ, ν, Sẋ, ε in the sentences of both texts. Specifically designed software, which is represented as a combination of XML markup language, Microsoft Excel spreadsheet, and Python programming language, has been used. Results of statistical calculations of the drama ―7 stories‖ by Morris Panych by unit of measure word are presented in the corresponding Tables 1 – 17. Structural recognition provides useful information about the characters of the play, original and translation, namely the length of the sentence in word units that will help with further comparisons of ST and TT. The quantitative characteristics of the original play and its Ukrainian translation on the lexical level relying on the linguistic statistical analysis have been clarified: the amount of translated text Numbers of meaning (N) exceeds considerably and demands further analysis. The discrepancy becomes obvious with number of characters (Man, Leonard, Marshal, Lilian) The correlation of coefficients has been presented in tables and figures to illustrate the material under research. The prospect of the study is to further explore the problems of translator’s meaningful choices which resulted in the declared above data. 6. Acknowledgement The project has been carried out within the complex academic topic ―Application of modern technologies for optimization of information processes in natural language‖ at Lviv Polytechnic National University. At the initial stage the project underwent the consultancy of Ihor Kulchytskyy, to whom we express our gratitude. 7. References [1] M. Panych, Seven Stories, Vancouver: Talonbooks, 2013. [2] M. Panych, 7 istorii, [per. Z anhliiskoi Ivana Krychfalushiia], Brusturiv: Dyskursus, 2014. [3] S. Laviosa (Ed.), Corpus-based Translation Studies: Theory, Findings, Applications, Rodopy, 2002. [4] K. H. Chen, and H. H. Chen, Aligning bilingual corpora especially for language pairs from different families. Information Sciences Applications, 1995, 42, pp. 57–81. [5] J. Munday, A Computer-assisted approach to the Analysis of Translation Shifts, Meta, 1998, XLIII, 4. [6] F. Zanettin, Parallel corpora in translation studies: Issues in corpus design and analysis. In Intercultural Faultlines. Research Models in Translation Studies I: Textual and Cognitive Aspects, ed. M. Olohan, pp. 105–118. Manchester: St. Jerome, 2000. [7] J. Allen, Natural Language Understanding. Cummings Publishing Company, Redwood City, 1995. [8] D. Barnard, et al. ―SGML-Based Markup for Literary Texts: Two Problems and Some Solutions.‖ Computers and the Humanities, vol. 22, no. 4, 1988, pp. 265–276. JSTOR, URL: www.jstor.org/stable/30200136. Accessed 28 Feb. 2021. [9] P. Blackburn, J. Bos, M. Kohlhase, & H. De Nivelle, Inference and computational semantics. In Computing Meaning, Springer Netherlands, 2001, pp. 11–28. [10] I. Dagan, and O. Glickman, Probabilistic textual entailment: generic applied modeling of language variability. In Proceedings of the PACAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble, France, 2004, pp. 26–29. [11] R. Dale, H. Moisl, H. Somers (Eds.), Handbook of natural language processing. CRC press, 2000. [12] M. Dilai, O. Levchenko, Discourses Surrounding Feminism in Ukraine: A Sentiment Analysis of Twitter Data 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2018 – Proceedings 2018 | conference-paper doi: 10.1109/STC-CSIT.2018.8526694 [13] A. Hogan, The Web of Data. Springer, 2020. [14] V. Lytvyn, V. Vysotska, T. Hamon, N. Grabar, N. Sharonova, O. Cherednichenko, O. Kanishcheva (Eds.), Computational Linguistics and Intelligent Systems. Proc. 4thInt. Conf. COLINS 2020. Volume I:Workshop. Lviv, Ukraine, April 23-24, 2020, CEUR- WS.org, online [15] M. Marcus, B. Santorini, M. Marcinkiewicz, Building a Large Annotated Corpus of English: Penn TreeBank. Computational linguistics: Special Issue on Using Large Corpora, 1993, 19(2), pp. 313–330. [16] C. Matthews, An Introduction to Natural Language Processing Through Prolog, Routledge: London and New York, 2014. [17] M. Oakes, Sentence and word alignment in the CARTER project. In Using Corpora for Language Research, ed. J. Thomas, and M. Short, London: Longman, 1996, pp. 211–233. [18] P. Pavis, Theatre at the Crossroads of Culture, Routledge, 1992. [19] S. Bassnett, Translating for the Theatre: The Case Against Performability. TTR : traduction, terminologie, rédaction, 1991, 4(1), pp. 99–111. URL: https://doi.org/10.7202/037084ar. [20] S. Bassnett, Still Trapped in the Labyrinth: Further Reflections on Translation and Theatre, Constructing Cultures: Essays on Literary Translation.-Multilingual Matters, 1998, pp. 90–108. [21] T.H. Howard-Hill, Modern Textual Theories and the Editing of Plays. The Library, 6th ser., 1989, 11, pp. 89–115. [22] M. Issacharoff, F. Robin Jones (Eds.), Performing Texts. Philadelphia: University of Pennsylvania Press, 1988. [23] J. Lavagnino, E. Mylonas, The show must go on: Problems of tagging performance texts. Comput Hum, 1995, pp. 113–121. URL: https://doi.org/10.1007/BF01830705 [24] Corpus-based Language Studies: An Advanced Resource Book, ed. T. McEnery, R. Xiao, Y. Tono, Routledge, 2006. [25] N. Dershowitz, E. Nissan (Eds.), Language, Culture, Computation: Computing for the Humanities, Law and Narratives. Springer, 2014. [26] O. Levchenko, O. Tyshchenko and M. Dilai. Associative Verbal Network of the Conceptual Domain БІДА (MISERY) in Ukrainian. Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Systems (COLINS 2020). Volume I: Main Conference. URL: http://ceur-ws.org/Vol-2604/Associative Verbal Network of the Conceptual Domain БІДА (MISERY) in Ukrainian [27] N. Shakhovska, and M. Medykovskyy (Eds), Advances in Intelligent Systems and Computing III: Selected papers from the International Conference on Computer Science and Information Technologies, CSIT 2018, September 11–14 Lviv, Ukraine. Springer: Springer Nature Switzerland, 2019. [28] C.M. Sperberg-McQueen, Text in the Electronic Age: Textual Study and Text Encoding, with Examples from Medieval texts. Literary and Linguistic Computing, 6 (1991), pp.34–46. [29] C.M. Sperberg-McQueen, and B. Lou (Eds.), Guidelines for Electronic Text Encoding and Interchange (TEI P3). Chicago and Oxford: Text Encoding Initiative, 1994. [30] S. Shaheen, and M. Spruit. Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation. 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2017, doi:10.1109/dsaa.2017.61. [31] Topic Modeling in Python with Gensim. Machine Learning Plus, 16 Apr. 2020, URL: www.machinelearningplus.com/nlp/topic-modeling-gensim-python. [32] K. Aguilar, NLP Techniques with Shakespeare’s Plays: Cleaning and Classifying Text with the Bard, 2020. URL: https://medium.com/analytics-vidhya/nlp-techniques-with- shakespeares-plays-d8843ba26a4f. [33] O. Levchenko, M. Dilai, (2019) Attitudes Toward Feminism in Ukraine: A Sentiment Analysis of Tweets. In: Shakhovska N., Medykovskyy M. (eds) Advances in Intelligent Systems and Computing III. CSIT 2018. Advances in Intelligent Systems and Computing, vol 871. Springer, Cham. doi:10.1007/978-3-030-01069-0_9 RODNEY MY GOLD CARD is in there!!