<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The chi-square test and the Student's t-test used for authorial style characterization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vasyl Teslyuk</string-name>
          <email>vasyl.m.teslyuk@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Khomytska</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iryna Bazylevych</string-name>
          <email>i_bazylevych@yahoo.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentyna Holtvian</string-name>
          <email>valentyna.i.holtvian@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olena Durytska</string-name>
          <email>olena.durytska@lnu.edu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ivan Franko National University of Lviv</institution>
          ,
          <addr-line>Lviv, 79000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this research, we combine two classical statistical tests for author identification - the chisquare test and the Student's t-test. Application of these statistical tests for analysis of distribution of parts of speech is the novelty of the research. The research was conducted on the material of the belles-lettres and scientific styles. The research has proved that the chosen statistical tests give good results for determining the specificity of parts of speech distribution and phoneme distribution. The results of our research allow us to identify the style differentiating capability of each part of speech. Authors and styles are differentiated by the parts of speech which ensure statistically significant results. The calculations were carried out in Java. The structure of the developed software is based on the modular principle. The test validity of the obtained results is 95%. The results can be applied in authorship attribution.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Chi-square test</kwd>
        <kwd>Student's t-test</kwd>
        <kwd>Distribution of parts of speech</kwd>
        <kwd>Phoneme distribution</kwd>
        <kwd>Belleslettres style</kwd>
        <kwd>Scientific style</kwd>
        <kwd>Authorship attribution 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>the patterns the author follows in the manner of writing. In most cases, researchers analyze
the author’s word stock, the distribution of the most frequently and the least frequently
used words. However, here, we deal with the syntactic and the phonological levels. It is
expedient to analyze the distribution of parts of speech and phonemes in the researched
text. The difference between the authorial styles is the difference between the individual
patterns used by the authors. The difference is established by various methods and
techniques. The most efficient are those that ensure high level of test validity (95% – 99%).
However, 95% test validity is considered classical and is applied in most cases. Powerful
classical statistical tests (the Student’s t-test, the chi-square test, the Lehmann-Rosenblatt
test, the Wilcoxon test), allow us to obtain the results with high accuracy. The data
clustering and the discriminant analysis give also good results. The statistical tests can be
checked for efficiency on the phonological, lexical and syntactic levels. The reliability of the
results can be enhanced by the use of several tests. The purpose of this research is to prove
that the chi-square test and the Student’s t-test are efficient statistical tests to differentiate
texts by parts of speech distribution and phoneme distribution. The text differentiation by
parts of speech distribution is a novel approach of this research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        The analysis of recent research has shown that the machine learning and classical methods
are often applied for authorship attribution. In most cases, the content of the researched
texts is emotionally colored [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Thus, an attempt was made to detect aggression in social
media using the deep learning models. The models were tested on the Cyber-Troll dataset
and gave the result – F1 score of 97% [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Convolutional neural networks (CNN) gave good
results for author identification. The applied algorithm of this research was classical [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For
fake news detection, the use of feature stacking gave the results of 93.39%. In the research,
random forest and extra tree models were used for bagging [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. The textual semantic
analysis of the Reddit statements was conducted with the help of the software toolbox
LIWC-22 (Linguistic Inquiry and Word Count). On the basis of the analysis, two cognitive
sub-models with linguistic psychological and social apprehension were developed [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
individual authorial conceptualization was characterised by the quantitative markers [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
An intellectual analysis system aimed at determining the text authorship attribution
probability for Ukrainian-language artistic works was developed [
        <xref ref-type="bibr" rid="ref8">8 – 10</xref>
        ]. For Ukrainian
tweets analysis, algorithms using Levenstein distance, that is fuzz sort and fuzz set ensured
good results. The best result is fingerprint similarity reaching 70% [11]. The research
presented in this paper, has proved that the chi-square test and the Student’s t-test are
powerful statistical tests for texts differentiation by parts of speech distribution and
phoneme distribution. Statistically significant results have been obtained with a high level
of test validity – 95%. Consequently, the results are reliable and may be used for further
research or practically applied in author identification.
      </p>
      <p>1. Choose the texts from J. K. Rowling’s creation.</p>
      <p>Choose the texts from K. Ashley’s creation.</p>
      <p>Determine the most frequently used parts of speech for each author.</p>
      <p>Let the sample size be equal for the texts compared.
5. Calculate the absolute, mean and relative frequency of occurrence of parts of speech
and phonemes for the two samples.
6. Use the Pearson’s normality test for two samples:</p>
      <p>̂2 = ∑ −1
(  −   )2,</p>
      <p>where  is a number of intervals [14 – 16].</p>
      <p>Use the Student’s t-test:
(1)
(2)
(3)</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods and software</title>
      <sec id="sec-3-1">
        <title>3.1. The proposed combination of methods</title>
        <p>In this research, we combine the chi-square test and the Student’s t-test. The two tests were
used in our previous research in different combinations: with the Lehmann-Rosenblatt test,
the Wilcoxon test, the data clustering and the discriminant analysis [12, 13]. The tests were
efficient in each combination. The algorithm of text differentiation in this research is the
given below.</p>
        <p>2.
3.</p>
        <p>Module of forming samples of parts of speech.</p>
        <p>Module of determining the most frequently used parts of speech.</p>
        <p>Module of calculating the relative frequencies of occurrence of parts of speech.
Module of forming samples of English phonemes.</p>
        <p>Module of calculating the mean frequencies of occurrence of phonemes.</p>
        <p>Module of carrying out the Pearson’s test
8. Module of carrying out the Student’s t-test.</p>
        <p>= ( ̄ −  ̄)/ √

 + ≥   ;( + −2),
where  ̄ and  ̄are the values of mean frequencies of occurrence of parts of speech and
phoneme groups for the two samples  and  [17 – 19].</p>
        <p>Use the chj-square test:</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. The developed software</title>
        <p>1. Module of data input.</p>
        <p>The text differentiation program is developed on the Java programming language [20]. The
structure of the program is based on the modular principle and consists of the following
modules:
9. Module of carrying out the chi-square test.
10. Module of data output.</p>
        <p>The software has the following structure of classes: Main, SampleProcessor,
PartsOfSpeechProcessor, PhonemeProcessor, PartsOfSpeechUtils, PhonemeUtils,
StatisticProcessor.</p>
        <p>The researched text files are downloaded in the class Main.</p>
        <p>The texts are transcribed in the class SampleProcessor.</p>
        <p>The samples of parts of speech are formed in the class PartsOfSpeechProcessor.
The samples of phonemes are formed in the class PhonemeProcessor.</p>
        <p>The relative frequencies of occurrence of word combinations are calculated in the class
PartsOfSpeechUtils.</p>
        <p>The mean frequencies of occurrence of phonemes are calculated in the class
PhonemeUtils.</p>
        <p>The Pearson’s test, the Student’s t-test and the chi-square test are carried out in the class
StatisticProcessor.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results of the study</title>
      <p>"RBR", # Adverb, comparative
"RBS", # Adverb, superlative
"RP", # Particle
"SYM", # Symbol
"TO", # to
"UH", # Interjection
"VB", # Verb, base form
"VBD", # Verb, past tense
"VBG", # Verb, gerund or present participle
"VBN", # Verb, past participle
"VBP", # Verb, non-3rd person singular present
"VBZ", # Verb, 3rd person singular present
"WDT", # Wh-determiner
"WP", # Wh-pronoun
"WP$", # Possessive wh-pronoun
"WRB" # Wh-adverb</p>
      <p>In Figure 1, we present a fragment of the tagged text “Harry Potter and the Philosopher’s
Stone” by J. K. Rowling</p>
      <p>For calculations, the two samples were used.</p>
      <p>For Harry Potter and the Philosopher’s Stone” by J. K. Rowling:
111, 1182, 22, 0, 1350, 599, 34, 15, 9, 272, 1302, 577, 355, 5, 15, 78, 1282, 302, 849, 4, 1,
120, 0, 260, 5, 532, 942, 157, 279, 233, 113, 53, 44, 0, 82.</p>
      <p>For “Sebring” by K. Ashley:
145, 979, 15, 0, 1159, 422, 35, 8, 0, 351, 1202, 548, 407, 1, 11, 6, 1710, 448, 759, 4, 2, 99,
0, 517, 0, 847, 971, 132, 249, 268, 146, 63, 61, 0, 95.</p>
      <p>
        The application of the chi-square test has proved that the homogeneity hypothesis is
rejected and the differences between the compared texts are statistically significant:
por_zn=qchisq(0.95,34)
&gt; por_zn
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] 48.60237
      </p>
      <p>The style differentiation has been carried out by the Student’s t-test on the material of
Show’s drama and the scientific style (classical mechanics). Three cases of style
differentiation were considered: 1 – any position in the word; 2 – the beginning of the word;
3 – the end of the word. Statistically significant differences were obtained in position 1 for
all except for two groups of phonemes and in positions 2, 3 – for all except one group of
phonemes. The results prove the Student’s t-test efficiency. The data are given in Tables 1 –
3.</p>
      <p>In Tables 1 – 6, we use such designations: GP – the group of phonemes; SD – Show’s
drama; SC – the scientific style (classical mechanics); L – labials; D – dorsals; C – coronals; V
– velars; N – nasals; S – sonorous; F – fricatives; T – stops;  is the value of dispersion;  is
the Student’s statistic; 2 is the level of significance;  ̄is the mean value of frequencies of
phoneme groups;  (  −  ̄)2 is a sum of squares of difference of the value of middle of the
interval and the mean value of frequencies of phoneme groups,  ̄1 −  ̄2 is the value of
difference between the researched samples.
The results of the calculations for the comparison between Show’s drama and the scientific
style in an unidentified position</p>
      <p>In Table 1 (continuation), we see the style differentiating capability of groups of
phonemes. In the groups of dorsals, coronals, velars, sonorous and fricatives, the differences
between the researched texts are statistically significant.</p>
      <p>In Table 2, you can see the data of a sum of squares of difference of the value of middle
of the interval and the mean value of frequencies of phoneme groups for Show’s drama and
the scientific style in the position at the beginning of a word the end of a word.</p>
      <p>In Table 2 (continuation), we can see the essential differences revealed in the position at
the beginning of a word for the groups of labials, dorsals, coronals, velars, nasals, sonorous
and stops.</p>
      <p>In Table 3, we give the data of a sum of squares of difference of the value of middle of the
interval and the mean value of frequencies of phoneme groups for Show’s drama and the
scientific style in the position at the end of a word.</p>
      <p>In Table 3 (continuation), we see the style differentiating capability of the groups of
labials, dorsals, velars, sonorous, fricatives and stops for the comparison of Show’s drama
and the scientific style in the position at the end of a word.
The results of the calculations for the comparison between Show’s drama and the scientific
style at the beginning of a word
The essential differences between Show’s drama and the scientific style at the beginning of
style at the end of a word
The results of the calculations for the comparison between Show’s drama and the scientific
GP
L
D
C
V
N
S
F
T</p>
      <p>GP
L
D
C
V
N
S
F
T
18,5
125,8</p>
      <p>–
10,1
37,0
54,2
56,5
43,3</p>
      <p>2
The essential differences between Show’s drama and the scientific style at the end of a word
L
D
C
V
N
S
F
T</p>
      <p>GP
L
D
C
V
N
S
F</p>
      <p>T
GP
L
D
C
V
N
142,5</p>
      <p>–
15,4
35,2
60,7
49,3
74,4</p>
      <p>4,70
7,85
4,22
6,71
9,20
7,47
7,40
a word.
(continuation)).</p>
      <p>The results obtained for the comparison of Show’s drama and the scientific style have
shown that in three cases of phoneme’s position in a word the differences between the
compared texts are statistically significant for almost all groups of phonemes. Consequently,
the Student’s t-test is efficient for solving a text differentiation task. In another comparison,
we have obtained statistically significant differences between Byron’s emotive prose and
the scientific style. In Tables 4 – 6, we see the data for three cases of phoneme’s position in</p>
      <p>Byron’s emotive prose differs essentially from the scientific style in an unidentified
position for the groups of labials, dorsals, nasals, sonorous and fricatives (Table 4
The results of the calculations for the comparison between Byron’s emotive prose and the
scientific style in an unidentified position
194,1
10770,24
12960,31
186,8
211,1
unidentified position
The essential differences between Byron’s emotive prose and the scientific style in an</p>
      <p>In Table 5, you can see the data of a sum of squares of difference of the value of middle
of the interval and the mean value of frequencies of phoneme groups for Byron’s emotive
prose and the scientific style in the position at the beginning of a word.
The results of the calculations for the comparison between Byron’s emotive prose and the
scientific style at the beginning of a word
F
T
GP
L
D
C
V
N
S</p>
      <p>F
GP
L
D
C
V
N
S
F
T
12,5
17,79
3,50
8,69
10,11
14,10
15,05</p>
      <p>At the beginning of a word, statistically significant differences have been obtained for the
groups of labials, dorsals, velars, nasals, sonorous and fricatives (Table 5 (continuation)).
The essential differences between Byron’s emotive prose and the scientific style at the
5,29
4,96
1,91
0,00
3,08
2,79
4,54</p>
      <p>2
BE  (  −  ̄)2
1058,00
13225,44</p>
      <p>49,99
2169,56
382,39
2865,56
10265,51
1875,44</p>
      <p>In Table 6, we present the data of a sum of squares of difference of the value of middle of
the interval and the mean value of frequencies of phoneme groups for Byron’s emotive
prose and the scientific style in the position at the end of a word.
The results of the calculations for the comparison between Byron’s emotive prose and the
scientific style at the end of a word</p>
      <p>Byron’s emotive prose differs essentially from the scientific style in the case of the end
of a word for the groups of labials, dorsals, nasals, sonorous, fricatives and stops (Table 6
V
N
S
F
The essential differences between Byron’s emotive prose and the scientific style at the end
1,34
7,60
3,04
7,69
14,38
9,17
0,59
4,97
4,93
9,21
3,81
0,82</p>
      <p>7,41
3,39
1,58
3,56
2,62
2,90</p>
      <p>50%
2</p>
      <p>In this research, the Student’s t-test is efficient for style differentiation. Statistically
significant differences have been revealed in comparisons of the belles-lettres style (Show’s
drama; Byron’s emotive prose) and the scientific style (classical mechanics) for the three
cases of phoneme’s position in a word.</p>
      <p>The analysis of the results obtained by the chi-square test in this research, has shown
that this test is efficient for authorship attribution on the syntactic level. The Student’s t-test
has given good results on the phonological level for style differentiation. The results have
been obtained with the test validity of 95%.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussions</title>
      <p>The chi-square test in this research has been used on the syntactic level for author
identification. In our previous research, we used the test on the phonological and
lexicalsemantic levels [12, 13]. The test was efficient on these levels. In this paper, we have proved
efficiency of the chi-square test on the syntactic level. Consequently, the chi-square test
ensures reliable data (the level of test validity – 95%) on the phonological, lexical-semantic
and syntactic levels.</p>
      <p>The Student’s t-test in this research has been used for style differentiation. The results
of testing have shown statistically significant differences between the belles-lettres style
(Shaw’s drama, Byron’s emotive prose) and the scientific style (classical mechanics). The
level of test validity is 95%.</p>
      <p>
        According to the analysis of similar research, the authorial style was identified by deep
learning models in an attempt to detect aggression in social media. The models were tested
on the Cyber-Troll dataset and ensured the result – F1 score of 97% [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In another research,
the random forest and extra tree models were used for fake news detection. The use of
feature stacking gave the results of 93.39%. [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. The algorithms, using Levenstein
distance for Ukrainian tweets analysis, ensured reliable results. The best result is
fingerprint similarity – 70% [9].
      </p>
      <p>Having analyzed the results obtained in our research with the help of the chi-square test
and the Student’s t-test, we can state that this combination of tests is efficient for style
differentiation and author identification on three language levels: the phonological,
lexicalsemantic and syntactic. As the test validity of the results is high – 95%, it is recommended
to apply this combination of tests for solving the tasks of authorship attribution.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>It is topical in modern research to propose a new approach to the authorial style
identification. The novelty of the research is an application of the chi-square test and for
analysis of distribution of parts of speech on the material of American emotive prose.</p>
      <p>The chi-square test was performed on the material of the belles-lettres style (“Harry
Potter and the Philosopher’s Stone” by J. K. Rowling and “Sebring” by K. Ashley). For text
differentiation, the two texts were tagged by parts of speech (POS) in natural language
processing (NLP). The task of an authorial style differentiation has been solved with a level
of test validity of 95%.</p>
      <p>The Student’s t-test was performed on the material of the belles-lettres style (Show’s
drama, Byron’s emotive prose) and the scientific style (classical mechanics). Statistically
significant differences were obtained in three cases of style differentiation: 1 – any position
in the word; 2 – the beginning of the word; 3 – the end of the word. The style differentiating
capability of phoneme groups (labials, dorsals, coronals, velars, nasals, sonorous, fricatives
and stops) was revealed in position 1 for all except for two groups of phonemes and in
positions 2, 3 for all except one group of phonemes. The results prove the Student’s t-test
efficiency. The calculations were carried out in Java. The structure of the developed
software is based on the modular principle. The test validity of the obtained results is 95%.</p>
      <p>The goal of this research has been attained. The research has proved that the chi-square
test and the Student’s t-test are efficient statistical tests to differentiate texts by parts of
speech distribution and phoneme distribution.</p>
      <p>The practical application of this research involves the author identification and style
differentiation. In our future research, we will choose some other syntactic features for
authorial styles differentiation.
Authorship Attribution Probability, in: Proceedings of the 18th International Scientific
and Technical Conference on Computer Sciences and Information Technologies,
CSIT 2023, Lviv, Ukraine, 19-21 October, 2023,
Doi:10.1109/CSIT61576.2023.10324012.
[9] M. Kestemont, M. Tschuggnall, E. Stamatatos, W. Daelemans, G. Specht, B. Stein, M.</p>
      <p>Potthast, Overview of the author identification task at PAN-2018: cross-domain
authorship attribution and style change detection. In Working Notes Papers of the CLEF
2018 Evaluation Labs. CEUR Workshop Proceedings, vol. 2125, 2018, pp. 1–25.
[10] Hou, R., &amp; Huang, C.-R. (2020). Robust stylometric analysis and author attribution
based on tones and rimes. Natural Language Engineering 26(1) 2020 49–71.
doi:10.1017/S135132491900010X.
[11] O. Prokipchuk, V. Vysotska, Ukrainian Language Tweets Analysis Technology for Public
Opinion Dynamics Change Prediction Based on Machine Learning. Radio Electronics,
Computer Science, Control 2 (2023) 103. doi: 10.15588/1607-3274-2023-2-11.
[12] I. Khomytska, V. Teslyuk, K. Prysyazhnyk, N. Hrytsiv, The Lehmann-Rosenblatt test
applied for determination of statistical parameters of Charles Dickens's authorial style,
in Proceedings of IEEE XVIth Scientific and Technical Conference on Computer Science
and Information Technologies. CSIT 2021, Lviv, Ukraine, 22–25 September, vol.
2, 2021, pp. 64–67. doi:10.1109/CSIT52700.2021.9648789.
[13] I. Khomytska, V. Teslyuk, I. Bazylevych, Yu. Kordiiaka, Machine learning and classical
methods combined for text differentiation, in Proceedings of the 6th International
Conference on Computational Linguistics and Intelligent Systems. Vol. I: Main
Conference, Gliwice, Poland, May 12-13, CEUR Workshop Proceedings, vol. 3171, 2022,
pp 1107-1116.
[14] Th. S. Gries, Statistics for Linguistics with R: A Practical Introduction (Trends in</p>
      <p>Linguistics: Studies &amp; Monographs), Mouton de Gruyter, 2009, р. 348.
[15] R. Bhattacharya, E. C Waymire, A Basic Course in Probability Theory (2nd ed.), Springer,
2016 edition, February 16, 2017.
[16] V. S. Pеrebyjnis, Statystychni metody dlia lingvistiv, Nova Knyha, Vinnytsia, Ukraine,
2013.
[17] P. C. Gomez, Statistical Methods in Language and Linguistic Research. University of</p>
      <p>Murcia, Spain, 2013.
[18] A. Kornai, Mathematical Linguistics, Springer, 2008.
[19] V. M. Turchyn, Matematychna statystyka, Navch. Posib., Vydavnychyj tsentr
“Akademia”, Kyiv, Ukraine, 1999.
[20] A. Batyuk, V. Voityshyn, V. Verhun, Software Architecture Design of the Real-Time
Processes Monitoring Platform, in: Proceedings of the IEEE Second International
Conference on Data Stream Mining &amp; Processing, DSMP 2018, Lviv, Ukraine, 2018, pp.
98-101. doi: 10.1109/DSMP.2018.8478589.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hajibabaee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Malekzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ahmadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heidari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Esmaeilzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Abdolazimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H. J.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Offensive language detection on social media based on text classification</article-title>
          ,
          <source>in: Proceedings of the IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC)</source>
          ,
          <source>Las Vegas</source>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA,
          <year>2022</year>
          , pp.
          <fpage>0092</fpage>
          -
          <lpage>0098</lpage>
          , doi: 10.1109/CCWC54503.
          <year>2022</year>
          .
          <volume>9720804</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>U.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          , G. Atteai,
          <string-name>
            <surname>M. M. Jamjoom</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <string-name>
            <surname>Samee</surname>
          </string-name>
          ,
          <article-title>Aggression detection in social media from textual data using deep learning models</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>12</volume>
          (
          <issue>10</issue>
          )
          <fpage>5083</fpage>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3390/app12105083.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Mohades Delami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sadr</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Nazari,</surname>
          </string-name>
          <article-title>Using machine learnjng-based models for personality recognition</article-title>
          ,
          <source>Big Data and Computing Visions</source>
          <volume>1</volume>
          (
          <issue>3</issue>
          ) (
          <year>2022</year>
          )
          <fpage>128</fpage>
          -
          <lpage>139</lpage>
          . doi:
          <volume>10</volume>
          .22105/bdcv.
          <year>2021</year>
          .
          <volume>142588</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Lina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jianga</surname>
          </string-name>
          ,
          <article-title>Fake news detection in the Urdu language using CharCNNRoBERTa</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2826</volume>
          /T3-2,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shoaib Farooq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Naseem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rustam</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Ashraf</surname>
          </string-name>
          ,
          <article-title>Fake news detection in the Urdu language using machine learning</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>9</volume>
          :
          <issue>e1353</issue>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .7717/peerj-cs.
          <volume>1353</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Albota</surname>
          </string-name>
          ,
          <article-title>Creating a model of war and pandemic apprehension: textual semantic analysis</article-title>
          ,
          <source>in proceedings of the 7th International conference on computational linguistics and intelligent systems</source>
          . Vol.
          <article-title>II: Computational linguistics workshop</article-title>
          . Kharkiv, Ukraine,
          <source>April 20-21</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>228</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Levchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dilai</surname>
          </string-name>
          ,
          <article-title>Qualitative and Quantitative Markers of Individual Authorial Conceptualization</article-title>
          ,
          <source>in proceedings of the 7th International conference on computational linguistics and intelligent systems</source>
          . Vol.
          <article-title>II: Computational linguistics workshop</article-title>
          . Kharkiv, Ukraine,
          <source>April 20-21</source>
          ,
          <issue>3396</issue>
          ,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Romanchuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Andrunyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chyrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chyrun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Brodyak</surname>
          </string-name>
          ,
          <article-title>Intellectual Analysis System Project for Ukrainian-language Artistic Works to Determine the Text</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>