<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information Technology and Interactions, December</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Language-Independent Features for Authorship Attribution on Ukrainian Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuliia Hlavcheva</string-name>
          <email>yuliia.hlavcheva@khpi.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Glavchev</string-name>
          <email>maksym.glavchev@khpi.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victoria Bobicev</string-name>
          <email>victoria.bobicev@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Kanishcheva</string-name>
          <email>kanichshevaolga@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Machine Learning Methods</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University “KhPI”</institution>
          ,
          <addr-line>2 Kyrpychova str., Kharkiv, 61002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technical University of Moldova</institution>
          ,
          <addr-line>Bd. Ştefan cel Mare, 168, Chișinău, MD-2004</addr-line>
          ,
          <country>Republic of Moldova</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Writing Style</institution>
          ,
          <addr-line>Language-Independent Features, Authorship Attribution, Text Classification</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>0</volume>
      <fpage>2</fpage>
      <lpage>03</lpage>
      <abstract>
        <p>Authorship attribution is the natural language processing task of the author identification of an input text. The main goal of this task is to define the salient characteristics of documents that capture the author's writing style. In this paper, we analyze language-independent features for authorship attribution. All experiments were realized on the corpus of Ukrainian scientific papers. For the experiments we used Bayes Based Algorithms (Naive Bayes Multinomial), Support Vector Machine (SMO) and Decision Trees (LMT, J48) methods. The experimental results of the scientific text classification demonstrated that Decision Trees method in most cases outperforms other machine learning methods, and the proposed in the paper language-independent features are appropriate for the Ukrainian scientific documents authorship attribution.</p>
      </abstract>
      <kwd-group>
        <kwd>Keywords1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The task of authorship identification is not new. The results of authorship detection studies are
actively used in various spheres of human life. Authorship research can be divided into three main
areas: authorship identification (author identification by analyzing the writing styles of other works of
this author), authorship characterization (determination of the author's characteristics (gender,
education, culture, language skills, etc.) and generation of the author's profile), similarity detection
(comparison of several texts and definitions were created by one author, actually identifying the
identity of the author) [1, 2, 3, 4]. Similarity detection is most often used to identify potential
academic plagiarism. The advantage of this method is small value of input textual data.</p>
      <p>The relevance of the research topic is confirmed by the dynamics of publications and citations in
scientific databases. Data from the Web of Science Core Collection on the publication activity of
scientists confirms this. We selected documents by the keyword “Writing Style” and received 6,679
documents for the previous 10 years (2010-2019). During this time, the number of publications has
almost tripled, but the publication citation on the topic has significantly increased (Figure 1).</p>
      <p>
        Methods and approaches to the author identification and the definition of writing style differ in the
studies of various authors [
        <xref ref-type="bibr" rid="ref13 ref5 ref8">4, 5, 6, 7, 8, 9,10</xref>
        ].
      </p>
      <p>In the paper [7] the authors proposed to detect differences between writings on the same topic
provided by a set of users and tested whether these differences are enough to use for an authentication
system. They observed 74% accuracy in detecting the actual authors and concluded that with
additional features the accuracy can be pushed to above 90%. Also they analyzed the impact of some
data cleaning systems like removing stop words and punctuation marks, and how they affected the
final results.
EMAIL:
(A.</p>
      <p>1);</p>
      <p>2);</p>
      <p>2020 Copyright for this paper by its authors.</p>
      <p>In the paper [6] was proposed a new methodology for authorship attribution based on a profile of
indices related to the generalized coupon collector problem, called coupon-collector-type indices.</p>
      <p>The authors in the work [5] proposed an alternative AV approach that considers only
topicagnostic features in its classification decision. In addition, they presented a post-hoc interpretation
method that allows understanding, which particular features have contributed to the prediction of the
proposed AV method.</p>
      <p>
        The authors [
        <xref ref-type="bibr" rid="ref13 ref5 ref8">8, 10</xref>
        ] for the Authorship verification task also allocate a subtask Author obfuscation
in a situation where the author deliberately changes the writing style.
      </p>
      <p>Author obfuscation is the adversarial task of preventing a successful verification by altering a
text's style so that it does not resemble that of its original author anymore. The paper [8] introduces
new algorithms for both tasks. They proposed an approach that models writing style difference as the
Jensen-Shannon distance between the character n-gram distributions of texts, and manipulates an
author's writing style in a sophisticated manner using heuristic search.</p>
      <p>The length of text is influence on the authorship identification. In the work [9] the authors tried to
identify authors of tweet messages, which are limited to 280 characters.</p>
      <p>This paper focuses on the third direction of similarity identification. Therefore, by identification of
the author, we mean the definition of a potential author of a text from a certain number of applicants.
The decision is based on the group of properties that reflect the author's style measurement and
comparison. By the author's style, we mean the author's own writing style, which he uses
unconsciously when writing texts. Text properties that reflect the author's style are called stylometric.
Stylometric properties by which the author's style can be identified make up a list of style markers.</p>
      <p>The main stylometric properties include the following features: lexical, symbolic, syntactic,
semantic, and application-specific [1, 2]. The author [1] highlights the following application-specific
properties that can reflect a written style: Structural, Content-specific and Language-specific.</p>
      <p>The search for symbolic properties is based on the analysis of text as a sequence of symbols. This
type of information is easily identifiable for any natural language. We can easy analyze total number
of characters, number of alphabetic characters, number of upper and lower case characters, number of
punctuation marks, etc. On practical experiments, it was proved that the use of symbolic features, in
combination with other markers, is useful for determining the style of writing [11].</p>
      <p>All investigated structural units of the text depend on the language, general content, and have a
probabilistic nature. Each language has certain lexical, syntactic, semantic, stylistic features.
Therefore, the same approaches to attribution for different languages give results of different quality
(accuracy). This is confirmed by previous studies of the stylometric properties of scientific texts in
Ukrainian and Russian, carried out by the authors [12, 13].</p>
      <p>Also, all style markers are equally effective when used for different languages [11]. Based on this,
we distribute the stylometric characters into two groups: language-dependent features and
languageindependent features.</p>
      <p>The paper [11] analyzes in detail and investigates language-independent and language-dependent
features of the stylometric properties: average sentence length in the text, percentage of capital letters
in relation to number of lowercase letters, percentage of lowercase letters in relation to total number
of characters in the text, percentage of punctuation signs in relation to total number of spaces in the
text, percentage of numeric characters in relation to total number of letter characters in the text,
average word length in a sentence in the text, frequency of the most frequent stop word in the text, the
most frequent starting word in a sentence in the text, the frequency of the most frequent starting letter
of the starting word, the frequency of the most frequent starting letter of the stop word, the number of
the words occurring just once in the text, the number of the words that occurred twice in the text, the
number of words for the given word length in the text.</p>
      <p>Some of them were used in experiments by the authors of this publication. This paper focuses on
the research of the statistical characteristics of scientific texts in the Ukrainian language, which can be
attributed to language-independent stylometric properties.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data Description</title>
      <p>For the experiments we used our own preprocessed text corpus. The data source is the repository
of the National Technical University "Kharkiv Polytechnic Institute"
(http://repository.kpi.kharkov.ua) and the portal of scientific publications of the National Technical
University "Lviv Polytechnic" (http://science.lpnu.ua/uk). The text corpus consists of individual
scientific publications in Ukrainian. For stylometric properties we used only the paper main text,
which best reflects the author’s written style. For each author, a collection of paper fragments is
formed. Thus, the text corpus represents a set of data based on identical fragments (instances), all
documents of which are considered individually. In this paper, we have implemented an
instancebased approach to authoring style research. Two existing approaches [1, 14, 15] were used so far:
profiles (profile-based approaches), instances (instance-based approaches). We concentrated on the
instance-based methods, because the profile approaches defines the overall writing style of the author
and does not account for style changes in individual documents. The instance-based approaches
determine the writing style for each instance of the text and thus accommodate the changes in the
author's writing style. In our case, stylometric properties are determined for each fragment of the
paper separately. Text data statistics are shown in Table 1. However, the distribution of text volumes
among the authors and the files is highly unbalanced. The smallest files contained only 150-160
words. Therefore, we cut files and are working with shortest scientific texts in our collection are
around 150 words. We created two subsets for our experiments, one of them consist of 8 classes
(authors), other – 32 classes (authors). These classes were selected randomly but the sets are balance.</p>
      <p>Statistical characteristics were determined using our own software "Determination of statistical
characteristics of text fragments". This program helps to obtain different kind of features from text
and is adapted to the processing of a wide range of character sets.</p>
      <p>This program implements its processing of text data according to its own algorithm in order to
form a wide range of text statistical parameters. The general interface consists of two areas: working
with data and calculation results (Fig. 2). Calculations are carried out on the tabs for following
elements: sentences, words, symbols, set (user-selectable).</p>
      <p>The program implements the functions of calculating and saving the obtained experimental data
for all elements.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Authorship Attribution the Base on Language-Independent Features</title>
      <p>In this Section, we describe our language-independent features for text classification in Ukrainian
and show the main results of our experiments of authorship attribution.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Identification of Language-Independent Features</title>
      <p>Groups of properties, which refer to the text statistical parameters and allow to determine the
author's style with high accuracy are described in [2, 11]. The authors of this paper created their own
list of text statistical properties of Ukrainian, which are divided into 5 groups:</p>
      <p>Group 1: average number of words in a sentence, average word length, average word frequency,
punctuation (5 indicators).</p>
      <p>Group 2: the number of words with length from 1 to 20 characters, the number of words with a
word frequency from 1 to 8 times (28 indicators).</p>
      <p>The range of up to 20 letters was selected due to the use of similar words: multifunctional (19),
competitiveness (22), etc. The statistics on the words with a length of 1 to 20 characters and the
number of words with a word frequency of 1 to 8 times are shown in Fig. 3 and Fig. 4.</p>
      <p>Group 3: frequency of using letters of the Ukrainian alphabet (33 indicators).</p>
      <p>Less commonly used letter "ґ"/"g" and most often used letters "о"/"o", "н"/"n" and "а"/"a". The
full information about these letters are presented in Table 2.
words with a frequency of 100 or more for the classification process. The stop-words and pronouns
are shown in Table 3 and Table 4.</p>
      <p>Group 5: coefficients of language diversity (5 indicators).</p>
      <p>The author's unique vocabulary consists of specialized terms, functional words, certain linguistic
constructions and influences the author’s text variety. This variety is determined using the following
coefficients [13, 16]:
</p>
      <p>coefficient of text lexical diversity   =
 , where W is the number of unique words, N is the
syntactic complexity coefficient   = 1 −</p>
      <p>, where P is the number of sentences, N is the
 1, where W1 is the number of words with a frequency of 1, W is the
=</p>
      <p>10, where W10 is the number of words with a frequency of 1,
speech connectivity coefficient   =</p>
      <p>, where Z is the number of prepositions, S is the
Stopwords and pronouns with a frequency of 20 or more (for 8 classes)
однак/however
оскільки/because
при/at
проте/but
під/under
та/and
так/so
таким чином/so
також/also
те/that
ти/you
тим/by that
тих/then
то/then
того/that</p>
      <p>After calculating the text diversity coefficients, we determined their average values for each
author. Results were grouped by value. Thus, we found that a significant number of authors belong to
each group of indicators for each coefficient of text diversity. Unfortunately, it does not allow
independent use of these properties to identify the author. In Table 5 are presented the number of
unique mean values for the text diversity coefficients and the maximum number of authors with the
same indicators.</p>
      <p>The analysis of the text diversity coefficients showed that these coefficients could not be a formal
feature and could not be used for the author identification. Therefore, these indicators are used
together with additional properties. For example, according to the Decision Trees (LMT) algorithm,
with the addition of text diversity coefficients to the set of properties, the F-Measure value increased
from 0.599 to 0.614.
2357
35
207
24
48
85
47
49
91
593
61
164
2518
203
57
250
49
60
59
128
418
327</p>
      <p>Word
той/that
тому/so
тільки/only
у/in
це/it
цей/this
цих/these
цього/this
цьому/this
ця/this
ці/these
цієї/this
чи/or
що/what
щоб/in order
to
як/as
якщо/if
і/and
Із/from
їх/their
її/her
32
249
87
2155
363
85
86
153
112
41
85
83
135
1365
95
663
293
2398
343
408
323</p>
      <p>For our experiments of text classification, we used our corpus (2 subsets) and five groups of
features: for separate groups and their combination. Weka software
(https://www.cs.waikato.ac.nz/ml/weka/) was used for classification task. Bayes Based Algorithms
(Naive Bayes Multinomial, NBM), Support Vector Machine (SMO), Decision Trees (LMT, J48) were
used as classification methods with the cross-validation parameter – 10 folds. According to the
preliminary experiments of the authors using other stylometric properties for machine learning
methods are shown to demonstrate good results [12].</p>
      <p>The Figures 5 shows the classification results of 1, 2 and 3 groups separately and together. For
individual groups, quality is not assessed using F-measure. F-measure cannot be calculated or has
very little value for them. The result is presented in the percentage of correct classification. In the
experiments, fragments of documents from 8 authors were used, that is, a classification was carried
out into 8 classes. The results of 32 classes are not as good as for 8 classes. In our opinion, this is due
to the fact that with an increase in the number of classes, the influence of statistical characteristics
decreases.</p>
      <p>Experiments have confirmed that the classification quality depends on sets of features and machine
learning methods. In addition, scientific publications analyze other problems associated with the
preparation of data that affect the quality of the classification [2]:</p>
      <p>• Problem Scope – The number of authors in the research, equal to the number of classes in the
classification.</p>
      <p>• Training Size – The number of documents in the training set.</p>
      <p>Therefore, we conducted experiments for 32 authors (32 classes) and 8 authors (8 classes) and
compared the results. The result of the text classification using different sets of features for a different
number of classes is presented in Table 6.</p>
      <p>According to the experiments, we obtained an average increase in the value of Correctly Classified
Instances: 20%, MIN Correctly Classified Instances: 15%, MAX Correctly Classified Instances: 28%.
The dynamics of changes in the classification quality for 8 authors (8 classes) using a different
number of indicators is shown in Fig. 6.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusions</title>
      <p>In this paper we described our experiments on text authorship attribution. These experiments were
run on our own text corpus of scientific publications. Bayes Based Algorithms (Naive Bayes
Multinomial, NBM), Support Vector Machine (SMO) and Decision Trees (LMT, J48) from the
WEKA toolkit were tested for this task. Two sets of experiments have been designed, with selections
of texts written by 32 and 8 authors respectively. As a novelty, we proposed our own set of author
style indicators, organizing them in 5 groups and testing these groups individually and in various
combinations.</p>
      <p>The result of the experiments demonstrated the usefulness of the proposed language-independent
stylometric properties indicators for text authorship attribution. Experiments showed that for 1-3, 1-4
and 1-5 groups of properties, the classification indicators are similar, despite the increase in the
number of features.</p>
    </sec>
    <sec id="sec-6">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>The best result (F-measure) of 32 classes we received for the SMO method (0.586) and</article-title>
          LTM
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>(0.614) for 1-5 groups of properties. The best result (F-measure) of 8 classes we received for the SMO</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>method (0,794) and LTM (0,806) for 1-5 groups of properties</article-title>
          . [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <article-title>A survey of modern authorship attribution methods</article-title>
          .
          <source>Journal of the American</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>Society for information Science and Technology</source>
          , Vol.
          <volume>60</volume>
          , no 3, (
          <year>2009</year>
          ), pp.
          <fpage>538</fpage>
          -
          <lpage>556</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>doi:10</source>
          .1002/asi.21001. [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Li. J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <article-title>A framework for authorship identification of online messages:</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>information science and technology</article-title>
          , Vol.
          <volume>57</volume>
          . №. 3, (
          <year>2006</year>
          ), pp.
          <fpage>378</fpage>
          -
          <lpage>393</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.20316. [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>AlSallal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Iqbal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Palade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>An integrated approach for intrinsic</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>plagiarism detection</article-title>
          .
          <source>Future Generation Computer Systems</source>
          , Vol.
          <volume>96</volume>
          ., (
          <year>2019</year>
          ) pp.
          <fpage>700</fpage>
          -
          <lpage>712</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>doi:10</source>
          .1016/j.future.
          <year>2017</year>
          .
          <volume>11</volume>
          .023. [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Alhijawi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hriez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Awajan</surname>
          </string-name>
          ,
          <article-title>Text-based authorship identification - A survey</article-title>
          . Paper
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>presented at the 5th International Symposium on Innovation in Information and Communication</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Technology</surname>
            ,
            <given-names>ISIICT</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . doi:
          <volume>10</volume>
          .1109/ISIICT.
          <year>2018</year>
          .
          <volume>8613287</volume>
          . [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Halvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Graner</surname>
          </string-name>
          , R. Regev,
          <string-name>
            <surname>TAVeer:</surname>
          </string-name>
          <article-title>An interpretable topic-agnostic authorship</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>verification method</article-title>
          , ACM International Conference Proceeding Series,
          <year>2020</year>
          . [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , Authorship Attribution via
          <string-name>
            <surname>Coupon-Collector-Type</surname>
            <given-names>Indices</given-names>
          </string-name>
          , Journal of
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Quantitative</given-names>
            <surname>Linguistics</surname>
          </string-name>
          , Vol.
          <volume>27</volume>
          , no.
          <issue>4</issue>
          , (
          <year>2020</year>
          ), pp.
          <fpage>321</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>Doi:10.1080/09296174</source>
          .
          <year>2019</year>
          .
          <volume>1577939</volume>
          . [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Sadman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Datta Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poudyal</surname>
          </string-name>
          ,
          <article-title>Stylometry as a Reliable Method</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>for Fallback</surname>
            <given-names>Authentication</given-names>
          </string-name>
          , 17th International Conference on Electrical Engineering/Electronics,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Computer</surname>
          </string-name>
          , Telecommunications and Information Technology,
          <source>ECTI-CON</source>
          <year>2020</year>
          , (
          <year>2020</year>
          ), pp.
          <fpage>660</fpage>
          . [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          , T. Wenzel, M. Potthast,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>On divergence-based author</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Information</given-names>
            <surname>Technology</surname>
          </string-name>
          , vol.
          <volume>62</volume>
          , no.
          <issue>2</issue>
          , (
          <year>2020</year>
          ), pp.
          <fpage>99</fpage>
          -
          <lpage>115</lpage>
          . doi:
          <volume>10</volume>
          .1515/itit-2019-
          <volume>0046</volume>
          . [9]
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Sharon Belvisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Muhammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alonso-Fernandez</surname>
          </string-name>
          , Forensic Authorship Analysis of
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Microblogging</given-names>
            <surname>Texts Using N-Grams</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stylometric</given-names>
            <surname>Features</surname>
          </string-name>
          ,
          <year>2020</year>
          8th International
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>Workshop on Biometrics and Forensics, IWBF 2020 - Proceedings</source>
          ,
          <year>2020</year>
          . [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Heuristic authorship obfuscation</article-title>
          ,
          <source>ACL</source>
          <year>2019</year>
          -
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <article-title>57th Annual Meeting of the Association for Computational Linguistics</article-title>
          , Proceedings of the
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Conference</surname>
          </string-name>
          ,
          <year>2020</year>
          , pp.
          <fpage>1098</fpage>
          . [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Adamovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Miskovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Milosavljevic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sarac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Veinovic</surname>
          </string-name>
          , Automated language‐
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>for Information Science and Technology</source>
          , Vol.
          <volume>70</volume>
          .8, (
          <year>2019</year>
          ), pp.
          <fpage>858</fpage>
          -
          <lpage>871</lpage>
          . doi:
          <volume>10</volume>
          .1002/asi.24163. [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bobicev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hlavcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kanishcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Lazu</surname>
          </string-name>
          , Authorship Attribution in Scientific
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Publications</surname>
          </string-name>
          . Proceedings of Corpora-2019 conference. Saint-Petersburg, Russia, (
          <year>2019</year>
          ), pp.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          174-
          <fpage>181</fpage>
          . [13]
          <string-name>
            <given-names>V.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kanishcheva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hlavcheva</surname>
          </string-name>
          ,
          <article-title>Authorship Identification of the Scientific Text in</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <article-title>Ukrainian with Using the Lingvometry Methods, 2018 IEEE 13th International Scientific and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          (
          <year>2018</year>
          ), pp.
          <fpage>34</fpage>
          -
          <lpage>38</lpage>
          . doi:
          <volume>10</volume>
          .1109/STC-CSIT.
          <year>2018</year>
          .
          <volume>8526735</volume>
          . [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bacciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. La</given-names>
            <surname>Morgia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. N.</given-names>
            <surname>Nemmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Neri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stefa.</surname>
          </string-name>
          Cross-Domain Authorship
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <article-title>Attribution Combining Instance-Based and Profile-Based Features</article-title>
          . In
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          , (
          <year>2019</year>
          ). [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <source>Foundations and Applications of Authorship Attribution Analysis</source>
          ,
          <year>2019</year>
          , 38 p. [16]
          <string-name>
            <given-names>V. A.</given-names>
            <surname>Vysotska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Pasichnyk</surname>
          </string-name>
          ,
          <string-name>
            <surname>Yu. M. Shcherbyna</surname>
            ,
            <given-names>T. V.</given-names>
          </string-name>
          <string-name>
            <surname>Shestakevych</surname>
          </string-name>
          . Matematychna
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <source>lingvistyka. Knyha 1</source>
          .
          <string-name>
            <surname>Kvantytatyvna</surname>
            <given-names>linhvistyka</given-names>
          </string-name>
          , Lviv, Novyi Svit-
          <year>2000</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>