<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Propaganda Detection in Text Data Based on NLP and Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University Institute of Lisbon</institution>
          ,
          <addr-line>Lisbon</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2080</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The goal of this research is to build a machine-learning model, as well as preliminary data process and feature extraction algorithms that would allow to successfully identify signs of propaganda in text data and to solve a binary classification task. The task is presented in two forms: article level propaganda detection and sentence level propaganda detection. The propaganda detection dataset for this level of task consists of 35 993 articles (including headlines) written in English. Each article is marked as either “propaganda” or “non-propaganda”. The dataset also contains unique identifier for each article.</p>
      </abstract>
      <kwd-group>
        <kwd>Content</kwd>
        <kwd>Text Data</kwd>
        <kwd>Propaganda Detection</kwd>
        <kwd>Data Classification</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Before we start feature extraction process, we need to perform a few particular
operations on the data to clean and prepare it for the extraction. First, we need to convert
every word in the dataset to lowercase so that in the process of vectorization two
semantically identical words, one uppercase and one lowercase, would not considered as
separate tokens. In order to do so, we perform the following transformation:
data['article'] = data['article'].apply(lambda x: "
".join(x.lower() for x in x.split()))
Data is presented in the form of text file that consist of tab-separated article content,
assigned class and unique article identifier. After the file is loaded and the identifier
attribute is deleted, we convert our dataset to pandas.DataFrame format.</p>
      <p>As is clearly presented on the graph, data is not balanced and the non-propagandistic
articles are in majority. To be more specific, data contains 31 972 non-propagandistic
and 4 201 propagandistic articles. After this, we need to view whether the data is evenly
distributed by category.</p>
      <p>The next step isto remove punctuation symbolsfromthe dataset, since onthe tasklevel
it would not be a very informative feature, but it will also be counted as a separated
token during vectorization process which can lead to data being noisy.</p>
      <p>In order to do so, we perform the following transformation:
data['article'] = data['article'].str.replace('[^\w\s]','')
Lastly, we need to extract and remove so-called stop words. Stop words are usually the
most common words in a language that do not contribute anything to the data
semantically. Moreover, so will be no use for us in the process of building a model. We are
using nltk library in-built corpora and remove them from the data in a loop.
stop = nltk.stopwords.words('english')
data['article'] = data['article'].apply(lambda x: " ".join(x for
x in x.split() if x not in stop))
After this, we can split our data into separate training and test sets.</p>
      <p>X = data['article']
y = data['label']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)</p>
    </sec>
    <sec id="sec-2">
      <title>2 Feature Extraction</title>
      <p>The raw data by itself does not carry any useful attributes and is not suitable for
applying them in the machine-learning model, so we need to extract those features before we
can run any machine-learning algorithm.</p>
      <p>The first step in this process is text vectorization.</p>
      <p>During the vectorization process, every word (term) in the corpus is assigned a
unique number. Text data transforms into N-dimensional vector, where N is a number
of words in the corpus. The value of each vector element is a term frequency of the
corresponding word. We are going to use CountVectorizer class from scikit-learn
library to implement text vectorization.
vectorizer = CountVectorizer(analyzer='word',
token_pattern=r'\w{1,}', ngram_range=(1,2),
strip_accents='unicode', min_df=3, max_df=0.5)
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)
The return value of CountVectorizer fit and transform methods is a sparse matrix, the
size of which equals to a number of unique words in corpus.</p>
      <p>We need to make sure that, after the vectorization is applied, the number of features
of training set and the number of features of test set are equal.</p>
      <p>
        The nest step of our feature extraction process is TF-IDF transformation [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1-5</xref>
        ].
      </p>
      <p>TF-IDF (Term Frequency-Inverse Document Frequency) is a numerical statistic that
is intended to reflect how important a word is to a document in a collection or corpus.</p>
      <p>
        TF is a ratio of a raw count of a term in a document (the number of times the word
 occurs in a document  ) and the overall number of words in a document [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6-9</xref>
        ]. TF
evaluates the importance of a particular word   in a document scope [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14">10-14</xref>
        ].
where   is the number of occurrences of the word  , while the denominator represents
the overall number of words in a document [
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18 ref19 ref20 ref21">15-21</xref>
        ].
      </p>
      <p>
        .
where | | is the number of documents in the collection, while | ∈  ∶  ∈  |
represents the number of documents that contain the word  [
        <xref ref-type="bibr" rid="ref22 ref23 ref24 ref25 ref26 ref27">22-27</xref>
        ].
      </p>
      <p>We will use TfidfTransofrmer class from scikit-learn library to implement TF-IDF
transformation.
transformer = TfidfTransformer(use_idf=True, smooth_idf = True)
X_train = transformer.fit_transform(X_train)
X_test = transformer.transform(X_test)
The transformation is performed on the previously formed sparse matrix of vectorized
text data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Building a Model</title>
      <p>
        For this classification task, we will use the logistic regression model [
        <xref ref-type="bibr" rid="ref26 ref27 ref28 ref29 ref30 ref31 ref32 ref33 ref34 ref35 ref36 ref37 ref38 ref39">26-39</xref>
        ].
      </p>
      <p>Logistic regression uses logistic function to model binary dependent variable. It can
described with the following mathematical equation:
 =</p>
      <p>
        +
where  is a dependent variable that varies in the scope of [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], while  is a matrix of
independent variables, and  і  are numeric coefficients of logistic model.
      </p>
      <p>We will use LogisticRegression class from scikit-learn library to implement logistic
regression model.
model = LogisticRegression(penalty='l2',
class_weight='balanced', solver='lbfgs')
model.fit(X_train, y_train)</p>
      <p>The parameters specified are: penalty=’l2’, which indicates that for the purpose of
regularization our model will use Ridge regression approach, while the parameter
solver=’lbfgs’ indicates that for the purpose of optimization our model will be using
the limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Model Evaluation</title>
      <p>To evaluate our model, we will use it to predict values for the test set data.
predictions = model.predict(X_test)
Then we’ll build a confusion matrix.
The interpretation of the confusion matrix indicates that our model has successfully
classified 6097 non-propaganda articles &amp; 694 propaganda articles, but failed to
classify 123 propaganda articles and 285 non-propaganda articles.</p>
      <p>To view the model score:
model.score(X_test, y_test)
The achieved model score equals to 0.9433254618697041.
5</p>
      <p>Sentence Level Propaganda Detection Task
The dataset for this type of task contains approximately 540 articles in English,
including headlines. The articles are broken up into separate sentences. Each sentence is
labelled as either “propaganda” or “non-propaganda”. The dataset also includes unique
identifier for each article, as well as a unique identifier for each sentence on the article
scope. The dataset contains a total of 14263 sentences.</p>
      <p>Data is stored in a form of text file collection, with a separate file for each article.
The data attributes are also stored separately. After each file collection is loaded, we
concatenate them and form a single pandas.DataFrame, while also elimination articles
and sentences unique identifiers from the data.</p>
      <p>The next step is to once more build a plot to see how our data is distributed in the terms
of category and to evaluate the level of balance in the data.</p>
      <p>As is evident from the plot, the data is once again not evenly distributed and unbalanced.
The number of non-propaganda articles outweighs propaganda articles in this dataset
as well. Specifically, the dataset contains 10 325 non-propaganda articles and 3938
propaganda articles. First, we repeat the approach we have taken while solving the
previous task and remove all stop words from the data.
stop = stopwords.words('english')
data['sentence'] = data['sentence'].apply(lambda x: " ".join(x
for x in x.split() if x not in stop))
After that, we convert our text data to lowercase once again.
data['sentence'] = data['sentence'].apply(lambda x: "
".join(x.lower() for x in x.split()))
We do not eliminate punctuation symbols this time, since we will need them at during
a feature extraction process. At the beginning of this process, we perform a POS
(Partof-Speech) tagging operation. In order to do so, we use the spacy library.
tagged = pos_tagging(sentences_pos)
data['tagged'] = tagged
After this operation, our data has a new column called “tagged” that presents every
word in a sentence in the following format: “%word%_%part of speech
tag%_%lemma%”.</p>
      <p>The next step is to perform a few manual feature extraction technique.</p>
      <p>Firstly, we need to mark every sentence as either containing or not-containing
quotations to identify the “Appeal to authority” propaganda technique.</p>
      <p>In order to do so, we initialize the corresponding function.
def get_quotations(sentences):
result = []
for sentence in sentences:
match = 1 if '"' in sentence else 0
result.append(match)
return np.array(result).reshape(-1, 1)
The next step is to check each sentence for the presence of so-called glitter words. These
are the words like “patriotism”, “democracy”, “duty”, “power” etc. We do this to
identify “Slogan” and “Flag Waving” propaganda techniques.</p>
      <p>In order to do so, we compute the number of matches of words from a sentence with
words from a glitter word lexicon and divide it by the overall number of words in a
sentence in order to normalize the coefficient. We form a separate text file (a lexicon)
with such words and initialize the corresponding function.
def get_glitter(tagged):
filename = 'glitter_words.txt'
glitters = []
append = glitters.append
with open(os.path.join(LEXICONS_PATH, filename),
encoding='utf-8') as f:
for line in f.readlines():</p>
      <p>append(line.replace('\n', ''))
result = []
for sentence in tagged:
words = 0
matches = 0
for wline in sentence.split():
try:</p>
      <p>w, t, l = wline.split("_")
except:</p>
      <p>continue
w = w.lower()
l = l.lower()
words+=1
if l in glitters or w in glitters:</p>
      <p>matches+=1
if words == 0:</p>
      <p>result.append(0)
else:</p>
      <p>result.append(matches/words)
return np.array(result).reshape(-1, 1)
Then, with a similar approach, we check each sentence for the presence of intensifying
words (“absolute”, “total”, “very”, “incredible” etc.) and absolute pronouns
(“everyone”, “nobody” etc.). While doing so, we try to identify such propaganda techniques as
“Loaded language” and “Bandwagon”.</p>
      <p>We form similar lexicons for each task and initialize similar functions. Finally, we
victories our text data using Word2Vec shallow two-layer neural net with pre-trained
Twitter 200-dimensional pre-trained model to perform word embedding’s.</p>
      <p>In order to do so, we will use gensim library.
w2v_file = os.path.join(WORD2VEC_PATH, 'twitter.27B.200d.txt')
w2v_model = KeyedVectors.load_word2vec_format(w2v_file,
binary=False)
def w2v_vectorize(tagged):
global w2v_model
X = []
ndims = 200
for sentence in tagged:
words = []
for wline in sentence.split():
try:</p>
      <p>w, t, l = wline.split("_")
except:</p>
      <p>continue
words.append(w)
row_data = np.mean([w2v_model[w] for w in words if w in
w2v_model] or</p>
      <p>[np.zeros(ndims)], axis=0).tolist()</p>
      <p>X.append(row_data)
X = np.array(X)</p>
      <p>X_std = (X - X.min(axis=0)) / (X.max(axis=0)
X.min(axis=0))</p>
      <p>X_scaled = X_std * (1 - 0) + 0
return X_scaled
Finally, we create a new pandas.DataFrame on the basis of the already existing one
with the use of all aforementioned operations.
word2vec_features = w2v_vectorize(data['tagged'])
word2vec_columns = [f'dim{x}' for x in range(200)]
glitter_words = get_glitter(data['tagged'])
quotations = get_quotations(data['sentence'])
intensifiers = get_intensifiers(data['tagged'])
absolutes = get_absolutes(data['tagged'])
X = pd.DataFrame(word2vec_features, columns=[word2vec_columns])
X['quotations'] = quotations
X['glitter_words'] = glitter_words
X['intensifiers'] = intensifiers
X['absolutes'] = absolutes
y = data['label']
Split dataset into training and test sets.</p>
      <p>X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
We will also use Logistic regression model for this task, as well as Grid Seacrh
crossvalidation algorithm to find best parameters of our model.
lr_model = LogisticRegression()
penalty = ['l1', 'l2']
C = np.logspace(0, 4, 10)
hyperparameters = dict(C=C, penalty=penalty)
clf = GridSearchCV(lr_model, hyperparameters, refit='f1', cv=5)
Best model parameters: penalty=’l2’, C=7.74.</p>
      <p>To evaluate the efficiency of our model, we predict values on test set.
predictions = best_model.predict(X_test)
Build confusion matrix.
Fig. 8. Confusion matrix.</p>
      <p>The confusion matrix can be interpreted in the following way: our model has
successfully classified 1917 non-propaganda articles and 205 propaganda articles, but 585
propaganda articles and 146 non-propaganda articles were misclassified.</p>
      <p>We can also view model score:
best_model.score(X_test, y_test)
Model score equals to 0.7437784787942516.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>During the research conduction, two-machine learning models are built to identify
propaganda – one for article level task and one for sentence level task. In the process
vectorization, TF-IDF, POS-tagging, word embedding and manual feature extraction
techniques are applied. Classification has performed with the use of Logistic
Regression model and Grid Search cross-validation algorithm. The model received following
scores: 0.94 for article-level model and 0.74 for sentence level model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Propaganda definitions, https://propaganda.qcri.org/annotations/definitions.html.</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Text</surname>
          </string-name>
          <article-title>Summarization using NLTK: TF-IDF Algorithm</article-title>
          , https://towardsdatascience.com
          <article-title>/textsummarization-using-tf-idf-e64a0644ace3.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>3. Spacy: Linguistic Features, https://spacy.io/usage/linguistic-features.</mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. Introduction to Word Embedding and Word2Vec, https://towardsdatascience.com/introduction-to
          <article-title>-word-embedding-and-word2vec-652d0c2060fa.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Glittering</given-names>
            <surname>Generalities</surname>
          </string-name>
          , [https://marketingwit.com
          <article-title>/examples-of-glittering-generalities.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lytvyn</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vysotska</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Budz</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelekh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sokulska</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalchuk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dzyubyk</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tereshchuk</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Komar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Development of the quantitative method for automated text content authorship attribution based on the statistical analysis of N-grams distribution</article-title>
          .
          <source>In: Eastern-European Journal of Enterprise Technologies</source>
          ,
          <volume>6</volume>
          (
          <issue>2</issue>
          -
          <fpage>102</fpage>
          ), pp.
          <fpage>28</fpage>
          -
          <lpage>51</lpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lytvyn</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vysotska</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grabar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharonova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cherednichenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanishcheva</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Preface: Computational Linguistics and Intelligent Systems (COLINS-</article-title>
          <year>2020</year>
          ).
          <source>In: CEUR Workshop Proceedings</source>
          , Vol-
          <volume>2604</volume>
          . (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bisikalo</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vysotska</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kravets</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Conceptual Model of Process Formation for the Semantics of Sentence in Natural Language</article-title>
          .
          <source>In: CEUR workshop proceedings, Vol2604</source>
          ,
          <fpage>151</fpage>
          -
          <lpage>177</lpage>
          . (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Vysotska</surname>
          </string-name>
          , V.:
          <article-title>Ukrainian Participles Formation by the Generative Grammars Use</article-title>
          .
          <source>In: CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>407</fpage>
          -
          <lpage>427</lpage>
          . (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bisikalo</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vysotska</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Linguistic analysis method of Ukrainian commercial textual content for data mining</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          , Vol-
          <volume>2608</volume>
          ,
          <fpage>224</fpage>
          -
          <lpage>244</lpage>
          . (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Khairova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kolesnyk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mamyrbayev</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petrasova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Applying VSM to Identify the Criminal Meaning of Texts</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>20</fpage>
          -
          <lpage>31</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Lande</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dmytrenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Radziievska</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Subject Domain Models of Jurisprudence According to Google Scholar Scientometrics Data</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>32</fpage>
          -
          <lpage>43</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Bisikalo</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontsevoi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>System for Definition of Indicator Characteristics of Social Networks Participants Profiles</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>77</fpage>
          -
          <lpage>88</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Levchenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tyshchenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dilai</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Associative Verbal Network of the Conceptual Domain БІДА (MISERY) in Ukrainian</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>106</fpage>
          -
          <lpage>120</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vasyliuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyika</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shestakevych</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Information System of Psycholinguistic Text Analysis</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>178</fpage>
          -
          <lpage>188</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Khomytska</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teslyuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The Multifactor Method Applied for Authorship Attribution on the Phonological Level</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>189</fpage>
          -
          <lpage>198</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Andrunyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rusyn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pohreliuk</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chyrun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dokhniak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karpov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krylyshyn</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <source>Information System of Photostock Web Galleries Based on Machine Learning Technology. In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>1032</fpage>
          -
          <lpage>1059</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zdebskyi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lytvyn</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rybchak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kravets</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lozynska</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holoshchuk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubinska</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dmytriv</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Intelligent System for Semantically Similar Sentences Identification and Generation Based on Machine Learning Methods</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>317</fpage>
          -
          <lpage>346</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Andrunyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasevych</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chyrun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chernovol</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonyuk</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gozhyj</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gozhyj</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalinina</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korobchynskyi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Development of Information System for Aggregation and Ranking of News Taking into Account the User Needs</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>1127</fpage>
          -
          <lpage>1171</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Makara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chyrun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rybchak</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peleshchak</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peleshchak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holoshchuk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubinska</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dmytriv</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An Intelligent System for Generating End-User Symptom Recommendations Based on Machine Learning Technology</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>844</fpage>
          -
          <lpage>883</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Husak</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lozynska</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karpov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peleshchak</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chyrun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vysotskyi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Information System for Recommendation List Formation of Clothes Style Image Selection According to User's Needs Based on NLP and Chatbots</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>788</fpage>
          -
          <lpage>818</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Lytvyn</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kubinska</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shestakevych</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demkiv</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shcherbyna</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Peculiarities of Generation of Semantics of Natural Language Speech by Helping Unlimited and Context-Dependent Grammar</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>536</fpage>
          -
          <lpage>551</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Bekesh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chyrun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kravets</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Demchuk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matseliukh</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batiuk</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peleshchak</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bigun</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maiba</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Structural Modeling of Technical Text Analysis and Synthesis Processes</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>562</fpage>
          -
          <lpage>589</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Chyrun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Model of Adaptive Language Synthesis Based On Cosine Conversion Furies with the Use of Continuous Fractions</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>600</fpage>
          -
          <lpage>611</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Perkhach</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kysil</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dosyn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zavuschak</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kis</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hrendus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasyliuk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sadova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prodanyuk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Method of Structural Semantic Analysis of Dental Terms in the Instructions for Medical Preparations</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>662</fpage>
          -
          <lpage>669</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Khomytska</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teslyuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holovatyy</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morushko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Development of methods, models, and means for the author attribution of a text</article-title>
          .
          <source>In: Eastern-European Journal of Enterprise Technologies</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          -
          <fpage>93</fpage>
          ),
          <fpage>41</fpage>
          -
          <lpage>46</lpage>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Khomytska</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teslyuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Authorship and Style Attribution by Statistical Methods of Style Differentiation on the Phonological Level</article-title>
          .
          <source>In: Advances in Intelligent Systems and Computing III. AISC 871</source>
          , Springer,
          <fpage>105</fpage>
          -
          <lpage>118</lpage>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Stasiuk</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Computer Sampling and Quantitative Analysis in Exploring Secondary Functions of Questions in Speech Genres of Intimate Communication</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>227</fpage>
          -
          <lpage>238</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Hadzalo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Analysis of Gender-Marked Units: Statistical Approach</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>462</fpage>
          -
          <lpage>471</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Romanyshyn</surname>
          </string-name>
          , N.:
          <article-title>Application of Corpus Technologies in Conceptual Studies (based on the Concept Ukraine Actualization in English and Ukrainian Political Media Discourse)</article-title>
          .
          <source>In: Computational Linguistics and Intelligent Systems, COLINS, CEUR workshop proceedings</source>
          , Vol-
          <volume>2604</volume>
          ,
          <fpage>472</fpage>
          -
          <lpage>488</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Saaya</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).:
          <article-title>The development of trust matrix for recognizing reliable content in social media</article-title>
          .
          <source>International Journal of Computing</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ),
          <fpage>60</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Zhezhnych</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shilinh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Melnyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Linguistic analysis of user motivations of information content for university entrant's web-forum</article-title>
          .
          <source>International Journal of Computing</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ),
          <fpage>67</fpage>
          -
          <lpage>74</lpage>
          . (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Pach</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilski</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A robust binarization and text line detection in historical handwritten documents analysis</article-title>
          .
          <source>In: International Journal of Computing</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ),
          <fpage>154</fpage>
          -
          <lpage>161</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Batura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakiyeva</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Charintseva</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A method for automatic text summarization based on rhetorical analysis and topic modeling</article-title>
          .
          <source>In: International Journal of Computing</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ),
          <fpage>118</fpage>
          -
          <lpage>127</lpage>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Sachenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pushkar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rippa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Intellectualization of Accounting System</article-title>
          .
          <source>In: International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applicationsm</source>
          ,
          <volume>536</volume>
          -
          <fpage>538</fpage>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Sachenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rippa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krupka</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Pre-Conditions of Ontological Approaches Application for Knowledge Management in Accounting</article-title>
          .
          <source>In: International Workshop on Аntelligent Data Acquisition and Advanced Computing Systems: Technology and Applications</source>
          ,.
          <volume>605</volume>
          -
          <fpage>608</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An Evaluation of the Information Technology of Gene Expression Profiles Processing Stability for Different Levels of Noise Components</article-title>
          .
          <source>In: Data</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ), art. no.
          <issue>48</issue>
          . (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Babichev</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Durnyak</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pikh</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senkivskyy</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>An Evaluation of the Objective Clustering Inductive Technology Effectiveness Implemented Using Density-Based and Agglomerative Hierarchical Clustering Algorithms</article-title>
          .
          <source>In: Advances in Intelligent Systems and Computing</source>
          ,
          <volume>1020</volume>
          ,
          <fpage>532</fpage>
          -
          <lpage>553</lpage>
          . (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Yurynets</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yurynets</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dosyn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Risk Assessment Technology of Crediting with the Use of Logistic Regression Model</article-title>
          .
          <source>In: CEUR Workshop Proceedings</source>
          , Vol-
          <volume>2362</volume>
          ,
          <fpage>153</fpage>
          -
          <lpage>162</lpage>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>