<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Panibrat</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadym Sobko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Security and Computer Science University of the National Education Commission</institution>
          ,
          <addr-line>2 Podchorazych str, Krakow, 30-084</addr-line>
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mikolaj Karpinski</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Military Institute, Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>Zdanovska str. 81a, Kyiv, 03189</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>State University of Information and Communication Technologies</institution>
          ,
          <addr-line>Solomianska str. 7, Kyiv, 03680</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper sets and investigates the task of developing a system for automatic classification of phraseological units in English texts, designs the structure of the corresponding system and implements it in software. The paper proposes a hybrid method for automatic classification of phraseological units in English texts, the main idea of which is to use a rule-based method to identify and distinguish specific types of phraseological units and further apply machine learning methods to classify them based on semantic and syntactic properties. To implement the hybrid method, a system is proposed, the structure of which includes the following modules: Hybrid Soft; tokenization; tagging; base determination; division; corpus module; classification module. The Python language was used to develop a system for automatic classification of phraseological units in English texts that implements the hybrid method. The use of the proposed system allows not only to reduce the time required for processing phraseological units but also to establish additional regularities for the classification features by which phraseological units are classified.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;phraseological units</kwd>
        <kwd>automatic classification</kwd>
        <kwd>artificial intelligence</kwd>
        <kwd>information technology</kwd>
        <kwd>algorithm</kwd>
        <kwd>method</kwd>
        <kwd>1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The study of phraseology is an important part of linguistics as it helps to understand the
complex structure of language and how it is used in communication. In recent years a lot of</p>
      <p>0000-0002-8846-332X (M. Karpinski); 0000-0003-2949-2187 (L. Borovyk); 0000-0001-7689-239X (S. Lienkov);
0000-0002-3014-131X (V. Savchenko); 0000-0002-3209-9119 (I. Panibrat); 0009-0004-6109-0262 (V. Sobko)
© 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
attention has been paid to the study of natural language processing in general and phraseology
in particular. This is due to the improvement of research tools. In particular, the development
of information technology and artificial intelligence has made it possible to develop systems
capable of analyzing and understanding human speech. The computational capabilities of these
tools increase the number of linguistic tasks that can be solved and deepen the level of their
processing.</p>
      <p>One of the key problems in natural language processing (NLP) is phrase recognition and
classification. The need to automate this task is explained by the fact that many natural
language processing applications such as text mining, information search, machine translation,
and natural language generation, require preliminary phrase recognition and classification.
Automatic classification of phraseological units in English texts is an important task in natural
language processing which involves the identification and categorization of groups of words or
phrases based on their semantic and syntactic properties.</p>
      <p>The subject area of automatic phraseological unit (PhU) classification lies at the intersection
of computational linguistics and natural language processing. It involves the development of
algorithms and methods that can automatically identify and classify PhUs that are fixed or
semifixed expressions in a language with figurative or idiomatic meanings that cannot be easily
derived from the meanings of their individual words.</p>
      <p>Thus, automatic phraseology classification is a complex task that requires a combination of
linguistic knowledge, computing and algorithms.</p>
      <p>Taking into account the shortcomings of existing approaches to the classification of
phraseological units and the peculiarities of the functioning of software tools for the automatic
classification of phraseological units, the task of developing an effective system for the
automatic classification of phraseological units in English texts is becoming increasingly
important. The effectiveness of the system implies its reliability, applicability to the processing
of various sentence structures and different types of phraseological units, including fixed
expressions, idioms and phrases, as well as minimization of the shortcomings typical of existing
tools.</p>
      <p>Therefore, the purpose of the article is to elaborate the theoretical and applied foundations
for the development of an effective system for automatic classification of phraseological units
in English texts based on the use of modern information technologies in general and artificial
intelligence in particular.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Recently, the issue of classification of phraseological units in general and automatic
classification in particular has been the subject of research by a number of philologists and
specialists in the field of information technology [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1-6</xref>
        ]. As part of their research they analyzed
various approaches, methods and developed software and hardware tools for automatic
classification of PhUs.
      </p>
      <p>Several approaches have been proposed for the automatic classification of PhUs which can
be generally divided into three categories: rule-based approach, statistics-based approach and
machine learning approach. The following methods have also been used to classify
phraseological units in the raw text: the use of constructed «local» grammars; the use of
dictionaries; the use of statistical processes.</p>
      <p>Table 1 provides a comparative assessment of these approaches by various criteria.
Table 1
Evaluation of different approaches for automatic classification of PhUs</p>
      <sec id="sec-2-1">
        <title>Approach</title>
      </sec>
      <sec id="sec-2-2">
        <title>Comparative characteristics of the approaches</title>
      </sec>
      <sec id="sec-2-3">
        <title>Mathematical methods underlying the approach</title>
      </sec>
      <sec id="sec-2-4">
        <title>The essence of the approach</title>
      </sec>
      <sec id="sec-2-5">
        <title>Benefits of the approach</title>
      </sec>
      <sec id="sec-2-6">
        <title>Disadvantages of the approach</title>
      </sec>
      <sec id="sec-2-7">
        <title>Rulebased Statisticsbased</title>
      </sec>
      <sec id="sec-2-8">
        <title>Uses statistical Probabilistic</title>
        <p>measures of models,
frequency and clustering,
distribution of associative
expressions to rules
identify and classify
PhUs</p>
      </sec>
      <sec id="sec-2-9">
        <title>Flexibility, clarity, Limited scalability,</title>
        <p>high accuracy error-prone,
maintenance
overhead, lack of
adaptability</p>
      </sec>
      <sec id="sec-2-10">
        <title>Scalability, data Lack of contextual</title>
        <p>management, understanding,
automatic feature data displacement,
extraction, data domain
processing speed, dependency
generalization
Uses a set of Decision trees, High accuracy, Requires large
features derived support contextual amounts of
from expressions vectors, deep understanding, training data,
and a labeled neural reliability, retooling, limited
dataset to train a networks scalability interpretation
classifier that can capabilities, lack of
automatically transparency
identify and classify
new expressions</p>
        <p>
          The analysis of the data in Table 1, the methods of classifying PhUs and a number of thematic
scientific sources, in particular [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1-6</xref>
          ], allows us to conclude that there are currently no perfectly
working approaches and methods that could ensure the identification and classification of PhUs
in any English text without error and distortion.
        </p>
        <p>It should also be noted that there are several basic software tools available for automatic
classification of financial institutions:</p>
        <p>Sketch Engine: Sketch Engine is a web-based corpus management and analysis tool that
provides various functions for language processing including automatic classification of PhUs
[7]. It uses statistical analysis and machine learning algorithms to identify and classify the PhUs
in a corpus of text.</p>
        <p>Linguistic Investigation and Word Count (LIWC): LIWC is a software tool that provides text
analysis and language processing capabilities. It contains the function of identifying and
classifying the PhUs in the text corpus based on the created rules and statistical analysis [8].</p>
        <p>ConText: ConText is a software tool that provides natural language processing capabilities
including automatic classification of PhUs. It uses machine learning algorithms to identify and
classify PhUs in clinical text.</p>
        <p>Natural Language Toolkit (NLTK) [9]: NLTK is a Python library that provides various
functionalities for natural language processing including automatic classification of
phraseological units. It contains a module for identifying and classifying phraseological units
based on manual rules and statistical analysis [10].</p>
        <p>PhraseDetective: PhraseDetective is a web-based software tool that provides automatic
classification of the phraseological units in a corpus of text. It uses a combination of statistical
analysis and machine learning algorithms to identify and classify the phraseological units.</p>
        <p>In general, existing software tools for automatic classification of PhUs provide a number of
functionalities but there is a need for further research and development in this area. This is due
to the need to improve the accuracy and efficiency of software tools.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <p>In order to achieve this goal, it seems advisable to clearly formulate the task of developing a
system for the automatic classification of PhUs in English texts, to design the structure of the
relevant system and to implement it in software.</p>
      <p>Problem Statement of the Development of a System for Automatic Classification of PhUs in
English Texts</p>
      <p>The input to the system should be an array of English text consisting of sentences,
paragraphs or large text segments. The output of the system should be a list of PhUs found in
the text representing the identified phrases, and classified according to certain features.</p>
      <p>The system should process the input data and automatically generate the result.
The system should demonstrate the following key capabilities:
1. Comprehensive perception: the system should be able to process a large amount of text
covering different genres, registers and linguistic contexts. It should be able to perceive the
PhUs in complex sentence structures such as compound or complex subordinate clauses as well
as interrogative and exclamatory sentences. The system should be able to classify a wide range
of linguistic units including fixed expressions, idiomatic phrases, phrases and other lexical
combinations.</p>
      <p>2. Accurate identification and classification: the system should provide high accuracy in
recognizing and classifying the PhUs in the text. It should use linguistic models, parsing,
semantic information and contextual clues to distinguish the PhUs from ordinary language
usage. The system should apply rule-based and machine learning techniques to improve the
accuracy and memorability of the classification of the PhUs.</p>
      <p>3. Scalability and efficiency: the system should be able to process large amounts of text
efficiently. It should optimize computing resources to provide timely results even when
processing large corpora or real-time text streams.</p>
      <p>4. Further analysis: the system should produce a list of identified and classified PhUs as an
output. The output should be suitable for further analysis such as language modeling, corpus
linguistics research or other natural language processing tasks.</p>
      <p>Designing the structure of the system for automatic classification of PhUs in English texts.</p>
      <p>The theoretical basis of the developed system of automatic classification of PhUs in English
texts should be an improved method of automatic classification of PhUs which would combine
the strengths of rule-based and machine learning methods. In the following, we will call this
method a hybrid method. The hybrid method should provide a more accurate classification of
the PhUs in English texts.</p>
      <p>The main idea of the hybrid method is to use a rule-based method to identify and distinguish
specific types of PhUs and then apply machine learning methods to classify PhUs based on their
semantic and syntactic properties. For example, a rule-based method can be used to identify
nouns that are composed of a noun and an adjective and a machine learning algorithm can be
used to classify these nouns based on their semantic similarity to other known PhUs.</p>
      <p>There are several algorithms and technical tools that can be used to achieve this combination
of rule-based and machine learning techniques. For example the Natural Language Toolkit
(NLTK) in Python which contains tools for pattern matching, parsing, and feature extraction as
well as algorithms for classification, clustering, and information search [11]. Another natural
language processing tool can be spaCy which contains a rule-based matching system for
detecting specific patterns in text as well as a machine learning line for training and evaluating
custom models for classification and other tasks [12].</p>
      <p>Feature extraction involves identifying the characteristics of the PhUs that can distinguish
it from other types of expressions. These features may include frequency of occurrence, length
of the expression, presence of certain words or parts of speech and other linguistic properties.</p>
      <p>Classification involves the use of machine learning algorithms to group similar PHUs based
on selected features.</p>
      <p>The classification process can be supervised or unsupervised. In supervised learning, the
algorithm is trained on a labeled dataset while in unsupervised learning, the algorithm identifies
data patterns without prior knowledge of the categories [13]. There are several machine
learning algorithms that can be used to automatically classify phraseology including supervised
learning, unsupervised learning, and semi-supervised learning [14].</p>
      <p>Supervised learning involves training a machine learning algorithm on a labeled dataset of
PhUs and their corresponding categories. The algorithm learns to identify the characteristics
and properties of different types of phrases and uses this knowledge to classify new phrases.
For example a supervised learning algorithm can be trained on a dataset of idiomatic
expressions and their corresponding categories (e.g., food-related idioms, weather-related
idioms etc.) to accurately classify new examples of idiomatic expressions based on their
semantic and syntactic properties [15].</p>
      <p>Unsupervised learning involves training a machine learning algorithm on an unlabeled set
of PhUs data allowing the algorithm to detect patterns and similarities in the data without any
prior knowledge of the categories. This approach is useful when the categories of phrases are
unknown, or when there are too many categories to be labeled manually. For example, the
algorithm can be used to combine similar idiomatic expressions into groups based on their
semantic and syntactic properties [16].</p>
      <p>Semi-supervised learning involves a combination of supervised and unsupervised learning
where the algorithm is trained on a small labeled data set and a large unlabeled data set. The
algorithm uses the labeled dataset to learn the characteristics and properties of the categories
and then applies this knowledge to the unlabeled dataset to identify similar instances. This
approach can be useful when the labeled dataset is small or when labeling the entire dataset is
not possible [17].</p>
      <p>When developing an information system for automatic classification of PhUs we divide the
structure of the information system into three main components: data pre-processing; feature
extraction; and classification.</p>
      <p>For example, let's imagine that we have a set of PhUs that have been labeled as idioms or
phrases. To classify a new PhUs as an idiom or a phrase we can apply the k-nearest neighbor
algorithm (k-NN). Its characteristics such as frequency and length, are established, and its k
nearest neighbors in the dataset are determined. If most of the neighbors are labeled as idioms,
the PhU can be classified as an idiom.</p>
      <p>Data preprocessing involves cleaning and organizing data before it is divided into feature
extraction and classification algorithms. This process includes removing irrelevant information
such as stop words and converting the text into a format that can be easily processed by the
algorithms. Feature extraction involves identifying the characteristics of phrases that can
distinguish them from other types of expressions. These features may include frequency of use,
length of the expression, presence of certain words or parts of speech and other linguistic
properties. Mathematical formulas can be used to extract features from the data.</p>
      <p>Other linguistic properties such as the presence of certain words or parts of speech can be
determined using NLP methods [18]. Classification involves grouping similar phrases based on
the features identified during the feature extraction process. Mathematical algorithms can be
used to classify phrases based on their features.</p>
      <p>
        The bag-of-words (BOW) model and the word embedding model are two popular methods
used for feature extraction. In the BOW model each phraseological unit is represented as a
vector of word frequencies. The number of times each word appears in the unit is counted, and
the resulting vector has a dimension corresponding to the number of unique words in the corpus
[19]. For example the phraseological units «A piece of cake», «Break a leg» and «Hit the sack»
will be represented as follows:
[
        <xref ref-type="bibr" rid="ref1 ref1 ref1">1, 1, 1, 0, 0, 0, 0</xref>
        ]
[
        <xref ref-type="bibr" rid="ref1 ref1">0, 1, 0, 1, 0, 0, 0</xref>
        ]
[
        <xref ref-type="bibr" rid="ref1 ref1">0, 1, 0, 0, 0, 1, 0</xref>
        ]
      </p>
      <p>Each dimension of the vector represents the frequency of the corresponding word in the
phraseology.</p>
      <p>The word embedding model represents each word as a vector in a multidimensional space.
The vector representation of each PhU is created by averaging the vectors of the words that
make up the unit. For example the vector representation of the words of the phraseological
units «A piece of cake», «Break a leg» and «Hit the sack» will be as follows:
[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]
[0.5, 0.6, 0.7, 0.8, 0.9, 0.1, 0.2]
[0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3]</p>
      <p>Each dimension of the vector represents the average value of the corresponding vector of
words in the phrase.</p>
      <p>After feature extraction, the next step is classification. One example of classifying PhUs by
clear features is to distinguish between idioms and phrases by their structure and predictability.
For example the idiom «kick the bucket» means «to die» and cannot be understood by looking
up the meaning of «kick» or «bucket» in a dictionary. Idioms often have a metaphorical or
figurative meaning that is not related to their literal meaning [20]. On the other hand, phrases
themselves. They can be predicted to some extent based on the meaning of individual words.</p>
      <p>For example, let's look at the classification of phraseological units by four features:
1) Structure. For example, the phrase «kick the bucket» is an idiom which means that it has
a fixed structure and cannot be understood based on the meaning of its individual words.</p>
      <p>2) Semantic connection. For example, the phrase «strong coffee» is a collocation, i.e. it
consists of words that often occur together due to semantic similarity.</p>
      <p>3) Function. For example, the phrase «on the other hand» is a discourse marker, which
means that it is used to indicate a contrast or an alternative point of view.</p>
      <p>4) Origin. For example, the phrase «faux pas» is a loanword which means that it was
borrowed from the French language and is commonly used in English to refer to a social mistake
or blunder [21].</p>
      <p>Several methods can be used to classify PhUs including k-nearest neighbors, decision trees,
support vector machines (SVMs) and neural networks. The choice of a classification method
depends on the size of the data set, the complexity of the PhUs and the desired classification
accuracy.</p>
      <p>In the k-nearest neighbors method, a phraseology is classified based on the class label of its
k nearest neighbors in the feature space.</p>
      <p>The decision tree method creates a tree that represents decision rules for assigning class
labels to phraseological units.</p>
      <p>The SVM uses a hyperplane to divide phrases into different classes based on their feature
representations.</p>
      <p>In neural networks a deep learning model is trained on the feature representations of phrases
to predict their class membership [22].</p>
      <p>The development of an information system for automatic classification of PhUs has certain
difficulties. One of the biggest challenges is the diversity and complexity of PhUs in different
languages and cultures. In addition, the accuracy of the classification task strongly depends on
the quality of the training data and the choice of feature extraction and classification methods.
For example, the BOW model is simple and effective but it does not take into account the
semantic relations between words in phraseological units. In contrast, the word embedding
model takes into account the semantic relations between words but it requires a large amount
of training data and can be computationally expensive. Therefore, the choice of a feature
extraction method should be based on the characteristics of the dataset and the available
computing resources. Likewise, the choice of classification method depends on the size of the
dataset, the complexity of the feature and the desired classification accuracy. For example, the
k-nearest neighbors method is simple and easy to implement but it may not work properly
when the dataset is large and the number of classes is high. In contrast, the neural network
method is more complex and computationally expensive but it can achieve high accuracy even
with large and complex datasets [23].</p>
      <p>The conducted analysis and research allow us to propose the author's information system
for implementing the hybrid method for categorizing phrases:</p>
      <p>Data collection: the system collects data that includes a set of texts or documents to be
categorized. The data may also include phraseological units and related categories that can be
used as training data for a machine learning model.</p>
      <p>Pre-processing: The system pre-processes the data to prepare it for classification. This
includes tasks such as tokenization, stop word removal, stemming, and normalization.</p>
      <p>Rule-based method: the system applies a method to identify and classify phraseological
units in the text. This can be done by creating a set of rules that match certain patterns or
sequences of words that correspond to phrases [24].</p>
      <p>Machine learning-based method: the system uses a method to classify the text. This can be
done by training a classification model on training data that includes phraseology and related
categories. The machine learning model can then be used to automatically classify the
phraseology in the text.</p>
      <p>Hybrid Method: The system combines the results of a rule-based method and a machine
learning-based method to improve classification accuracy. This can be done by applying the
machine learning model only to idioms that were not classified by the rule-based method or by
using the rule-based method to refine the output of the machine learning model.</p>
      <p>Evaluation: The system evaluates classification performance using metrics such as precision,
recall, and F1-score. This can be done by comparing the system’s output with a set of manually
classified texts.</p>
      <p>Output: The system outputs classification results which can be used for various purposes,
such as information retrieval, text analysis, or sentiment analysis [25].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>Classification of phraseological units using HS software can be implemented with the following
sequence of steps:</p>
      <p>1) HS Installation: Installation of the HS library in the system. HS can be installed using
pip, the Python package installer. Next, the command line interface is opened, and the command
«pip install HS» is executed.</p>
      <p>2) Import HS and Data Preprocessing: Import the necessary HS modules and packages into
the Python code. In addition the PhUs dataset is preprocessed to ensure consistent formatting
and remove any irrelevant information. This may include removing punctuation, converting to
lowercase or applying stemming.</p>
      <p>3) Feature Extraction: Defining the features that will be used for classification. In the case
of phraseological units features can be based on usage frequency, part-of-speech tags, or other
linguistic characteristics.</p>
      <p>4). Dataset Split: Split the preprocessed dataset into training and testing sets. The training
set will be used to train the classification model while the testing set will be used to evaluate
the model's performance.</p>
      <p>5). Classification Algorithm Selection: Choose a classification algorithm from the available
options in the HS software: the Naive Bayes algorithm and the Named Entity Recognition (NER)
algorithm.</p>
      <p>Classifier Training: Use the training set to train the selected classification algorithm. Provide
the features extracted from the training data along with the corresponding labels (categories).</p>
      <p>6). Model Evaluation: Test the trained classifier on the test set. Then measure accuracy or
other relevant performance metrics to assess how well the model can classify PUs.</p>
      <p>7). Using the Model for Prediction: Once the model is trained and evaluated, it can be used
to predict the categories of new, unseen phraseological units. Provide the extracted features of
the new data to the classifier and obtain the predicted categories.</p>
      <p>An example code snippet demonstrating the implementation of the above steps using HS
software is shown in Figure 5.
Figure 3: Software interface</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>Thus, the author’s software not only reduced the processing time for PhUs but also
established additional regularities for the features by which these objects are classified.</p>
      <p>During the research, a comparative evaluation of the application of the existing and
proposed software was conducted on other datasets of PhUs. The classification results were
found to be similar to those analyzed above. Furthermore, in each of the studied cases,
regularities regarding the results and time of automatic classification, as presented in the
analyzed variant, were observed.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>Based on the results of the study, the following conclusions can be drawn:
 currently, one of the key problems in natural language processing is the recognition
and classification of phraseological units;
 the task of automating the classification of phraseological units in English texts is
relevant;
 existing systems for automating the classification of phraseological units contain a
number of shortcomings that do not allow to effectively solve the problem of their
qualitative classification;
 an urgent task is to develop an effective system for automatic classification of
phraseological units in English texts that would be reliable, applicable to the
processing of various sentence structures and different types of phraseological units
including fixed expressions, idioms and phrases and would contain a minimum
number of shortcomings;
 solving the problem of developing a system for automatic classification of
phraseological units in English texts which is formulated in the article can increase
the efficiency of classification of phraseological units in English texts;
 the theoretical basis for solving the problem formulated in the article can be the
hybrid method proposed by the authors, the main idea of which is to use a
rulebased method to identify and distinguish specific types of PhUs and further apply
machine learning methods to classify PhUs based on their semantic and syntactic
properties;
 an effective means of implementing the hybrid method can be a system whose
structure includes the following modules: Hybrid Soft; tokenization; tagging; base
determination; division; corpus module; classification module;
 it is advisable to use Python to develop a system for automatic classification of
phraseological units in English texts that implements the hybrid method.</p>
      <p>The direction of further research is to fully evaluate the effectiveness of the proposed system.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[5] L. Thompson, "Advances in Natural Language Processing Techniques." (Part of the
publication: Conference Materials) In: The Annual Conference of the Association for
Computational Linguistics, Vancouver, (2023), 22-35.
[6] Bishop, Christopher M., and Hugh Bishop. Deep learning: Foundations and concepts.</p>
        <p>Springer Nature, (2023).
[7] Sun, Wei, and Eunjeong Park. "EFL learners’ collocation acquisition and learning in
corpus-based instruction: A systematic review." Sustainability 15.17 (2023): 13242.
[8] 8. Eichstaedt, Johannes C., et al. "Closed-and open-vocabulary approaches to text analysis:
A review, quantitative comparison, and recommendations." Psychological Methods 26.4
(2021): 398.
[9] Natural Language Toolkit. url: https://www.nltk.org/ (2023).
[10] Wang, Meng, and Fanghui Hu. "The application of nltk library for python natural language
processing in corpus research." Theory and Practice in Language Studies 11.9 (2021):
10411049.
[11] Zong, Chengqing, Rui Xia, and Jiajun Zhang. "Information extraction." Text data mining.</p>
        <p>Singapore: Springer Singapore, (2021). 227-283.
[12] Chollet, Francois, and François Chollet. Deep learning with Python. Simon and Schuster,
(2021).
[13] North, Kai, Marcos Zampieri, and Matthew Shardlow. "Lexical complexity prediction: An
overview." ACM Computing Surveys 55.9 (2023): 1-42.
[14] Liu, Chen, et al. "FigMemes: A dataset for figurative language identification in
politicallyopinionated memes." Proceedings of the 2022 conference on empirical methods in natural
language processing. (2022).
[15] Wankhade, Mayur, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. "A survey
on sentiment analysis methods, applications, and challenges." Artificial Intelligence
Review 55.7 (2022): 5731-5780.
[16] Li, Qian, et al. "A survey on text classification: From traditional to deep learning." ACM</p>
        <p>Transactions on Intelligent Systems and Technology (TIST) 13.2 (2022): 1-41..
[17] Mardanova, Aziza. "Role of Phraseology in Developing Linguistic and Intercultural</p>
        <p>Communication Competences." American Journal of Philological Sciences 3.05 (2023): 68-72.
[18] Spinde, Timo, et al. "Automated identification of bias inducing words in news articles using
linguistic and context-oriented features." Information Processing &amp; Management 58.3
(2021): 102505.
[19] Miletic, Filip. "Bridging Across Datasets and Disciplines: The Contribution of Corpus
Phonology to the Study of Lexical Semantic Variation." Spoken English Varieties: Redefining
and Representing Realities, Communities and Norms. (2021)..
[20] Sung, Min-Chang, and Hyunwoo Kim. "Effects of verb–construction association on second
language constructional generalizations in production and comprehension." Second
Language Research 38.2 (2022): 233-257.
[21] Bozşahin, Cem. "Referentiality and Configurationality in the Idiom and the Phrasal</p>
        <p>Verb." Journal of Logic, Language and Information 32.2 (2023): 175-207.
[22] Dhar, Ankita, et al. "Text categorization: past and present." Artificial Intelligence</p>
        <p>Review 54.4 (2021): 3007-3054.
[23] Bardab, Saeed Ngmaldin, Tarig Mohamed Ahmed, and Tarig Abdalkarim Abdalfadil
Mohammed. "Data mining classification algorithms: An overview." Int. J. Adv. Appl. Sci 8.2
(2021): 1-5.
[24] Lyu, Jinghui, et al. "A character-level convolutional neural network for predicting
exploitability of vulnerability." 2021 International Symposium on Theoretical Aspects of
Software Engineering (TASE). IEEE, (2021).
[25] Maulud, Dastan Hussen, et al. "State of art for semantic analysis of natural language
processing." Qubahan academic journal 1.2 (2021): 21-28..
[26] Venkateswaran, N., et al. "Study on Sentence and Question Formation Using Deep Learning
Techniques." Digital Natives as a Disruptive Force in Asian Businesses and Societies. IGI
Global Scientific Publishing, (2023). 252-273.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>I. Basaraba</surname>
          </string-name>
          ,
          <article-title>"English-language phraseological units: the problem of classification." Scientific notes of the V. I</article-title>
          . Vernadsky Tavrichesky National University. Series: Philology. Social communications
          <volume>31</volume>
          (
          <issue>70</issue>
          ), no.
          <issue>4</issue>
          (
          <year>2020</year>
          ):
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Basaraba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lemeshko</surname>
          </string-name>
          ,
          <article-title>"Correlation of cognitive abilities and translation skills of phraseological units." SKASE</article-title>
          .
          <source>Journal of Theoretical Linguistics</source>
          , Košice, Slovak Republic,
          <volume>18</volume>
          , no.
          <issue>2</issue>
          (
          <year>2021</year>
          ):
          <fpage>34</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Basaraba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Borovyk</surname>
          </string-name>
          ,
          <article-title>"Application of Linguistic and Statistical (Quantitative) Methods to the Research of the Idiomatic Space of Military Fiction." SKASE 21, no</article-title>
          .
          <issue>2</issue>
          (
          <year>2024</year>
          ):
          <fpage>141</fpage>
          -
          <lpage>160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>I. Basaraba</surname>
          </string-name>
          ,
          <article-title>"Challenges encountered in automatically classifying phraseological units." Current issues of the humanities: interuniversity collection of scientific works of young scientists of the Ivan Franko Drohobych State Pedagogical University</article-title>
          , Drohobych, no.
          <issue>75</issue>
          (
          <issue>1</issue>
          ) (
          <year>2024</year>
          ):
          <fpage>145</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>