<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Protection in the Utilization of Natural Language Processors for Trend Analysis and Public Opinion: Cryptographic Aspect</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Inna Rozlomii</string-name>
          <email>inna-roz@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliia Yehorchenkova</string-name>
          <email>nataliia.yehorchenkova@stuba.sk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Yarmilko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Naumenko</string-name>
          <email>naumenko.serhii1122@vu.cdu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bohdan Khmelnytsky National University of Cherkasy</institution>
          ,
          <addr-line>81, Shevchenko Blvd., Cherkasy, 18031</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Slovak University of Technology in Bratislava</institution>
          ,
          <addr-line>81, Vazovova 5, 812 43 Bratislava</addr-line>
          ,
          <country>Slovak Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the digital age, the significant increase in information generation and processing is accompanied by a growing threat of unauthorized access, illegal distribution, and use. One of the most promising strategies for protecting information from various cyber threats and malicious attacks is the use of Natural Language Processing (NLP) processors. This article focuses on the methodology of data protection in the context of utilizing Natural Language Processing for sentiment analysis and trend detection. Emphasis is placed on the relevance of using NLP to address tasks related to text content analysis for identifying suspicious or dangerous information. The article covers the stages of text data collection and processing, including data gathering from various sources such as social media, news portals, forums, and blogs. Subsequently, preliminary processing is performed, involving noise removal, tokenization, stemming, and lemmatization of the text to prepare the data for further analysis. The application of NLP allows for the identification of keywords, topics, sentiment, and text structure, facilitating categorization and trend identification in public opinion. Additionally, a mathematical model for detecting phishing indicators is presented, along with an example of identifying suspicious text features. It is noted that the use of cryptographic methods can effectively secure processed data, reducing the risk of unauthorized access or misuse. The article provides a detailed description of data protection methods in the process of sentiment analysis using NLP and underscores the necessity of employing cryptographic techniques to ensure the security of processed text data.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Natural language processing</kwd>
        <kwd>natural language processing technologies</kwd>
        <kwd>information security</kwd>
        <kwd>analysis of global trends</kwd>
        <kwd>cybersecurity</kwd>
        <kwd>disinformation</kwd>
        <kwd>phishing</kwd>
        <kwd>automatic text analysis</kwd>
        <kwd>text classification</kwd>
        <kwd>threat detection</kwd>
        <kwd>digital security</kwd>
        <kwd>cyber threats</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In today's information society, against the backdrop of rapid technological advancements and
dynamic changes in the global information space, information security gains increasing importance and
relevance. The growing volume of information circulating on the Internet, along with the rapid
development of social networks, digital platforms, and online services, creates new opportunities for
communication, collaboration, and knowledge access. However, this also increases the risk of
unauthorized access to personal data, the spread of disinformation, phishing attacks, and other threats
that jeopardize the security and resilience of information processes.</p>
      <p>
        In this context, it is crucial not only to identify specific threats and react to them but also to anticipate
their emergence, understand the dynamics of global trends, and adapt security measures in a timely
manner. One way to achieve this goal is by utilizing Natural Language Processing (NLP) for the
analysis of public opinion and textual content [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. NLP encompasses a set of technologies based on
natural language processing and machine learning, enabling the automatic analysis and understanding
of linguistic context [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>The aim of this article is to explore the relevance and importance of applying NLP in the context of
analyzing global trends and public opinion, as well as their role in ensuring the security of information
processes. To achieve this goal, a comprehensive approach is used, which includes data collection and
preparation, identification of key topics and terms, sentiment analysis, and the application of
classification and clustering methods. By investigating the capabilities of these technologies, the article
aims to highlight their potential in detecting and countering cyber threats, including phishing attacks,
the dissemination of disinformation, and other forms of attacks on information security. Additionally,
the article emphasizes the importance of analyzing global trends and public sentiment as tools for
prevention and response to potential threats. This contributes to a deeper understanding of the role of
NLP in the modern information environment and helps develop new approaches to ensuring the security
of information processes, providing more effective protection in the evolving digital landscape.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Analysis of trends and public opinion is a key tool for business, politics, science, and social research
in the modern information society. The use of Natural Language Processors (NLP) allows for the
automation of the analysis of a large amount of textual information, enabling the detection of topics,
sentiments, and sentiment [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, this approach faces significant challenges in terms of
confidentiality, integrity, and security of processed information.
      </p>
      <p>
        NLP has long been used for the analysis of public opinion and trend identification in textual sources
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Previous research has focused on sentiment analysis, emotion classification, as well as identifying
keywords and phrases to understand public sentiments [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Recently, researchers have been paying attention to the application of cryptographic methods to
protect information processed using natural language processors [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This is important due to the risks
to the confidentiality and integrity of data during text processing by third parties.
      </p>
      <p>
        Previous research has focused on the development of cryptographic methods to protect information
transmitted and processed when using natural language processors [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Some approaches include the
use of encryption, message signing, and other cryptographic protocols to ensure the confidentiality and
authenticity of data during processing [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        Additionally, the interest in text processors is evidenced by publications related to the analysis of
various languages, such as Indonesian [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Bengali [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Arabic [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and others.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Research methodology</title>
      <p>The methodology of the research discussed in this article is based on the combination of NLP
analysis with the analysis of global trends and public opinion to ensure information (data) protection in
the digital environment. In the context of the article, data protection is regarded as the application of
cryptographic methods and strategies to guarantee the confidentiality, integrity, and availability of data
being processed and stored while using Natural Language Processing for trend analysis and public
sentiment evaluation. This encompasses protection against unauthorized data access, ensuring the
confidentiality of personal information, and guaranteeing data unavailability to unauthorized parties.
Such protection may involve encryption, authentication, digital signatures, and other cryptographic
methods to secure data during processing.</p>
      <p>In turn, the analysis of global trends is considered as the process of studying and identifying common
changes or patterns in data that reflect a particular evolution or movement in public opinion or consumer
behavior based on the analysis of a large volume of textual information, including social media, news,
blogs, and other virtual sources. This may include identifying popular topics, opinions, sentiments,
reactions, or thoughts that circulate online, as well as identifying changes in consumer habits, trends in
public opinion, or reactions to specific events. Such analysis can help in understanding public sentiment
towards specific issues, identifying risks, determining popular opinions, and forecasting potential
directions of development.</p>
      <p>The concept of "public opinion" is used to refer to the collective beliefs, views, and sentiments of
the public, which can be expressed through various sources such as social media, surveys, expert
opinions, and other communication channels. This encompasses a wide range of views, beliefs,
reactions, and sentiments that exist in society regarding specific issues, events, individuals, or
processes. Public opinion is important for determining trends and sentiments in society, as well as for
measuring the level of support or rejection of certain ideas, political decisions, goods, or services. The
analysis of public opinion can help in understanding the needs and expectations of society, as well as
in forming communication and influence strategies.</p>
      <p>
        To achieve the stated research goal, a systematic methodology has been developed, which includes
the following steps:
1. Data Collection and Preparation. The initial stage involves gathering various textual data
from different sources, such as social media, news portals, forums, etc. [
        <xref ref-type="bibr" rid="ref13">13-14</xref>
        ]. The sample
should be representative and cover current topics and trends. The collected data undergo
preliminary processing, such as tokenization, noise removal, and so on.
2. Analysis of Global Trends and Public Opinion. The application of natural language
processing allows for the identification of key topics, popularity, and sentiments in textual data
[15-16]. Using classification and clustering methods, connections between terms are
established, and patterns in global trends and public opinion are identified.
3. Detection of Threats and Anomalies. The use of NLP enables the detection of textual
indicators that may point to security threats, such as phishing attempts, the spread of
disinformation, insults, and more [17]. An automated analysis system can highlight suspicious
information and classify it based on predefined criteria.
4. Development of Forecasting Models. By utilizing data on global trends and public opinion,
predictive models can be created to help anticipate potential risks and threats in the future [18].
These models may be based on the analysis of past events and dependencies between various
factors.
5. Validation and Evaluation of Results. Assessing the effectiveness of developed models and
methods requires validation on real data or simulated scenarios. This stage involves comparing
the analysis results with actual events and evaluating the accuracy of forecasts.
6. Analysis of Innovative Approaches. Innovative approaches to the application of natural
language processors in ensuring information security and trend analysis include the use of deep
learning, neural networks, and other modern methods [19-20].
      </p>
      <p>This methodology enables the detection, analysis, and prediction of security threats and real-time
responses, relying on the analysis of textual content and global sentiments. The research findings can
be a valuable contribution to enhancing information security and risk management in the modern digital
environment.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Data collection</title>
      <p>The first stage of the research involves collecting a large volume of textual data from various sources
such as social networks, news portals, forums, and blogs [21]. It is important to consider the diversity
of sources and linguistic variety to ensure the representativeness of the sample. At this stage, a
substantial amount of textual data is gathered from different sources for further analysis and processing.
Since information sources can be diverse, ensuring the representativeness of the sample, including
linguistic and cultural diversity, is crucial [22].</p>
      <p>
        Steps of data collection:
1. Source Determination. The selection of information sources depends on the specific
objectives of the research. Social networks (Twitter, Facebook, Reddit), news portals (BBC,
CNN), forums, and blogs can provide diverse insights into global trends and public opinion
[21].
2. Data Collection. The use of APIs and web scraping assists in automatically gathering textual
data from selected sources. For example, Twitter API can be used to collect tweets on a specific
topic.
3. Linguistic Diversity. Ensuring representativeness involves choosing data from different
languages and cultures. For instance, if the research pertains to global trends in sustainable
technologies, it's important to consider data from various countries and linguistic communities
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10-12</xref>
        ].
      </p>
      <p>Figure 1 provides a scheme of the process of collecting textual data from various sources. This
scheme illustrates how, after data collection, analysis and processing of information occur to identify
key aspects, sentiment analysis, and semantic analysis. It also involves analyzing textual relationships
between words and concepts. All these stages help in understanding global trends and public opinion
based on an extensive selection of textual data.</p>
      <p>Thanks to the integration of NLP in this process, researchers and specialists receive additional tools
for understanding social and informational phenomena, which are becoming more and more complex.
The results of such research can be used to develop more effective risk management strategies, ensure
cyber security and create more objective information environments. Thus, the use of NLP in the
collection and analysis of textual data has great potential for improving the quality and security of the
information space.</p>
      <p>The obtained data undergo preprocessing, which includes noise removal, tokenization, stemming,
and lemmatization of texts. This stage helps prepare the data for further analysis, reduce dimensionality,
and ensure normalization [23, 24].</p>
      <p>Data preprocessing is an important stage in preparing information for further analysis and research
[25]. Here are the steps typically involved in data preprocessing:
1. Noise removal: First, it's necessary to eliminate redundant or irrelevant information, such as
special characters, advertisements, URLs, punctuation marks, etc. This helps make the data
cleaner and facilitates more accurate analysis.
2. Tokenization. Text is divided into individual words or tokens. This can be done by splitting
the text using spaces or other delimiters. Consequently, each word becomes a separate element
to work with.
3. Stemming and Lemmatization. Stemming and lemmatization help reduce words to their base
forms. Stemming involves removing affixes (prefixes and suffixes), while lemmatization
involves reducing words to their lemma (base form). For example, the word "running" can
become "run" after stemming or "run" after lemmatization.
4. Stopword Removal. Words that are extremely common and carry little meaningful
information (e.g., "and," "the," "in," "with") can be removed from the text. This helps reduce
noise and focus on keywords.
5. Normalization. To ensure uniformity, data may be transformed to lowercase, which helps
avoid duplicate words due to different letter casing.</p>
      <p>After performing these steps, the data is ready for further analysis. It's important to note that data
preprocessing may vary depending on the specific task and data type, but the general principle involves
cleaning, normalizing, and structuring the information before its subsequent use.</p>
      <p>Specialized software tools and libraries for text processing can be used for these stages, such as
Natural Language Toolkit (NLTK) or spaCy for the Python programming language.</p>
      <p>Following preprocessing, the data becomes more structured and prepared for further analysis. Below
is an example of real results after data preprocessing.</p>
      <p>Original text: "Global climate changes affect the economy and natural resources. Innovative
technologies contribute to sustainable economic development".</p>
      <p>Result of data preprocessing: ["global", "climate", "changes", "affect", "the", "economy", "and",
"natural", "resources", "innovative", "technologies", "contribute", "to", "sustainable", "economic",
"development"].</p>
      <p>This preprocessing ensures the uniformity of textual data and prepares them for further analysis,
making it easier to recognize key words, identify themes, and other aspects of global trends and public
opinion.
3.3.</p>
    </sec>
    <sec id="sec-5">
      <title>Analysis of global trends</title>
      <p>After the preprocessing, the data undergo analysis using natural language processors [27]. Various
algorithms and models are employed to identify key words, topics, sentiment, and to determine the
structure of texts. NLP helps to extract and categorize data, enabling the detection of common trends
and differences in public opinion [28]. The application of NLP analysis opens up possibilities for a
detailed understanding of global trends and public sentiment. This section elaborates on the methods
and approaches that allow the identification of key topics, assessment of popularity, and determination
of sentiments in textual data using natural language analysis. Classification and clustering methods are
also used to find connections between terms and patterns in global trends and public opinion.</p>
      <p>In this section, we will conduct a detailed analysis of the process of identifying key topics, assessing
popularity and sentiments, as well as the application of classification and clustering methods for
analyzing global trends and public opinion.</p>
    </sec>
    <sec id="sec-6">
      <title>3.3.1. Identification of key topics and terms</title>
      <p>When analyzing textual data, we use text processing techniques to identify key themes and terms.
For example, let's consider a virtual dataset of text discussions on cybersecurity. We determine the
frequency of each key term's mentions and its importance using Term Frequency-Inverse Document
Frequency (TF-IDF).</p>
      <p>TF-IDF is a statistical measure that indicates the importance of a term within a text relative to the
entire corpus of texts [29]. It consists of two components: Term Frequency (TF), which shows how
often the term appears in a specific text, and Inverse Document Frequency (IDF), which indicates the
rarity of the term across the entire dataset of texts.</p>
      <p>Let's apply the TF-IDF method to this set of texts:
Text 1: "The cyberattack on a major bank was an explicit threat!"
Text 2: "How to protect your data from hackers?"
Text 3: "Top 5 most common cyber threats this year."</p>
      <p>Based on these texts, we calculate the TF-IDF values for key terms (Table 1).</p>
    </sec>
    <sec id="sec-7">
      <title>3.3.2. Assessment of popularity and sentiments</title>
      <p>For each key term, we also conduct sentiment analysis to assess popularity and sentiments.
Sentiment analysis evaluates the emotional tone of the text and determines whether it is positive,
negative, or neutral. Table 2 provides an example of sentiment values for the analyzed text dataset.</p>
    </sec>
    <sec id="sec-8">
      <title>3.3.3. Classification and Clustering</title>
      <p>Classification and clustering methods help organize and group key terms and topics based on certain
characteristics. Applying classification and clustering methods in the analysis of textual data collected
from various sources allows for the systematic organization of a large amount of information and the
identification of connections that may be imperceptible during superficial analysis. For example,
through clustering, it is possible to identify groups of similar themes or viewpoints that form in public
opinion regarding information security (Figure 2).</p>
      <p>Using natural language processing and classification and clustering methods, we can delve deeper
into the relationships and patterns within textual data and gain a better understanding of global trends
in information security.</p>
    </sec>
    <sec id="sec-9">
      <title>4. Detection of threats and anomalies</title>
      <p>The increasing volume of data processed by Natural Language Processing (NLP) processors for
trend analysis and public sentiment gives rise to an important challenge of detecting threats and
anomalies in the raw data. Researchers are actively working on the development and enhancement of
cryptographic methods and algorithms that efficiently identify potential vulnerabilities in natural
language processing systems.</p>
      <p>One key aspect of data protection involves securing natural language processing models from
potential attacks by malicious actors. To achieve this, it's necessary to implement monitoring systems
capable of timely detecting deviations in the models' performance, which may indicate hacking attempts
or the introduction of malicious algorithms. Special attention should be paid to the detection of
abnormal patterns and data in the input streams fed into natural language processors. This can be
achieved through methods analyzing data structure and comparing it to reference templates, as well as
the application of machine learning algorithms for automatic anomaly detection.</p>
      <p>For effective protection against malicious attacks on data obtained during public sentiment analysis,
it is essential to implement comprehensive cryptographic methods such as encryption, digital
signatures, user authentication, and more. Additionally, it is recommended to regularly update
cryptographic protocols to address modern threats and vulnerabilities.</p>
      <p>In this section, we will explore the application of NLP for identifying textual features that may
indicate security threats, such as phishing attempts, disinformation, and abuse. We will also examine
an automated analysis system that allows the identification of suspicious information and classifies it
based on various criteria. It's worth noting that after NLP analysis, an expert analysis is conducted,
involving a deep examination of the sample, the identification of nuances and context that may be lost
during automated analysis. The expert approach helps ensure the accuracy and reliability of the results.</p>
      <p>Natural language processors can analyze text and identify key features that indicate potential security
threats. For example, phrases containing suspicious URLs or queries related to personal data can be
indicators of phishing attempts. Additionally, detecting intense negative language and insults can point
to potential instances of harm or offensive behavior.</p>
      <p>Let's consider an automated analysis system capable of identifying suspicious information and
classifying it based on various criteria. This system is used to enhance security and identify potential
threats in textual data.
4.1.</p>
    </sec>
    <sec id="sec-10">
      <title>Operation of the automatic analysis system</title>
      <p>The automatic analysis system is based on trained machine learning models that recognize patterns
and differences in text (Figure 3).
The analysis process consists of the following stages:
1. Text Retrieval. The system initially obtains text data for analysis, which can come from
various sources such as social media, emails, news, etc.
2. Preprocessing. Text data undergo preprocessing, including tokenization (splitting into
individual words or tokens), removing unnecessary characters, converting to lowercase, etc.
3. Feature Extraction. Natural Language Processing (NLP) tools are used to extract features
from the processed text. These features may include words, phrases, collocations, lexical and
grammatical features, word frequencies, sentiment analysis, the use of linguistic devices
(metaphors, comparisons), the use of special symbols and emojis, links and URLs, word
repetition counts, and other aspects.</p>
      <p>influence, etc.
4. Detection of Suspicious Features. The system analyzes the extracted features and looks for
those that may indicate potential threats or anomalies. These could include unusual queries,
suspicious URL links, negative tone, rapid tone changes, specific information requests,
business proposals from unknown sources, calls for immediate action, unexpected identity
changes, excessive use of special symbols and mixed-case letters, attempts at psychological
5. Classification. Trained classifier models are applied to assign class labels to the text data. This
classification can be based on the level of suspicion, the type of threat (phishing,
disinformation, etc.), and other criteria (Table 3).</p>
    </sec>
    <sec id="sec-11">
      <title>A mathematical model for detecting phishing indicators</title>
      <p>To detect phishing indicators in text, machine learning models such as a binary classifier (e.g.,
logistic regression or naive Bayes classifier) can be used.</p>
      <p>Let  be a feature vector of the text, which includes important parameters of the analyzed text. Also,
let  represent the class label for the given text, where  can take values "phishing" or "non-phishing".
Thus, we have a training dataset  :
(1)
(2)
where  – is the number of training examples.</p>
      <p>The model can be represented as follows:

= {( 12,  12), ( 22,  22), … , (  2,   2)},
ℎ ( ) =</p>
      <p>1
1 +  −   ,
where  – represents the model parameters that are learned during training.</p>
      <p>During the training process, our goal is to reduce the value of a loss function (e.g., logarithmic loss
function) using the gradient descent method. This method allows us to find optimal parameter values
 , that help the classifier determine whether the text is suspicious phishing or not.
4.3.</p>
    </sec>
    <sec id="sec-12">
      <title>Example of detecting phishing signs</title>
      <p>Let's consider a real example of how the automatic analysis system works. Suppose we have the
following text with a phishing attempt: "Welcome! Your account has been blocked. Please click on the
link and enter your credentials to unlock."</p>
      <p>To analyze this text sample, we can use a previously trained model designed to detect phishing
features. After applying natural language processing to the text, we obtain a feature vector, which we'll
denote as  . Plugging this vector  into the model already supported by parameters  , we can obtain
the probability that the given text is phishing.</p>
      <p>To illustrate the process, let's assume that after processing the text, we obtained the following feature
vector:  = [0.2, −0.5, 0.8], and the model parameters  = [0.1, 0.4, −0.7]. The probability can be
calculated using the hypothesis function ℎ according to the following formula:</p>
      <p>1 (3)
ℎ ( ) =</p>
      <p>1 +  −</p>
      <p>Substituting the values of the feature vector  and the parameters  into this formula, we obtain the
calculated probability, which indicates how likely it is that this text is phishing.</p>
      <p>In our example:</p>
      <p>1 (4)
ℎ ( ) = 1 +  −0.2∗0.1−0.5∗0.4+0.8∗(−0.7) ≈ 0.72
This number, approximately 0.72, indicates the probability that the given text is phishing.</p>
      <p>Thus, by examining a specific example of phishing feature detection, you can understand how text
analysis automation systems work in practice. They use NLP to extract features from text, and then
apply trained models to determine the likelihood of a specific threat. The significance and importance
of each feature can be determined by the model parameters, allowing systems to accurately detect
potential threats and anomalies in textual data. These approaches are an important tool for ensuring
cybersecurity and effectively identifying malicious actions in the modern digital environment.</p>
    </sec>
    <sec id="sec-13">
      <title>5. Discussions</title>
      <p>The presented research opens up prospects for further development in the field of information
security and public sentiment analysis using NLP. The research focuses on the use of NLP for detecting
cybersecurity threats as well as analyzing trends in the cryptographic aspect. The main results of the
article can serve as an important starting point for further research and practical applications.</p>
      <p>The research underscores the importance of using NLP for threat detection and global trend analysis.
Future research could focus on improving methods for anomaly detection and malicious activity
identification, as well as developing new algorithms for implementing intelligent cybersecurity systems.</p>
      <p>In the future, it may be possible to enhance the performance of sentiment analysis using NLP by
refining emotion classification algorithms and determining the importance of trends for different
aspects of society.</p>
      <p>Future research could also focus on analyzing real-world examples of NLP usage for information
security and trend analysis. This may include assessing the impact of such systems on practical aspects
of information security.</p>
    </sec>
    <sec id="sec-14">
      <title>6. Conclusion</title>
      <p>The article discusses the role of natural language processors (NLP) in detecting security threats such
as phishing attacks, misinformation, insults, and spam. The research focuses on the capabilities of NLP
in analyzing global trends and public sentiment that indicate potential risks and vulnerabilities in the
information space.</p>
      <p>The research conducted underscores the evident fact that the use of NLP for public sentiment
analysis and trend detection is an integral part of the modern research process. The collection and
processing of textual data from various sources, including social media, news portals, forums, and
blogs, demonstrate the significant role of NLP in identifying key words, sentiment, and themes that
reflect public opinion. It is shown that effective preliminary processing of textual data, such as noise
removal, tokenization, stemming, and lemmatization, is a critical step in preparing data for further
analysis, ensuring the accuracy and completeness of the obtained results.</p>
      <p>The application of cryptographic methods to protect processed data is a key aspect that guarantees
the security and confidentiality of information, especially when dealing with sensitive data. The
developed mathematical model for phishing detection and the examples of identifying suspicious
textual features highlight the importance of employing cryptographic methods to protect processed data.</p>
      <p>It's also worth noting that expert analysis plays a crucial role in understanding context and nuances
that may be lost in the process of automated analysis. This emphasizes the need to consider the human
factor when processing information with NLP to ensure the accuracy and reliability of the results.</p>
      <p>Therefore, the use of NLP in identifying security threats in the information space opens up
opportunities for timely threat detection and effective response. The proposed research methodology
allows for the use of natural language processors to analyze and understand global trends and public
sentiment, taking into account cultural and linguistic peculiarities. Combining technical analysis with
an expert approach ensures objective and reliable results that can be used for forecasting and making
strategic decisions in various fields of activity.</p>
    </sec>
    <sec id="sec-15">
      <title>7. References</title>
      <p>[14] Z. Jiang, L. Liu, Research on sentiment analysis of online public opinion based on semantic, in:
Geo-Spatial Knowledge and Intelligence, in: H. Yuan, J. Geng, C. Liu, F. Bian, T. Surapunt (Eds.),
Geo-Spatial Knowledge and Intelligence, GSKI 2017, volume 849 of Communications in
Computer and Information Science, Springer, Singapore, 2017, pp. 313–32.1
https://doi.org/10.1007/978-981-13-0896-3_31.
[15] S. Salloum, T. Gaber, S. Vadera, K. Shaalan, A systematic literature review on phishing email
detection using natural language processing techniques, IEEE Access 10 (2022) 65703-65727. doi:
10.1109/ACCESS.2022.3183083.
[16] X. Chen, R. Ding, K. Xu, S. Wang, T. Hao, Y. Zhou, A bibliometric review of natural language
processing empowered mobile computing, Wireless Communications and Mobile Computing
(2018). https://doi.org/10.1155/2018/1827074.
[17] T. Peng, I. Harris, Y. Sawa, Detecting phishing attacks using natural language processing and
machine learning, in: 2018 IEEE 12th International Conference on Semantic Computing (ICSC),
Laguna Hills, CA, USA, 2018, pp. 300-301. doi: 10.1109/ICSC.2018.00056.
[18] Y. Zhu, X. Li, J. Wang, Analysis and research of Weibo public opinion based on text, Journal of</p>
      <p>Physics: Conference Series 1769(1) (2021). doi: 10.1088/1742-6596/1769/1/012018.
[19] W. E. Zhang, Q. Z. Sheng, A. Alhazmi, C. Li, Adversarial attacks on deep-learning models in
natural language processing: A survey, ACM Transactions on Intelligent Systems and Technology
(TIST) 11(3) (2020) 1-41.
[20] H. Gan, Research on data mining method based on privacy protection, in: 020 3rd International
Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE),
Shenzhen, China, 2020, pp. 502-506. doi: 10.1109/AEMCSE50948.2020.00114.
[21] N. Garg, K. Sharma, Text pre-processing of multilingual for sentiment analysis based on social
network data, International Journal of Electrical &amp; Computer Engineering 12(1) (2022)
20888708.
[22] C. Qian, N. Mathur, N. H. Zakaria, R. Arora, V. Gupta, M. Ali, Understanding public opinions on
social media for financial sentiment analysis using AI-based techniques, Information Processing
&amp; Management 59(6) (2022). doi: https://doi.org/10.1016/j.ipm.2022.103098.
[23] M. Anandarajan, C. Hill, T. Nolan, Text Preprocessing, in: Practical Text Analytics. Advances in
Analytics and Data Science, Springer, Cham, 2019. doi:
https://doi.org/10.1007/978-3-319-956633_4 45-59.
[24] A. Tabassum, R. R. Patil, A survey on text pre-processing &amp; feature extraction techniques in
natural language processing, International Research Journal of Engineering and Technology
(IRJET) 7(06) (2020) 4864-4867.
[25] A. Kurniasih, L. P. Manik, On the Role of Text Preprocessing in BERT Embedding-based DNNs
for Classifying Informal Texts, Neuron 1024(512) (2022) 927-934.
[26] J. Potočnik, E. Thomas, R. Killeen, S. Foley, A. Lawlor, J. Stowe, Automated vetting of radiology
referrals: exploring natural language processing and traditional machine learning approaches,
Insights into Imaging 13(1) (2022) 1-8.
[27] H. Brown, K. Lee, F. Mireshghallah, R. Shokri, F. Tramèr, What does it mean for a language
model to preserve privacy? in: 2022 ACM Conference on Fairness, Accountability, and
Transparency (FAccT '22), ACM, Seoul, Republic of Korea, New York, NY, USA, pp. 2280-2292.
doi: https://doi.org/10.1145/3531146.3534642.
[28] H. Yang, Q. He, Z. Liu, Q. Zhang, Malicious encryption traffic detection based on NLP. Security
and Communication Networks (2021). doi: https://doi.org/10.1155/2021/9960822.
[29] M. I. Alfarizi, L. Syafaah, M. Lestandy, Emotional Text Classification Using TF-IDF (Term
Frequency-Inverse Document Frequency) And LSTM (Long Short-Term Memory). JUITA: Jurnal
Informatika 10(2) (2022) 225-232.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chowdhary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Chowdhary</surname>
          </string-name>
          ,
          <source>Natural Language Processing. In: Fundamentals of Artificial Intelligence</source>
          . Springer, New Delhi,
          <year>2020</year>
          , pp.
          <fpage>603</fpage>
          -
          <lpage>649</lpage>
          . doi: https://doi.org/10.1007/
          <fpage>978</fpage>
          -81-322- 3972-7_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Khurana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Natural language processing: State of the art, current trends and challenges</article-title>
          ,
          <source>Multimedia tools and applications</source>
          <volume>82</volume>
          (
          <issue>3</issue>
          ) (
          <year>2023</year>
          )
          <fpage>3713</fpage>
          -
          <lpage>3744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Raina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <source>Natural Language Processing</source>
          , in: V.
          <string-name>
            <surname>Raina</surname>
          </string-name>
          , S. Krishnamurthy (Eds.),
          <article-title>Building an Effective Data Science Practice: A Framework to Bootstrap and Manage a Successful Data Science Practice</article-title>
          , Apress, Berkeley, CA,
          <year>2022</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>73</lpage>
          . doi: https://doi.org/10.1007/978- 1-
          <fpage>4842</fpage>
          -7419-
          <issue>4</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Oshikawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A survey on natural language processing for fake news detection</article-title>
          , arXiv:
          <year>1811</year>
          .
          <article-title>00770 [cs</article-title>
          .CL] (
          <year>2018</year>
          ). doi: https://doi.org/10.48550/arXiv.
          <year>1811</year>
          .
          <volume>00770</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Khurana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Khatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Natural language processing: State of the art, current trends and challenges</article-title>
          ,
          <source>Multimedia tools and applications</source>
          <volume>82</volume>
          (
          <issue>3</issue>
          ) (
          <year>2023</year>
          )
          <fpage>3713</fpage>
          -
          <lpage>3744</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Maulud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Zeebaree</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jacksi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. M.</given-names>
            <surname>Sadeeq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Sharif</surname>
          </string-name>
          ,
          <article-title>State of art for semantic analysis of natural language processing</article-title>
          ,
          <source>Qubahan academic journal 1(2)</source>
          (
          <year>2021</year>
          )
          <fpage>21</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Cyber security meets artificial intelligence: a survey</article-title>
          ,
          <source>Frontiers of Information Technology &amp; Electronic Engineering</source>
          <volume>19</volume>
          (
          <issue>12</issue>
          ) (
          <year>2018</year>
          )
          <fpage>1462</fpage>
          -
          <lpage>1474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>May</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Denecke</surname>
          </string-name>
          , Security, privacy, and
          <article-title>healthcare-related conversational agents: a scoping review</article-title>
          .
          <source>Informatics for Health and Social Care</source>
          <volume>47</volume>
          (
          <issue>2</issue>
          ) (
          <year>2022</year>
          )
          <fpage>194</fpage>
          -
          <lpage>210</lpage>
          . doi:
          <volume>10</volume>
          .1080/17538157.
          <year>2021</year>
          .
          <volume>1983578</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Jha</surname>
          </string-name>
          ,
          <article-title>Strengthening Smart Grid Cybersecurity: An In-Depth Investigation into the Fusion of Machine Learning and Natural Language Processing</article-title>
          ,
          <source>Journal of Trends in Computer Science and Smart Technology</source>
          <volume>5</volume>
          (
          <issue>3</issue>
          ) (
          <year>2023</year>
          )
          <fpage>284</fpage>
          -
          <lpage>301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Pradana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hayaty</surname>
          </string-name>
          ,
          <article-title>The effect of stemming and removal of stopwords on the accuracy of sentiment analysis on indonesian-language texts</article-title>
          , Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and
          <issue>Control 4</issue>
          (
          <issue>4</issue>
          ) (
          <year>2019</year>
          )
          <fpage>375</fpage>
          -
          <lpage>380</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Banik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H. H.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Seddiqui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Azim</surname>
          </string-name>
          ,
          <article-title>Survey on text-based sentiment analysis of bengali language</article-title>
          ,
          <source>in: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT)</source>
          , Dhaka, Bangladesh,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , doi: 10.1109/ICASERT.
          <year>2019</year>
          .
          <volume>8934481</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Hegazi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Al-Dossari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Yahy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Sumari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hilal</surname>
          </string-name>
          ,
          <source>Preprocessing Arabic text on social media, Heliyon</source>
          <volume>7</volume>
          (
          <issue>2</issue>
          ) (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.heliyon.
          <year>2021</year>
          .e06191.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Hossain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Higgins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Soar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Pisani</surname>
          </string-name>
          ,
          <article-title>Natural language processing in electronic health records in relation to healthcare decision-making: a systematic review</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>155</volume>
          (
          <year>2023</year>
          ). doi: https://doi.org/10.1016/j.compbiomed.
          <year>2023</year>
          .
          <volume>106649</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>