<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Business Activity Indicators for Detecting the Impact of Income Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luka Kadyntsev</string-name>
          <email>luca.kadyntsev@knu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liudmyla Zubyk</string-name>
          <email>zubyk.liudmyla@knu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Kulibaba</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Ivanytska</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alona Chorna</string-name>
          <email>chornaa@mdpu.org.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bogdan Khmelnitsky Melitopol State Pedagogical University</institution>
          ,
          <addr-line>59 Naukovogo mistechka, Zaporizhzhya, 69000</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>60 Volodymyrska str., Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>458</fpage>
      <lpage>464</lpage>
      <abstract>
        <p>This article presents methods of collecting and analyzing data from media publications, with the intent of establishing relations between them and stock market activity. There exists a great number of solutions, that analyze user-given topics and search for publication activity by given keywords. This solution aims to minimize human input, collecting and categorizing information autonomously, thus minimizing the subjectivity of the result. Currently, the general public's interest in stock market trading is high, but it is near impossible for a human to process all the relevant information and most of the tools are locked behind a paywall or require a certain skill level. The proposed solution is designed to be as simple as possible, while also maintaining a high accuracy in detecting media trends by using clustering and detecting the necessary number of clusters by itself.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Algorithm</kwd>
        <kwd>cluster</kwd>
        <kwd>media</kwd>
        <kwd>finance</kwd>
        <kwd>stock market</kwd>
        <kwd>newsbreak</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The stock market is very sensitive to world
events [1]. Note that it can be purely economic
and political news, or sports, cultural, or news
covering some military operations. Thus, by
analyzing the news, you can get some valuable
information that can help ensure profit during
financial operations.</p>
      <p>But the task requires pretty heavy
automatization. It is impossible to properly
process thousands of publications a day, let alone
find all the emerging trends in this informational
noise. A lot of articles cover unimportant topics,
like one-time stories, that would never be picked
up by other media outlets.</p>
      <p>This leads to the rising need for a system,
that would act as a filter to let the user focus on
really important news events, that are
mentioned in several publications throughout
some time. This will greatly reduce the time
needed to work through all the news for the
day and pick up some missed trends.</p>
      <p>In addition, not everyone has the means to
perform a large number of natural language
processing operations at their disposal. Thus, it
would be beneficial for the system to have the
ability to be divided into user and server-side
applications so that a user can perform data
gathering on their machine and then send the
data to process on a more powerful machine,
that can also be performing other users’ tasks.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Analysis of Publications, the</title>
    </sec>
    <sec id="sec-3">
      <title>Status of the Issue, and the</title>
    </sec>
    <sec id="sec-4">
      <title>Statement of the Problem</title>
      <sec id="sec-4-1">
        <title>2.1. Analysis of Research and</title>
      </sec>
      <sec id="sec-4-2">
        <title>Publications</title>
        <p>
          The connection between news and the stock
market was established quite a long time ago,
but most often research focuses on the
emotional evaluation of news with the
subsequent forecast of share price dynamics.
For example, researchers from Stanford
University, Kari Lee and Ryan Timmons [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ],
were able to use news analysis to increase the
average percentage of profit from trading per
month from 0.615% to 2.77%, which is more
than a fourfold increase.
        </p>
        <p>
          Another group of researchers, Dev Shah,
Haruna Isa, and Farhana Zulkernin, from
Queen’s University [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] was able to achieve an
accuracy of 70.59% in short-term prediction of
general trends in stock market price dynamics.
They also based their research on the overall
mood of the publications.
        </p>
        <p>
          There exist similar works using different
algorithms and approaches, but they all have a
common problem—they do not handle
unsorted data well. All studies note the need to
pre-analyze the data for inclusion of
nonfinancial publications. Also, publications
consider processing and prediction algorithms
but do not consider data collection algorithms,
which creates big problems when trying to
quickly create large training data sets, having
to rely on already existing ones [
          <xref ref-type="bibr" rid="ref4 ref5 ref6">4–6</xref>
          ].
        </p>
        <p>Data collection can be extremely
timeconsuming and requires a lot of operations.
Web scraping can be used, but it can be very
slow and a lot of websites have unique page
structures and have policies against scraping.
Another possible method is reading the RSS
feed of a website, but not every website has it.</p>
        <p>
          Processing algorithms also vary greatly,
from the simplest word counting to artificial
neuron networks with complex preprocessing
algorithms [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>2.2. Analysis of the State of the Issue in the Applied Field</title>
        <p>Analyzing data for financial operations has
been needed for many years now. While many
companies use existing intelligence gathering
and processing engines like SemanticForce,
they can require a lot of user input and are
priced too high for a regular user who is not a
corporate entity.</p>
        <p>Big financial trading companies often use
custom in-house solutions, that focus on some
specific markets and are completely
inaccessible to the general public.
Everyone interested in trading is searching for a
competitive edge in information processing
tasks. At the same time, existing solutions either
do not provide sufficient efficiency or are too
expensive and complicated for a potential user.</p>
        <p>After the analysis, no solution for the
“beginners” sector of trading enthusiasts was
found, that would provide adequate service with
a good price-to-quality ratio and a low entry
threshold.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. Data Gathering Methods</title>
      <sec id="sec-5-1">
        <title>3.1. Web-scraping</title>
        <p>Web scraping is a data collection method in
which a website page is loaded and then
parsed to extract useful information.</p>
        <p>A multitude of tools including whole
solutions with customizable scheduling and
data formatting exist, with a lot of them being
free and open-source.</p>
        <p>Among the advantages of this method, the
following can be mentioned:
1. Availability of already existing free and
open-source software and libraries for
web scraping.
2. The information fully corresponds to what
the user would see on the page, nothing is
lost.
3. The process of web scraping is possible on
every site that can be opened in a browser.
4. It is faster than manually collecting data
from websites.
5. Results are structured so they are easy to
work with.</p>
        <p>But this method also has its disadvantages:
1. It is necessary to spend time waiting for
the page to load.
2. Sites have different layouts, which forces
the scraper to be configured for almost
every individual site.
3. Browsers usually “eat up” a lot of
operating memory, so a web scraper will
not be able to process a lot of pages at the
same time.</p>
        <p>This method is often used when there is a
lot of time and processing power available and
websites being scraped have a similar
structure. But with a slower machine or
unstable internet connection, this method
results in being ineffective and slow.</p>
      </sec>
      <sec id="sec-5-2">
        <title>3.2. RSS Feed Reading</title>
        <p>Websites use RSS (RDF Site Summary or Really
Simple Syndication) to publish information in
an XML-like format that can be read with an
RSS reader, that is built into most major
browsers, and can be installed as a standalone
on nearly any platform.</p>
        <p>The pros of using RSS are as follows:
1. Very high speed.
2. No need to load the page completely.
3. Sites return only the most important, so
there is no need to filter out unnecessary
website elements.
4. No need to adapt the reader to each site.
The cons are:
1. Incomplete information—although this
filters out unnecessary elements, it can
also reduce the amount of necessary
information.
2. Some sites simply do not have RSS feeds.
3. Despite the general similarity of the
answers, some sites may still differ in
one or two fields.</p>
        <p>Considering the advantages and
disadvantages, the method of RSS requests was
chosen. In this case, the speed of forming a
massive dataset is more useful than the
guaranteed reading of every site. In addition,
during data collection, it was found that the
percentage of sites without RSS in the Ukrainian
news segment is only about 15%, and in the
English-language segment—even less.</p>
        <p>At the moment, the system can work with
sites that have an RSS feed, and it is not
necessary to specify the address to access it—
if there is a link to the site itself, the system
finds the address of the feed automatically. At
the same time, in the absence of an RSS feed,
the system simply skips the site, displaying a
message about the impossibility of receiving
information from it.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Data Preprocessing</title>
      <sec id="sec-6-1">
        <title>4.1. Tokenization</title>
        <p>
          It was decided to use the spaCy library, which
makes it possible to carry out “smart” text
tokenization.
The algorithm can be summarized as follows [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]:
1. Iterate over space-separated
substrings.
2. Check whether we have an explicitly
defined special case for this substring.
        </p>
        <p>If we do, use it.
3. Look for a token match. If there is a
match, stop processing and keep this
token.
4. Check whether we have an explicitly
defined special case for this substring.</p>
        <p>If we do, use it.
5. Otherwise, try to consume one prefix. If
we consumed a prefix, go back to #3, so
that the token match and special cases
always get priority.
6. If we didn’t consume a prefix, try to
consume a suffix and then go back to
#3.
7. If we can’t consume a prefix or a suffix,
look for a URL match.
8. If there’s no URL match, then look for a
special case.
9. Look for “infixes”—stuff like hyphens
etc. and split the substring into tokens
on all infixes.
10. Once we can’t consume any more of the
string, handle it as a single token.</p>
        <p>Make a final pass over the text to check for
special cases that include spaces or that were
missed due to the incremental processing of
affixes.</p>
      </sec>
      <sec id="sec-6-2">
        <title>4.2. Remove Unnecessary Tokens</title>
        <p>During this process, service words are discarded
from the total number of tokens, which helps to
reduce the amount of information noise.</p>
        <p>
          Registers of stop words for English are much
more complete than for Ukrainian. This
phenomenon is explained by the “dominant”
status of English in programming [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. However,
there is always an opportunity to supplement the
lists of Ukrainian stop words yourself, which,
although it is an additional waste of time, still
significantly improves the final result.
        </p>
      </sec>
      <sec id="sec-6-3">
        <title>4.3. Lemmatization</title>
        <p>
          Reduction to the original form (or
lemmatization) is a powerful mechanism for
unifying several forms of the same word. For
the processing of the Ukrainian language
(because it’s a highly inflectional language),
this process is necessary [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], since depending
on the case, the word can change so much that
the result of stemming can be significantly
different from the result of stemming the same
word in another case.
        </p>
        <p>Lemmatization algorithms work based on
existing and trained language models that
return the original form of the word by
searching their own data sets. As in the case of
stop words, very few models support the
Ukrainian language at a level sufficient for
productive analysis. But, unlike stop words,
filling such registers with lems is either very
difficult or usually impossible.</p>
        <p>Fortunately, the used spaCy library
supports the Ukrainian language and has a
trained model with the possibility of
lemmatization. But even such a trained model
is still not able to process the names of cities,
leaving them as they are. This is an example of
only one of the many problems of processing
the Ukrainian language.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Clustering</title>
      <sec id="sec-7-1">
        <title>5.1. Hierarchical Clustering</title>
        <p>The following scheme can serve as an example
of the hierarchical algorithm:
The hierarchical (agglomerative) clustering
algorithm can be described as follows:
1. Designate each point as a separate cluster.
2. Calculate the distance between all clusters.
3. Combine the two nearest clusters into one.
4. If all clusters have not yet been merged
into one, then return to step 2.</p>
        <p>The distance between clusters can be
calculated in several ways.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Single linkage</title>
        <p>where r and s are clusters, D() is the distance
function, and xri and xsj are the closest points of
the respective clusters.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Complete linkage</title>
        <p>where r and s are clusters, D() is the distance
function, and xri and xsj are the furthest points
of the respective clusters.</p>
        <p>Average linkage
(1)
(2)
(3)
where r and s are clusters, D() is the distance
function, nr, and ns are the number of entries in
the corresponding cluster, and xri and xsj are
some points in the respective clusters.</p>
        <p>This type of algorithm immediately
provides a good visualization of the result,
which helps to better imagine the connections
between information. Also, it does not require
prior information about the number of
clusters, which allows the user to
independently “trim” the dendrogram at any
point. On the other hand, it is precisely this that
is a certain minus for this work, because it
reduces the degree of automation of the
system, clearly forcing the user to make certain
decisions regarding the presence of trends,
which already brings a certain subjectivity to
the process.</p>
      </sec>
      <sec id="sec-7-4">
        <title>5.2. Partitional Clustering</title>
        <p>Among the separating algorithms, the k-means
and k-medoids algorithms were considered.
Algorithms are very similar, but medoids,
unlike centroids, are real points, that is, certain
entries to the cluster. However, the use of
medoids has one undesirable negative side,
namely low sensitivity to abnormal points. The
specificity of the processed data is such that
one cluster often has less than ten entries, with
the total number of entries in the thousands,
which makes such insensitivity quite critical
during processing. Therefore, the k-means
algorithm was chosen, which is one of the
oldest and most widely used, simple
partitioning clustering algorithms.</p>
        <p>The algorithm has the following steps:
1. Determine the number of K clusters.
2. Determine K centroids from random
points.
3. Assign each point to the nearest
centroid, forming a cluster.
4. Calculate dispersion and change the
centroid for each cluster.
5. Repeat step 4 until the clusters become
stable, the internal variance is minimal,
and the variance between clusters is
maximal.</p>
      </sec>
      <sec id="sec-7-5">
        <title>5.3. Determine the Number of Clusters</title>
        <p>The k-means algorithm requires a certain
number of clusters to process. At the same
time, the program solves the problem of
identifying trends in the news, that is, it must
independently determine the emergence of
new trends. In addition, it is almost impossible
to determine the number of clusters
independently on very large data sets. Because
of this, there was a need to develop and
implement a mechanism for determining the
number of clusters.
Multiple algorithms were tested, but the best
results were shown by the BIC and the
Calinski-Harabasz. But the BIC has one critical
flaw—it does not handle well data sets where
the number of clusters is not significantly less
than the number of occurrences. That’s why
Estimated n of
clusters
(actual
n = 5)
Elbow 5
Davies-Bouldin 5
Silhouette 5
Calinski-Harabasz 5
BIC 5
Estimated n of
clusters
(actual
n = 15)
10
15
14
15
15
the solution is using the Calinski-Harabasz
index to determine the number of clusters.</p>
        <p>The Calinski-Harabasz algorithm can be
described as follows:</p>
        <p>12. The first step is to calculate the
intercluster variance using the formula:
(4)
(5)
(6)
(7)
where nk is the number of entries in cluster k,
Ck is the centroid of cluster k, C is the centroid
of the entire dataset, and K is the number of
clusters.</p>
        <p>13. The second step is to calculate the
intracluster variance for each cluster using the
formula:
where nk is the number of entries in cluster k,
Ck is the centroid of cluster k, and Xik is the ith
entry of the k cluster.</p>
        <p>14. After that, you need to add all
intracluster variances:
Estimated n of
clusters
(actual
n = 25)
16
23
25
25
25
where K is the number of clusters and WGSSk is
a measure of intracluster variance for cluster k.</p>
        <p>15. Then, calculate the index itself:
where K is the number of clusters, N is the total
number of occurrences, BGSS is a measure of
between-cluster variance, and WGSS is the sum
of measures of within-cluster variance. As you
can easily guess, it is optimal to increase the
intercluster variance, while reducing the
intracluster variance, that is, the larger the
index, the better.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>6. Finding the Trends</title>
      <p>
        In today’s world, about 80% of news loses its
relevance after 12 hours. Another 10%
maintain the level of relevance, and another
10% increase it [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Thus, it is the last group
that can be called “trends”.
      </p>
      <p>In this paper, a trend is considered to be
news that is mentioned at least on 2 different
days by more than 30% of sources each day. In
this way, news that does not hold the attention
of readers, or news that was published by
several media outlets and published by other
media outlets late, but did not receive further
development, is filtered out.</p>
      <p>By receiving a list of trends, the user can
draw appropriate conclusions that can help
him make certain financial decisions.</p>
    </sec>
    <sec id="sec-9">
      <title>7. Comparison with Analogues</title>
      <p>
        The proposed solution provides users with an
easy way to quickly and automatically collect
large data samples, determine the number of
groups required, and divide the dataset into
respective categories. Due to using
lowcomplexity algorithms, the required
operational time is minimal and computer
resource usage is also quite low [
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16 ref17 ref18 ref19 ref20">13–30</xref>
        ].
      </p>
      <p>While the system is not perfect and does not
grant 100% precision, time to effectiveness
ratio is satisfactory for a regular trading
enthusiast to improve his chances with the
financial market.</p>
    </sec>
    <sec id="sec-10">
      <title>8. Conclusions</title>
      <p>This paper considered algorithms, that are
already widely used and proven to be effective.</p>
      <p>The principles of algorithms’ operations
were explained so that users can use custom
clustering distance length calculation and
choose either lemmatization or stemming.</p>
      <p>The proposed system can also be run
without an internet connection, by feeding
pregathered datasets to it, making it possible to
use the solution in fully autonomous mode.</p>
      <p>As trading is picking in popularity, a
decision was made to create an easy-to-use
and cheap-to-operate system, that does not
require high-end equipment and training.
Thanks to this, more people will be able to
improve their performance in that field.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <source>Impact of News on the Trend of Stock Price Change: An Analysis based on the Deep Bidirectiona LSTM Model, Procedia Computer Science</source>
          <volume>174</volume>
          (
          <year>2020</year>
          )
          <fpage>128</fpage>
          -
          <lpage>140</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2020</year>
          .
          <volume>06</volume>
          .068.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Timmons</surname>
          </string-name>
          ,
          <article-title>Predicting the Stock Market with News Articles, CS224N Final Report (</article-title>
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Isah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zulkernine</surname>
          </string-name>
          ,
          <article-title>Predicting the Effects of News Sentiments on the Stock Market</article-title>
          ,
          <source>IEEE International Conference on Big Data</source>
          (
          <year>2018</year>
          )
          <fpage>4705</fpage>
          -
          <lpage>4708</lpage>
          . doi:
          <volume>10</volume>
          .1109/BigData.
          <year>2018</year>
          .
          <volume>8621884</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bebeshko</surname>
          </string-name>
          , et al.,
          <source>Application of Game Theory, Fuzzy Logic and Neural Networks for Assessing Risks and Forecasting Rates of Digital Currency, J. Theor. Appl. Inf. Technol</source>
          .
          <volume>100</volume>
          (
          <issue>24</issue>
          ) (
          <year>2022</year>
          )
          <fpage>7390</fpage>
          -
          <lpage>7404</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Khorolska</surname>
          </string-name>
          , et al.,
          <article-title>Application of a Convolutional Neural Network with a Module of Elementary Graphic Primitive Classifiers in the Problems of Recognition of Drawing Documentation and Transformation of 2D to 3D Models</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Theor</surname>
          </string-name>
          . Appl. Inf. Technol.
          <volume>100</volume>
          (
          <issue>24</issue>
          ) (
          <year>2022</year>
          )
          <fpage>7426</fpage>
          -
          <lpage>7437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Obushnyi</surname>
          </string-name>
          , et al.,
          <article-title>Ensuring Data Security in the Peer-to-Peer Economic System of the DAO, in: Cybersecurity Providing in Information and Telecommunication Systems II</article-title>
          , vol.
          <volume>3187</volume>
          (
          <year>2021</year>
          )
          <fpage>284</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Obushnyi</surname>
          </string-name>
          , et al.,
          <article-title>Autonomy of Economic Agents in Peer-to-Peer Systems</article-title>
          ,
          <source>in: Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>3288</volume>
          (
          <year>2022</year>
          )
          <fpage>125</fpage>
          -
          <lpage>133</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Virovets</surname>
          </string-name>
          , et al., Ways of Interaction of Autonomous Economic Agents in Decentralized Autonomous Organizations,
          <source>in: Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>3421</volume>
          (
          <year>2023</year>
          )
          <fpage>182</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Linguistic</given-names>
            <surname>Features</surname>
          </string-name>
          , Tokenization. URL: https://spacy.io/usage/lin-guisticfeatures#tokenization
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>U.</given-names>
            <surname>Abdurakhimovich</surname>
          </string-name>
          ,
          <article-title>The Power of English for Programming. Why is English Important to Software Developers?</article-title>
          ,
          <source>Models Methods Increasing Effic. Innov. Res</source>
          .
          <volume>3</volume>
          (
          <issue>26</issue>
          ) (
          <year>2023</year>
          )
          <fpage>145</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Korenius</surname>
          </string-name>
          , et al.,
          <article-title>Stemming and Lemmatization in the Clustering of Finnish Text Documents</article-title>
          ,
          <source>Thirteenth ACM International Conference on Information and Knowledge Management (CIKM '04)</source>
          (
          <year>2004</year>
          )
          <fpage>625</fpage>
          -
          <lpage>633</lpage>
          . doi:
          <volume>10</volume>
          .1145/1031171.1031285.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Castillo</surname>
          </string-name>
          , et al.,
          <source>Characterizing the Life Sci. Eng</source>
          .,
          <volume>8</volume>
          (
          <issue>3</issue>
          ) (
          <year>2019</year>
          )
          <fpage>195</fpage>
          -
          <lpage>200</lpage>
          . doi:
          <source>Cycle of Online News Stories Using Social</source>
          <volume>10</volume>
          .30534/ijatcse/2019/57832019. Media Reactions, ACM Conference on [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ma'Ruf</surname>
          </string-name>
          , Adiwijaya, U. Wisesty, Computer Supported Cooperative Work,
          <article-title>Analysis of the Influence of Minimum CSCW (</article-title>
          <year>2013</year>
          ). doi:
          <volume>10</volume>
          .1145/2531602. Redundancy Maximum Relevance as 2531623. Dimensionality Reduction Method on
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ivanytska</surname>
          </string-name>
          , et al.,
          <source>The Advertising Cancer Classification Based on Prediction Model Based on Machine Microarray Data Using Support Vector Learning Technologies, in: Information Machine Classifier, J. Phys. Conf. Ser. Technology and Implementation</source>
          Vol.
          <volume>1192</volume>
          (
          <issue>1</issue>
          ) (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          -
          <lpage>3179</lpage>
          (
          <year>2021</year>
          )
          <fpage>35</fpage>
          -
          <lpage>44</lpage>
          . 6596/1192/1/012011.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>W.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          , A New Graphic [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alnemari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bagherzadeh</surname>
          </string-name>
          ,
          <article-title>A Kernel Method of Stock Price Trend Two-Stage Efficient 3-d Cnn Framework Prediction Based on Financial News for Eeg Based Emotion Recognition</article-title>
          ,
          <source>IEEE Semantic and Structural Similarity, International Conference on Industrial Expert Syst. Appl</source>
          .
          <volume>118</volume>
          (
          <year>2019</year>
          )
          <fpage>411</fpage>
          -
          <lpage>424</lpage>
          .
          <string-name>
            <surname>Technology (ICIT) IEEE</surname>
          </string-name>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2018</year>
          .
          <volume>10</volume>
          .008. [23]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <source>Transformerbased</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Nam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Seong</surname>
          </string-name>
          ,
          <article-title>Financial News-Based Attention Network for Stock Movement Stock Movement Prediction Using Prediction, Expert Syst</article-title>
          .
          <source>Appl. 202 Causality Analysis of Influence in the (</source>
          <year>2022</year>
          ). Korean Stock Market, Decis. Support [24]
          <string-name>
            <given-names>K.</given-names>
            <surname>Althelaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.-S.</given-names>
            <surname>El-Alfy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          , Syst.
          <volume>117</volume>
          (
          <year>2019</year>
          )
          <fpage>100</fpage>
          -
          <lpage>112</lpage>
          . doi: Evaluation of Bidirectional Lstm for
          <volume>10</volume>
          .1016/j.dss.
          <year>2018</year>
          .
          <volume>11</volume>
          .004.
          <string-name>
            <surname>Short-</surname>
          </string-name>
          And
          <string-name>
            <surname>Long-Term Stock</surname>
          </string-name>
          Market
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Md</surname>
            . E. Karim,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ahmed</surname>
            ,
            <given-names>A Deep</given-names>
          </string-name>
          <string-name>
            <surname>Prediction</surname>
          </string-name>
          ,
          <source>9th International Conference Learning-Based Approach for Stock on Information and Communication Price Prediction Using Bidirectional Systems (ICICS)</source>
          (
          <year>2018</year>
          ).
          <source>Gated Recurrent Unit and Bidirectional</source>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Shahi</surname>
          </string-name>
          , et al.,
          <article-title>Stock Price Forecasting Long Short Term Memory Model, 2nd with Deep Learning: A Comparative Global Conference for Advancement in Study</article-title>
          , Math.
          <volume>8</volume>
          (
          <issue>9</issue>
          ) (
          <year>2020</year>
          ).
          <source>Technology (GCAT)</source>
          (
          <year>2021</year>
          ). doi: [26]
          <string-name>
            <given-names>EOD</given-names>
            <surname>Historical</surname>
          </string-name>
          <article-title>Data</article-title>
          .
          <source>URL: 10.1109/GCAT52182</source>
          .
          <year>2021</year>
          .
          <volume>9587895</volume>
          . https://eodhistoricaldata.com/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Awad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Elkaffas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fakhr</surname>
          </string-name>
          , Stock [27]
          <string-name>
            <surname>News</surname>
            <given-names>API</given-names>
          </string-name>
          . URL: https://newsapi.org/ Market Prediction Using Deep [28]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Finbert</surname>
            :
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Pre-Trained Reinforcement</surname>
            <given-names>Learning</given-names>
          </string-name>
          ,
          <source>Appl. Syst. Financial Language Representation Innov</source>
          .
          <volume>6</volume>
          (
          <issue>6</issue>
          ) (
          <year>2023</year>
          )
          <article-title>106</article-title>
          . doi: Model for Financial Text Mining,
          <volume>10</volume>
          .3390/asi6060106. Twenty-Ninth International Joint
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Karzanov</surname>
          </string-name>
          ,
          <string-name>
            <surname>Headline-Driven</surname>
          </string-name>
          Classifica- Conference
          <source>on Artificial Intelligence tion and Local Interpretation for Market</source>
          (
          <year>2020</year>
          ). Outperformance and
          <string-name>
            <surname>Low-Risk Stock</surname>
            [29]
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Misra</surname>
          </string-name>
          ,
          <article-title>News Headlines Dataset for Prediction, Computational Econom</article-title>
          . Sarcasm
          <string-name>
            <surname>Detection</surname>
          </string-name>
          (
          <year>2018</year>
          ). doi: (
          <year>2023</year>
          ).
          <source>doi: 10.1007/s10614-023- 10.13140/RG.2.2.16182.40004</source>
          .
          <fpage>10449</fpage>
          -
          <lpage>5</lpage>
          . [30]
          <string-name>
            <given-names>F.</given-names>
            <surname>Xing</surname>
          </string-name>
          , et al.,
          <source>Financial Sentiment</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sueno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gerardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Medina</surname>
          </string-name>
          , Multi
          <article-title>- Analysis: An Investigation into Common class Document Classification Using Mistakes and Silver Bullets</article-title>
          ,
          <source>28th Support Vector Machine (SVM) Based on International Conference on CompuImproved Naïve Bayes Vectorization tational Linguistics</source>
          (
          <year>2020</year>
          )
          <fpage>978</fpage>
          -
          <lpage>987</lpage>
          . Technique,
          <source>Int. J. Adv. Trends Comput. Sci. Eng</source>
          .
          <volume>9</volume>
          (
          <issue>3</issue>
          ) (
          <year>2020</year>
          )
          <fpage>3937</fpage>
          -
          <lpage>3944</lpage>
          . doi:
          <volume>10</volume>
          .30534/ijatcse/2020/216932020.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sayoc</surname>
          </string-name>
          , et al.,
          <article-title>Nature Inspired Dimensional Reduction Technique for Fast and Invariant Visual Feature Extraction</article-title>
          ,
          <source>Int. J. Adv. Trends Comput.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>