<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Accuracy of the Uzbek Stop Words Detection: a Case Study on “School Corpus”</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khabibulla Madatov</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shukurla Bekchanov</string-name>
          <email>shukurla15@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jernej</string-name>
          <email>jernej.vicic@upr.si</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vičič</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Centre of the Slovenian Academy of Sciences and Arts, The Fran Ramovš Institute</institution>
          ,
          <addr-line>Novi trg 2, 1000 Ljubljana</addr-line>
          ,
          <country country="SI">Slovenija</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Primorska, FAMNIT</institution>
          ,
          <addr-line>Glagoljaska 8, 6000 Koper</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Urgench state university</institution>
          ,
          <addr-line>14, Kh. Alimdjan str, Urgench city, 220100</addr-line>
          ,
          <country country="UZ">Uzbekistan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Stop words are very important for information retrieval and text analysis investigation tasks of natural language processing. Current work presents a method to evaluate the quality of a list of stop words aimed at automatically creating techniques. Although the method proposed in this paper was tested on an automatically-generated list of stop words for the Uzbek language, it can be, with some modifications, applied to similar languages either from the same family or the ones that have an agglutinative nature. Since the Uzbek language belongs to the family of agglutinative languages, it can be explained that the automatic detection of stop words in the language is a more complex process than in inflected languages. Moreover, we integrated our previous work on stop words detection in the example corpus” bynviestigating how to automatically analyse the detection of stop words in Uzbek texts. This work is devoted to answering whether there is a good way of evaluating available stop words for Uzbek texts, or whether it is possible to determine what part of the Uzbek sentence contains the majority of the stop words by studying the numerical characteristics of the probability of unique words. The results show acceptable accuracy of the stop words lists.</p>
      </abstract>
      <kwd-group>
        <kwd>1 stop word detection</kwd>
        <kwd>Uzbek language</kwd>
        <kwd>accuracy</kwd>
        <kwd>agglutinative language</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The application of Natural Language Processing (NLP) tasks in real-life scenarios are getting more
frequent than ever before, and there is huge research getting involved with different approaches to
enhance the quality of such tasks. An important aspect of many NLP tasks that make use of tasks,
such as information retrieval, text summarization, context-embedding, etc., relies on a task of
removing unimportant tokens and words from the context under focus. Such data are known as stop
words. Therefore, it is desired that some automatic method should be developed to identify stop
words that either make no change in the meaning of the context (or do very little) and remove them.
from the context.</p>
      <p>
        In this work, we are addressing the problem of automatic detection of stop words for the
lowresource agglutinative Uzbek language, and evaluate the proposed methods. The existing literature
that deal with stop words removal task for the Uzbek language [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [10] focus on the creation
process, the importance, as well as the availability of the proposed data, leaving a gap for further
investigation, which we discuss in this paper.
      </p>
      <p>The scientific term "stop words" is popular in the field of natural language processing, and its
definition we focus in this work is as follows: If the removal of those words from the text not only
does not change the context meaning but also leaves the minimum number of words possible that can
still hold the meaning of the context, then such words can be called stop words for this work.</p>
      <p>For instance, the following examples are shown to better explain what words would be considered
in given sentences, and what the final context would become after removing those stop words:
● “Men bu maqolani qiynalib yozdim”. (I wrote this article with difficulty). After removing
the stop words me(“n”, b“u”, q“iynalib”) the context becom“eMs: aqolani yozdim”.(I wrote
the article.);
● “Har bir inson baxtli bo’lishga haqlidir” (Every person has the right to be happy). After
removing the stop wordshar(“”, b“ir”), the context become“sI:nson baxtli bo’lishga
haqlidir” (Person has right to be happy).</p>
      <p>Such definition is an extension of the traditional definition of stop words by including more words
than the actual expectations but still including the traditional stop words.</p>
      <p>The Term Frequency - Inverse Document Frequency (TF-IDF) method [15] was used to detect
stop words in Uzbek texts. TF-IDF is a numerical statistic that is intended to reflect how important a
word is to a document in a corpus, the method acknowledges words with the lowest TF-IDF values as
less important to the semantic meaning of the document and proposes these words as stop word
candidates.</p>
      <p>
        In our previous work[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we discuss the methods and algorithms for automatic detection and
extraction of Uzbek stop words from previously collected text forming a new corpus called the
“School corpus”. The stop words detection method based on TF-IDF was applied to the
aforementioned corpus collected from 25 textbooks used for teaching at primary schools of
Uzbekistan, consisting of 731,156 words, of which 47,165 are unique words. To perform our
technique, for each word from the set of unique words, its frequency was determined (the number of
occurrences in the texts of the School corpus), and the inverse document frequency IDF(word) =
ln(n/m) where n = 25 – number of documents and m is the number of documents, containing the
unique word among 25 documents.
      </p>
      <p>
        The existing fundamental papers that deal with stop words in general, let alone for the Uzbek
language, barely address the quality of the automatically detected list of stop words. This statement
also applies to our previous work, where a preliminary manual expert observation of a part of the lists
(only unigrams) was done. To the authors‟ kno,wtlheedrgee was no in-depth observation of the
accuracy of the automatically constructed lists of stop words for agglutinative languages. For
instance, [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][10] are mostly focusing on Uzbek texts‟ stop words and methods for automatic
extraction of stop words. But none of them discusses the accuracy of the presented methods. The
article is devoted to answering whether there is a good way of evaluating available stop words for
Uzbek texts, or whether it is possible to determine what part of the Uzbek sentence contains the
majority of the stop words by studying the numerical characteristics of the probability of unique
words.
      </p>
      <p>
        The words were sorted by the TF-IDF value in descending order and the lowest 5 percent of them
were tagged as stop words. We used this method to automatically detect stop words in the corpus [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Using this information, the article focuses on the followings:
● To create a probability distributions model of the TF-IDF of unique words in order to
determine the position of stop words along with the corpus;
● To establish the accuracy of the detection method for stop words;
● To conclude on automatic position detection of stop words for the given text.
      </p>
      <p>The rest of the paper is structured as follows: We start by explaining the related works in the field of
stop word removal, as well as the Uzbek language itself in Section 2, followed by the main
methodology of the paper in Section 3, which includes the creation of probability distribution law of
TF-IDF of unique words (Section 3.1), the numerical characteristics of the probability of unique
words (Section 3.2), and the evaluation of the created method using a small selected chunk
(Section3.3). The accuracy of the method for automatic detection of stop words in Uzbek texts, which
is based on TF-IDF, is presented in Section 4. The last section of the paper presents conclusions and
future work (Section 5).</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>
        Uzbek language belongs to the family of Turkic languages. There has been some research on the
Uzbek language mostly in the last few years. Most of the research done on Turkic languages can be
applied to the Uzbek language as well, using cross-lingual learning and mapping approaches,
alongside some language-specific additions. The paper [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presents a viability study of established
techniques to align monolingual embedding spaces for Turkish, Uzbek, Azeri, Kazakh, and Kyrgyz,
members of the Turkic family which is heavily affected by the low-resource constraint.
Several authors present experiment and propose techniques for stopwords extraction from text for
agglutinative languages such as [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that bases the stopword detection problem as a binary
classification problem and the evaluation shows that classification methods improve stopword
detection with respect to frequency-based methods for agglutinative languages but fails for English.
Ladani and Desai [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] present an overview of stopwords removal techniques for Indian and Non-Indian
Languages. Jayaweera et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] proposes a dynamic approach to find Sinhala stopwords, the cutoff
point is subjective to the dataset. Wijeratne and de Silva [17] collected the data from patent
documents and listed the stopwords using term frequency. Rakholia et al. [14] proposed a rule-based
approach to detect stopwords for the Gujarati language dynamically. They developed 11 static rules
and used them to generate a stopword list at runtime. Fayaza et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presents a list of stopwords for
Tamil language and reports improvement in text clustering using removal.
      </p>
      <p>
        The paper Ошибка! Источник ссылки не найден. provides the first annotated corpus for
polarity classification for the Uzbek language. Three lists of stop words for the Uzbek language are
presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] that were constructed using automatic detection of stop words by applying algorithms
and methods presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Paper [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] focuses on the automatic discovery of stop words in the
Uzbek language and its importance. Articles [12] and [13] are also mainly concentrated on the
creation of stop words in Uzbek.
      </p>
      <p>Matlatipov et. al [10] propose the first electronic dictionary of Uzbek words‟ endings invariants for
morphological segmentation pre-processing useful for neural machine translation.</p>
      <p>The article [11] presents the algorithm of cosine similarity of Uzbek texts, based on TF-IDF to
determine similarity. Another work on similarity in Uzbek, but this time on semantic similarity of
words, a decent amount of work went on the creation and evaluation of a semantic evaluation dataset
that possesses both similarity and relatedness scores Ошибка! Источник ссылки не найден..</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The scientific novelty of the methodology used in this work can be shown as follows:
● The creation of probability distributions law based on TF-IDF scores of unique words;
● Thorough investigation of numerical characteristics of the probability of unique words;
● Better evaluation of the stop words detection method‟s accuracy;
Summarising the automatic detection of the position of stop words in given Uzbek texts.</p>
      <p>
        In our previous work[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we proposed the usage of TF-IDF [15] to automatically extract stop
words from a corpus of documents. The stop words are discovered based on the Term Frequency
Inverse Document Frequency – TF-IDF. The number of times a word occurs in a text is defined by
Term Frequency -- TF. Inverse Document Frequency -- IDF is defined as the number of texts
(documents) being viewed and the presence of a given word in chosen texts (documents). TF-IDF is
one of the popular methods of knowledge discovery.
      </p>
      <p>
        Madatov et. al [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] propose the usage of TF-IDF [15] to automatically extract stop words from a
corpus of documents. The stop words are discovered based on the frequency of the word and the
frequency of the inverse document Term Frequency – Inverse Document Frequency – TF-IDF. The
number of times a word occurs in a text is defined by Term Frequency -- TF. Inverse Document
Frequency -- IDF is defined as the number of texts (documents) being viewed and the presence of a
given word in chosen texts (documents). TF-IDF is one of the popular methods of knowledge
discovery.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Probability distribution</title>
      <p>In order to determine the position of the stop words throughout the school corpus, we investigate the
probability distribution law of TF-IDF scores of stop words.</p>
      <p>
        Word weight and its probability. Select a word from the set of unique
words extracted from a corpus. For future references these two assumptions are valid: a word
represents a unique word from a corpus and a corpus represents the “School corpus” presented in
previous work [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For every calculate average TF-IDF( ), called the weight of and denoted as
. It is known that is not the probability of the word .
      </p>
      <p>The probability of can be calculated using the following formula:
∑ . We match for each word. Now ∑ .</p>
      <p>The probability density function. Suppose unique words are distributed independently in the
total corpus. In that case, word can be applied multiple times. In order to escape repeating the word</p>
      <p>We consider only the first appearance of this word. For each word observe i as a random
variable. As the probability density function of the unique words, we get the following function:
f(i) can be considered as the probability density function of word .</p>
      <p>In the Cartesian coordinate plane, observe i on the OX axis and observe along the OY axis. Figure
1 presents the described observations extracted from the “School corpus”.t toWoebsenreveed thie
position of stop words along with the corpus.</p>
    </sec>
    <sec id="sec-5">
      <title>Numerical characteristics of the probability</title>
      <p>This section presents numerical characteristics of the probability of unique words. They are
calculated by the following formulas:
∑ the mathematical expectation of the unique words
∑
√
∑</p>
      <p>– dispersion of the unique words
– standard deviation of the unique words</p>
      <p>of the unique words
third central moment of the unique words</p>
      <p>The asymmetry of the theoretical distribution</p>
      <sec id="sec-5-1">
        <title>The described values extracted from the corpus are presented in Table 1.</title>
        <p>The variety of words increases gradually with grades in the school literature. It means that the
probability density function of unique words is not symmetrical. One may predict it without a
mathematical way. However, mathematically, the data in Table 1, especially, , confirms that
the probability density function is asymmetric.</p>
        <p>The stop words are distributed along the axis (not grouped at one part of the axis); represented by
orange dots in Figure 2.
3.3.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Evaluation using a sub-corpus</title>
      <p>This section presents the probability density function of unique words of selected work from the
corpus. Each book from the corpus is devoted to one topic.</p>
      <p>The prediction: Every book consists of the culmination part of the topic, the rest can be stop words.
That is why we investigated just one book.</p>
      <p>A random book was selected from the range of 25 books (in the corpus): 11th class literature. The
book consists of 12837 unique words. The same process that was presented in Section 3.2 was applied
to just the selected part of the corpus in order to create the probability density function of unique
words. Figure 3 shows the probability density function of 11th class literature unique words.</p>
      <sec id="sec-6-1">
        <title>Mathematical analysis of the distribution is presented in Table 2.</title>
        <p>
          Table 2: Distribution analysis of the selected single book
We obtain Figure 4 by the rule of stop words detection method, as mentioned in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>means that the probability density function is asymmetric.</p>
        <p>The values were sorted in descending order and the lowest 5 percent of them are candidates for
stop words. Figure 4 graphically represents the process, words with probability less than are
candidates to be a stop word ( = 0,00001034371184).</p>
        <p>The number of these candidates is 642. 85,8% of these words is located outside of the interval
. On the left side of the interval there are 545 stop words and on the right side are 6
stop words. The same facts can be observed graphically on Figure 5 (Taking into the account the
numerical characteristics of 5% words of selected work and comparing Figure 3 and figure 4 we
detected their position along with the text).</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4. Evaluation results</title>
      <p>The accuracy of the presented method if confirmed using the following reasoning:
Let suppose hypothesis
H0: Stop words of the selected document (11th class literature) are located outside of the interval
(E-σ,E+σ);
and alternative hypothesis
H1: Stop words of the selected document (11th class literature) are located inside of the interval
(Eσ,E+σ).</p>
      <p>The critical value – Z (Z-score or Standard score) is obtained using this Equation:</p>
      <p>.; where N=12837, =6419, =7076.62,
√
In the presented task |Z|≈21,526. Z is located
reject the null hypothesis.</p>
      <p>This is the basis for rejecting the H1 hypothesis.</p>
    </sec>
    <sec id="sec-8">
      <title>5. Conclusions and further work</title>
      <p>
        Throughout the work performed in this paper, we presented a natural extension of the already
presented previous research of the automatic detection of stop words in the Uzbek language [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
the main focus of the analysis was twofold: a) a probability distributions model of the observed text
and b) the accuracy of the detection method for stop words.
      </p>
      <p>From all theoretical investigations from previous sections, it can be concluded that, for a single genre,
the majority of stopwords have the following nature:
if , are located at the beginning parts of the text;
if , are located at the ending of the text;
if , are located at the beginning at the ending part of the text.</p>
      <p>.
on the le-σft, smideeanionfg Ethere is no reason to</p>
      <p>In future works, we would like to use the results of this article as the basis for automatically
extracting keywords and automatically extracting the abstract of a given text.</p>
    </sec>
    <sec id="sec-9">
      <title>6. Acknowledgements</title>
      <p>The authors gratefully acknowledge the European Commission for funding the InnoRenew CoE
project (Grant Agreement $\#$739574) under the Horizon2020 Widespread-Teaming program and the
Republic of Slovenia (Investment funding of the Republic of Slovenia and the European Union of the
European Regional Development Fund).</p>
    </sec>
    <sec id="sec-10">
      <title>7. Conclusion</title>
      <p>
        The paper presents a natural extension of the already presented research of automatic detection of stop
words in Uzbek language[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and presents two goals: a) a probability distributions model of the
observed text and b) the accuracy of the detection method for stop words.
      </p>
      <p>a) The probability density is defined and later used to observe the accuracy of the automatic
method for extraction of stop words of Uzbek language.
b) The accuracy of the method that is presented in Section Ошибка! Источник ссылки не
найден..</p>
      <p>From this fact it can be concluded that, for a single genre, more of the stop words for texts:
if , are located at the beginning parts of the text;
if , are located at the ending of the text;
if , are located at the beginning at the ending part of the text.</p>
      <p>Further we use this result in the process of automatically extracting keywords from the given text and
automatically extracting the annotation of the given text.</p>
    </sec>
    <sec id="sec-11">
      <title>8. References</title>
      <p>[10] S. Matlatipov, U. Tukeyev, M. Aripov. “Towards the Uzbek Language Endings as a Language
Resource”, In: Advances in Computational Collective Intelligence. ICCCI 2020.</p>
      <p>Communications in Computer and Information Science, vol 1287. Springer, Cham., (2020)
[11] S. Matlatipov. "Cosine Similarity and its Implementation to Uzbek Language Data," Central</p>
      <p>Asian Problems of Modern Science and Education: Vol. 2020 : Iss. 4 , Article 8, (2020).
[12] I. Rabbimov, S. Kobilov, I. Mporas. Uzbek News Categorization using Word Embeddings and
Convolutional Neural Networks. 2020 IEEE 14th International Conference on Application of
Information and Communication Technologies (AICT). pp 1-5, (2020),
doi:10.1109/AICT50176.2020.9368822
[13] I. Rabbimov, S. Kobilov. “Multi-Class Text Classification of Uzbek News Articles using
Machine Learning”. Journal of Physics: Conference Series. (2020), doi:
10.1088/17426596/1546/1/012097
[14] R. M. Rakholia, J. R. Saini, "A Rule-Based Approach to Identify Stop Words for Gujarati
Language," In Proceedings of the 5th International Conference on Frontiers in Intelligent
Computing: Theory and Applications, pp. 797-806, (2017)
[15] C. Sammut, G. Webb, eds. “Encyclopedia of machine learning”. Springer Science &amp; Business</p>
      <p>Media, (2011)
[16] Salaev, Ulugbek, Elmurod, Kuriyozov, and Carlos, Gomez-Rodriguez. "SimRelUz: Similarity
and Relatedness scores as a Semantic Evaluation dataset for Uzbek language". In Proceedings of
the the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced
Languages (pp. 199–206). European Language Resources Association, 2022.
[17] Y. Wijeratne, N. de Silva, "Sinhala Language Corpora and Stopwords from a Decade of Sri
Lankan Facebook," arXiv, 2020, doi: 10.2139/ssrn.3650976.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fayaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Farhath</surname>
          </string-name>
          .
          <article-title>"Towards stop words identification in Tamil text clustering.", (IJACSA)</article-title>
          <source>International Journal of Advanced Computer Science and Applications</source>
          , Vol.
          <volume>12</volume>
          , No.
          <volume>12</volume>
          , (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. A. V. A.</given-names>
            <surname>Jayaweera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. N.</given-names>
            <surname>Senanayake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Haddela</surname>
          </string-name>
          ,
          <article-title>"Dynamic Stopword Removal for Sinhala Language," 2019</article-title>
          <string-name>
            <surname>Natl. Inf. Technol. Conf. NITC</surname>
          </string-name>
          <year>2019</year>
          , pp.
          <fpage>8</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2019</year>
          , doi: 10.1109/NITC48475.
          <year>2019</year>
          .
          <volume>9114476</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kumova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Karaoğlan</surname>
          </string-name>
          .
          <article-title>"Stop word detection as a binary classification problem."</article-title>
          <source>Anadolu University Journal of Science and Technology A-Applied Sciences and Engineering</source>
          <volume>18</volume>
          , no.
          <issue>2</issue>
          (
          <year>2017</year>
          ):
          <fpage>346</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kuriyozov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Doval</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Gomez-Rodriguez. “Cross-Lingual Word Embeddings for Turkic Languages”</article-title>
          ,
          <source>Proceedings of The 12th Language Resources and Evaluation Conference</source>
          ,
          <fpage>pp4054</fpage>
          --
          <lpage>4062</lpage>
          ,
          <year>2020</year>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Kuriyozov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matlatipov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>-RGoódmrígeuzez</surname>
          </string-name>
          , C.,
          <year>2022</year>
          .
          <article-title>Construction and Evaluation of Sentiment Datasets for Low-Resource Languages: The Case of Uzbek</article-title>
          .
          <source>In Language and Technology Conference</source>
          (pp.
          <fpage>232</fpage>
          -
          <lpage>243</lpage>
          ). Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Ladani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Desai</surname>
          </string-name>
          ,
          <article-title>"Stopword Identification and Removal Techniques on TC and IR applications: A Survey,"</article-title>
          <source>2020 6th Int. Conf. Adv. Comput. Commun. Syst. ICACCS</source>
          <year>2020</year>
          , pp.
          <fpage>466</fpage>
          -
          <lpage>472</lpage>
          , (
          <year>2020</year>
          ), doi: 10.1109/ICACCS48705.
          <year>2020</year>
          .
          <volume>9074166</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Madatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bekchanov</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Vičič.</surname>
          </string-name>
          “Lists of Uzbek Stopwords”, Zenodo, (
          <year>2021</year>
          ), doi: 10.5281/zenodo.6319953
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Madatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bekchanov</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Vičič.</surname>
          </string-name>
          “
          <article-title>Automatic Detection of Stop Words for Texts in the Uzbek Language”</article-title>
          , Preprints,
          <string-name>
            <surname>MDPI</surname>
          </string-name>
          ,
          <year>2022</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Madatov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sharipov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bekchanov</surname>
          </string-name>
          . Ozb„ek Tili Matnlaridaginomuhim so „zlar //Computer Linguistics: Problems, Solutions, Prospects. -
          <year>2021</year>
          . -
          <fpage>Т</fpage>
          .
          <fpage>1</fpage>
          -.
          <source>nr. 1.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>