<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sentiment Analysis of Twitter: Turkey Earthquake 2023 Case</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ala Kamal Rashid</string-name>
          <email>alakamal991@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oguz Fındık</string-name>
          <email>oguzfindik@karabuk.edu.tr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Turkey Earthquake, Text classification, Machine learning, NLTK VADER, and Cluster.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SMARTINDUSTRY-2024: International Conference on Smart Automation &amp; Robotics for Future Industry</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Karabuk, Computer Engineering</institution>
          ,
          <addr-line>Karabuk</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The most devastating earthquake in the past 20 years was February 6, 2023. The earthquake occurred in southern Turkey near the northern Syrian border. Thousands of people died and many more were left homeless, due to the magnitude of the event, it quickly spread all over the world. The earthquake and its damage were discussed and analyzed from all sides. In this paper, a separate analysis was proposed for tweets posted within 14 days after the earthquake. In this analysis to classify tweets, one type of label did not depend as in previous works that have been done on text classification, but three different types of labels (Manual label, NLTK_VADER label, and Cluster label) are created to classify text tweets by using machine learning algorithms. Then by using the Jaccard similarity coefficient and the cosine similarity measure the two AI labels (NLTK_VADER and Cluster) are compared which result is closer to manual labeling, according to the number of categories (positive, negative, and natural) and accuracy of sentiment in each label. In the result, we have reached that the accuracy of the VADER labeling is more effective than Cluster labeling because its accuracy is much closer to the Manual labeling.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>[1].
topic modeling[2].</p>
      <p>Sentiment analysis is a technique used to determine the emotional tone or sentiment
expressed in a text. It involves analyzing the words and phrases used in the text to identify the
underlying sentiment, whether it is positive, negative, or neutral, and has a wide range of
applications, such as social media monitoring, customer feedback analysis, and market research</p>
      <p>(Ayşe Berika et al. 2022) In this overview “Comparison of Different Heuristics Integrated
with Neural Networks: A Case Study for Earthquake Damage Estimation”, Various Machine
Learning (ML) algorithms were compared on a public dataset of earthquakes [3].</p>
      <p>
        <xref ref-type="bibr" rid="ref2">(Sean Wilkinson et al. 2022)</xref>
        in the article “Accuracy of a Pre-trained Sentiment Analysis
(SA) Classification Model on Tweets Related to Emergency Response and Early Recovery
Assessment: The Case of the 2019 Albanian Earthquake” supervised tweets that are classified as
either positive, negative, or neutral for comparison with the unsupervised classification [4].
      </p>
      <p>(Asif Malik et al. 2019) This study “Lexicon-Based Sentiment Comparison of iPhone and
Android Tweets During the Iran-Iraq Earthquake” quantified the observed sentiment difference
between the Android and iPhone tweets using unsupervised classification utilizing a
lexiconbased approach [5].</p>
      <p>
        <xref ref-type="bibr" rid="ref14 ref15 ref4">(Cagri Toraman et al. 2023)</xref>
        this paper “Tweets Under the Rubble: Detection of Messages
Calling for Help in Earthquake Disaster” Classifies the tweets calling for help or not and
visualizes them in an interactive map screen [6].
        <xref ref-type="bibr" rid="ref14 ref15">(Yufei Xie et al. 2023)</xref>
        this study explores the
use of CNNs for sentiment analysis on data from Weibo. to investigate this method's
effectiveness in the context of NLP tasks and evaluate any possible ramifications [7].
      </p>
      <p>On 6 February 2023, a Mw 7.8 earthquake struck southern and central Turkey, and northern
and western Syria. The epicenter was Gaziantep, the largest seismic event in Turkey since 1939
[8]. The devastating earthquake caused heavy damage and many residents were killed and
injured under the collapsed buildings.</p>
      <p>Social media plays an important role during events, and it is used as a trusted source in many
areas, especially Twitter, which is currently the most accurate source among various social
networks. Twitter is one of the most vibrant and widespread resources within social media [9],
mostly used by academics. Google Scholar lists 27,000 research articles that include the word
Twitter in their title [10].</p>
      <p>
        In this research, we used a dataset of 28,000 tweets that express people's feelings during the
Turkey earth
        <xref ref-type="bibr" rid="ref5">quake in 2023</xref>
        , available on Kaggle. To analyze the tweet, three different types of
labels were created (Manual label, VADER label, and Cluster label), and they were classified by
using machine learning algorithms such as (Logistic Regression, Support Vector Machine
(SVM), K-Nearest Neighbor (KNN), and Decision Tree). This study aims to analyze tweets
posted during and after the earthquake to indicate which had positive and negative sentiments
among citizens and to determine the accuracy of labeling types by indicating which is much
closer to manual labeling accuracy.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>
        The best social media dataset for text classification is Twitter [11]. Our dataset consists of
28,000 tweets from Twitter API about the Turkey Earth
        <xref ref-type="bibr" rid="ref5">quake between "2023</xref>
        -02-07 /
2023-0221". This collected dataset is available from the Kaggle website [12]. It contains 16 columns
such as (‘id’, ‘username’, ‘user location’, ‘user description’, ‘date’, ‘text’, ‘hashtags’, ‘source’,
‘retweets’, and so on), and 28,000 rows.
      </p>
      <p>In this study, we used 10% of the dataset, which is 2,800 tweets, and worked on text fields
within 3 sections which are Text Pre-processing, Text Labeling, and Text Classifications.</p>
      <sec id="sec-2-1">
        <title>2.1. Text Preprocessing</title>
        <p>The text of tweets, which are vague data because they are normal people's speech and full of
strange words, emojis, hashtags, etc. These texts need to be cleaned up and the meaningless
words removed by several processes [13], Here these steps were performed:</p>
        <p>Cleaning (Remove Special Characters and Numbers, Convert to Lowercase).
Tokenization splits and breaks down the sentences into individual words. Stop Word Removal
removes common words like 'the', 'and', 'or' etc. that may not have important meanings and are
not considered keywords. Lemmatization returns the words to the original root or the source of
the word like "running," and "runs," to the common stem "run".</p>
        <p>At the last step in preprocessing, by using the function (get-word) extracts words from
text using a regular expression pattern and fills non-values in a specific column ('remove shorts')
with an empty string. The text column is cleaned of all unnecessary phrases, emoji, and words,
for example, this tweet (Prayers for Türkiye and Syria  Hope the rescue...) after
preprocessing converted to (prayers trkiye syria hope rescue teams from va...).</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Text Labeling</title>
        <p>Text labeling is the process of identifying raw texts and adding one or more meaningful labels
to provide context so that a machine learning model can learn from it [14]. Labeling is typically
done according to several guidelines defined for text labeling. There are several different types
of labeling, and the most common types are done manually by human annotators or through
automated methods. In this work, 3 types of labeling were created for the text field such as
(Manual labeling, VADER labeling, and cluster labeling), for each labeling types have 3
different categories (positive, negative, and natural).
It is assigned by human annotators or experts based on their domain knowledge or specific
guidelines and is typically used in supervised learning settings, where the goal is to train a
model to predict or classify unseen data based on labeled examples. Manual annotation can
provide more accurate and meaningful categorizations compared to other labels, especially
when the true underlying structure of the data is known or can be reliably determined [15] but it
requires a lot of time and human expertise. Here read and analyzed the tweets carefully
according to our experience and with the help of the positive and negative phrases used in the
texts we have decided which are positive, negative, and natural.
2.2.2. VADER labeling
(Valence Aware Dictionary and Sentiment Reasoner) is one of the greatest options for sentiment
analysis in Python, a pre-built library in NLTK that is based on lexicon and rule. This package
was created specifically for sentiment analysis on social media [16]. Text sentiment is
calculated by VADER, which also provides the probability that a given input sentence is
positive, negative, or neutral. The measurement that the library provides is called a compound
score, or polarity score. It is the sum of all normalized lexical evaluations between -1 (negative)
and +1 (positive). In this study, tweets were categorized according to polarity scores as positive
emotion (polarity score &gt; 0), negative emotion (polarity score &lt; 0), and natural emotions
(polarity score = 0).</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.2.3. Cluster labeling</title>
        <p>It is assigned through unsupervised learning techniques typically clustering algorithms such as
K-means, hierarchical clustering, or DBSCAN. These labels are derived solely from the data's
intrinsic structure without any external guidance or supervision. Each data point is assigned to a
cluster based on its similarity or proximity to other data points within the same cluster. Cluster
labels are useful for discovering patterns or groupings in the data when the true categories or
classes are unknown or not provided [17].</p>
        <p>Here (convert texts to numerical format by (TF-IDF, Term Frequency-Inverse Document
Frequency), K-Means clustering to group similar documents and used (The elbow Method and
Silhouette Score) to determine the optimal number of clusters (K), Applied Principal
Component Analysis (PCA) to reduce the dimensionality of the TF-IDF data to 2D components
for visualization, Evaluated the quality of clusters using the Silhouette Score and
DaviesBouldin Index) performed, those steps showed in figure 2.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.3. Text Classification</title>
        <p>Classifications are often using three categories to classify sentiment: negative, neutral, and
positive. It is still possible that these categories do not reflect the real world [18]. Therefore,
several algorithms have been developed to make predictions more accurate in obtaining results
such as ML or deep learning etc.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.3.1. Approaches</title>
        <p>Machine learning techniques were implemented to classify text tweets with all three labels such
as (logistic regression, support vector machine (SVM), K nearest neighbor (KNN), and decision
tree) and used accuracy measures to determine which model instances were correctly classified
across all classes.</p>
        <sec id="sec-2-5-1">
          <title>Correct predictions</title>
          <p>Accuracy = ____________________</p>
          <p>All predictions</p>
          <p>TP + TN
= __________________</p>
          <p>TP +TN + FP + FN
Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False
Negatives
2.3.2.</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>Data splitting</title>
        <p>Data splitting involves using some of the data for model modification and setting aside the
remainder as an assessment set or Training samples [19]. The dataset will be split into an “80:20
ratio” (80% training and 20% testing), using (the stratified random sampling) method. Table_1
shows the splitting dataset.</p>
        <sec id="sec-2-6-1">
          <title>Total data in the train set: Total text data in the train set: Total column in train set:</title>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>Total</title>
        <p>2204
2204</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Comparative</title>
      <p>In this section, labeling (VADER and cluster) is compared with manual labeling, according to
the number of emotions (positive, negative, and natural) in each label, and then the accuracy of
the classification models is compared.</p>
      <sec id="sec-3-1">
        <title>3.1. Sentiment number comparison</title>
        <p>After creating all three labels, the categories are counted according to function (value counts),
the results are shown in Table 2 and the Sentiment plot in Figure 3.
In Table 2. And Figure 3. The distribution of sentiment labels varies across the datasets,
indicating potential differences in the labeling criteria. In total data (2756) Cluster labeling has
the highest count of positive sentiment (2647), and it has the lowest count of negative sentiment
and natural sentiment, the proportion of positive sentiment is generally higher than negative
sentiment in all datasets.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Accuracy Comparison</title>
        <p>machine learning classification algorithms such as (Logistic Regression, Support Vector
Machine (SVM), K-nearest neighbor (KNN), and Decision Tree) trained with each label to
compare the accuracy results of (VADER and Cluster) labels which one has closer accuracy
with the manual label, this comparison is shown in Table_3.
As in Table 3. for the Cluster labels, the accuracy achieved by various models such as K
Nearest Neighbor, Logistic Regression, Support Vector Machine, and Decision Tree, ranges
from around 96.93% to 98.97%. For manual labels, the accuracy was achieved from around
58.51% to 67.93%. VADER Sentiment labels, the accuracy achieved ranges from around
58.87% to 71.92%.</p>
        <p>The models trained on Cluster labels generally exhibit higher accuracies compared to those
trained on Manual labels and VADER Sentiment labels. This suggests that the clustering
algorithms might have captured underlying patterns in the data more effectively.
Manual labels are typically assigned by human annotators based on domain knowledge or
specific guidelines, making them more interpretable and possibly more reliable in certain
contexts. VADER Sentiment labels are derived from sentiment analysis techniques and may
capture sentiment-related information in the text but might not necessarily align with manual
annotations.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>These measures quantify the similarity between two sets of labels based on their intersection
and union [20]. A higher similarity score indicates a closer resemblance between the labels.
According to the (Jaccard and Cosine) similarity, and Accuracy measures, the results obtained
are as follows:
4.1. Jaccard Similarity
values range from 0 to 1, with 1 indicating complete similarity and 0 indicating no similarity. In
this case, all Jaccard Similarity scores are 1.0, This suggests that the category names in every
three labels are the same.
4.2. Cosine Similarity
values range from -1 to 1, with 1 indicating perfect similarity, 0 indicating no similarity, and -1
indicating complete dissimilarity (orthogonal). In this case, the cosine similarity between
(VADER labeling and Manual labeling) was (0.979), and between (Cluster labeling and Manual
labeling) was (0.868), suggesting that the first pair distributions have a stronger similarity than
the second pair distributions.</p>
      <sec id="sec-4-1">
        <title>4.3. Accuracy measures</title>
        <p>the difference in accuracy between each labeling method and manual labeling was computed
across all models. A smaller difference in accuracy indicates that the labeling approach is closer
to the manual label [21].</p>
        <p>In the cluster labeling approach, the average difference in accuracy compared to manual
labeling is approximately 33.92%, while for the VADER labeling approach, the average
difference in accuracy compared to manual labeling is approximately 3.26%. Since the average
difference in accuracy for the VADER labeling approach is smaller compared to cluster
labeling, it indicates that the VADER labeling approach is much closer to the manual label than
the cluster labeling approach, those results are shown in Figure 4.
According to the results obtained in all three criteria, the automatic label NLTK_VADER has a
closer analysis rate to the manual analysis than the cluster analysis rate. Therefore, in case you
need to make a quick assessment and analyze several topics, you can use NLTK_VADER to
label texts like manual labels without using complex algorithms.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>
        Many techniques and studies have been tried and tested in text classification, but what makes
our paper different from past works, this paper presents a sentiment analysis conducted on
Twitter data related to the Turkey Earth
        <xref ref-type="bibr" rid="ref5">quake 2023</xref>
        . Twitter is a popular platform, where users
can also express their opinions on a variety of themes related to their everyday lives by writing
tweets.
      </p>
      <p>This study analyzed 2,800 tweets posted during and after the earthquake indicating which had
positive and negative sentiments among citizens, then used a machine learning classification
approach to determine which labeling types of accuracy were much closer to the manual
labeling.</p>
      <p>Finally, VADER labeling was found to be more effective and suitable for determining the
emotional tone or sentiment expressed in social media texts, especially tweets. Because manual
labeling requires a lot of time and human expertise, and the accuracy of VADER labeling is
much closer to manual labeling accuracy.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future work</title>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Although the results obtained in this study were not highly accurate, they could be useful and
improved by using more appropriate criteria and methods in the future.</p>
      <p>Thanks to those who helped us in this study, especially Professor Dr. Oguz Findk in Turkey
who supervised this paper, (Dr. Aras Asad in the UK and Mr. Mario Caesar) in Indonesia for
their comments, and My brother Engineer Amanj Kamal in Iraq for the linguistic review.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>Acta Infologica</source>
          <volume>6</volume>
          , (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Contreras</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alterman</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Hervás</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Accuracy of a pre-trained sentiment analysis (SA) classification model on tweets related to emergency response and early recovery assessment: the case of 2019 Albanian earthquake</article-title>
          .
          <source>Natural Hazards</source>
          <volume>113</volume>
          ,
          <fpage>403</fpage>
          -
          <lpage>421</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Juhász</surname>
            ,
            <given-names>P. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stéger</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kondor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Vattay</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>A Bayesian approach to identify Bitcoin users</article-title>
          .
          <source>PLoS One</source>
          <volume>13</volume>
          , (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Toraman</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kucukkaya</surname>
            ,
            <given-names>I. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ozcelik</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Sahin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <article-title>Tweets Under the Rubble: Detection of Messages Calling for Help in Earthquake Disaster</article-title>
          . (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <article-title>Rapid report of seismic damage to hospitals in the 2023 Turkey earthquake sequences</article-title>
          .
          <source>Earthquake Research Advances</source>
          <volume>3</volume>
          ,
          <issue>100234</issue>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Harald</surname>
            <given-names>Hornmoen &amp; Klas</given-names>
          </string-name>
          <string-name>
            <surname>Backholm</surname>
          </string-name>
          .
          <article-title>Social Media Use in Crises and Risks: An Introduction to the Collection</article-title>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Despoina</given-names>
            <surname>Antonakaki</surname>
          </string-name>
          , Paraskevi Fragopoulou &amp; Sotiris
          <string-name>
            <surname>Ioannidis</surname>
          </string-name>
          .
          <article-title>A survey of Twitter research: Data model, graph structure, sentiment analysis and attacks</article-title>
          .
          <source>ScienceDirect</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Yue</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuo</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>A survey of sentiment analysis in social media</article-title>
          .
          <source>Knowl Inf Syst</source>
          <volume>60</volume>
          ,
          <fpage>617</fpage>
          -
          <lpage>663</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>GABRIEL</given-names>
            <surname>PREDA. Turkey Earthquake</surname>
          </string-name>
          <string-name>
            <surname>Tweets.</surname>
          </string-name>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Wankhade</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>A. C. S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>A survey on sentiment analysis methods, applications, and challenges</article-title>
          .
          <source>Artif Intell Rev</source>
          <volume>55</volume>
          ,
          <fpage>5731</fpage>
          -
          <lpage>5780</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <article-title>Text Classification Using Label Names Only: A Language Model SelfTraining Approach</article-title>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <article-title>Factors affecting inter-rater agreement in human classification of eye movements: a comparison of three datasets</article-title>
          .
          <source>Behav Res Methods</source>
          <volume>55</volume>
          ,
          <fpage>417</fpage>
          -
          <lpage>427</lpage>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>in Lecture Notes in Networks and Systems</source>
          vol.
          <volume>693</volume>
          LNNS 569-581 (Springer Science and Business Media Deutschland GmbH,
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Al</given-names>
            <surname>Mahmoud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            ,
            <surname>Hammo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. H.</given-names>
            &amp;
            <surname>Faris</surname>
          </string-name>
          ,
          <string-name>
            <surname>H.</surname>
          </string-name>
          <article-title>Cluster-based ensemble learning model for improving sentiment classification of Arabic documents</article-title>
          .
          <source>Nat Lang Eng</source>
          (
          <year>2023</year>
          ) doi:10.1017/S135132492300027X.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Rahman</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          et al.
          <source>Multi-Tier Sentiment Analysis of Social Media Text Using Supervised Machine Learning. Comput. Mater. Contin</source>
          <volume>74</volume>
          ,
          <fpage>5527</fpage>
          -
          <lpage>5543</lpage>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Hui</given-names>
            <surname>Lin</surname>
          </string-name>
          &amp;
          <string-name>
            <given-names>Ming</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Practitioner's Guide to Data Science</article-title>
          . (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Joyinee</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          , Priyanka Kumari Mishra, Selvakuberan Karuppasamy &amp;
          <article-title>Arpana Dipak Mahajan. A Survey of Numerous Text Similarity Approach</article-title>
          .
          <source>International Journal of Scientific Research in Computer Science, Engineering and Information Technology 184-194</source>
          (
          <year>2023</year>
          ) doi:10.32628/cseit2390133.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Cascante-Bonilla</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Ordonez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Curriculum</surname>
          </string-name>
          <article-title>Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning</article-title>
          . www.aaai.
          <source>org</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>