<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Hope Speech Detection using Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mesay Gemeda Yigezu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Girma Yohannis Bade</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Kolesnikova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigori Sidorov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Gelbukh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Politécnico Nacional (IPN), Centro de Investigación en Computación (CIC), NLP Lab</institution>
          ,
          <addr-line>Mexico City</addr-line>
          ,
          <country country="MX">Mexico</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Millions of individuals use social media platforms like Facebook, Twitter, Instagram, and YouTube to share or get opinions. These social media platforms also spread, negative and positive thoughts. Hope speech is one of the positive thoughts which can make relax an environment when people get anxious. This paper presents hope speech detection among posts in English and Spanish. Since it's a shared task of IberLEF 2023, train-test data sets for both English and Spanish labeled as hope speech and not hope speech. We used Python to develop a model and chose a support vector machine (SVM) to achieve the assigned task. We developed the hope speech-detecting model by using a train-development data set and evaluated it on test data sets. The performance of the model was measured by an average macro F1 score metric. The model showed an average macro F1 for English of 0.489 and 0.481 for Spanish.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Hope speech</kwd>
        <kwd>Classification algorithm</kwd>
        <kwd>Shared task</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Social media platform</kwd>
        <kwd>Support vector machine</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Technology has a big impact on every aspect of our lives. It has been changing the way
we communicate, purchase, and make decisions in diferent application areas. Millions of
individuals use social media sites like Facebook, Twitter, Instagram, and YouTube to share
material and voice their opinions. These platforms also spread negative and positive opinions.
Some linguistic computational tasks aimed at finding posts on online social media and trying to
stop the spread of negativity are hate speech detection, ofensive language identification, and
abusive language detection [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],[2]. On the other hand, people may look for good suggestions,
encouragements, gratitude, appreciation, and acknowledgments; these are positive dimensions
of social media posts and can be categorized as hope speech. Hope speech is a type of speech
able to relax a hostile environment when people get anxious Palakodety et al. [3]. Classifying a
given comment as Hope Speech or Non-Hope Speech is known as hope speech detection. This
year, we participate in IberLEF 2023 HOPE shared task [4]. As the context of this collaborative
endeavor, hope speech supports people sufering from disease, stress, loneliness, or sadness;
moreover hope messages ofer advice and inspire readers to do good things. To counteract
sexual or racial prejudice or to promote less combative workplaces, it can be quite efective to
automatically detect hope speech so that favorable remarks can be more widely spread.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related works</title>
      <p>Recent research on the improvement of free speech through social media was undertaken by [5].
The researchers presented a revolutionary custom deep network architecture that employed
a concatenation of embedding from T5-Sentence, rather than removing seemingly ofensive
phrases, in order to detect and encourage positivity in the comments. Several machine learning
methods such as SVM, logistic regression, K-nearest neighbor, decision tree, logistic neighbors,
and a newly suggested CNN-based model have all been tested With a macro F1-score that was
higher than the others, the suggested model performed better in the English language.</p>
      <p>Tonja et al. [6] discussed social media mining for health, particularly in the area of
classiifcation of self-reporting exact age in tweets and Reddit posts. In this regard, they applied
transformer-based models such as BERT and RoBERTa, and their application in classification
tasks. The study also presented the evaluation metrics for the classification of Self-reporting
exact ages on Tweets and Reddit posts. It also highlighted the performance of the models which
is capable to be compared with previous works in the field.</p>
      <p>Puranik et al. [7] has used a variety of transformer-based models to categorize social media
remarks in English, Malayalam, and Tamil as hope speech or not hope speech. The study’s
whole dataset includes 59,354 YouTube comments, of which 28,451 are in English, 20,198 are
in Tamil, and 10,705 are in Malayalam. These comments are categorized as Hope speech, not
Hope speech, and other languages. As the best model, character-output Bert’s for the validation
dataset is used.</p>
      <p>Balouchzahi et al. [8] participated in the “Hope Speech Detection for Equality, Diversity, and
Inclusion-EACL 2021” shared task. The team proposed three models for classifying English and
code-mixed texts in Tamil-English and Malayalam-English into three categories - "Hope speech",
"Non-hope speech", and "other languages". The three models, CoHope-ML, CoHope-NN, and
CoHope-TL, are based on the Ensemble of classifiers, Keras Neural Network, and BiLSTM with
Conv1d model, respectively. The CoHope-ML model obtained the best results among the three
models, achieving the 1st, 2nd, and 3rd ranks with weighted F1-scores of 0.85, 0.92, and 0.59 for
Malayalam-English, English, and Tamil-English texts, respectively.</p>
      <p>
        Tonja et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] presented in the text focuses on violent and related problematic behaviors in
social media to detect and classify aggressive and violent incidents in Spanish social media using
language-specific pre-trained language models. Their model achieved an F1 score of 0.7455 for
violent event identification and an F1 score of 0.4903 for violent event category recognition on
the DA-VINCIS dataset.
      </p>
      <p>Mahajan et al. [9] carried out the study to forecast the presence of hope speech as well as the
existence of samples from diferent languages in the data set. The method used RoBERTa to
identify hope speech for English and XLM-RoBERTa for Tamil and Malayalam. It was noted as
hope-speech, non-hope speech, and not-language. Their strategy had the highest F1 score in
English. It was also part of shared task-2 of 2022 in codalab.</p>
      <p>Arif et al. [10] specifically presented the use of diferent algorithms for the multiclass and
cross-lingual fake news detection task and achieved a macro F1-score of 28.60% for a
monolingual task in English using RoBERTa pre-trained model and 17.21% for a cross-lingual task for
English and German using Bi-LSTM deep learning algorithm.</p>
      <p>Balouchzahi et al. [11] provided a hope speech dataset that classifies English tweets into
two broad categories, "Hope" and "Not Hope," and then three more specific hope categories,
"Generalized Hope," "Realistic Hope," and "Unrealistic Hope." Finally, they provided a detailed
description of their annotation process and guidelines. In addition, in order to benchmark
the collected dataset, they reported several baselines that were based on various learning
approaches. These learning approaches included traditional machine learning, deep learning,
and transformers. They evaluated the baselines by using weighted-averaged and macro-averaged
F1 scores.</p>
      <p>Gupta et al. [12], looked for and promoted helpful and uplifting YouTube posts. They used
a variety of machine learning models to categorize social media remarks in English as hope
speech or non-hope speech. It represents the Cooperative Task on Hope Speech Detection for
Equality, Diversity, and Inclusion during LT-EDI-ACL 2022.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The specific requirements of shared tasks and the constraints imposed by the classification
task served as the basis for the development of our methodology. When selecting models for
machine learning, it is usual practice to base the decision on how well those models perform in
binary classification tasks. The choice of a particular model can be influenced by a number of
diferent considerations, including the size and complexity of the dataset, the level of accuracy
that is desired, and the availability of computational resources that are readily available. figure
1 depicts, we discussed the methodology which is applied to this shared task as followed.</p>
      <p>Step-1 Data Understanding and Preparation: To begin, it is necessary to gain an
understanding of the issue that we are attempting to resolve and to understand the data. Make
certain that the data is representative, well-balanced, and pre-processed, using methods such as
normalization and the imputation of missing values as appropriate.</p>
      <p>Step-2 Feature Selection: Choose the features that are going to be most helpful in building
the SVM model. This method is used to determine which aspects provide the most useful
information. In accordance with the parameters of our task, we utilized approaches known as
term frequency and inverse document frequency (TF-IDF), which assign a greater significance
to words that appear more frequently in a given document. The TF is a measurement that
determines how often a particular term or phrase appears in a given document. It is determined
by taking the total number of words in a document and dividing that number by the number of
times a particular word appears in that document.</p>
      <p>Step-3 Model Selection: We have decided that the support vector machine model is the
best fit for the issue at hand; thus, that is the model we have chosen SVM because it works well
in high-dimensional spaces, which makes them perfect for addressing dificult classification
issues in which there are a lot of features.</p>
      <p>Step-4 Training and Cross-Validation: These are crucial steps in the construction of
machine learning models that help to ensure the model can generalize efectively to new data
and contribute to the overall accuracy of the model.</p>
      <p>In machine learning, "training" refers to the process of teaching a model how to generate
predictions by using a labeled dataset. During training, the model is taught to recognize patterns
and correlations in the data that are important for producing correct predictions. This learning
takes place during the training process. The purpose of training is to develop a model that is
able to generalize well to data that it has not before encountered.</p>
      <p>The efectiveness of a machine learning model can be evaluated with the help of a technique
known as cross-validation. In this step, the dataset is divided into a training set and a validation
set. The model is then trained using the training set, and its accuracy is assessed using the
validation set. This process is done a number of times, with various subsets of the data being
utilized for training and validation each time, and the performance indicators are then averaged
across all of the iterations.</p>
      <p>In general, training and cross-validation are two essential processes in the process of
developing a machine-learning model. These steps help to guarantee that the model is reliable and
can generalize well to new data.</p>
      <p>Step-5 Model Evaluation: Evaluate the model’s performance on the testing set using
appropriate evaluation metrics such as precision, recall, and F1 score. We evaluated our model
with a test dataset and submitted the result. Finally, evaluation is done by a shared task organizer.</p>
      <p>Step-6 Interpretation and Visualization: Create a visual representation of the findings in
order to acquire a deeper comprehension of the model’s behavior and overall performance.</p>
    </sec>
    <sec id="sec-4">
      <title>4. System Task Description</title>
      <sec id="sec-4-1">
        <title>4.1. Data sets Description</title>
        <p>Training, development, and test data sets were given to the participants for two languages
which are English [2] and Spanish [13]. The given data sets were annotated at the comment
level or post level as shown in table 1 and table 2. The IberLEF-2023 shared tasks’ quality is that
it has used the expanded and most improved data sets than the previous shared tasks for both
Spanish and English languages [14].</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Existing classification algorithms and selection</title>
        <p>Choosing the right classifier is the most crucial stage in the pipeline for text classification. We
are unable to choose the most successful model for a text categorization application without
having a thorough conceptual knowledge of each approach Lee and Shin [15].</p>
        <p>
          We described the current text and document categorization methods in this section. In
history, the text categorization technique has begun with the Rocchio algorithm. Secondly, it
advanced to boosting and bagging, two well-liked ensemble learning algorithmic approaches.
Although becoming more conventional, techniques like logistic regression, Naive Bayes, and
k-nearest neighbor [
          <xref ref-type="bibr" rid="ref2">16</xref>
          ] are still widely utilized in the scientific community. As a classification
method, support vector machines (SVM), particularly kernel SVM, is also widely employed. For
categorizing documents, tree-based classification algorithms like decision trees and random
forests are eficient and precise. The are also other neural network-based text classification
algorithms including hierarchical attention networks (HAN), deep belief networks (DBN), CNN
[
          <xref ref-type="bibr" rid="ref3">17</xref>
          ], RNN, and combination methods [
          <xref ref-type="bibr" rid="ref4">18</xref>
          ] and could apply transformer-based approaches [
          <xref ref-type="bibr" rid="ref5">19</xref>
          ].
4.2.1. Support Vector Machine(SVM) Classification Algorithm
SVM was firstly developed for binary classification applications. However, a lot of scholars use
this prevalent strategy when working on multi-class issues.
        </p>
        <p>
          The study of text categorization using a string kernel is also known as kernel SVM. Using a
function to map the string in the feature space is the fundamental concept behind the string
kernel (SK). Several other applications, including the categorization of text, DNA, and proteins,
have used kernels as part of the SKSVM rhythm Cervantes et al. [
          <xref ref-type="bibr" rid="ref6">20</xref>
          ]. SVM is the most
efectiveness when there is a distinct line dividing classes, and when the number of samples is less
than the number of dimensions; for this incredible advantages we chose SVM to classify the
given social media posts are hope speech or not hope speech [
          <xref ref-type="bibr" rid="ref7">21</xref>
          ]. In addtion to that SVM able
to handle high-dimensional feature spaces and non-linear classification problems. SVMs are
particularly useful when the data is not linearly separable and the decision boundary is complex,
as they can transform the original data into a higher-dimensional space where it may become
linearly separable [
          <xref ref-type="bibr" rid="ref8">22</xref>
          ].
        </p>
        <p>Furthermore, SVMs are robust to overfitting, as they use a regularization parameter that
helps to prevent the model from fitting the noise in the data. They can also work well with
small to medium-sized datasets.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Challenges of the task</title>
      <p>This task is one of the essential activities in the NLP area. However, data poses many challenges
for NLP due to its lack of context, informal language [4], and imbalanced dataset.</p>
      <p>Lack of context: As a result, analyzing data has become a crucial step for NLP. Twitter is a
well-known social networking site that produces enormous amounts of data every day. It is
dificult to infer the context of a tweet in this task because Twitter data is short and limited
to 240 characters per tweet. Lack of context causes ambiguity, which makes it challenging to
extract the meaning of a tweet accurately.</p>
      <p>Informal language: we used informal language that includes misspellings, acronyms, and
emojis, which makes it challenging for NLP algorithms to understand the intended meaning of
social media. We solved the above-mentioned problems in the pre-processing stage.</p>
      <p>Imbalanced datasets: these datasets pose a significant challenge for NLP tasks, in this task
unbalanced English datasets were given. As a result, biased models are inaccurate in predicting
minority classes. Moreover, imbalanced datasets can lead to the overfitting of models and poor
generalization to unseen data.</p>
      <p>
        There are several techniques that can be used to address the problem of imbalanced datasets in
machine learning [
        <xref ref-type="bibr" rid="ref9">23</xref>
        ], among this technique we used oversampling which involves duplicating
instances from the minority class until it is balanced with the majority class.
      </p>
      <sec id="sec-5-1">
        <title>5.1. Result and Discussion</title>
        <p>The developed model was based on the SVM string kernel classifier. We have evaluated the
developed model in terms of F1 scores. The model classifies social media comments/posts into
hope speech or not hope speech as we are asked in shared tasks. We have tabulated Precision
(P), Recall (R), F1-score, and the average macro F1-scores of the model for the test data set in
Table 3. The average-macro-F1 for English is 0.4894 and Spanish is 0.4815. From the result, the
model performed better in the English language than in Spanish because the given data size for
Spanish is less than English.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        Hope is a positive frame of mind that is both present- and future-focused. It is founded on
the desire for favorable results in one’s life or the world as a whole and may also be found in
motivational speeches about those who have faced and overcome hardship [
        <xref ref-type="bibr" rid="ref10">24</xref>
        ]. This study
described a multilingual hope speech detection using machine learning algorithm. We applied a
SVM algorithms to automatically classify whether the given text in both English and Spanish is
hope speech or not hope speech. Two hope speech classification models were developed for
both languages and their performances were also tested using average macro F1-score metric.
The performance of the model is highly depend on the size and quality of data sets.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Future Work</title>
      <p>Since hope speeches build the soft mindsets of human beings, the tasks should span to other
languages. In addition to this, the performance of the proposed model in this study should be
improved by increasing the number of dataset sizes and providing other more algorithm for the
langauges used here.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The work was done with partial support from the Mexican Government through the grant
A1S-47854 of CONACYT, Mexico, grants 20220852, 20220859, and 20221627 of the Secretaría de
Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the
CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje
Profundo para Tecnologías del Lenguaje of the Laboratorio de Supercómputo of the INAOE,
Mexico and acknowledge the support of Microsoft through the Microsoft Latin America Ph.D.
Award.
Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2022), CEUR Workshop
Proceedings. CEUR-WS. org, 2022.
[2] B. R. Chakravarthi, Hopeedi: A multilingual hope speech detection dataset for equality,
diversity, and inclusion, in: Proceedings of the Third Workshop on Computational Modeling
of People’s Opinions, Personality, and Emotion’s in Social Media, 2020, pp. 41–53.
[3] S. Palakodety, A. R. KhudaBukhsh, J. G. Carbonell, Hope speech detection: A computational
analysis of the voice of peace, arXiv preprint arXiv:1909.12940 (2019).
[4] S. M. Jiménez-Zafra, M. Á. García-Cumbreras, D. García-Baena, J. A. García-Díaz, B. R.</p>
      <p>Chakravarthi, R. Valencia-García, L. A. Ureña-López, Overview of HOPE at IberLEF 2023:
Multilingual Hope Speech Detection, Procesamiento del Lenguaje Natural 71 (2023).
[5] M. Ahmed, A. Najmul Islam, Deep learning: hope or hype, Annals of Data Science 7 (2020)
427–432.
[6] A. L. Tonja, O. E. Ojo, M. A. Khan, A. G. M. Meque, O. Kolesnikova, G. Sidorov, A. Gelbukh,
Cic nlp at smm4h 2022: a bert-based approach for classification of social media forum
posts, in: Proceedings of The Seventh Workshop on Social Media Mining for Health
Applications, Workshop &amp; Shared Task, 2022, pp. 58–61.
[7] K. Puranik, A. Hande, R. Priyadharshini, S. Thavareesan, B. R. Chakravarthi, Iiitt@
ltedi-eacl2021-hope speech detection: there is always hope in transformers, arXiv preprint
arXiv:2104.09066 (2021).
[8] F. Balouchzahi, B. Aparna, H. Shashirekha, Mucs@ lt-edi-eacl2021: Cohope-hope speech
detection for equality, diversity, and inclusion in code-mixed texts, in: Proceedings of the
First Workshop on Language Technology for Equality, Diversity and Inclusion, 2021, pp.
180–187.
[9] K. Mahajan, E. Al-Hossami, S. Shaikh, Teamuncc@ lt-edi-eacl2021: Hope speech detection
using transfer learning with transformers, in: Proceedings of the First Workshop on
Language Technology for Equality, Diversity and Inclusion, 2021, pp. 136–142.
[10] M. Arif, A. L. Tonja, I. Ameer, O. Kolesnikova, A. Gelbukh, G. Sidorov, A. G. M. Meque,
Cic at checkthat! 2022: multi-class and cross-lingual fake news detection, Working Notes
of CLEF (2022).
[11] F. Balouchzahi, G. Sidorov, A. Gelbukh, Polyhope: Two-level hope speech detection from
tweets, Expert Systems with Applications 225 (2023) 120078.
[12] V. Gupta, R. Kumar, R. Pamula, Iit dhanbad@ lt-edi-acl2022-hope speech detection for
equality, diversity, and inclusion, in: Proceedings of the Second Workshop on Language
Technology for Equality, Diversity and Inclusion, 2022, pp. 229–233.
[13] D. García-Baena, M. Á. García-Cumbreras, S. M. Jiménez-Zafra, J. A. García-Díaz,
R. Valencia-García, Hope speech detection in spanish: The lgbt case, Language Resources
and Evaluation (2023) 1–28.
[14] S. M. Jiménez-Zafra, F. Rangel, M. Montes-y Gómez, Overview of IberLEF 2023: Natural
Language Processing Challenges for Spanish and other Iberian Languages, in: Proceedings
of the Iberian Languages Evaluation Forum (IberLEF 2023), co-located with the 39th
Conference of the Spanish Society for Natural Language Processing (SEPLN 2023),
CEURWS.org, 2023.
[15] I. Lee, Y. J. Shin, Machine learning for enterprises: Applications, algorithm selection, and
challenges, Business Horizons 63 (2020) 157–170.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolesnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          , G. Sidorov,
          <article-title>Detection of aggressive and violent incidents from social media in spanish using pre-trained language model</article-title>
          , in:
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Tash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ahani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gemeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hussain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolesnikova</surname>
          </string-name>
          ,
          <article-title>Word level language identification in code-mixed kannada-english texts using traditional machine learning algorithms</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Yigezu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolesnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Tash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          ,
          <article-title>Word level language identification in code-mixed kannada-english texts using deep learning approach</article-title>
          ,
          <source>in: Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kowsari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jafari Meimandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heidarysafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <article-title>Text classification algorithms: A survey</article-title>
          ,
          <source>Information</source>
          <volume>10</volume>
          (
          <year>2019</year>
          ). URL: https://www.mdpi.com/ 2078-2489/10/4/150. doi:
          <volume>10</volume>
          .3390/info10040150.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lambebo Tonja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Gemeda</given-names>
            <surname>Yigezu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kolesnikova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Shahiki</given-names>
            <surname>Tash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sidorov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbuk</surname>
          </string-name>
          ,
          <article-title>Transformer-based model for word level language identification in code-mixed kannada-english texts</article-title>
          , arXiv e-prints (
          <year>2022</year>
          ) arXiv-
          <fpage>2211</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cervantes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Garcia-Lamont</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rodríguez-Mazahua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on support vector machine classification: Applications, challenges and trends</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>408</volume>
          (
          <year>2020</year>
          )
          <fpage>189</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Karamizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Abdullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Halimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shayan</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>javad Rajabi, Advantage and drawback of support vector machine functionality</article-title>
          , in: 2014 international conference on computer,
          <source>communications, and control technology (I4CT)</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>63</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Madhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rahman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <article-title>A comparative study of support vector machine and artificial neural network for option price prediction</article-title>
          ,
          <source>Journal of Computer and Communications</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>78</fpage>
          -
          <lpage>91</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>H.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <article-title>Learning from imbalanced data</article-title>
          ,
          <source>IEEE Transactions on knowledge and data engineering 21</source>
          (
          <year>2009</year>
          )
          <fpage>1263</fpage>
          -
          <lpage>1284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Multilingual hope speech detection in english and dravidian languages</article-title>
          ,
          <source>International Journal of Data Science and Analytics</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>389</fpage>
          -
          <lpage>406</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>