<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Detection of Hope Speech</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel García-Baena</string-name>
          <email>daniel.gbaena@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, SINAI research group, CEATIC, Universidad de Jaén</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Hope speech is a type of discourse that has the power to help, inspire people for good and even relax hostile environments. The automatic detection of hope speech is an open challenge in Natural Language Processing that has been generally eclipsed by hate speech detection. Rather than simply deleting hate speech from the Internet, restricting freedom of speech, according to the outstanding importance that psychology gives to hope and the success that some social experiments had when they highlighted hope speech over the rest of the texts, we find specially necessary to study in depth the automatic identification of hope speech. In this work, we describe a thesis project that focuses on the development of new datasets and systems that allow the automatic detection, by means of diferent classical machine learning techniques and new deep learning architectures, of hope speech, mainly in Spanish.</p>
      </abstract>
      <kwd-group>
        <kwd>Hope speech</kwd>
        <kwd>natural language processing</kwd>
        <kwd>language that relaxes hostile environments</kwd>
        <kwd>language that</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>
        ceur-ws.org
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] also introduces a possible variation of what is meant by hope speech, now taking into account
the ability of language to promote equality, diversity and inclusion (EDI) of women belonging
to the fields of science, technology, engineering and management (STEM), lesbian, gay, bisexual,
transgender, intersex and queer individuals (LGBTIQ); and racial minorities and individuals
with disabilities.
CEUR
Workshop
Proceedings
      </p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>In this thesis it is pretended to elaborate resources in order to automatically classify hope
speech. Therefore, it will be created a new Spanish written dataset for hope speech identification
and it will be developed too some systems for detecting hope speech.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Previous works</title>
      <p>As this is a recent task to be tackled automatically from Natural Language Processing (NLP),only
a few corpora are available. Until now, the work that has been done in relation to hope speech
identification has been focused in developing new datasets for English, Malayalam and Tamil;
and automatic detection systems based on classic machine learning strategies and modern deep
learning architectures. They will be discussed below.</p>
      <sec id="sec-2-1">
        <title>2.1. HopeEDI</title>
        <p>
          The HopeEDI dataset [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] contains comments in English, Malayalam and Tamil. It consists of
data obtained from comments posted on YouTube videos that were collected from November
2019 to June 2020. The corpus can be downloaded free of charge at Hugging Face: https:
//huggingface.co/datasets/hope_edi.
        </p>
        <p>The subject matter of the comments written in English is EDI (Equality, Diversity and
Inclusion). In this case, the comments come from videos posted by Indian and Sri Lankan users.
It is important to note that since India is a multilingual country, many of the comments may be
written in several languages at the same time (code-mixing).</p>
        <p>For the HopeEDI corpus, its author applied diferent machine learning algorithms on a
TFIDF (Term Frequency-Inverse Document Frequency) representation of the tokens. Specifically,
the corpus was evaluated with the following: Bayesian multinomial classifier (multinomial
Naïve Bayes or MNB) with a value of alpha equal to 0.7, k-nearest neighbors method, support
vector machine (SVM), decision tree (DT), and with Logistic Regression (LR). In any case, for all
commented techniques, results scored an F1 value no better than 0.56 and, consequently, they
were quite disappointing.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. India-Pakistan</title>
        <p>
          This dataset contains data from English comments posted on videos from YouTube [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The
researchers chose this site as the source of the data because it is the most widely used video
broadcasting platform in India and Pakistan today. Unfortunately, this dataset is not publicly
available.
        </p>
        <p>For their compilation, a series of queries were prepared and then extended with searches
related to the Kashmir conflict by consulting trends from India and Pakistan that took place
between February 14, 2019 and March 13, 2019. Finally, such queries were used to search for
related videos on YouTube and subsequently obtain their comments using the public API of
that social network.</p>
        <p>The comments are all written in English and come from mainly Indian and Pakistani users.
There are also comments submitted by immigrants from India and Pakistan, whose were in
Bangladesh, Nepal, United States, United Kingdom, Afghanistan, China, Canada and Russia. In
this case, the origin of the users was taken into account with the intention of maintaining an
equal representation of citizens belonging to both sides of the conflict.</p>
        <p>This time, the authors used a logistic regression with L2 regularization classifier (Ridge
Regression). The experiment was run a total of one hundred times on one hundred random
sections of the dataset and achieved an F1 value of 0.79.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. KanHope</title>
        <p>
          KanHope dataset [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] contains comments in code-mixed Kannada-English. All data was collected
with the app YouTube Comment Scraper between February 2020 and August 2020. The dataset
is publicly available on Hugging Face: https://huggingface.co/datasets/kan_hope.
        </p>
        <p>KanHope gathers comments from several videos on distinctive topics such as movie trailers,
India-China border dispute, people’s opinion about the ban on several mobile apps in India,
Mahabharata and other social issues that involved oppression, marginalization and mental health.
KanHope dataset authors emphasize on the inclusion of people of marginalized communities,
such as LGBT, racial and gender minorities. All comments were from users based in India and,
being it a multilingual country, researchers were motivated to extract the comments to work on
code-mixed texts.</p>
        <p>The corpus authors applied from primitive machine learning to complex deep learning
approaches. The model DC-BERT4HOPE (roberta-mbert) obtained the best results for F1-scores
with 0.752, followed by DC-BERT4HOPE (bert-mbert): 0.735, mBERT: 0.726, DC-BERT4HOPE
(roberta-xlm): 0.720, and random forest with 0.706.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. SpanishHopeEDI</title>
        <p>
          Finally, we have generated a quality dataset SpanishHopeEDI [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], a new Spanish Twitter corpus
on LGBT community, and we have conducted some experiments that can serve as a baseline
for further research. The dataset consists of 1,650 LGBT-related tweets annotated as HS (Hope
Speech) or NHS (Non Hope Speech). A tweet is considered as HS if the text:
        </p>
        <sec id="sec-2-4-1">
          <title>1. Explicitly supports the social integration of minorities. 2. Is a positive inspiration for the LGTBI community. 3. Explicitly encourages LGTBI people who might find themselves in a situation or unconditionally promotes tolerance.</title>
        </sec>
        <sec id="sec-2-4-2">
          <title>On the contrary, a tweet is marked as NHS if the text:</title>
        </sec>
        <sec id="sec-2-4-3">
          <title>1. Expresses negative sentiment towards the LGTBI community</title>
          <p>2. Explicitly seeks violence or uses gender-based insults.</p>
          <p>
            The dataset was created from LGBT-related tweets. All of those tweets were written in
Spanish and were collected using the Twitter API. As seed for the search we used a lexicon of
LGBT-related terms, such as #OrgulloLGTBI and #LGTB. In addtion, it should be mentioned that
our SpanishHopeEDI dataset was included in the second workshop on Language Technology
for Equality, Diversity and Inclusion that was held as a part of the ACL 2022 [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Description, hypotheses and objectives</title>
      <p>
        EDI is an important issue in many diferent areas. Language is a fundamental tool for
communication and it must be inclusive and treat everyone equally. However, sometimes on social
media this is not the case, as more ofensive messages are posted towards people because of
their race, color, ethnicity, gender, sexual orientation, nationality or religion. As Chakravarthi
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] stated, the importance of the social media on the lives of vulnerable groups, such as for
people belonging to the LGBT community, racial minorities or individuals with disabilities;
plays an essential role in shaping their personalities and how they perceive society [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ].
Therefore, it is found important to focus on researching on the inclusion of this people and to
use promoting positive content on social media, in pursuit of EDI.
      </p>
      <p>
        The importance of hope has already been carefully studied by psychologists and, consequently,
we can afirm that hope plays a crucial role in the well-being, recovery and restoration of
humans [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Greater hope is consistently related to a better academic, athletic, physical health,
psychological adjustment and psychotherapy outcomes. In general, Hope Theory is comparable
to theories of Learned Optimism, Optimism, Self-Eficacy and Self-Esteem [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Individuals with high doses of hope do not react in the same way to barriers as those with low
amounts of hope, but instead view barriers as challenges to overcome and use their pathway
thoughts to plan an alternative route to their goals [11, 12]. In addition, high levels of hope has
been found to be correlated with a number of beneficial elements, such as academic performance
[13] and lower levels of depression [14]. In contrast, low hope proportions are associated with
negative outcomes, such as reduced well-being [15].</p>
      <p>Therefore, it is relevant to analyze the state of the art of automated hope speech detection
technologies from the perspective of NLP. In this sense, automated detection of hope speech
can be especially useful in promoting the dissemination of hopeful messages to those in dificult
times and can be used to promote positive messages to support EDI. Previous studies have
shown that a snowball efect occurs in social media and abusive comments lead to more abusive
comments and positive comments inspire people to leave more positive comments [16, 17]. In
order to study this, Facebook conducted an experiment by modifying its Newsfeed algorithm to
show more positive or negative posts to certain users [18]. Their results showed that people
tend to write positive posts when they see happy posts in their newsfeeds and vice versa. All
this suggest the importance of reinforce positivity on social media, focusing then on promoting
hope speech.</p>
      <p>Hence, it was considered important to pursue the following objectives:
1. To theoretically study the concept of hope speech, as well as its treatment from an NLP
point of view.
2. Analyzing the already existing hope speech detection solutions and discussing the
problems derived from them.
3. To make a review of all available resources, providing experiences and an accessible
introduction to those researchers who may be interested in tackle this problem.
4. Make a new dataset focused on the LGBT community for Spanish hope speech detection.
5. Create baseline experiments using machine learning and deep learning algorithms,
including, of course, cutting edge technologies as transformers models.
6. Develop an extensive error analysis in order to be able to determine future directions of
this study.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>The methodology that is proposed in order to achieve the objectives of this thesis is detailed
below:
1. Firstly, it is necessary to carefully review the state of the art of hope speech classification.</p>
      <p>Therefore, it will be important to evaluate both already existing corpus and classification
systems.
2. Secondly, we will part from some of the currently available resources, in relation to hope
speech detection, and we will develop new ones with the intention of make it possible to
detect hope speech sentences from texts written in Spanish.
3. Therefore, it will be created a new corpus, containing several texts written only in Spanish,
that we will focus in EDI.
4. Then, we will create diferent systems that will use the last dataset for making possible to
automatically identify hope speech texts.
5. And, finally, we will experiment with and evaluate our new resources so as to improve
them, always sharing our work with the scientific community, publishing all the results
and organizing shared tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Research questions</title>
      <p>The main research questions that we pretend to respond with this work are all of them listed
afterwards:
• How similar it is to detect hope and hate speech?
• It is possible to elaborate unambiguous hope speech tagging notes?
• Are tagging notes for hope speech corpus dependent of the language in which the texts
from the dataset were written?
• It is interesting, or useful, to create hope speech datasets for making possible to
automatically detect it?
• What can we learn from the already existing datasets for hope speech detection in
languages diferent than Spanish?
• For new classification systems, how could we improve them?
• In relation to hope speech detection, is it viable to identify the main causes of possible
classification errors?
• What algorithms are the best for automatic detection of hope speech?</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by Project CONSENSO (PID2021-122263OB-C21), Project
MODERATES (TED2021-130145B-I00) and Project SocialTox (PDC2022-133146-C21) funded
by MCIN/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR,
Project PRECOM (SUBV-00016) funded by the Ministry of Consumer Afairs of the Spanish
Government, Project FedDAP (PID2020-116118GA-I00) supported by MICINN/AEI/10.13039/501100011033,
WeLee project (1380939, FEDER Andalucía 2014-2020) funded by the Andalusian Regional
Government and by a grant from Fondo Social Europeo and the Administration of the Junta de
Andalucía (DOC_01073).
[11] C. R. Snyder, The psychology of hope: You can get there from here, Simon and Schuster,
1994.
[12] C. R. Snyder, Hypothesis: There is hope, in: Handbook of hope, Elsevier, 2000, pp. 3–21.
[13] C. R. Snyder, H. S. Shorey, J. Cheavens, K. M. Pulvers, V. H. Adams III, C. Wiklund, Hope
and academic success in college., Journal of educational psychology 94 (2002) 820.
[14] C. R. Snyder, B. Hoza, W. E. Pelham, M. Rapof, L. Ware, M. Danovsky, L. Highberger,
H. Ribinstein, K. J. Stahl, The development and validation of the children’s hope scale,
Journal of pediatric psychology 22 (1997) 399–421.
[15] E. Diener, Subjective well-being, The science of well-being (2009) 11–58.
[16] A. Sundar, A. Ramakrishnan, A. Balaji, T. Durairaj, Hope speech detection for dravidian
languages using cross-lingual embeddings with stacked encoder architecture, SN Computer
Science 3 (2022) 1–15.
[17] L. Muchnik, S. Aral, S. J. Taylor, Social influence bias: A randomized experiment, Science
341 (2013) 647–651.
[18] A. D. Kramer, J. E. Guillory, J. T. Hancock, Experimental evidence of massive-scale
emotional contagion through social networks, Proceedings of the National Academy of
Sciences 111 (2014) 8788–8790.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Palakodety</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>KhudaBukhsh</surname>
          </string-name>
          , J. G. Carbonell,
          <article-title>Hope speech detection: A computational analysis of the voice of peace</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>12940</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>HopeEDI: A multilingual hope speech detection dataset for equality, diversity, and inclusion</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Computational Modeling of People's Opinions</source>
          , Personality, and
          <article-title>Emotion's in Social Media, Association for Computational Linguistics</article-title>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>53</lpage>
          . URL: https: //aclanthology.org/
          <year>2020</year>
          .peoples-
          <volume>1</volume>
          .5.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Snyder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. S.</given-names>
            <surname>Shorey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. L.</given-names>
            <surname>Rand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Feldman</surname>
          </string-name>
          , Hope theory, measurements, and applications to school psychology.,
          <source>School psychology quarterly 18</source>
          (
          <year>2003</year>
          )
          <fpage>122</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sampath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Thamburaj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <article-title>Hope speech detection in under-resourced kannada language</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2108</volume>
          .
          <fpage>04616</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Baena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>García-Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <article-title>Hope speech detection in spanish, Language Resources and Evaluation (</article-title>
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10579- 023- 09638- 3.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. R.</given-names>
            <surname>Chakravarthi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muralidaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Priyadharshini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Chinnaudayar</given-names>
            <surname>Navaneethakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCrae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>García-Cumbreras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Jiménez-Zafra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Valencia-García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kumar</surname>
          </string-name>
          <string-name>
            <surname>Kumaresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ponnusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>García-Baena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          ,
          <article-title>Overview of the shared task on hope speech detection for equality, diversity, and inclusion, Association for Computational Linguistics (</article-title>
          <year>2022</year>
          )
          <fpage>378</fpage>
          -
          <lpage>388</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .ltedi-
          <volume>1</volume>
          .58. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .ltedi-
          <volume>1</volume>
          .
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kitzie</surname>
          </string-name>
          ,
          <article-title>I pretended to be a boy on the internet: Navigating afordances and constraints of social networking sites and search engines for lgbtq+ identity work</article-title>
          , First
          <string-name>
            <surname>Monday</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Burnap</surname>
          </string-name>
          , G. Colombo,
          <string-name>
            <given-names>R.</given-names>
            <surname>Amery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hodorog</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Scourfield, Multi-class machine classification of suicide-related communication on twitter</article-title>
          ,
          <source>Online social networks and media 2</source>
          (
          <year>2017</year>
          )
          <fpage>32</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. N.</given-names>
            <surname>Milne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hachey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Calvo</surname>
          </string-name>
          ,
          <article-title>Clpsych 2016 shared task: Triaging content in online peer-support forums</article-title>
          ,
          <source>in: Proceedings of the third workshop on computational linguistics and clinical psychology</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>118</fpage>
          -
          <lpage>127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Snyder</surname>
          </string-name>
          ,
          <article-title>Hope theory: Rainbows in the mind</article-title>
          .,
          <source>Psychological Inquiry</source>
          <volume>13</volume>
          (
          <year>2002</year>
          )
          <fpage>249</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>