<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Negation handling for Amharic sentiment classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Girma Neshir</string-name>
          <email>girma1978@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Rauber</string-name>
          <email>ber@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Solomon Atnafu</string-name>
          <email>solomon.atnafu@aau.edu.et</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Addis Ababa University, Department of Computer Science</institution>
          ,
          <country country="ET">Ethiopia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Addis Ababa University, IT Doctoral Program</institution>
          ,
          <country country="ET">Ethiopia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technical University of Vienna, Institute of Information Systems Engineering</institution>
          ,
          <addr-line>Austria, rau-</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Introduction: Due to the advancement of World Wide Web technology, users usually express their feelings, emotions and opinions as comments in response to the posted news, photo, audio and video. Currently, opinionated sources are increasing in languages other than English. However, Amharic sentiment analysis researches are very few as it has no sufficient linguistic resources for linguistic preprocessing and sentiment analysis. There are several challenges in lexicon based sentiment analysis. One of these is that handling negation in the text. The most common approach for negation handling is carried out relying on negation keywords. However, it is complex to identify the scope of negation where the process of correctly identifying the part of the text affected by the presence of negation word. Negation Handling(NH) is never studied in Amharic language to the best of our knowledge. Thus, this research develops an automatic method to handle negation and combined with char ngram features for Amharic sentiment classification. The research questions to be addressed in this work are as follows: (a) how can we automatically detect negation words in Amharic texts? (b) how can we design a framework for handling negation in Amharic sentiment anal ysis? (c) how to capture char level ngram features for improving Amharic sentiment analysis in Social media(e..g. Facebook) and (d) how can we evaluate the performance of the framework? Proposed Approach: As part of preprocessing, we normalized not only all Amharic words in the Amharic News Comments but also handling entries of Amharic Sentiment Lexicon by replacing varied alphabets of the same sound with identical symbols. Moreover, a stemmer is applied after negation identification is completed. As Amharic is morphologically rich, to reduce the mismatch of Amharic words during string comparison operation. We also used stemming for this purpose. The proposed framework consists of components including preprocessing and sentiment score cal culation using negation detection and machine learning using char level ngrams features. For more detail, the proposed framework is shown in Fig. 1 of Appendix. To compute sentiment score using negation detection, for each Amharic news comment, Ci, if each stemmed word wij is found in either of the Amharic Sentiment lexicons (Manual, SOCAL, SWN) [7], then the sentiment score sij is retrieved. sij and its position index in the comment is stored. To compute the sentiment of the comment, we apply positional weighting inversion if the comment contains any negation clue. If negation clue is not found, the score of the word is simply added. For more detail, the negation handling algorithm is depicted in Listing 1 of Appendix. Besides lexicon based negation handling approach, the usefulness of character aware language models is well suited to apply for language identification, reducing of text feature sparse di-</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        mensionality, helps to handle spelling errors, abbreviations, special characters, etc.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. That is why we proposed character level ngram approaches to reduce and address
these issues for Amharic facebook news comments’ sentiment classification rather
than word level ngram approaches. For example, the negation carrying Amharic word
“አልወደውም”/ “I do not like him”/ has 2-gram character level features includes
አ-አልልወ-ወደ-ደው-ውም-ም, 3-gram character level features are አ-
አል-አልወ-ልወደ-ወደውደውም-ውም-ም, etc. The negation marker/morpheme/ አል- is detected as feature of the
negation word/“አልወደውም”/. Thus, prior to char ngram based sentiment
classification, we partition the Annotated Amharic Facebook News Comment corpus into
training and testing sets. Logistic regression(LR) and Naïve Bayesian(NB) models are
built relying on the term-frequency inverse document frequency(tfidf) of char level
and word level(baseline) bi-gram and tri-gram features of training set.
Results: For Amharic Sentiment Classification, the results of the accuracy of the
individual and the combined models on the test set are presented in detail in Table 1 of
Appendix. The results in Table 1 show that negation handling algorithm outperforms
very well (acc. 86.2%) than the performance of character level and word level based
machine learning models for classifying sentiment of Amharic texts. On the other
hand, character level ngram based classifier is more useful for classifying Amharic
Sentiment than word level ngram models (baseline) (accuracy of 95.27). Finally, the
hybrid model is obtained by combining negation handling approach and char ngram
models (NB+LR). This hybrid model outperforms with accuracy of 98% than the
other models and its combinations. Yet, it is quite difficult to find why errors are gen
erated in predicting sentiment category of Amharic Facebook Comments. For exam
ple the facebook comment, 'በቃል የሚነገሩነገሮችንበተግባርእንዲፈፀሙልእንፈጋለ፣ በተግባርእንዲፈፀሙልንእፈልጋለን፣ እንበተግባርእንዲፈፀሙልእንፈጋለ፣ዲፈፀሙልንበተግባርእንዲፈፀሙልእንፈጋለ፣ እንበተግባርእንዲፈፀሙልእንፈጋለ፣ፈልጋለንበተግባርእንዲፈፀሙልእንፈጋለ፣፣ '/
We need to see accomplished in practice that we heard in words/ is wrongly
predicted. This comment does not express any opinion. This kind of comment represents
wishes that someone wants it to be done, but not necessarily expressing sentiment.
Further researches needs to be carried out to reduce the source of errors in predicting
sentiment class of Amharic comments. Our recent findings is a good starting point to
improve the performance of Amharic sentiment analysis in facebook news comments.
Fine tuning char ngram features shows suitableness and flexible for sentiment
analysis of resource limited language (e.g. Amharic) than word level ngram models
Conclusions: In general, extensive linguistic resources are expensive to build
sentiment classification on the less dominant languages (e.g. Amharic). To reduce this
problem, we proposed negation handling approach and char ngram approach for
Sentiment analysis of Amharic face book news comments. So far, we have seen that the
proposed approach still lacks accuracy of Amharic sentiment classification. The
approach potentially does not sufficiently capture the language specific features that
help to identify the sentiment class of Amharic news comment in social media.
Further work should be performed to reduce the amount of errors in sentiment analysis of
Amharic facebook news comments. To address these issues, we may need to consider
char ngram embedding features from corpus of the same domain(e.g. Facebook news
comments). Besides, Amharic negation scope identification and handling is recom
mended to be investigated for further researches.
      </p>
      <p>Appendix: List of figures, tables and algorithm listings</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Rizkiana</given-names>
            <surname>Amalia</surname>
          </string-name>
          , Moch Arif Bijaksana, and
          <string-name>
            <given-names>Dhinta</given-names>
            <surname>Darmantoro</surname>
          </string-name>
          .
          <article-title>Negation handling in sentiment classification using rule-based adapted from indonesian language syntactic for indonesian text in twitter</article-title>
          .
          <source>In Journal of Physics: Conference Series</source>
          , volume
          <volume>971</volume>
          , page 012039. IOP Publishing,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Amna</given-names>
            <surname>Asmi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Tanko</given-names>
            <surname>Ishaya</surname>
          </string-name>
          .
          <article-title>Negation identification and calculation in sentiment analysis</article-title>
          .
          <source>In The second international conference on advances in information mining and management</source>
          , pages
          <fpage>1</fpage>
          --
          <lpage>7</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Claudia</given-names>
            <surname>Diamantini</surname>
          </string-name>
          , Alex Mircoli, and
          <string-name>
            <given-names>Domenico</given-names>
            <surname>Potena</surname>
          </string-name>
          .
          <article-title>A negation handling technique for sentiment analysis</article-title>
          .
          <source>In 2016 International Conference on Collaboration Technologies and Systems (CTS)</source>
          , pages
          <fpage>188</fpage>
          --
          <lpage>195</lpage>
          . IEEE,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Martine</given-names>
            <surname>Enger</surname>
          </string-name>
          , Erik Velldal, and
          <string-name>
            <given-names>Lilja</given-names>
            <surname>Øvrelid</surname>
          </string-name>
          .
          <article-title>An open-source tool for negation detection: a maximum-margin approach</article-title>
          .
          <source>In Proceedings of the Workshop Computational Semantics Beyond Events and Roles</source>
          , pages
          <fpage>64</fpage>
          --
          <lpage>69</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Umar</given-names>
            <surname>Farooq</surname>
          </string-name>
          , Hasan Mansoor, Antoine Nongaillard, Yacine Ouzrout, and Muhammad Abdul Qadir.
          <article-title>Negation handling in sentiment analysis at sentence level</article-title>
          .
          <source>JCP</source>
          ,
          <volume>12</volume>
          (
          <issue>5</issue>
          ):
          <fpage>470</fpage>
          --
          <lpage>478</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Bas</given-names>
            <surname>Heerschop</surname>
          </string-name>
          , Paul van Iterson,
          <string-name>
            <surname>Alexander Hogenboom</surname>
            , Flavius Frasincar, and
            <given-names>Uzay</given-names>
          </string-name>
          <string-name>
            <surname>Kaymak</surname>
          </string-name>
          .
          <article-title>Accounting for negation in sentiment analysis</article-title>
          .
          <source>In 11th DutchBelgian Information Retrieval Workshop (DIR</source>
          <year>2011</year>
          ), pages
          <fpage>38</fpage>
          --
          <lpage>39</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Girma</given-names>
            <surname>Neshir</surname>
          </string-name>
          <string-name>
            <surname>Alemneh</surname>
          </string-name>
          , Andreas Rauber, and
          <string-name>
            <given-names>Solomon</given-names>
            <surname>Atnafu</surname>
          </string-name>
          .
          <source>Dictionary Based Amharic Sentiment Lexicon Generation</source>
          , pages
          <fpage>311</fpage>
          --
          <lpage>326</lpage>
          . 08
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>