=Paper=
{{Paper
|id=Vol-1988/LPKM2017_paper_6
|storemode=property
|title=Document Embeddings for Arabic Sentiment Analysis
|pdfUrl=https://ceur-ws.org/Vol-1988/LPKM2017_paper_6.pdf
|volume=Vol-1988
|authors=Amira Barhoumi,Yannick Estève,Chafik Aloulou,Lamia Hadrich Belguith
|dblpUrl=https://dblp.org/rec/conf/lpkm/BarhoumiEAB17
}}
==Document Embeddings for Arabic Sentiment Analysis==
<pdf width="1500px">https://ceur-ws.org/Vol-1988/LPKM2017_paper_6.pdf</pdf>
<pre>
  Document embeddings for Arabic Sentiment Analysis

 Amira Barhoumi1,2 Yannick Estève1 Chafik Aloulou2                  Lamia Hadrich Belguith2
                      (1) LIUM, Université du Maine, 72000 Le Mans
            amira.barhoumi.etu@univ-lemans.fr , yannick.esteve@univ-lemans.fr
                          (2) MIRACL, Université de Sfax, Tunisie
     amirabarhoumi29@gmail.com , chafik.aloulou@fsegs.rnu.tn ,l.belguith@fsegs.rnu.tn


       Abstract. Research and industry are more and more focusing in finding automat-
       ically the polarity of an opinion regarding a specific subject or entity. Paragraph
       vector has been recently proposed to learn embeddings which are leveraged for
       English sentiment analysis. This paper focuses on Arabic sentiment analysis and
       investigates the use of paragraph vector within a machine learning techniques to
       determine the polarity of a given text. We tested some preprocessing method, and
       we show that light stemming enhance the performance of classification.

       Keywords: Sentiment analysis, document embedding, paragraph vector, arabic
       language.


Introduction

With the widespread of Internet and the revolution of social networks, any person could
follow his opinion and express his feelings and emotions regarding various topics, prod-
ucts, ideas, persons, etc. Many academic and industrial efforts are focusing on analyzing
opinions and sentiments by investigating automatic techniques to extract convenient in-
formation.
    Sentiment Analysis (SA) involves building systems that recognize the opinion ex-
pressed in a textual unit. It aims mainly to identify the subjectivity and polarity of a
given text. Generally, the polarity consists of positive, negative or neutral, with or with-
out their strength. SA and its applications have spread to many languages and most of
the works deal with Indo-European ones. Indeed, several researches have been carried
out for the English language. However, few works have been done for Arabic. In this
work, we are interested in Arabic language.
    In this paper, we present an application of sentiment analysis to the Arabic lan-
guage. The main contributions of this work are as follows: (1) we measure the effi-
ciency of distributed representation for Arabic sentiment analysis ASA, (2) we evaluate
the performance of neuronal techniques for sentiment classification.
    The rest of the paper is structured as follows. Section 2 discusses some related
works dedicated mainly to Arabic. In section 3, we present our methodology for ASA.
We report, in section 4, our experimental framework and discuss the obtained results.
Finally, we conclude in section 5 and give some outlooks to future work.
Related work

Nowadays, sentiment analysis is becoming very interesting1 due to the explosion of the
number of internet users and the proliferation of social networks.
     The largest amount of SA researches has been carried out for the English language.
There are few works have been done for other languages. Recently, there has been a
considerable effort to develop SA systems for the Arabic language. In this section, we
focus on works dedicated for ASA. Most of the existing methods in sentiment analy-
sis can be divided into three categories: knowledge based approach, machine learning
based approach and hybrid approach.
     The knowledge based approach uses lexicon or patterns. [4] proposed an approach
based on a local grammar which contains patterns that extract sentiment from a given
document. [3] followed the same approach based on patterns. For works based on lex-
icon, we quote the work of [5]. They manually construct a lexicon that contains 4815
words (1942 positive words and 2873 negatives ones). Their system compute the num-
ber of positive and negative words in a text in order to generate the overall polarity.
Another work is that of [6] who implemented a tool which determine the subjectiv-
ity, the polarity and the strength of an opinion. They used two general lexicons and 16
specific lexicons (8 for positive polarity and 8 for negative polarity). For the strength
computation, they manually added a score between 1 and 10 to each term in the lexicon.
Another work has been done in [25] where the authors presented a lexicon based ap-
proach for MSA. First, a lexicon has been built by applying a semi automatic method.
Then, the lexicon entries were used to detect opinion words and assign to each one
a sentiment class. [26] built a sentiment lexicon of about 120,000 Arabic words and
created a SA system on top of it. They reported a 86.89% of classification accuracy.
     Machine learning based approach views SA as a classification task. Annotated data
sets are used to train classifiers. [7] proposed a system that performs subjectivity and
sentiment analysis for social media using morphological features. [8] compared Sup-
port Vector Machines SVM, Naive Bayes NB classifiers and neural networks which are
trained on Opinion Corpus for Arabic OCA [18] and ACOM corpus [9] with different
combinations. Another machine learning approach was used in [18] where they build
the corpus OCA which consists of movie reviews written in Arabic. They also created
an English version translated from Arabic and called EVOCA. SVM and NB classifiers
are then used to create SA systems for both languages. For instance, SVM gives 90%
F-measure on OCA compared to 86.9% on EVOCA.
In multi-way sentiment analysis, [10] performed a multi-class classification, using a
scale from 1 to 5 to measure polarity. They tested SVM, decision tree C4.5, decision
table J48, KNearest Neighbors KNN, NB, MultiNaive Bayes MNB and voting (a com-
bination of KNN, decision tree and NB). They concluded that MNB is more efficient.
The authors did a flat classification, i.e there is only one level in the hierarchy. How-
ever, [11] shows that a hierarchical classification of the multi-way sentiments is better
than an ordinary flat classification. They have implemented two hierarchical structures:
one with two levels and the other with four levels. They tested SVM, NB, KNN and
decision tree techniques. They concluded that KNN is more efficient.
 1
     https://trends.google.com/trends/explore?q=sentiment%20analysis#qusentiment%20analysis
    For hybrid approach, it is a combination of the two previous ones: it uses both lex-
icons and machine learning algorithms. The earliest work is of [12] how presents a
combined classification hierarchy by applying sequentially multiple classifiers. More-
over, [13] use a lexicon of 5244 adjectives, a lexicon of 3296 idioms to improve the
classification of sentences made with SVM. [14] apply a hybrid approach to predicting
the sentiment strength of an Arabic tweet. In fact, they used a set of linear regression
models for predicting initial scores for sentences, then they adjusted these scores by
applying a set of rules extracted from existent sentiment lexicon.
    Works on Arabic SA are fewer than those on English. The mainly reasons behind
that are the followings:
 – Limited number of resources developed for ASA: there are few corpora and lexicon
   freely available [24]. For more details about previous works on ASA, we refer the
   reader to the extensive surveys presented in [28]. [29] summarizes the list of all
   freely available SA corpora for MSA and its dialects.
 – The MSA is a semitic language with rich morphology.
 – The diacritization problem of MSA (Table 1 shows the meaning change of the word
   " ÉÔg." /jml/ while changing its diacritics).
 – The way of negation detection: the existence of a negation term reverses the polar-
   ity.
 – The structure of the statement (structered, semi-structered or non structered) has an
   impact on polarity prediction [27].
 – The problem of figurative language: irony, sarcasm, etc.
 – The use of foreign words (English, French, Italian, etc) in Internet user’s content
   makes the ASA more difficult.


        Word    Possible Diacritics    Transliteration    Translation     Polarity
                                      /joumalun/         sentenses       neutral
        ÉÔg.    ÉÔg.
                                      /jamalun/          camel           neutral
                ÉÔg.
                                       /jammala/          beautify        positive
                ÉÔg.
                        Table 1. Different meaning for the word " ÉÔg."

Methodology
This work falls within the framework of the machine learning based approach. In fact,
many machine learning algorithms require, as input, vector representations. The most
common representation used in NLP is the bag of words (BOW) representation.
Despite its popularity, the BOW has two major drawbacks: the lost of the order words
and the semantic ignorance of words. Distributed representations resolve these prob-
lems. We distinguish mainly two types of embeddings:
 – word embeddings: word2vec [21] and Glove [22], etc.
 – document embeddings: paragraph vector [15] for variable length texts, sentence
   vector [23], etc.
Paragraph vector algorithm allows obtaining distributed representations (Doc2vec) for
any length sequence, ranging from phrases to documents. It efficiently computes doc-
ument vector representations in a dimensional vector space. Word vectors are located
in the vector space where words that have similar semantic and share common contexts
are mapped nearby each other in the space.
    The Doc2vec representations were used for English sentiment analysis by [15]. The
authors, Le and Mikolov, achieved the best performance with paragraph vector com-
pared to other approaches on IMDB [16] dataset which contains 100000 film reviews.
Motivated by their work, we propose using Doc2vec embeddings for Arabic sentiment
analysis. The main question asked in this work consists on measuring the efficiency of
Le and Mikolov’s SA method for Arabic language.
We built a system composed with two parts: the first one applies some linguistic prepro-
cessing on the input text, and the second uses a classifier in order to predict the polarity
of the input. We trained two classifiers: a logistic regression LR and a multilayer per-
ceptron MLP 2 . The input vector of the classifier is the embeddings obtained by learning
paragraph vector. This vector is a concatenation of the two learned vectors, one from
distributed memory version DM and one from distributed bag of words version DBOW,
each have 400 dimensions. So that, 800 is the dimension of the classifier’s input. In fact,
we kept the same neural architecture and the same hyperparameters of paragraph vector
model used by Le and Mikolov [15].


Experiments and results
In this section, we perform experiments for two tasks: binary sentiment polarity classifi-
cation and five-class classification. We test two classifiers: MLP and logistic regression.

Training data and feature extraction
The learning of Doc2vec representations needs a big corpus. According to our knowl-
edge, LABR dataset [17] is the biggest arabic dataset for SA that is freely available3 .
We used the corpus LABR for ASA . This corpus consists of 63257 book reviews writ-
ten in MSA and colloquial Arabic, each with a rating 1 to 5 stars. Table 2 describes the
distribution of the reviews on different classes.

Data preprocessing
We use LABR dataset that contains book reviews. The plain reviews without any pre-
processing consists the baseline of our experiments. In other words, each token in the
review is considered as a normal word.
For sentiment analysis, some special characters such as ! ? carry sentiments. Moreover,
 2
     The MLP contains one hidden layer with 50 units in order to predict the sentiment.
 3
     LARB dataset is available on http://www.mohamedaly.info/datasets/labr
                     Very negative     Negative Neutral      Positive    Very positive   Total
     Training        2331              4195     9762         15189       19129           50606
     Test            608               1090     2439         3865        1649            12651

                   Table 2. LABR corpus: the reviews distribution on different classes


some combinations of these special characters, for example :) :(, are smileys which
are significant for our task. So it is important to consider them as words. Following an
analysis of our corpus, we found that many punctuations are agglutinated to words. For
this reason, the first preprocess applied over LABR consists on separating punctuations
from words and considers them as normal words.
    An other experimentation consists on applying stemming for LARB. In fact, the
stemming (either light or not) reduces the size of vocabulary. The stemming is the pro-
cess of eliminating the affixes of words and reducing them to their roots. However,
the light stemming removes only prefixes and/or suffixes, without manipulation of the
infixes of the word. For example, the two words ©K@P et ¨ðQÓ (table 3) have the same
stem [rgb]0.24,0.7,0.44or root ( ¨ , @ , P ) but, they don’t have the same polarity. So,
applying light stemming4 is relevant for Arabic SA.


      Stem                Light stem       Transliteration    Translation         Polarity
                                           /mrwE/             terrible            negative
      ( ¨, @, P)          ¨ðQÓ
                                           /r|}E/             fabulous            positive
      ( ¨, @, P)          ©K@P
                                 Table 3. Light stemming and polarity


Arabic SA experiments
In this work, two types of classification are performed: binary classification and multi-
class one. Binary classification considers only two classes: positive and negative. How-
ever, in multi-class classification, there are five classes: very positive, positive, neutral,
negative and very negative. The same method and hyperparameters are used for both
classification tasks: binary sentiment classification and five-classes classification.
    To evaluate the performance of SA on the LABR dataset, we carried out several ex-
periments using various configuration. All the experiments were conducted in Python
using Theano5 for classification and gensim6 for learning vector representation. For
machine learning methods, we investigate two classifiers: logistic regression LR and
multi-layer perceptron MLP. The input of the each sentiment classifier is a set of fea-
tures vectors obtained with paragraph vector algorithm. In fact, we tested three different
 4
   In this work, we use the light stemmer https://github.com/motazsaad/arabic-light-stemmer
 5
   http://deeplearning.net/software/theano/
 6
   https://radimrehurek.com/gensim/
types of Doc2vec vectors: (1) vectors obtained with DM version of paragrath vector al-
gorithm, (2) vectors obtained with DBOW version, and (3) concatenation of the vectors
obtained separately with DM and DBOW.


Results and discussion

In binary classification framework, the results of the different classifiers with different
experimental prepocessing are presented in Table 4. The empty set symbol ∅ means
that there is no preprocessing step: we used the review as it stands, without any modi-
fication. It represents the baseline of the experiments conducted. The MLP classifier is
more efficient than the logistic regression. However, this is not the case when applying
preprocessing: the regression classifier becomes more efficient. We notice that there is
a little difference in the performances of two classifiers. The lower error rate is 23.31%
and it is obtained with logistic regression by applying light stemming. There is a 2%
gain after light stemming and special character preprocessing. We think that this low
value of gain obtained by applaying light stemming comes from the quantity of MSA
words that exist in the corpus. In fact, The reviews of LABR dataset are written in MSA
and dialectal Arabic.
We conclude that Arabic language, as opposed to English, requires a specific processing
process in order to enhance the performance of SA.


                                          Regression      MLP
                  ∅                       25.60%          24.61%
                  Special character       25.32%          25.46%
                  Light stemming          23.31%          23.35%

                Table 4. Error rate of various experiments over LABR dataset


    It’s well known that paragraph vector can capture the semantic similarity between
words. Or, among the objectives of this paper is to measure the effeciency of document
embeddings for ASA. For example, the words YJ k. "good" and PA JÜØ "excellent" are
close to each other.
To ensure the effectiveness of Doc2vec algorithm for arabic language, we look to the
most similar words to some sentiment words. Here, we report the 10 top words similar
to the word YJk. "good" which are in the following order: ÉJÔg. beautiful, ©K@P fabulous,
©JÜØ enjoyable, YJ®Ó useful, J  interesting, ÉÜØ boring, ­J®k light, PAJÜØ excellent,
                very. Among these words, seven words are semantically similar to
­J ¢Ë nice , @ Yg.
YJk. "good". We notice some similarity errors:


 – The word ÉÜØ "boring" is close to YJk. "good", which is not true.
 – The word ÉÜØ "boring" is closer to YJ k. "good" than PA JÜØ "excellent", which is
   false.
We think that this similarity error is strongly linked to the way of Doc2vec learning.
In fact, paragraph vector algorithm extract representations that covers syntactic and se-
mantic information based on the context. This means that words with similar context
are very near in the vector space, even antonyms. To circumvent this problem, the rep-
resentations should be constructed by predicting the context and the polarity at a time.

                                           Regression       MLP
                  Error rate               67.62%           69.42%
                Table 5. Error rate in a multi-class classification framework

     For multi-class classification, we tested also MLP and LR classifiers. The perfor-
mances obtained in multi-class classification framework are reported in table 5. In
this framework, the input of each classifier is LABR dataset after application of light
stemming and special character preprocessing. Table 5 shows that logistic regression
is more efficient that MLP. In fact, the error rate with regression is lower than with
MLP. Moreover, the error rate in the binary classification framework is lower than in
multi-class framework. Indeed and under the same dataset preprocessing and classifier
hyperparameters, the error rate obtained with regression is 23.31% with binary clas-
sification and 67.62% in multi-class classification which is obviously much harder to
handle. In fact, having more classes is not the only challenge imposed by multi-class
classification. The other difficulty comes from the relation between some classes, i.e
the relation between positive and very positive polarities and relation between negative
and very negative polarities.

                        works with LABR               Accuracy
                        Our work                      32.38%
                        [10]                          45%
                        [11]                          45.7%

                    Table 6. Flat hierarchy for multi-class classification

    In this work, we adopted a flat classification (table 6) and we obtained an accuracy
equal to 32.38% by using regression classifier over Doc2vec representations. However,
[10] used muti-Naive Bayes over BOW vectors. They obtained 45% as accuracy. All
works mentioned in tables 6 and 7 use LABR dataset for their experiments.

          Works with LABR          #Levels                  Accuracy
          [11]                     2                        46.2%
                                   4                        57.8%
                 Table 7. multi-level hierarchy for multi-class classification


    [11] prove that multi-level hierarchy enhance the performance of multi-class frame-
work (table 7). They used KNN classifier and they obtained an accuracy equal to 46.2%
with 2-level hierarchy. But, they obtained 57.8% as accuracy with a 4-level hierarchy.
Conclusion and future works
In this paper, we have made an Arabic sentiment analysis which uses embedding. The
aim of this study is to measure the utility of Doc2vec embeddings in Arabic SA frame-
work. The results reported in this paper match the difficulty of Arabic with respect
to English. Arabic is morphological rich language. So dealing with Arabic requires
preprocessing step. With the purpose to study the potential of preprocessing, we had
principally tested the contribution of light stemming in improving performance.
    As future work, we think that using tokenization in preprocessing could enhance the
performance [30]. Moreover, Adding a stop word list consists an other way of prepro-
cessing. We would like also to test the common BOW representation: the input of our
classifiers becomes BOW vectors, not Doc2vec embeddings. So that we could compare
the two different representations.


References
1. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. In: Foundations and Trends in
   Information Retrieval (2008)
2. Korayem, M., Crandall, D., Abdul-Mageed, M.: Subjectivity and sentiment analysis for ara-
   bic: A survey. In: Advanced Machine Learning Technologies and Applications, vol. 322, pp.
   128–139. (2012)
3. Farra, N., Challita, E., Assi, R.A., Hajji, H.: Sentence-level and document-level sentiment
   mining for arabic texts. In: ICDMW (2010)
4. Almas, Y., Ahmad, K.: A note on extracting sentiments in financial news in english, arabic &
   ardu. In: CAASL (2007)
5. Abdulla, N.A., Ahmed, N.A., Shehab, M.A., Al-Ayyoub, M., Al-Kabi, M.N.: Towards Im-
   proving he Lexicon-Based Approach for Arabic Sentiment Analysis. In: International Journal
   of Information Technology and Web Engineering, pp. 55–71. (2014)
6. Al-Kabi, M.N., Gigieh, A.H., Alsmadi, I.M., Wahsheh, H.A.: Opinion Mining and Analysis
   for Arabic Language. In: International Journal of Advanced Computer Science and Applica-
   tions, pp. 181–195. (2014)
7. Abdulla, N.A., Al-Ayyoub, M., Al-Kabi, M.N.: An extended analytical study of arabic senti-
   ments. In: International Journal of Big Data Intelligence, vol. 1, pp. 103–113. (2014)
8. Bayoudhi, A., Hadrich Belghith, L., Ghorbal, H.: Sentiment Classification of Arabic Docu-
   ments: Experiments with multi-type features and ensemble algorithm. In: 29th Pacific Asia
   Conference on Language, Information and computation, Shanghai (2015)
9. Mountassir, A., Benbrahim, H., Berrada, I.: Sentiment classification on Arabic corpora: A
   preliminary cross-study. In: Document Numérique, vol. 16, pp. 73–96. (2013)
10. Al Shboul, B., Al-Ayyoub, M., Jararweh, Y.: Multi-Way Sentiment Classification of Arabic
   Reviews. In: the 6th International Conference on Information and Communication Systems
   (ICICS 2015) (2015)
11. Al-Ayyoub, M., Nuseir, A., Kanaan, G., Al-Shalabi, R.: Hierarchical Classifiers for Multi-
   Way Sentiment Analysis of Arabic Reviews. In: International Journal of Advanced Computer
   Science and Application, vol. 7 (2016)
12. El-Halees, A.: Arabic opinion mining using combined models. In International Arab Con-
   ference On Information Technology, (2011)
13. Ibrahim, H.S., Abdou, S.M., Gheith, M.: Sentiment analysis for modern standard Arabic and
   colloquial. In: International Journal on Natural Language Computing (IJNLC), vol. 4 (2015)
14. Refaee, E., Rieser, V.: iLab-Edinburgh at SemEval-2016 Task 7: A Hybrid Approach for
   Determining Sentiment Intensity of Arabic Twitter Phrases. In: Proceedings of SemEval-2016,
   pp. 474–480 (2016)
15. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: proceedings
   ofthe 31 International Conference om Machine Learning (2014)
16. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors
   for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for
   Computational Linguistics (2011)
17. Mahmoud, N., Aly, M., Atiya, A.: LABR 2.0: Large Scale Arabic Sentiment Analysis Bench-
   mark. In: arXiv e-print (arXiv:1411.6718). (2014)
18. Rushdi-Saleh, M., Martin-Valdivia, M.T., Urena-Lopez, L.A., Perea-Ortega, J.M.: Bilingual
   Experiments with an Arabic-English Corpus for Opining Mining. In: Proceedings of Recent
   Advances in Natural Language Processing, pp. 740–745, (2011)
19. Biltawi, M., Etaiwi, W., Tedmori, S., Hudaib, A., Awajan, A.: Sentiment Classification Tech-
   niques for Arabic Language: A survey. In: the 7th International Conference on Information
   and Computation System ICICS (2016)
20. Korayem, M.: Sentiment/Subjectivity analysis survey for languages other than English. In:
   arXiv:1601.00087v2 [cs.CL] (2016)
21. Mikolov, T., chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
   vector space. (2013)
22. Pennington, J., Rocher, R., Manning, C.D.: Glove: Global vectors for words representation.
   In: EMNLP, vol. 14, pp. 1532–1543. (2014)
23. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recur-
   sive deep models for semantic compositionality over a sentiment treebank. In: Conference on
   Emperical Methods in Natural Language Processing (2011)
24. Al-Kabi, M., Al-Ayyoub, M., Alsmadi, I., Wahsheh, H.: A Prototype for a Standard Arabic
   Sentiment Analysis Corpus. In: International Arab Journal of Information Technology, pp.
   163–170. (2016)
25. Bayoudhi, A., Ghorbel, H., Koubaa, H., Hadrich Belguith, L.: entiment classification at dis-
   course segment level: Experiments on multi-domain arabic corpus. In: Journal for Language
   Technology and Computational Linguistics (2015)
26. Al-Ayyoub, M., Bani Essa, S., Alsmadi, I.T.: Lexicon-based sentiment analysis of arabic
   tweets. In: International Journal of Social Network Mining, pp. 101–114 (2015)
27. Doaa Mohey El-Din Mohamed, H.: A survey on sentiment analysis challenges. In: Journal
   of King Saud University–Engineering Sciences (2016)
28. Biltawi, M., Etaiwi, W., Tedmori, S., Hudaib, A., Awajan, A.: Sentiment classification tech-
   niques for arabic language: A survey. (2016)
29. Mdhaffar, S., Bougares, F., Estève, Y., Hadrich Belguith, L.: Sentiment Analysis of Tunisian
   Dialect: Linguistic Resources and Experiments. In: Proceedings of The Third Arabic Natural
   Language Processing Workshop (WANLP), pp. 55–61 (2017)
30. Duwairi, R.M., Qarqaz, I,: Arabic Sentiment Analysis using Supervised Classification. In:
   The 1st International Workshop on Social Networks Analysis, Management and Security
   (SNAMS-2014),Barcelona, Spain, (2014)

</pre>