Identification of Opinions in Arabic Texts using Ontologies
                                            Farek Lazhar and Tlili-Guiassa Yamina


Abstract. A powerful tool to track opinions in forums, blogs, e-          orientation of the entire content of a text as positive or negative
business sites, etc., has become essential for companies,                 toward a subject or an object from the subjective expressions
politicians as well as for customers, and that because of the huge        carrying the semantic orientations of the different features, but
amount of texts available which make the manual exploration               the key questions that we should ask are:
more and more difficult and useless. In this paper, we present
our approach of identification of opinions based on an                              How to get this set of features?
ontological exploration of texts. This approach aims to study the
role of domain ontologies and their contributions in the
identification phase. In our approach, domain ontology and                          What features are related to each other?
sentiments lexicon are needed as pre-requirements.
                                                                                    What model of knowledge representation to be used to
                                                                                     produce an understandable summary for the studied
1        INTRODUCTION
                                                                                     domain?
The views available on the Internet have a significant impact on
users, for example, if users have already researched opinions on          To answer these questions, we propose in this paper to
a product, they are willing to pay more for a product whose               study the role of ontologies used in opinion mining, and
opinion is more favorable than another, and the product will be           more specifically, our goal is to study how domain
more marketed than another whose opinion is less favorable
                                                                          ontology can be used to:
[14].
Companies, politicians, and customers need a powerful tool to
                                                                                    Structure the features;
track opinions, sentiments, judgments, and beliefs that people
can express in blogs, comments, or in the form of texts, toward a
                                                                                    Extract explicit and implicit features from the texts;
product, a service, a person or an organization, etc. [13].
In opinion mining area, the use of expressions as a “bag of
                                                                                    Produce summaries based on reviews and user
sentiment words” to detect the semantic orientation of the
                                                                                     comments.
overall content of a text needs to give values to those
expressions as positive, negative or neutral towards a given
                                                                          The paper is organized as follows: We present in Section 2, state
topic [10].
Generally, research works in this area can be grouped into three          of the art of the main approaches used in the field and the
main categories:                                                          motivations of our work. We present in the next section, our
                                                                          approach and the general architecture of opinions identification
        Development of linguistic and cognitive models for               process.
         opinion mining where all approaches based on
         dictionary or corpus are used automatically or semi-
         automatically to extract opinions based on the semantic          2          STATE OF THE ART
         orientations of words and phrases [2];
                                                                          1.1       Related Work
        Opinions extraction from texts, where all the local
         opinions are aggregated to determine the overall                 Overall, two main types of work are distinguished, those that are
         orientation of a text [1],[2],[6];                               based on simple features extraction from the texts, and those
                                                                          who organize features into a hierarchy using taxonomies or
        Features based opinion mining, where all the opinions            ontologies. The extraction process mainly concerns explicit
         expressed towards the characteristics of a product or an         features. We can distinguish two main families:
         object are extracted and summarized [5], [8], [9].
                                                                           Opinion     Mining    without                       Knowledge
This article focuses on identification and classification of                Representation Models
opinions in Arabic texts, which aims to calculate the semantic


                                                                     61
 All approaches that do not use knowledge representation                     Ontologies have also been used to support polarity mining.
 models are based on the use of algorithms to discover the                   For example, in [4], the authors manually built an ontology
 different characteristics of a product or an object. Only the               for movie reviews and incorporated it in the polarity
 expressions of opinions (adjectival and adverbial) are                      classification task which substantially improved the
 extracted, then a summary is produced to show for each                      performance of their approach.
 characteristic, the positive and the negative opinions and the
 total number of these categories [2], [8].
                                                                         1.2        Ontology Based Opinion Mining
 The main limitation of these approaches is that there is a large
  number of extracted features and a lack of organization. In                In [13], the use of a hierarchy of features improves the
  addition, similar concepts are not grouped (for example, in                performance of features based identification systems.
  some domains, the words “‫ ”موعد‬and “‫ ”لقاء‬witch have the                   However, works using domain ontologies exploit the ontology
  same meaning “appointment”), and possible relationships                    as a taxonomy using only “is a” relations between concepts.
  between the features of an object are not recognized                       They do not really use all data stored in an ontology, such as
  (example: “‫“ ”قهوج‬coffee” is a specific term of “‫”شسب‬                      the lexical components and other types of relationships. We
  “drink”). Thus, analysis of polarity (positive, negative or                believe that we can get several advantages in the domain of
  neutral) of the text is done by assigning the dominant polarity            opinion mining by the full use of domain ontology
  of opinion words, regardless of the polarities associated with             capabilities:
  each feature individually [10].
                                                                                    Structuring of features: Ontologies are tools that
 Opinion      Mining               with          Knowledge                          provide a lot of semantic information. They help to
  Representation Models                                                              define concepts, relationships, and entities that
                                                                                     describe a domain with an unlimited number of terms;
 The family itself can be divided into two subfamilies:
                                                                                    Extraction of features: Relationship between concepts
 (a) Use of Taxonomies                                                               and lexical information can be used to extract explicit
                                                                                     and implicit features.
  This kind of approaches does not seek a list of features, but
  rather a hierarchical organized list by the use of taxonomies.
  We recall that a taxonomy is a list of terms organized                 3           OUR APPROACH
  hierarchically through a sort of “is a kind of”. In [5] the
  author use predefined taxonomies and semantic similarity
                                                                         1.3        Description
  measures to automatically extract the features and calculate
  the distances between concepts.
  Generally, the use of taxonomies is coupled with a                     For each studied domain, our approach requires three basic
  classification technique; the sentences corresponding to the
                                                                         elements:
  leaves of the taxonomy are extracted. At the end of the
  process, a summary that can be more or less detailed is
                                                                                    A domain ontology O, where each concept and each
  produced.
                                                                                     property is associated to a set of labels that correspond
 (b) Use of Ontologies                                                               to their semantics;

  These approaches aim to organize the features using                               A lexical resource L of opinion expressions;
  elaborated representation models. Unlike taxonomies,
  ontology is not restricted to a hierarchical relationship                         A set of texts T as comments and views.
  between concepts, but can describe other types of
  paradigmatic relations such as synonymy, or more complex               Based on the conceptual model described in [10], and on the
  relationships such as relations of composition or spatial              definition described in[3] witch define an elementary discourse
  relationships.                                                         unit (EDU) as a clause containing at least an elementary opinion
  Generally, the extracted features correspond exclusively to            unit (EOU) or a sequence of clauses that address a rhetorical
  terms contained in the ontology. The feature extraction phase          relation to a segment expressing an opinion. Note that an EOU is
  is guided by a domain ontology, built manually [11], or semi-          an explicit opinion expression composed of an explicit noun, an
  automatically [7], [9], which is then enriched by a process of         adjective or a verb with its possible modifiers (negation and
  automatic extraction of terms, corresponding to new features           adverbs).
  identification.                                                        In a review, the opinion holder comments a set of features of an
  Similar features are grouped together using semantic                   object or a product using opinion expressions. Each feature
  similarity measures.                                                   corresponds to a concept or a property in the ontology O.


                                                                    62
For each extracted EDU, the system:                                                      the used opinions expressions. For example, if our
                                                                                         lexicon contains the concept “‫”طثيعح‬, “nature”, and
           Extracts EOUs using an approach based on rules;                              sentiments lexicon contains the word “‫”خالب‬,
                                                                                         “amazing”, from the EDU “‫”طثيعح خالتح‬, “amazing
           Extracts features that correspond to the process of                          nature”, it is easy to extract the couple (‫طثيعح‬, ‫)خالتح‬,
            terms extraction using the domain ontology;                                  (nature, amazing) from the text.

           Associates, for each feature within the EDU, the set of                     Known Opinionated Features and Unknown Opinion
            opinion expressions;                                                         Expressions: Expressions, as in the EDU “‫”وتائج مقثولح‬,
                                                                                         “acceptable results”, where the opinion word “‫”مقثول‬,
   We detail below, these steps:                                                         “acceptable” was not extracted in step (a) (see section
                                                                                         3.1). In this case, the lexicon of opinions can be
                                                                                         automatically updated with the recovered opinion word.
   (a) Extraction of Elementary Opinion Units: Nouns,
       adjectives or verbs may be associated with certain
                                                                                        Unknown Opinionated Features and Unknown
       modifiers such as words of negation and adverbs. For
                                                                                         Opinion Expressions: As in the EDU “ ‫”غاتح مطسيح زائعح‬,
       example, “‫”ممتاش‬, “excellent”, “‫”ليس جيدا‬, “not good” are
                                                                                         “wonderful       rainforest”     where     the    feature
       EOUs.
                                                                                         “‫”مطسيح‬,“rainforest” has not been extracted in step (b)
For example in the following comment, the EDUs are between                               (see section 3.1), in this case, the domain ontology can
square brackets, the EOUs are underlined, and the characteristics                        be updated by adding a new concept or a new property
of the object are in bold. There is an inverse relationship                              in the right place.
between the EDUa and the EDUb, representing the review
expressed in the EDUd.                                                                  Opinion Expressions Only: As in the EDU “‫”تطيء‬,
                                                                                         “It‟s slow”. This kind of EDU expresses an implicit
                                         a[‫ اشتسيت جهاز هاتف‬، ‫]يوم أمس‬                   feature. In this case, we use the ontology properties to
                                             b[‫]حتى إذا كان الهاتف ممتاشا‬                retrieve the associated concept in the ontology.
                                                 c[‫] فان التصميم تسيط جدا‬
                                    d[‫]الشيء المخية لآلمال في هري العالمة‬               Features Only: An EDU with features alone can also be
                                                                                         an indicator of the presence of an implicit opinion
  [Yesterday, I purchased a phone] a. [Even if the phone is                              expression towards the feature as in “ ‫الحديقح أصثحت ملجأ‬
  excellent]b, [the design is very basic]c, [which is disappointing                      ‫”للمىحسفيه‬, “the park became a haven for perverts”, witch
  in this mark]d.                                                                        express a negative opinion towards “‫”الحديقح‬, “the park”.

              Figure 1. Example showing EOUs Extraction
                                                                                 1.4    Architecture of our Approach

   (b) Features Extraction
                                                                                 In this section, we present the general architecture of our
This step aims to extract for the comment all the labels of the                  approach and the different modules constituting our system:
ontology. As each concept is an explicit feature, we simply                                                         Texts
project the lexical components of the ontology on the text to
obtain, for each EDU, all the features. To extract the implicit
                                                                                                             EDUs Segmentation
features, ontology properties are used. We recall that these                      Sentiments                                                          Domain
properties are to define the relationships between concepts of the                 Lexicon                                                            Ontology
ontology. For example, the property “‫”يسوق‬,“drive” links the                                                         EDUs
concepts “‫”سائق‬,“conductor” and “‫”سيازج‬,“car”.
                                                                                          EOUs Extracting                              Features Extracting

   (c) Linking Opinions          Expressions      with     Extracted
       Features                                                                                EOUs                                        Features
                                                                                                               Features and EOUs
In this step, extracted opinions expressions in step (a) have to be                                                Associating

linked to the features extracted in step (b), i.e. we should
associate with each EDUi the set of pairs (fi, OEi). During this                                                                              Classification
                                                                                                                  Classification
step, we distinguish the following cases:                                                                                                       Techniques

                                                                                                               Classification Result
          Known Opinionated Features and Known Opinions
           Expressions: In this case, opinionated features match to                            Figure 2. General architecture of our approach


                                                                            63
          As indicated in the last figure, our system contains the               supérieures en vue de l’obtention du grade de M.Sc. en
          following modules:                                                     informatique, Département d’informatique et de recherche
                                                                                 opérationnelle, Université de Montréal, (2006)
                                                                            [8] Hu et al. „Mining and Summarizing Customer Reviews‟, In
     1.   Texts EDUs Segmentation: Generally, extraction of
                                                                                 Proceedings of the 10th ACM SIGKDD international conference on
          elementary discourse units (EDUs), depends on the                      Knowledge discovery and data mining, (2008)
          use of delimiters such as “.” , “,”, “?” “!”;                     [9] Cheng, Xiwen, and Feiyu Xu. „Fine-grained Opinion Topic and
                                                                                  Polarity Identification‟, In Proceedings of the Sixth International
     2.   EOUs Extracting: Elementary opinions units EOUs                         Language Resources and Evaluation (LREC' 08), Marrakech,
          and semantic orientations are usually extracted using a                 Morocco, (2008)
          lexicon of emotions specific to domain of study;                  [10] Farek Lazhar et al., „Identification d‟opinions dans les textes
                                                                                  arabes‟, IC, (2009)
                                                                            [11] Zhao, Lili, and Chunping Li, „Ontology Based Opinion Mining for
     3.   Features Extraction: Features can be extracted by a
                                                                                  Movie Reviews‟, In Proceedings of the 3rd International
          simple projection of the ontology on the elementary                     Conference on Knowledge Science, Engineering and Management,
          discourse units (EDUs);                                                 (2009)
                                                                            [12] Asher, Nicholas, Farah Benamara, and Yvette Y. Mathieu.
     4.   Associating UEOs to Features: Each extracted                            „Appraisal of Opinion Expressions in Discourse, Lingvisticæ
          feature should be associated to one or more elementary                  Investigationes, John Benjamins Publishing Company,
          opinions units in order to extract its semantic                         Amsterdam, Vol. 32:2, (2009)
          orientation;                                                      [13] Anaïs Cadilhac et al., „Ontolexical resources for feature based
                                                                                  opinion mining: a case study‟, Beijing, (2010)
                                                                            [14] Gillot Sébastien, „Fouille d‟opinions, Rapport de stage‟, (2010)
     5.   Classification: The last phase of our work is to                  [15] Alexander Pak et al., „Classification en polarité de sentiments avec
          classify the identified opinions into positive or                       une représentation textuelle à base de sous-graphes d‟arbres de
          negative classes using supervised classification                        dépendances‟, TALN 2011,Montpellier, 27 juin – 1er juillet,
          techniques.                                                             (2011)


4         CONCLUSION
In this paper we presented our approach based on an ontological
exploration of Arabic texts. Our method is promising because
the use of ontologies improves the extraction of features and
facilitates the association between opinions expressions and
opinionated features of the object. On the one hand, domain
ontology is useful within its list of concepts which carry much
semantic data in the system. The use of ontology concepts labels
can recognize terms that refers to the same concepts and
provides a hierarchy between these concepts. On the other hand,
ontology is useful to its list of properties between concepts that
can recognize the opinions expressed on the implicit features.


REFERENCES
[1] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan, „Thumbs
    up? Sentiment Classification using Machine Learning Techniques‟.
    Proceedings of EMNLP, (2002)
[2] Turney, Peter D., and Michael L. Littman, „Unsupervised Learning
    of Semantic Orientation from a Hundred-Billion-Word Corpus‟.
    National Research Council, Institute for Information Technology,
    Technical Report ERB-1094. (NRC#44929), (2002)
[3] Asher Nicholas and Lascarides Alex, „Logics of Conversation‟.
    Cambridge University Press, (2003)
[4] Pimwadee Chaovalit, Lina Zhou, „Movie Review Mining: a
    Comparison between Supervised and Unsupervised Classification
    Approaches‟, HICSS, (2005)
[5] Carenini, Giuseppe, Raymond T. Ng, and Ed Zwart, „Extracting
    Knowledge from Evaluative Text‟, In Proceedings of the 3rd
    international conference on Knowledge capture, (2005)
[6] Kim, Soo-Min, and Eduard Hovy, „Extracting Opinions, Opinion
    Holders, and Topics Expressed in Online News Media Text‟, In
    Proceedings of ACL/COLING Workshop on Sentiment and
    Subjectivity in Text, Sydney, Australia, (2006)
[7] Feiguina, Olga, „Résumé automatique des commentaires de
    Consommateurs‟. Mémoire présenté à la Faculté des études


                                                                       64