Identification of Opinions in Arabic Texts using Ontologies Farek Lazhar and Tlili-Guiassa Yamina Abstract. A powerful tool to track opinions in forums, blogs, e- orientation of the entire content of a text as positive or negative business sites, etc., has become essential for companies, toward a subject or an object from the subjective expressions politicians as well as for customers, and that because of the huge carrying the semantic orientations of the different features, but amount of texts available which make the manual exploration the key questions that we should ask are: more and more difficult and useless. In this paper, we present our approach of identification of opinions based on an  How to get this set of features? ontological exploration of texts. This approach aims to study the role of domain ontologies and their contributions in the identification phase. In our approach, domain ontology and  What features are related to each other? sentiments lexicon are needed as pre-requirements.  What model of knowledge representation to be used to produce an understandable summary for the studied 1 INTRODUCTION domain? The views available on the Internet have a significant impact on users, for example, if users have already researched opinions on To answer these questions, we propose in this paper to a product, they are willing to pay more for a product whose study the role of ontologies used in opinion mining, and opinion is more favorable than another, and the product will be more specifically, our goal is to study how domain more marketed than another whose opinion is less favorable ontology can be used to: [14]. Companies, politicians, and customers need a powerful tool to  Structure the features; track opinions, sentiments, judgments, and beliefs that people can express in blogs, comments, or in the form of texts, toward a  Extract explicit and implicit features from the texts; product, a service, a person or an organization, etc. [13]. In opinion mining area, the use of expressions as a “bag of  Produce summaries based on reviews and user sentiment words” to detect the semantic orientation of the comments. overall content of a text needs to give values to those expressions as positive, negative or neutral towards a given The paper is organized as follows: We present in Section 2, state topic [10]. Generally, research works in this area can be grouped into three of the art of the main approaches used in the field and the main categories: motivations of our work. We present in the next section, our approach and the general architecture of opinions identification  Development of linguistic and cognitive models for process. opinion mining where all approaches based on dictionary or corpus are used automatically or semi- automatically to extract opinions based on the semantic 2 STATE OF THE ART orientations of words and phrases [2]; 1.1 Related Work  Opinions extraction from texts, where all the local opinions are aggregated to determine the overall Overall, two main types of work are distinguished, those that are orientation of a text [1],[2],[6]; based on simple features extraction from the texts, and those who organize features into a hierarchy using taxonomies or  Features based opinion mining, where all the opinions ontologies. The extraction process mainly concerns explicit expressed towards the characteristics of a product or an features. We can distinguish two main families: object are extracted and summarized [5], [8], [9].  Opinion Mining without Knowledge This article focuses on identification and classification of Representation Models opinions in Arabic texts, which aims to calculate the semantic 61 All approaches that do not use knowledge representation Ontologies have also been used to support polarity mining. models are based on the use of algorithms to discover the For example, in [4], the authors manually built an ontology different characteristics of a product or an object. Only the for movie reviews and incorporated it in the polarity expressions of opinions (adjectival and adverbial) are classification task which substantially improved the extracted, then a summary is produced to show for each performance of their approach. characteristic, the positive and the negative opinions and the total number of these categories [2], [8]. 1.2 Ontology Based Opinion Mining The main limitation of these approaches is that there is a large number of extracted features and a lack of organization. In In [13], the use of a hierarchy of features improves the addition, similar concepts are not grouped (for example, in performance of features based identification systems. some domains, the words “‫ ”موعد‬and “‫ ”لقاء‬witch have the However, works using domain ontologies exploit the ontology same meaning “appointment”), and possible relationships as a taxonomy using only “is a” relations between concepts. between the features of an object are not recognized They do not really use all data stored in an ontology, such as (example: “‫“ ”قهوج‬coffee” is a specific term of “‫”شسب‬ the lexical components and other types of relationships. We “drink”). Thus, analysis of polarity (positive, negative or believe that we can get several advantages in the domain of neutral) of the text is done by assigning the dominant polarity opinion mining by the full use of domain ontology of opinion words, regardless of the polarities associated with capabilities: each feature individually [10].  Structuring of features: Ontologies are tools that  Opinion Mining with Knowledge provide a lot of semantic information. They help to Representation Models define concepts, relationships, and entities that describe a domain with an unlimited number of terms; The family itself can be divided into two subfamilies:  Extraction of features: Relationship between concepts (a) Use of Taxonomies and lexical information can be used to extract explicit and implicit features. This kind of approaches does not seek a list of features, but rather a hierarchical organized list by the use of taxonomies. We recall that a taxonomy is a list of terms organized 3 OUR APPROACH hierarchically through a sort of “is a kind of”. In [5] the author use predefined taxonomies and semantic similarity 1.3 Description measures to automatically extract the features and calculate the distances between concepts. Generally, the use of taxonomies is coupled with a For each studied domain, our approach requires three basic classification technique; the sentences corresponding to the elements: leaves of the taxonomy are extracted. At the end of the process, a summary that can be more or less detailed is  A domain ontology O, where each concept and each produced. property is associated to a set of labels that correspond (b) Use of Ontologies to their semantics; These approaches aim to organize the features using  A lexical resource L of opinion expressions; elaborated representation models. Unlike taxonomies, ontology is not restricted to a hierarchical relationship  A set of texts T as comments and views. between concepts, but can describe other types of paradigmatic relations such as synonymy, or more complex Based on the conceptual model described in [10], and on the relationships such as relations of composition or spatial definition described in[3] witch define an elementary discourse relationships. unit (EDU) as a clause containing at least an elementary opinion Generally, the extracted features correspond exclusively to unit (EOU) or a sequence of clauses that address a rhetorical terms contained in the ontology. The feature extraction phase relation to a segment expressing an opinion. Note that an EOU is is guided by a domain ontology, built manually [11], or semi- an explicit opinion expression composed of an explicit noun, an automatically [7], [9], which is then enriched by a process of adjective or a verb with its possible modifiers (negation and automatic extraction of terms, corresponding to new features adverbs). identification. In a review, the opinion holder comments a set of features of an Similar features are grouped together using semantic object or a product using opinion expressions. Each feature similarity measures. corresponds to a concept or a property in the ontology O. 62 For each extracted EDU, the system: the used opinions expressions. For example, if our lexicon contains the concept “‫”طثيعح‬, “nature”, and  Extracts EOUs using an approach based on rules; sentiments lexicon contains the word “‫”خالب‬, “amazing”, from the EDU “‫”طثيعح خالتح‬, “amazing  Extracts features that correspond to the process of nature”, it is easy to extract the couple (‫طثيعح‬, ‫)خالتح‬, terms extraction using the domain ontology; (nature, amazing) from the text.  Associates, for each feature within the EDU, the set of  Known Opinionated Features and Unknown Opinion opinion expressions; Expressions: Expressions, as in the EDU “‫”وتائج مقثولح‬, “acceptable results”, where the opinion word “‫”مقثول‬, We detail below, these steps: “acceptable” was not extracted in step (a) (see section 3.1). In this case, the lexicon of opinions can be automatically updated with the recovered opinion word. (a) Extraction of Elementary Opinion Units: Nouns, adjectives or verbs may be associated with certain  Unknown Opinionated Features and Unknown modifiers such as words of negation and adverbs. For Opinion Expressions: As in the EDU “ ‫”غاتح مطسيح زائعح‬, example, “‫”ممتاش‬, “excellent”, “‫”ليس جيدا‬, “not good” are “wonderful rainforest” where the feature EOUs. “‫”مطسيح‬,“rainforest” has not been extracted in step (b) For example in the following comment, the EDUs are between (see section 3.1), in this case, the domain ontology can square brackets, the EOUs are underlined, and the characteristics be updated by adding a new concept or a new property of the object are in bold. There is an inverse relationship in the right place. between the EDUa and the EDUb, representing the review expressed in the EDUd.  Opinion Expressions Only: As in the EDU “‫”تطيء‬, “It‟s slow”. This kind of EDU expresses an implicit a[‫ اشتسيت جهاز هاتف‬، ‫]يوم أمس‬ feature. In this case, we use the ontology properties to b[‫]حتى إذا كان الهاتف ممتاشا‬ retrieve the associated concept in the ontology. c[‫] فان التصميم تسيط جدا‬ d[‫]الشيء المخية لآلمال في هري العالمة‬  Features Only: An EDU with features alone can also be an indicator of the presence of an implicit opinion [Yesterday, I purchased a phone] a. [Even if the phone is expression towards the feature as in “ ‫الحديقح أصثحت ملجأ‬ excellent]b, [the design is very basic]c, [which is disappointing ‫”للمىحسفيه‬, “the park became a haven for perverts”, witch in this mark]d. express a negative opinion towards “‫”الحديقح‬, “the park”. Figure 1. Example showing EOUs Extraction 1.4 Architecture of our Approach (b) Features Extraction In this section, we present the general architecture of our This step aims to extract for the comment all the labels of the approach and the different modules constituting our system: ontology. As each concept is an explicit feature, we simply Texts project the lexical components of the ontology on the text to obtain, for each EDU, all the features. To extract the implicit EDUs Segmentation features, ontology properties are used. We recall that these Sentiments Domain properties are to define the relationships between concepts of the Lexicon Ontology ontology. For example, the property “‫”يسوق‬,“drive” links the EDUs concepts “‫”سائق‬,“conductor” and “‫”سيازج‬,“car”. EOUs Extracting Features Extracting (c) Linking Opinions Expressions with Extracted Features EOUs Features Features and EOUs In this step, extracted opinions expressions in step (a) have to be Associating linked to the features extracted in step (b), i.e. we should associate with each EDUi the set of pairs (fi, OEi). During this Classification Classification step, we distinguish the following cases: Techniques Classification Result  Known Opinionated Features and Known Opinions Expressions: In this case, opinionated features match to Figure 2. General architecture of our approach 63 As indicated in the last figure, our system contains the supérieures en vue de l’obtention du grade de M.Sc. en following modules: informatique, Département d’informatique et de recherche opérationnelle, Université de Montréal, (2006) [8] Hu et al. „Mining and Summarizing Customer Reviews‟, In 1. Texts EDUs Segmentation: Generally, extraction of Proceedings of the 10th ACM SIGKDD international conference on elementary discourse units (EDUs), depends on the Knowledge discovery and data mining, (2008) use of delimiters such as “.” , “,”, “?” “!”; [9] Cheng, Xiwen, and Feiyu Xu. „Fine-grained Opinion Topic and Polarity Identification‟, In Proceedings of the Sixth International 2. EOUs Extracting: Elementary opinions units EOUs Language Resources and Evaluation (LREC' 08), Marrakech, and semantic orientations are usually extracted using a Morocco, (2008) lexicon of emotions specific to domain of study; [10] Farek Lazhar et al., „Identification d‟opinions dans les textes arabes‟, IC, (2009) [11] Zhao, Lili, and Chunping Li, „Ontology Based Opinion Mining for 3. Features Extraction: Features can be extracted by a Movie Reviews‟, In Proceedings of the 3rd International simple projection of the ontology on the elementary Conference on Knowledge Science, Engineering and Management, discourse units (EDUs); (2009) [12] Asher, Nicholas, Farah Benamara, and Yvette Y. Mathieu. 4. Associating UEOs to Features: Each extracted „Appraisal of Opinion Expressions in Discourse, Lingvisticæ feature should be associated to one or more elementary Investigationes, John Benjamins Publishing Company, opinions units in order to extract its semantic Amsterdam, Vol. 32:2, (2009) orientation; [13] Anaïs Cadilhac et al., „Ontolexical resources for feature based opinion mining: a case study‟, Beijing, (2010) [14] Gillot Sébastien, „Fouille d‟opinions, Rapport de stage‟, (2010) 5. Classification: The last phase of our work is to [15] Alexander Pak et al., „Classification en polarité de sentiments avec classify the identified opinions into positive or une représentation textuelle à base de sous-graphes d‟arbres de negative classes using supervised classification dépendances‟, TALN 2011,Montpellier, 27 juin – 1er juillet, techniques. (2011) 4 CONCLUSION In this paper we presented our approach based on an ontological exploration of Arabic texts. Our method is promising because the use of ontologies improves the extraction of features and facilitates the association between opinions expressions and opinionated features of the object. On the one hand, domain ontology is useful within its list of concepts which carry much semantic data in the system. The use of ontology concepts labels can recognize terms that refers to the same concepts and provides a hierarchy between these concepts. On the other hand, ontology is useful to its list of properties between concepts that can recognize the opinions expressed on the implicit features. REFERENCES [1] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan, „Thumbs up? Sentiment Classification using Machine Learning Techniques‟. Proceedings of EMNLP, (2002) [2] Turney, Peter D., and Michael L. Littman, „Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus‟. National Research Council, Institute for Information Technology, Technical Report ERB-1094. (NRC#44929), (2002) [3] Asher Nicholas and Lascarides Alex, „Logics of Conversation‟. Cambridge University Press, (2003) [4] Pimwadee Chaovalit, Lina Zhou, „Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches‟, HICSS, (2005) [5] Carenini, Giuseppe, Raymond T. Ng, and Ed Zwart, „Extracting Knowledge from Evaluative Text‟, In Proceedings of the 3rd international conference on Knowledge capture, (2005) [6] Kim, Soo-Min, and Eduard Hovy, „Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text‟, In Proceedings of ACL/COLING Workshop on Sentiment and Subjectivity in Text, Sydney, Australia, (2006) [7] Feiguina, Olga, „Résumé automatique des commentaires de Consommateurs‟. Mémoire présenté à la Faculté des études 64