CRF-based Arabic Opinion Summarization System Imen touati, Marwa Graja, Mariem Ellouze, and Lamia Hadrich Belguith Miracl Laboratory, Arabic Natural Language Processing research Group (ANLP-RG), University of Sfax, Tunisia {imen_touati@yahoo.fr,marwa.graja@fsegs.rnu.tn, mariem.ellouze@planet.tn,l.belguith@fsegs.rnu.tn} Abstract. This paper presents the study that we have carried out to investigate supervised opinion summarization in Modern Standard Ara- bic. We use a corpus of news articles. We use conditional random fields (CRF) as machine learning technique. We investigate some features to identify those that allow achieving the best results. Our contribution is to use opinion specific features to summarize Arabic news articles using CRF models. Experimental results show that our proposed approach is very effective for assigning features to sentences. Keywords: Arabic news article, opinion analysis, arabic opinion sum- marization 1 Introduction Opinion summarization task is the overlap of two important fields : opinion mining and text summarization. Opinion mining or interchangeably called sen- timent analysis aims to the analysis of people’s opinions, judgments, evaluations about a specific entities, individuals, events or topics. Text summarization is an old field since fifties [17] that aims at the extraction of key sentences from a document. Recently, huge corpus have appeared with the growth of Internet. To obtain a comprehensive understanding of detailed opinions in the massive number of product reviews, blogs, news articles, etc., many studies on document summarization of evaluative text, on review mining and summarization have risen ([5],[6],[10]). With opinion summarization, the goal is no longer to produce only a summary of informations in the text, but it’s necessary to determine ex- pressed opinions with their semantic orientation (positive, negative) and more narrowly with their semantic categorization [22]. Opinion summarization is challenging and very useful task. Text Analysis Conference (TAC 2008) has dedicated a pilot task to opinion summarization where it’s asked to write summaries of opinions from blogs. Traditional summarization techniques focus on identifying a documents main topics, removing redundancies, and ordering the extracted sentences [11]. 2 Lecture Notes in Computer Science: Authors’ Instructions In our current work, we address the problem of opinion summarization by considering the creation of simple opinion summaries. Our contribution consists of seeking a more precise definition of the features that can be used effectively in the automatic extraction of opinion summaries. The remainder of this paper is organized as follows. In section 2, we focus on the most closely related studies on opinion summarization. In section 3, we present experiments performed for detecting opinionated sentences. Finally, con- clusion is presented in the last section. 2 Opinion Summarization In literature, there are different studies with different definitions of what an opinion summary should be. In general, opinion summary may have different forms : a single paragraph, a structured sentence, attribute-value pairs or just consist at an overall score of sentiment conveyed in a document. To resolve this problem, researchers have studied approaches for automati- cally summarizing or analyzing opinions expressed in the review data ([15],[20]). [12] classifies existing approaches under two main categories: aspect oriented summarization and non-aspect oriented summarization. Generally, reviews have been the focus of the majority of researches in the sentiment summarization field. Most existing work in the field of opinion or sentiment summarization is under feature based or aspect based summarization technique umbrella. The key idea of this technique is to identify the features of a product and opinion sentences towards each feature. [16] defines a set of user questions to summarize an English review in order to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. They treat the problem of aligning questions to a review as text summarization problem, with the goal of finding relevant and non-redundant questions for a review. Other works create a textual sentiment summary based on extraction of rel- evant sentences. The work of [2] consists of selection of a single passage that reflect the opinion of the document’s author. While [19] proposes tracking the sentiment flow within the document to create sentiment summary. So they sug- gest to choose the sentences at local extrema of the flow (plus the first and last sentence). Other studies are influenced by information extraction methods. They propose to view summary representation as template ([4],[8]). [23] proposes to select a set of most representative review sentences for the nominal features of each product. [21] are interested by summarizing multiple contrastive viewpoints in opinion- ated text. The work of [7] defines a novel task of generating entity comparisons from textual corpora in which each document describes one entity at a time. [13] summarizes reviews by choosing complementary reviews and ranking them according to different strategies. Several interesting and advanced works were performed on English. In con- trast and to the best of our knowledge; no work was done about opinion sum- Lecture Notes in Computer Science: Authors’ Instructions 3 marization for Arabic language, although that Arabic document summarization is quite a hot topic in the Arabic research community due to its utility for many tasks for NLP ([1],[3]). 3 CRF-based Arabic Opinion Summarization 3.1 Corpus The corpus used to perform experiments is a set of news articles from Arabic TreeBank (ATB part3 v3.2) [18] and from some sites of news channels like ”Al- jazeera.Net”, ”BBC Arabic” and ”France 24 Arabic”. In each article, relevant sentences was annotated manually. Each article presents a set of opinions ex- pressed by different holders about a topic from a political domain. A holder may be a person, an organization, a country, political party, etc. The topic may be a political event, a political person, etc. Differently from [24], we annotate targets with specification of opinion ex- pressions that are linked to them and their type. Each labeled target has a type (main-topic, part-of-topic and Other-topic). Our corpus has gone through a semantic opinion expression annotation [22]. Fig. 1. A sample of annotated sentence 3.2 Problem definition Standard formulation of the problem of opinion summarization assumes a docu- ment D composed of a set of sentences D = {x1 , ..., xN } which contains opinions about a specific topic T. The objective is to generate a summary S of the opin- ions expressed in the document D about T. Each opinion is indicated by an opinion expression, by a holder (the source of the opinion) about a topic T. In this work, we look at an extractive summarization settings where S is built by extracting the most important opinion sentences about the main topic from the document D. We assume that D is a set of candidate sentences of our summary. 3.3 CRF definition In order to investigate the opinion summarization task in Arabic news articles, we applied a machine learning process, based on the Conditional Random Fields 4 Lecture Notes in Computer Science: Authors’ Instructions (CRF) models [14]. CRF, as sequential discriminative probabilistic model, has proved its efficiency in various Natural Language Processing applications, such as named entity identification and morphological tagging. It is also used for many tasks of opinion mining in English, Chinese texts. For Arabic language, CRF has been adopted by [9] for opinion holder extraction. We can address the problem of opinion summarization as a sequential classi- fication problem where we estimate the conditional probability of a sequence of output values (the class of each lexical unit) S = y1 ...yN given an input sequence (observations) D = x1 ...xN . Then,the conditional probability p(y|x) for linear chain CRFs is given as [14]: 1 X X p(S|D) = exp( λj tj (y(i−1) , yi , D, i) + µk sk (yi , D, i)) (1) Z(D) j k where ZD is the normalization factor. Our implementation of Conditional Random Fields is based on the CRF++ tool1 . It’s used by [14] for sequence labeling classification. 3.4 Features Detecting sentences which contain an expression of an opinion by a holder about a main-topic of the news articles, is a challenge in Arabic opinion mining. This affects the selection of training features for the considered task. Therefore, we propose, for training, to use a set of opinion specific features as follows: – Token: this feature represents the string of the current token as a feature. This feature introduces lexical information about the domain. We will refer to this feature as Tok in the templates table. – Opinion Expression: this feature indicates the existence of an opinion expression in the considered sentence or not. We will refer to this feature as OpExp in the templates table. – Holder: this feature is used when there is a holder who expresses an opinion in the considered sentence or not. We will refer to this feature as Hold in the templates table. – Target: this feature indicates the existence of span of text representing the target of the opinion conveyed in the considered sentence or not. We will refer to this feature as Targ in the templates table. – main-topic: this feature indicates if the target about which is expressed the opinion in the sentence is the main-topic of the news article or not. We will refer to this feature as maintop in the templates table. – N-gram: this feature represents bi- and tri-gram expression. We will refer to this feature as bi- or tri- in the templates table. – Tokens in context: this feature consists in the words preceding and fol- lowing the considered one which forming a window with variable size (1 and 1 https://taku910.github.io/crfpp/ Lecture Notes in Computer Science: Authors’ Instructions 5 2). To determine the best window size, we performed a set of experiments with different window size on our data. We will refer to this feature as +1 or +2 to designate tokens preceding or following the current token in the templates table. Table 1. Different Features combinations Template0 Tok Template1 Tok+(+1)+(-1) Template2 Tok+(+1)+(-1)+(bi-) Template3 Tok+(+1)+(-1)+(bi-) Template4 Tok+OpExp Template5 Tok+OpExp+Hold Template6 Tok+OpExp+Hold+Targ Template7 Tok+OpExp+Hold+Targ Template8 Tok+OpExp+Hold+Targ+(+1)+(-1) Template9 Tok+OpExp+Hold+Targ+maintop Template10 Tok+OpExp+Hold+Targ+maintop+(+1)+(-1) 3.5 Experiments Usually, the evaluation process consists in comparing the result file of the test step with a carefully annotated file. We carried out an evaluation of our proposed system in terms of three evaluation metrics: precision (P) , recall (R) and F- measure metrics. The precision value evaluates the noise of a system while recall value evaluates its coverage. These metrics are often combined using the well- known weighted harmonic F-measure. As evaluation of our system to run the task, we have to verify if the system comes to guess all the sequence of words that are composing the opinionated sentences or not. All reported experiments in this work are performing using simple validation (Table 2). For all templates, the best performance is in bold. 3.6 Discussions We have carried out standard training and evaluation. We find that main-topic feature have a key role in selecting opinionated sentences for the summary, since we obtained the best result (F-measure 90.77 %) with Template 9. Experiments show that using bigram feature made considerable increase in comparison with Template0. But forming bigram by the words following or preceding the consid- ered one as done respectively in Template2 and Template3. However, the inclu- sion of opinion expression feature in Template4 has low F-measure compared to Template0. The evaluation shows that such features encourage the inclusion of sentences in the summary that preserves the overall opinion distribution expressed across 6 Lecture Notes in Computer Science: Authors’ Instructions Table 2. Simple Validation Template Precision(%) Recall(%) F-measure(%) 0 60.12 32.62 42.29 1 95.48 44.95 61.13 2 95.55 45.74 61.86 3 95.33 45.74 61.82 4 98.77 27.13 42.57 5 97.84 76.23 85.70 6 97.85 76.57 85.91 7 97.95 74.89 84.88 8 100 69.84 82.24 9 96.82 85.43 90.77 10 100 80.16 88.98 the original document. We conclude that the proposed new features offers im- provements over traditional summarization features of opinionated text. 4 Conclusion We have studied summarization in the field of sentiment analysis with the objec- tive of producing opinion summaries in standard Arabic. Experiments show that our study focuses on the problem of automatically extracting opinionated sen- tences from Arabic news article in order to form a summary of evoked opinions. After determining opinion words, their holders, their targets, the main-topic, our summarization system, based on CRF models, generates an easily readable summary for the considered news article. References 1. Al-Saleh, A.B., Menai, M.E.B.: Automatic Arabic text summarization: a survey. Artif Intell Rev. vol. 45, 203–234 (2016) 2. Beineke, P., Hastie, T., Manning, C., Vaithyanathan, S.: An exploration of senti- ment summarization. In: Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications, Stanford, US, (2004) 3. Belguith, L., Ellouze, M., Maaloul, M., Jaoua, M., Jaoua, F., Blache, P. : Automatic summarization. In: Zitouni I (ed) Natural language processing of semitic languages, theory and applications of natural language processing. pp 371408. Springer, Berlin (2014) 4. Cardie, C., Wiebe, J., Wilson, T., Litman, D.: Combining Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering. In: Proceed- ings of the AAAI Spring Symposium on New Directions in Question Answering, pp. 20–27. (2003) 5. Carenini, G., Cheung, J.C.K.: Extractive vs. NLG-based abstractive summarization of evaluative text: The effect of corpus controveriality. In: Proceedings of the 5th International Natural Generation Conference. (2008) Lecture Notes in Computer Science: Authors’ Instructions 7 6. Carenini, G., Cheung, J.C.K., Pauls, A.: Multi-Document Summarization of Eval- uative Text. Computational intelligence. vol. 29, 545–576 (2012) 7. Contractor, D., Singla, P., Mausam,.: Entity-balanced Gaussian pLSA for Auto- mated Comparison. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, pp. 69–79. (2016) 8. Dini, L., Mazzini, G.: Opinion classification through Information Extraction. In: Proceedings of the Conference on Data Mining Methods and Databases for Engi- neering, Finance and Other Fields (Data Mining), pp. 299–310. (2002) 9. Elarnaoty, M., AbdelRahman, S., : A machine learning approach for opinion holder extraction in arabic language. International Journal of Artificial Intelligence and Applications. vol. 3(2) (2012) 10. Di Fabbrizio, G., Aker, A., Gaizauskas, R.: Summarizing on-line product and ser- vice reviews using aspect rating distributions and language modeling. IEEE Intelli- gent Systems. vol. 28, 28-37 (2013) 11. Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text docu- ments: sentence selection and evaluation metrics. In SIGIR, pp. 121–128. (1999) 12. Kim, H.D., Ganesan, K., Sondhi,P., Zhai,C.: Comprehensive Review Of Opinion Summarization. Computer Science research and tech Reports. (2011) 13. Krestel, R., Dokoohaki, N.: Diversifying customer review rankings. Neural Net- works. vol. 66, 36 - 45 (2015) 14. Lafferty, J.D., McCallum, A., Pereira, F. C. N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceed- ings of the Eighteenth International Conference on Machine Learning (ICML ’01), (2001) 15. Liu, B.: Sentiment Analysis and Opinion Mining. CA: Morgan & Clay-pool, San Rafael (2012) 16. Liu, M., Fang, Y., Park, D. H., Hu, X., Yu, Z.: Retrieving Non-Redundant Ques- tions to Summarize a Product Review. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 385–394. New York (2016) 17. Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Re- search development. vol. 2, 159–165 (1958) 18. Maamouri, M., Bies, A., Kulick, S., Krouna, S., Gaddeche, F., Zaghouani, W.: Arabic TreeBank (ATB): Part 3 Version 3.2. Linguistic Data Consortium. Catalog No: LDC2010T08 (2010) 19. Mao, Y., Lebanon, G.: Sequential Models for Sentiment Prediction. In: Proceedings of the ICML Workshop: Learning in Structured Output Spaces Open Problems in Statistical Relational Learning Statistical Network Analysis: Models, Issues and New Directions, (2006) 20. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. vol. 2, pp. 1-135., (2008) 21. Paul, M.J., Zhai, C., Girju, R.: Summarizing Contrastive Viewpoints in Opinion- ated Text. In : Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 66–76. (2010) 22. Touati, I., Graja, M., Ellouze, M., Hadrich Belguith, L.: Arabic Fine-Grained Opin- ion Categorization Using Discriminative Machine Learning Technique. In: Proceed- ings of the International Conference on Advanced Intelligent Systems and Informat- ics, pp. 104–113. Cairo (2016) 23. Wang, D., Zhu, S., Li, T.: SumView: A Web-based engine for summarizing product reviews and customer opinions. Expert Syst. Appl. vol. 40. pp. 27-33 (2013) 8 Lecture Notes in Computer Science: Authors’ Instructions 24. Farra, N., Mckeown, K., Habash, N.: Annotating Targets of Opinions in Arabic using Crowdsourcing. In: Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 89–98 (2015)