Extracting Linguistic Features From Opinion Data Streams For Multi-Domain Sentiment Analysis Mauro Dragoni Fondazione Bruno Kessler, Trento, Italy dragoni@fbk.eu Abstract. The approach described in this paper explores the use of semantic structured representation of sentences extracted from texts for multi-domain sen- timent analysis purposes. The presented algorithm is built upon a domain-based supervised approach using index-like structured for representing information ex- tracted from text. The algorithm extracts dependency parse relationships from the sentences containing in a training set. Then, such relationships are aggregated in a semantic structured together with either polarity and domain information. Such information is exploited in order to have a more fine-grained representation of the learned sentiment information. When the polarity of a new text has to be com- puted, such a text is converted in the same semantic representation that is used (i) for detecting the domain to which the text belongs to, and then (ii), once the do- main is assigned to the text, the polarity is extracted from the index-like structure. First experiments performed by using the Blitzer dataset for training the system demonstrated the feasibility of the proposed approach. 1 Introduction Sentiment analysis is a natural language processing task whose aim is to classify docu- ments according to the opinion (polarity) they express on a given subject [1]. Generally speaking, sentiment analysis aims at determining the attitude of a speaker or a writer with respect to a topic or the overall tonality of a document. This task has created a con- siderable interest due to its wide applications. In recent years, the exponential increase of the Web for exchanging public opinions about events, facts, products, etc., has led to an extensive usage of sentiment analysis approaches, especially for marketing purposes. By formalizing the sentiment analysis problem, a “sentiment” or “opinion” has been defined by [2] as a quintuple: hoj , fjk , soijkl , hi , tl i, (1) where oj is a target object, fjk is a feature of the object oj , soijkl is the sentiment value of the opinion of the opinion holder hi on feature fjk of object oj at time tl . The value of soijkl can be positive (by denoting a state of happiness, bliss, or satisfaction), negative (by denoting a state of sorrow, dejection, or disappointment), or neutral (it is not possible to denote any particular sentiment), or a more granular rating. The term hi encodes the opinion holder, and tl is the time when the opinion is expressed. Such an analysis, may be document-based, where the positive, negative, or neutral sentiment is assigned to the entire document content; or sentence-based where individ- ual sentences are analyzed separately and classified according to the different polarity values. In the latter case, it is often desirable to find with a high precision the entity attributes towards which the detected sentiment is directed. Based on the scenario in which the opinion is needed, the use of a document-based analysis is preferred with respect to a sentence-based one, and vice versa. In this work, we want to extract the general opinion of an entire document; therefore, our approach relies on a document- based analysis. A further aspect that it is important to take into account is that, in the classic senti- ment analysis problem, the polarity of each document term is considered independently by the domain which the document belongs to. We illustrate the intuition behind domain specific term polarity by considering the following example: 1. The sideboard is small and it is not able to contain a lot of stuff. 2. The small dimensions of this decoder allow to move it easily. In these two sentences the adjective “small” is used in two different domains. In the first sentence, we considered the Furnishings domain and, within it, the polarity of the adjective “small” is, for sure, “negative” because it highlights an issue of the described item. On the other hand, in the second sentence, where we considered the Electronics domain, the polarity of such an adjective may be considered “positive”. First attempts exploring how term polarity is conditioned by domain is presented in [3]. Unlike the approaches already discussed in the literature (presented in Section 2), we address the multi-domain sentiment analysis problem from a different perspective. Firstly, we extract semantic and linguistic relationships from document terms, and then, we aggregate them in a structured representation where domain information, and the related polarities, are preserved. Such a structured representation is stored in an index- like repository (from now simply referred as “index”). When the polarity of a new document has to be computed, its structured representation is built and, combined with domain information, it is used for querying the index in order to estimate the polarity of the whole document. The rest of the work is structured as follows. Section 2 presents a survey on works about sentiment analysis. Section 3 described the proposed approach by explaining how texts are converted in a semantic structured representation, stored during the training phase, and exploited during the test one. Section 4 reports the comparison between the presented approach and three baselines. Finally, Section 5 concludes the paper. 2 Related Work The topic of sentiment analysis has been studied extensively in the literature [2], where several techniques have been proposed and validated. Machine learning techniques are the most common approaches used for address- ing this problem, given that any existing supervised methods can be applied to sen- timent classification. For instance, in [4], the authors compared the performance of Naive-Bayes, Maximum Entropy, and Support Vector Machines in sentiment analysis on different features like considering only unigrams, bigrams, combination of both, incorporating parts of speech and position information or by taking only adjectives. Moreover, beside the use of standard machine learning method, researchers have also proposed several custom techniques specifically for sentiment classification, like the use of adapted score function based on the evaluation of positive or negative words in prod- uct reviews [5], as well as by defining weighting schemata for enhancing classification accuracy [6]. An obstacle to research in this direction is the need of labeled training data, whose preparation is a time-consuming activity. Therefore, in order to reduce the labeling ef- fort, opinion words have been used for training procedures. In [7] and [8], the authors used opinion words to label portions of informative examples for training the classifiers. Opinion words have been exploited also for improving the accuracy of sentiment clas- sification, as presented in [9], where a framework incorporating lexical knowledge in supervised learning to enhance accuracy has been proposed. Opinion words have been used also for unsupervised learning approaches like the one presented in [10]. Another research direction concerns the exploitation of discourse-analysis tech- niques. [11] discusses some discourse-based supervised and unsupervised approaches for opinion analysis; while in [12], the authors present an approach to identify discourse relations. The approaches presented above are applied at the document-level[13,14,15,16], i.e., the polarity value is assigned to the entire document content. However, in some case, for improving the accuracy of the sentiment classification, a more fine-grained analysis of a document is needed. Hence, the sentiment classification of the single sen- tences, has to be performed. In the literature, we may find approaches ranging from the use of fuzzy logic [17,18,19] to the use of aggregation techniques [20] for computing the score aggregation of opinion words. In the case of sentence-level sentiment classi- fication, two different sub-tasks have to be addressed: (i) to determine if the sentence is subjective or objective, and (ii) in the case that the sentence is subjective, to determine if the opinion expressed in the sentence is positive, negative, or neutral. The task of classifying a sentence as subjective or objective, called “subjectivity classification”, has been widely discussed in the literature [21,22,23] and systems implementing the capa- bilities of identifying opinion’s holder, target, and polarity have been presented [24]. Once subjective sentences are identified, the same methods as for sentiment classifica- tion may be applied. For example, in [25] the authors consider gradable adjectives for sentiment spotting; while in [26,27] the authors built models to identify some specific types of opinions. In the last years, with the growth of product reviews, the use of sentiment analysis techniques was the perfect floor for validating them in marketing activities [28]. How- ever, the issue of improving the ability of detecting the different opinions concerning the same product expressed in the same review became a challenging problem. Such a task has been faced by introducing “aspect” extraction approaches that were able to extract, from each sentence, which is the aspect the opinion refers to. In the literature, many approaches have been proposed: conditional random fields (CRF) [29], hidden Markov models (HMM) [30], sequential rule mining [31], dependency tree kernels [32], clus- tering [33], and genetic algorithms [34]. In [35], a method was proposed to extract both opinion words and aspects simultaneously by exploiting some syntactic relations of opinion words and aspects. A particular attention should be given also to the application of sentiment analysis in social networks [36]. More and more often, people use social networks for expressing their moods concerning their last purchase or, in general, about new products. Such a social network environment opened up new challenges due to the different ways people express their opinions, as described by [37] and [38], who mention “noisy data” as one of the biggest hurdles in analyzing social network texts. One of the first studies on sentiment analysis on micro-blogging websites has been discussed in [39], where the authors present a distant supervision-based approach for sentiment classification. At the same time, the social dimension of the Web opens up the opportunity to combine computer science and social sciences to better recognize, interpret, and process opinions and sentiments expressed over it. Such multi-disciplinary approach has been called sentic computing [40]. Application domains where sentic computing has already shown its potential are the cognitive-inspired classification of images [41], of texts in natural language, and of handwritten text [42]. Finally, an interesting recent research direction is domain adaptation, as it has been shown that sentiment classification is highly sensitive to the domain from which the training data is extracted. A classifier trained using opinionated documents from one domain often performs poorly when it is applied or tested on opinionated documents from another domain, as we demonstrated through the example presented in Section 1. The reason is that words and even language constructs used in different domains for expressing opinions can be quite different. To make matters worse, the same word in one domain may have positive connotations, but in another domain may have negative ones; therefore, domain adaptation is needed. In the literature, different approaches re- lated to the Multi-Domain sentiment analysis have been proposed. Briefly, two main categories may be identified: (i) the transfer of learned classifiers across different do- mains [3,43,44], and (ii) the use of propagation of labels through graph structures [45,46,17,47]. All approaches presented above are based on the use of statistical techniques for building sentiment models. The exploitation of semantic information is not taken into account. In this work, we proposed a first version of a semantic-based approach preserv- ing the semantic relationships between the terms of each sentence in order to exploit them either for building the model and for estimating document polarity. The proposed approach, falling into the multi-domain sentiment analysis category, instead of using pre-determined polarity information associated with terms, it learns them directly from domain-specific documents. Such documents are used for training the models used by the system. 3 The Approach As introduced in Section 1, the proposed system is based on the implementation of an index-like approach, based on the use of structured representations of documents. Such representation is use for either preserving domain information associated with each document and for estimating the polarity of unclassified ones. Document polarity is estimated through the computation of a Score Status Value [48] (SSV) representing the aggregation of the polarities estimated for each feature extracted from the document. In this section, the steps carried out for implementing our approach are presented. 3.1 Feature Extraction The first task consists in the detection of the features that are exploited for building the sentiment model. The proposed approach has been designed upon two main desiderata: 1. The need of preserving and exploiting semantic relationships between document terms, requires to find a structured representation of information able to address this issue. In particular, we want to store linguistic information of each term together with its semantic relationships with the other ones; 2. The described approach addresses the problem of sentiment analysis in a multi- domain environment; therefore, each extracted feature has to enclose domain-specific information in order to exploit them during the estimation of document polarity. Addressing the two pillars described above, requires to parse raw texts in order to extract significant linguistic and semantic information. The proposed solution for extracting the set of features is based on the use of a native natural language processing library, namely the Stanford NLP Core Toolkit [49]. For each document of the training set, we applied the Stanford parser for extracting the terms dependencies. Such dependencies are taken into account for preserving the semantic between terms in the structured representation used for representing document content. As an example, let’s consider the following sentence: “I came here to reflect my happiness by fishing.” By applying the Stanford parser, we obtain the following list of dependencies be- tween terms: nsubj(came-2, I-1) nsubj(reflect-5, I-1) root(ROOT-0, came-2) advmod(came-2, here-3) aux(reflect-5, to-4) xcomp(came-2, reflect-5) poss(happiness-7, my-6) dobj(reflect-5, happiness-7) prep_by(reflect-5, fishing-9) Each dependency is composed by three elements: the name of the “relation” (R), the “governor” (G) that is the first term of the dependency, and the “dependent” (D) that is the second one. First of all, we removed from the dependencies list, ones containing a stop word 1 as governor or dependent element. Exceptions are made when one of 1 The list of stop words used in this work is the one provided by Apache with the Lucene and Solr packages the two terms contained in a dependency is an adjective. From the dependencies list presented above, the pruned list is the following: poss(happiness-7, my-6) dobj(reflect-5, happiness-7) prep_by(reflect-5, fishing-9) Then, for each dependency contained in the pruned list, we compile a set of pairs “field - value”. Each pair is a “feature” associated with the dependency extracted from the document. Table 1 show, by using as example the dependency “dobj(reflect-5, happiness- 7)”, the list of extracted features. Field Name Content RGD “dobj-reflect-happiness” RDG “dobj-happiness-reflect” GD “reflect-happiness” DG “happiness-reflect” G “reflect” D “happiness” Table 1: Field structure and corresponding content stored in the index. There are three considerations explaining the rationale of using the presented set of six features. – The choice of considering the governor and the dependent in both orders is to meet the possibility that the parser may produce different output based on how the text is written within the sentence. Such an order is affected also by the parser used. In our approach we decided to adopt the Stanford parser, but, obviously, any parser producing a list of dependencies like the one presented above can be used. – For the same reason, we decided to extract features pruned by the relation element, because different parsers may use different kind of dependencies. The meaning of these features (the third and fourth ones) is to track the co-occurrence of terms independently by the relationship between them. – Finally, the “G” and “D” features are used as backup purpose. Indeed, if, for train- ing a particular model, a small number of samples is available, the use of single terms allows to apply a bag-of-words approach as a backup for computing docu- ment polarity. For these two features only nouns, verbs, adverbs, or adjectives are considered. The set of features extracted from each dependencies is given as input to the com- ponent that will combine such features with either the polarity and domain information in order to construct the final representation of each document. 3.2 Structured Representation Construction Once all features have been extracted, they are passed to the component in charge of structuring and storing them in the model repository that, for simplicity, we call “index”. As mentioned early, to each feature, the domain and polarity information are associated for building its equivalent structured representation. Where, the polarity associated with each feature contained in the model is the average of the polarities of the document in which each feature occurs. This shrewdness is necessary for distinguishing the polarities that each feature may assume in different domains. Indeed, classic approached based on the use of polarized vocabularies do not consider the possibility that a particular feature may assume different polarities depending on the context in which they occur. An example has been presented in Section 1. On the light of this, the construction of the structured representation of each feature has to consider two aspects: (i) each feature may appear in different domains, and (ii) for each feature an estimation of the polarity for each domain has to be computed. Therefore, each feature is translated into the correspondent structured representation shown below. By considering as example the feature “RGD - dobj-reflect-happiness”, we have the following structure: feature-type: RGD feature-value: dobj-reflect-happiness domain_1: polarity_1 domain_2: polarity_2 ... domain_n: polarity_n The estimation of polarityi values associated with each domain is done by analyz- ing only the explicit information extracted from the training set. Values are computed as: ki polarityi (F ) = Fi ∈ [−1, 1] ∀i = 1, . . . , n, (2) TF where F is the feature taken into account, index i refers to domain Di which the feature i belongs to, n is the number of domains available in the training set, kC is the arithmetic sum of the polarities observed for the feature F in the training set restricted to domain Di , and TCi is the number of instances of the training set, restricted to domain Di , in which feature F occurs. Once all structured representation are built, they are stored in the repository. Such repository represents a multi-domain model for sentiment analysis purpose. 3.3 Polarity Computation When an unclassified document needs to be evaluated, a procedure similar to the one adopted for building the model is used for computing its polarity. A document is given as input to the Stanford parser and the list of dependencies is extracted and pruned by the ones containing stop words. Then, for each valid depen- dency, we build the related structured representation and we use it for estimating the polarity by analyzing information contained in the model. The final document polarity will be the average of the polarities estimated for each extracted dependency. Let’s consider the following sentence: “I feel good and I feel healthy.” After the execution of the Stanford parser and the pruning of exceeding dependen- cies by using the same strategy described early, we obtain the following set of depen- dencies: acomp(feel-2, good-3) acomp(feel-6, healthy-7) From these two dependencies, we generate the following two structures: FEATURE ID: F1 feature-type: RGD; feature-value: acomp-feel-good feature-type: RDG; feature-value: acomp-good-feel feature-type: GD; feature-value: feel-good feature-type: DG; feature-value: good-feel feature-type: G; feature-value: feel feature-type: D; feature-value: good FEATURE ID: F2 feature-type: RGD; feature-value: acomp-feel-healthyd feature-type: RDG; feature-value: acomp-healthy-feel feature-type: GD; feature-value: feel-healthy feature-type: DG; feature-value: healthy-feel feature-type: G; feature-value: feel feature-type: D; feature-value: healthy For each structure I presented above, for which the domain D is given, we com- puted the SSV representing the polarity of the structure I in the domain which the structure belongs to. The Equation below, show how the SSV is computed. SSV (I) = AV G(DP (RGDF 1 ) + DP (RDGF 1 )+ DP (GDF 1 ) + DP (DGF 1 )+ DP (GF 1 ) + DP (DF 1 )+ (3) DP (RGDF 2 ) + DP (RDGF 2 )+ DP (GDF 2 ) + DP (DGF 2 )+ DP (GF 2 ) + DP (DF 2 )) where DP is the function extracting the polarity of the feature I for the domain D, and AV G refers to the averaging operation of all detected polarities. 4 Experimental Evaluation In this Section, we present the results obtained from our experimental campaign where we compared our representation in different settings. Dataset construction And Baselines The training and testing of the system has been done on two different dataset. For creating the training model, we built structured doc- ument representation by using reviews contained in the Blitzer dataset and by apply- ing the DRANZIERA protocol [50]. In particular, we used the balanced version of the dataset in order to same number of positive and negative samples. Concerning the test operation, we created a test set of 32.000 reviews compiled by using the same strat- egy used for building the Blitzer dataset 2 . Test set is even balanced with respect to the number of positive and negative opinions. The same philosophy has been used for the domains, where, for each of the 16 domains used in the test set, we had 1.000 positive, and as many negative, reviews. Our approach (Structured Domain Dependent, SDD) has been compared with three baselines: – Most Frequent Polarity: the accuracy obtained by the system if it guesses the same polarity for all samples contained in the test set. – Structured Domain Independent: the accuracy obtained by using the proposed struc- tured representation without considering domain information. – Bag-Of-Word Domain Dependent: the accuracy obtained by using the classic sta- tistical bag-of-words approach by considering also domain information. Results and Discussion Table 2 shows the results obtained by the three baselines and by the proposed approach. First column contains the name of the approach, while the second one the accuracy obtained on the test set. Approach Accuracy Most Frequent Polarity (MFP) 0.5000 Structured Domain Independent (SDI) 0.5407 Bag-Of-Word Domain Dependent (BDD) 0.6350 Structured Domain Dependent (SDD) 0.6834 Table 2: Accuracy obtained by our approach with respect to the three chosen baselines. Results show that the proposed approach leads to better results with respect to all the baselines. Beside this, there is also a significant difference between the accuracies obtained by using domain-dependent features (BDD and SDD approaches) and the one obtained without considering domain information. By focusing on the two approaches exploiting domain information, in Table 3, we reported the detailed accuracy obtained on each domain by the two approaches exploit- ing such information. First column contain the name of the domain, second column the number of features for each domain and the last two columns the accuracies obtained by the BDD and SDD approaches respectively. By observing the results reported in Table 3, no particular correlations between the number of features and the accuracy of the approach can be noticed. Unexpectedly, 2 The test set is available at https://goo.gl/siOJbZ Domain Features BDD SDD Accuracy Accuracy automotive 259,239 0.6230 0.6935 baby 924,365 0.5980 0.5830 beauty 601,163 0.6390 0.6470 cell phones service 484,796 0.6115 0.6570 computer video games 1,247,408 0.5165 0.5725 electronics 944,796 0.6155 0.7180 gourmet food 417,309 0.6310 0.6275 health personal care 768,616 0.6590 0.7180 jewelry watches 358,677 0.6375 0.6540 kitchen housewares 793,167 0.6460 0.7290 musical instruments 130,005 0.6540 0.7225 office products 180,172 0.6535 0.7105 software 1,146,081 0.6680 0.7070 sports outdoors 869,576 0.6540 0.6810 tools hardware 40,962 0.6830 0.7250 toys games 833,887 0.6700 0.7885 Table 3: Accuracy obtained in each domain by the BDD and SDD approaches. the worst result is obtained for the domain having the higher number of features, and one of the best results, obtained on the “tools hardware” domain, is reported with a very low number of features compared to the others. One of the possible reasons may be the significant presence, in the set of documents used for building the model, of features having uncertain polarity, Indeed, if many features are used in either positive and negative contexts, it is difficult for the system to exploiting such information during the test phase for estimating document polarity. Further investigation in this direction may clarify this aspect. Finally, we may notice that for the two domains, “gourmet food” and “baby”, the performance of the bag of words approach, outperform the semantic one. Approach Limits As we mentioned at the end of Section 2, the approach presented in this paper is a first attempt of exploring the use of structured representation of docu- ments for addressing the sentiment analysis problem. For this reason, we performed a critical analysis of our work in order to highlight which are its limits and to outline a roadmap for future implementations. In particular, we detected three directions for extending the proposed approach: – Improve dependencies pruning: in the feature extraction process, we pruned part of the dependencies extracted by the Stanford parser. In the light of the results reported in Table 3, we inferred that having a huge number of features is not preparatory for obtaining higher results. Therefore, a more restrictive policy should be imple- mented in pruning dependencies by trying to detect the most significant features despite the ones causing information overlapping between domains. – Language coverage: a typical problem affecting the construction of language mod- els is the language coverage of such models. Indeed, without having a large cor- pus for training the system, a significant number of terms information might be excluded. This issue is strictly connected with the next one and it may share the possible solution. – Improve the semantic aspect: one of the possibility for addressing the problem of language coverage, is the adoption of external semantic resources, for instance WordNet, for extending the meaning of each feature. This way, we will be able to reduce the total number of features, due to the use of a concept-based representation of each feature instead of a term-based one, and, at the same time, to increase the language coverage. Working in this direction will mean that the current structured representation will have to be revised accordingly. 5 Conclusion In this paper, we described a system exploiting a structured representation of document for the problem of multi-domain sentiment analysis. Even if the representation used for structuring documents and the metric adopted for estimating document polarity is quite simple, the system obtained reasonable performances in the provided evaluation. Fu- ture work will address the possibility to exploit more sophisticated metrics considering the belonging of a document to a certain domain not in a binary but in a fuzzy fashion, measuring some sort of semantic relatedness of the sentence under test with each do- main and using such measures as weights for the polarity detection phase. Moreover, we intend to explore the integration of knowledge bases in order to move toward a more cognitive technique able to improve the language coverage of the approach. References 1. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of EMNLP, Philadelphia, Association for Computa- tional Linguistics (July 2002) 79–86 2. Liu, B., Zhang, L.: A survey of opinion mining and sentiment analysis. In Aggarwal, C.C., Zhai, C.X., eds.: Mining Text Data. Springer (2012) 415–463 3. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Do- main adaptation for sentiment classification. In: ACL. (2007) 187–205 4. Pang, B., Lee, L.: A sentimental education: Sentiment analysis using subjectivity summa- rization based on minimum cuts. In: ACL. (2004) 271–278 5. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: WWW. (2003) 519–528 6. Paltoglou, G., Thelwall, M.: A study of information retrieval weighting schemes for senti- ment analysis. In: ACL. (2010) 1386–1395 7. Tan, S., Wang, Y., Cheng, X.: Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. In: SIGIR. (2008) 743–744 8. Qiu, L., Zhang, W., Hu, C., Zhao, K.: Selc: a self-supervised model for sentiment classifica- tion. In: CIKM. (2009) 929–936 9. Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: KDD. (2009) 1275–1284 10. Taboada, M., Brooke, J., Tofiloski, M., Voll, K.D., Stede, M.: Lexicon-based methods for sentiment analysis. Computational Linguistics 37(2) (2011) 267–307 11. Somasundaran, S.: Discourse-level relations for Opinion Analysis. PhD thesis, University of Pittsburgh (2010) 12. Wang, H., Zhou, G.: Topic-driven multi-document summarization. In: IALP. (2010) 195–198 13. Dragoni, M.: Shellfbk: An information retrieval-based system for multi-domain sentiment analysis. In: Proceedings of the 9th International Workshop on Semantic Evaluation. Se- mEval ’2015, Denver, Colorado, Association for Computational Linguistics (June 2015) 502–509 14. Petrucci, G., Dragoni, M.: An information retrieval-based system for multi-domain sentiment analysis. In Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A., eds.: Semantic Web Evaluation Challenges - Second SemWebEval Challenge at ESWC 2015, Portorož, Slove- nia, May 31 - June 4, 2015, Revised Selected Papers. Volume 548 of Communications in Computer and Information Science., Springer (2015) 234–243 15. Rexha, A., Kröll, M., Dragoni, M., Kern, R.: Exploiting propositions for opinion mining. In Sack, H., Dietze, S., Tordai, A., Lange, C., eds.: Semantic Web Challenges - Third SemWe- bEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. Volume 641 of Communications in Computer and Information Science., Springer (2016) 121–125 16. Federici, M., Dragoni, M.: A knowledge-based approach for aspect-based opinion mining. In Sack, H., Dietze, S., Tordai, A., Lange, C., eds.: Semantic Web Challenges - Third SemWe- bEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. Volume 641 of Communications in Computer and Information Science., Springer (2016) 141–152 17. Dragoni, M., Tettamanzi, A.G., da Costa Pereira, C.: Propagating and aggregating fuzzy polarities for concept-level sentiment analysis. Cognitive Computation 7(2) (2015) 186–197 18. Dragoni, M., Tettamanzi, A.G.B., da Costa Pereira, C.: A fuzzy system for concept-level sentiment analysis. In Presutti, V., Stankovic, M., Cambria, E., Cantador, I., Iorio, A.D., Noia, T.D., Lange, C., Recupero, D.R., Tordai, A., eds.: Semantic Web Evaluation Challenge - SemWebEval 2014 at ESWC 2014, Anissaras, Crete, Greece, May 25-29, 2014, Revised Selected Papers. Volume 475 of Communications in Computer and Information Science., Springer (2014) 21–27 19. Petrucci, G., Dragoni, M.: The IRMUDOSA system at ESWC-2016 challenge on semantic sentiment analysis. In Sack, H., Dietze, S., Tordai, A., Lange, C., eds.: Semantic Web Chal- lenges - Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. Volume 641 of Communications in Computer and Information Science., Springer (2016) 126–140 20. da Costa Pereira, C., Dragoni, M., Pasi, G.: A prioritized ”and” aggregation operator for mul- tidimensional relevance assessment. In Serra, R., Cucchiara, R., eds.: AI*IA 2009: Emergent Perspectives in Artificial Intelligence, XIth International Conference of the Italian Associ- ation for Artificial Intelligence, Reggio Emilia, Italy, December 9-12, 2009, Proceedings. Volume 5883 of Lecture Notes in Computer Science., Springer (2009) 72–81 21. Federici, M., Dragoni, M.: Towards unsupervised approaches for aspects extraction. In Dragoni, M., Recupero, D.R., Denecke, K., Deng, Y., Declerck, T., eds.: Joint Proceedings of the 2th Workshop on Emotions, Modality, Sentiment Analysis and the Semantic Web and the 1st International Workshop on Extraction and Processing of Rich Semantics from Medical Texts co-located with ESWC 2016, Heraklion, Greece, May 29, 2016. Volume 1613 of CEUR Workshop Proceedings., CEUR-WS.org (2016) 22. Riloff, E., Patwardhan, S., Wiebe, J.: Feature subsumption for opinion analysis. In: EMNLP. (2006) 440–448 23. Wilson, T., Wiebe, J., Hwa, R.: Recognizing strong and weak opinion clauses. Computa- tional Intelligence 22(2) (2006) 73–99 24. Aprosio, A.P., Corcoglioniti, F., Dragoni, M., Rospocher, M.: Supervised opinion frames de- tection with RAID. In Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A., eds.: Seman- tic Web Evaluation Challenges - Second SemWebEval Challenge at ESWC 2015, Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected Papers. Volume 548 of Communications in Computer and Information Science., Springer (2015) 251–263 25. Hatzivassiloglou, V., Wiebe, J.: Effects of adjective orientation and gradability on sentence subjectivity. In: COLING. (2000) 299–305 26. Kim, S.M., Hovy, E.H.: Crystal: Analyzing predictive opinions on the web. In: EMNLP- CoNLL. (2007) 1056–1064 27. Rexha, A., Kröll, M., Dragoni, M., Kern, R.: Polarity classification for target phrases in tweets: A word2vec approach. In Sack, H., Rizzo, G., Steinmetz, N., Mladenic, D., Auer, S., Lange, C., eds.: The Semantic Web - ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. Volume 9989 of Lecture Notes in Computer Science. (2016) 217–223 28. Dragoni, M., Recupero, D.R.: Challenge on fine-grained sentiment analysis within ESWC2016. In Sack, H., Dietze, S., Tordai, A., Lange, C., eds.: Semantic Web Challenges - Third SemWebEval Challenge at ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers. Volume 641 of Communications in Computer and Informa- tion Science., Springer (2016) 79–94 29. Jakob, N., Gurevych, I.: Extracting opinion targets in a single and cross-domain setting with conditional random fields. In: EMNLP. (2010) 1035–1045 30. Jin, W., Ho, H.H., Srihari, R.K.: Opinionminer: a novel machine learning system for web opinion mining and extraction. In: KDD. (2009) 1195–1204 31. Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on the web. In: WWW. (2005) 342–351 32. Wu, Y., Zhang, Q., Huang, X., Wu, L.: Phrase dependency parsing for opinion mining. In: EMNLP. (2009) 1533–1541 33. Su, Q., Xu, X., Guo, H., Guo, Z., Wu, X., Zhang, X., Swen, B., Su, Z.: Hidden sentiment association in chinese web opinion mining. In: WWW. (2008) 959–968 34. Dragoni, M., Azzini, A., Tettamanzi, A.: A novel similarity-based crossover for artificial neural network evolution. In Schaefer, R., Cotta, C., Kolodziej, J., Rudolph, G., eds.: Parallel Problem Solving from Nature - PPSN XI, 11th International Conference, Kraków, Poland, September 11-15, 2010, Proceedings, Part I. Volume 6238 of Lecture Notes in Computer Science., Springer (2010) 344–353 35. Qiu, G., Liu, B., Bu, J., Chen, C.: Opinion word expansion and target extraction through double propagation. Computational Linguistics 37(1) (2011) 9–27 36. Dragoni, M.: A three-phase approach for exploiting opinion mining in computational adver- tising. IEEE Intelligent Systems 32(3) (2017) 21–27 37. Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: COLING (Posters). (2010) 36–44 38. Bermingham, A., Smeaton, A.F.: Classifying sentiment in microblogs: is brevity an advan- tage? In: CIKM. (2010) 1833–1836 39. Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Standford University (2009) 40. Cambria, E., Hussain, A.: Sentic Computing: Techniques, Tools, and Applications. Volume 2 of SpringerBriefs in Cognitive Computation. Springer, Dordrecht, Netherlands (2012) 41. Cambria, E., Hussain, A.: Sentic album: Content-, concept-, and context-based online per- sonal photo management system. Cognitive Computation 4(4) (2012) 477–496 42. Wang, Q.F., Cambria, E., Liu, C.L., Hussain, A.: Common sense knowledge for handwritten chinese recognition. Cognitive Computation 5(2) (2013) 234–242 43. Pan, S.J., Ni, X., Sun, J.T., Yang, Q., Chen, Z.: Cross-domain sentiment classification via spectral feature alignment. In: WWW. (2010) 751–760 44. Yoshida, Y., Hirao, T., Iwata, T., Nagata, M., Matsumoto, Y.: Transfer learning for multiple- domain sentiment analysis—identifying domain dependent/independent word polarity. In: AAAI. (2011) 1286–1291 45. Ponomareva, N., Thelwall, M.: Semi-supervised vs. cross-domain graphs for sentiment anal- ysis. In: RANLP. (2013) 571–578 46. Huang, S., Niu, Z., Shi, C.: Automatic construction of domain-specific sentiment lexicon based on constrained label propagation. Knowl.-Based Syst. 56 (2014) 191–200 47. Dragoni, M., da Costa Pereira, C., Tettamanzi, A.G.B., Villata, S.: Smack: An argumentation framework for opinion mining. In Kambhampati, S., ed.: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, IJCAI/AAAI Press (2016) 4242–4243 48. da Costa Pereira, C., Dragoni, M., Pasi, G.: Multidimensional relevance: Prioritized aggre- gation in a personalized information retrieval setting. Inf. Process. Manage. 48(2) (2012) 340–357 49. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stan- ford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, Mary- land, Association for Computational Linguistics (June 2014) 55–60 50. Dragoni, M., Tettamanzi, A., da Costa Pereira, C.: DRANZIERA: an evaluation protocol for multi-domain opinion mining. In Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., eds.: Proceedings of the Tenth International Conference on Language Resources and Eval- uation LREC 2016, Portorož, Slovenia, May 23-28, 2016., European Language Resources Association (ELRA) (2016)