Sentic API A Common-Sense Based API for Concept-Level Sentiment Analysis http://sentic.net/api Erik Cambria Soujanya Poria Temasek Laboratories School of Electrical & Electronic Engineering National University of Singapore Nanyang Technological University cambria@nus.edu.sg sporia@ntu.edu.sg Alexander Gelbukh Kenneth Kwok Center for Computing Research Temasek Laboratories National Polytechnic Institute of Mexico National University of Singapore gelbukh@cic.ipn.mx kenkwok@nus.edu.sg ABSTRACT But when it comes to interpreting sentences and extracting mean- The bag-of-concepts model can represent semantics associated with ingful information, their capabilities are known to be very limited. natural language text much better than bags-of-words. In the bag- Machine-learning algorithms, in fact, are limited by the fact that of-words model, in fact, a concept such as cloud_computing they can process only the information that they can ‘see’. As human would be split into two separate words, disrupting the semantics of text processors, we do not have such limitations as every word we the input sentence. Working at concept-level is important for tasks see activates a cascade of semantically related concepts, relevant such as opinion mining, especially in the case of microblogging episodes, and sensory experiences, all of which enable the comple- analysis. In this work, we present Sentic API, a common-sense tion of complex tasks – such as word-sense disambiguation, tex- based application programming interface for concept-level senti- tual entailment, and semantic role labeling – in a quick and effort- ment analysis, which provides semantics and sentics (that is, deno- less way. Machine learning techniques, moreover, are intrinsically tative and connotative information) associated with 15,000 natural meant for chunking numerical data. Through escamotages such as language concepts. word frequency counting, it is indeed possible to apply such tech- niques also in the context of natural language processing (NLP), but it would be no different from trying to understand an image by Categories and Subject Descriptors solely looking at bits per pixel information. H.3.1 [Information Systems Applications]: Linguistic Process- Concept-level sentiment analysis, instead, focuses on a seman- ing; I.2.7 [Natural Language Processing]: Language parsing and tic analysis of text through the use of web ontologies or semantic understanding networks, which allow the aggregation of conceptual and affective information associated with natural language opinions. By rely- ing on large semantic knowledge bases, such approaches step away General Terms from blind use of keywords and word co-occurrence count, but Algorithms rather rely on the implicit features associated with natural language concepts. Unlike purely syntactical techniques, concept-based ap- proaches are able to detect also sentiments that are expressed in a Keywords subtle manner, e.g., through the analysis of concepts that do not Natural language processing; Sentiment analysis explicitly convey any emotion, but which are implicitly linked to other concepts that do so. The bag-of-concepts model can repre- 1. INTRODUCTION sent semantics associated with natural language much better than bags-of-words. In the bag-of-words model, in fact, a concept such Hitherto, online information retrieval, aggregation, and process- as cloud_computing would be split into two separate words, ing have mainly been based on algorithms relying on the textual disrupting the semantics of the input sentence (in which, for ex- representation of webpages. Such algorithms are very good at re- ample, the word cloud could wrongly activate concepts related to trieving texts, splitting them into parts, checking the spelling and weather). counting the number of words. By allowing for the inference of semantics and sentics, the anal- ysis at concept-level enables a comparative fine-grained feature- Permission to make digital or hard copies of all or part of this work for based sentiment analysis. Rather than gathering isolated opinions personal or classroom use is granted without fee provided that copies are not about a whole item (e.g., iPhone 5S or Galaxy S5), users are gen- made or distributed for profit or commercial advantage and that copies bear this notice and Copyright c the 2014fullheld citation by on the first page. Copyrights author(s)/owner(s); copyingfor permitted components erally more interested in comparing different products according of thisfor only work owned private andby academic others thanpurposes. ACM must be honored. Abstracting with to their specific features (e.g., iPhone 5S’s vs Galaxy S5’s touch- credit is permitted. Published as part Toof copy otherwise, or republish, the #Microposts2014 to postproceedings, Workshop on servers or to screen), or sub-features (e.g., fragility of iPhone 5S’s vs Galaxy available online as CEUR Vol-1141 (http://ceur-ws.org/Vol-1141) redistribute to lists, requires prior specific permission and/or a fee. Request S5’s touchscreen). In this context, the construction of compre- permissions from permissions@acm.org. #Microposts2014, April 7th, 2014, Seoul, Korea. hensive common and common-sense knowledge bases is key for WWW’14 Companion, April 7-11, 2014, Seoul, Korea. feature-spotting and polarity detection, respectively. Copyright 2014 ACM 978-1-4503-2745-9/14/04 ...$15.00. · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 Common-sense, in particular, is necessary to properly decon- 3. TECHNIQUES ADOPTED struct natural language text into sentiments– for example, to ap- In this work, we exploit the ensemble application of spectral as- praise the concept small_room as negative for a hotel review sociation [12], an approximation of many steps of spreading activa- and small_queue as positive for a post office, or the concept tion, and CF-IOF (concept frequency - inverse opinion frequency), go_read_the_book as positive for a book review but negative an approach similar to TF-IDF weighting, to extract semantics from for a movie review. ConceptNet [20], a semantic network of common-sense knowl- The rest of the paper is organized as follows: Section 2 presents edge. The extraction of sentics, in turn, is performed through the available resources for concept-level sentiment analysis; Section 3 combined use of AffectiveSpace [4], a multi-dimensional vector illustrates the techniques exploited to build the Sentic API; Sec- space representation of affective common-sense knowledge, and tion 4 describes in detail how the API is developed and how it can the Hourglass of Emotions [6], a brain-inspired emotion catego- be used; Section 5 proposes an evaluation of the API; finally, Sec- rization model. tion 6 concludes the paper and suggests further research directions. 3.1 Spectral Association 2. RELATED WORK Spectral association is a technique that involves assigning acti- Commonly used resources for concept-level sentiment analysis vations to ‘seed concepts’ and applying an operation that spreads include ANEW [3], WordNet-Affect (WNA) [21], ISEAR [1], Sen- their values across the graph structure of ConceptNet. This opera- tiWordNet [9], and SenticNet [7]. In [22], for example, a concept- tion transfers the most activation to concepts that are connected to level sentiment dictionary is built through a two-step method com- the key concepts by short paths or many different paths in common- bining iterative regression and random walk with in-link normaliza- sense knowledge. tion. ANEW and SenticNet are exploited for propagating sentiment In particular, we build a matrix C that relates concepts to other values based on the assumption that semantically related concepts concepts, instead of their features, and add up the scores over all share common sentiment. Moreover, polarity accuracy, Kendall relations that relate one concept to another, disregarding direction. distance, and average-maximum ratio are used, in stead of mean Applying C to a vector containing a single concept spreads that error, to better evaluate sentiment dictionaries. concept’s value to its connected concepts. Applying C 2 spreads A similar approach is adopted in [19], which presents a method- that value to concepts connected by two links (including back to ology for enriching SenticNet concepts with affective information the concept itself). As we aim to spread the activation through any by assigning an emotion label to them. Authors use various features number of links, with diminishing returns, the operator we want is: extracted from ISEAR, as well as similarity measures that rely on C2 C3 the polarity data provided in SenticNet (those based on WNA) and 1+C + + + ... = eC 2! 3! ISEAR distance-based measures, including point-wise mutual in- formation, and emotional affinity. Another recent work that builds We can calculate this odd operator, eC , because we can fac- upon an existing affective knowledge base is [14], which proposes tor C. C is already symmetric, so instead of applying Lanczos’ the re-evaluation of objective words in SentiWordNet by assess- method [15] to CC T and getting the singular value decomposition ing the sentimental relevance of such words and their associated (SVD), we can apply it directly to C and get the spectral decom- sentiment sentences. Two sampling strategies are proposed and in- position C = V ΛV T . As before, we can raise this expression to tegrated with support vector machines for sentiment classification. any power and cancel everything but the power of Λ. Therefore, According to the experiments, the proposed approach significantly eC = V eΛ V T . This simple twist on the SVD lets us calculate outperforms the traditional sentiment mining approach, which ig- spreading activation over the whole matrix instantly. We can trun- nores the importance of objective words in SentiWordNet. In [2], cate this matrix to k axes and therefore save space while generaliz- the main issues related to the development of a corpus for opinion ing from similar concepts. We can also rescale the matrix, so that mining and sentiment analysis are discussed both by surveying the activation values have a maximum of 1 and do not tend to collect in existing work in this area and presenting, as a case study, an on- highly-connected concepts, by normalizing the truncated rows of going project for Italian, called Senti-TUT, where a corpus for the V eΛ/2 to unit vectors, and multiplying that matrix by its transpose investigation of irony about politics in social media is developed. to get a rescaled version of V eΛ V T . Other work explores the ensemble application of knowledge bases and statistical methods. In [24], for example, a hybrid ap- 3.2 CF-IOF Weighting proach to combine lexical analysis and machine learning is pro- CF-IOF is a technique that identifies common topic-dependent posed in order to cope with ambiguity and integrate the context of semantics in order to evaluate how important a concept is to a set sentiment terms. The context-aware method identifies ambiguous of opinions concerning the same topic. It is hereby used to feed terms that vary in polarity depending on the context and stores them spectral association with ‘seed concepts’. Firstly, the frequency of in contextualized sentiment lexicons. In conjunction with semantic a concept c for a given domain d is calculated by counting the oc- knowledge bases, these lexicons help ground ambiguous sentiment currences of the concept c in the set of available d-tagged opinions terms to concepts that correspond to their polarity. and dividing the result by the sum of number of occurrences of all More machine-learning based works include [10], which intro- concepts in the set of opinions concerning d. This frequency is then duces a new methodology for the retrieval of product features and multiplied by the logarithm of the inverse frequency of the concept opinions from a collection of free-text customer reviews about a in the whole collection of opinions, that is: product or service. Such a methodology relies on a language mod- nc,d X nk eling framework that can be applied to reviews in any domain and CF -IOFc,d = P log k nk,d nc language provided with a seed set of opinion words. The methodol- k ogy combines both a kernel-based model of opinion words (learned where nc,d is the number of occurrences of concept c in the set from the seed set of opinion words) and a statistical mapping be- of opinions tagged as d, nk is the total number of concept occur- tween words to approximate a model of product features from which rences and nc is the number of occurrences of c in the whole set of the retrieval is carried out. opinions. 20 · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 A high weight in CF-IOF is reached by a high concept frequency Concept similarity does not depend on their absolute positions (in the given opinions) and a low opinion frequency of the concept in the vector space, but rather on the angle they make with the ori- in the whole collection of opinions. Therefore, thanks to CF-IOF gin. For example we can find concepts such as beautiful_day, weights, it is possible to filter out common concepts and detect birthday_party, laugh and make_person_happy very relevant topic-dependent semantics. close in direction in the vector space, while concepts like sick, feel_guilty, be_laid_off and shed_tear are found in a 3.3 AffectiveSpace completely different direction (nearly opposite with respect to the To extract sentics from natural language text, we use AffectiveS- centre of the space). pace, a multi-dimensional vector space built upon ConceptNet and WNA. The alignment operation operated over these two knowl- 3.4 The Hourglass of Emotions edge bases yields a matrix, A, in which common-sense and affec- tive knowledge coexist, i.e., a matrix 15,000 × 118,000 whose rows To reason on the disposition of concepts in AffectiveSpace, we use the Hourglass of Emotions (Figure 1), an affective categoriza- are concepts (e.g., dog or bake_cake), whose columns are either tion model developed starting from Plutchik’s studies on human common-sense and affective features (e.g., isA-pet or hasEmotion- joy), and whose values indicate truth values of assertions. emotions [18]. In the model, sentiments are reorganized around Therefore, in A, each concept is represented by a vector in the four independent dimensions whose different levels of activation make up the total emotional state of the mind. The Hourglass of space of possible features whose values are positive for features that Emotions, in fact, is based on the idea that the mind is made of dif- produce an assertion of positive valence (e.g., ‘a penguin is a bird’), negative for features that produce an assertion of negative valence ferent independent resources and that emotional states result from (e.g., ‘a penguin cannot fly’) and zero when nothing is known about turning some set of these resources on and turning another set of the assertion. The degree of similarity between two concepts, then, them off [16]. The primary quantity we can measure about an emotion we feel is the dot product between their rows in A. The value of such a dot is its strength. But when we feel a strong emotion it is because product increases whenever two concepts are described with the same feature and decreases when they are described by features that we feel a very specific emotion. And, conversely, we cannot feel are negations of each other. In particular, we use truncated SVD a specific emotion like ‘fear’ or ‘amazement’ without that emotion being reasonably strong. Mapping this space of possible emotions [23] in order to obtain a new matrix containing both hierarchical leads to an hourglass shape. affective knowledge and common-sense. The resulting matrix has the form à = Uk Σk VkT and is a low- rank approximation of A, the original data. This approximation is based on minimizing the Frobenius norm [13] of the difference between A and à under the constraint rank(Ã) = k. For the Eckart–Young theorem [8] it represents the best approximation of A in the least-square sense, in fact: min |A − Ã| = min |Σ − U ∗ ÃV | Ã|rank(Ã)=k Ã|rank(Ã)=k = min |Σ − S| Ã|rank(Ã)=k assuming that à has the form à = U SV ∗ , where S is diagonal. From the rank constraint, i.e., S has k non-zero diagonal entries, the minimum of the above statement is obtained as follows: v u n uX min |Σ − S| = min t (σi − si )2 = Ã|rank(Ã)=k si i=1 v v u k n u X uX X u n = min t (σi − si )2 + σ2 = t σ2 i i si i=1 i=k+1 i=k+1 Therefore, à of rank k is the best approximation of A in the Frobe- nius norm sense when σi = si (i = 1, ..., k) and the corresponding singular vectors are the same as those of A. If we choose to dis- card all but the first k principal components, common-sense con- cepts and emotions are represented by vectors of k coordinates: these coordinates can be seen as describing concepts in terms of ‘eigenmoods’ that form the axes of AffectiveSpace, i.e., the basis e0 ,...,ek−1 of the vector space. For example, the most significant eigenmood, e0 , represents concepts with positive affective valence. That is, the larger a concept’s component in the e0 direction is, the more affectively positive it is likely to be. Thus, by exploiting the information sharing property of truncated SVD, concepts with the same affective valence are likely to have similar features – that is, concepts conveying the same emotion tend to fall near each other in AffectiveSpace. Figure 1: The Hourglass model 21 · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 In the model, affective states are not classified, as often happens in the field of emotion analysis, into basic emotional categories, but rather into four concomitant but independent dimensions, charac- terized by six levels of activation, which determine the intensity of the expressed/perceived emotion as a f loat ∈ [-1,+1]. Such lev- els are also labeled as a set of 24 basic emotions (six for each of the affective dimensions) in a way that allows the model to specify the affective information associated with text both in a dimensional and in a discrete form. Figure 3: XML file resulting from querying about the seman- tics of celebrate_special_occasion 4. BUILDING AND USING THE API Currently available lexical resources for opinion polarity and af- fect recognition such as SentiWordNet or WNA are known to be Thanks to CF-IOF weights, it is possible to filter out common pretty noisy and limited. These resources, in fact, mainly pro- concepts and detect domain-dependent concepts that individualize vide opinion polarity and affective information at syntactical level, topics typically found in online opinions such as art, food, music, leaving out polarity and affective information for common-sense politics, family, entertainment, photography, travel, and technol- knowledge concepts like celebrate_special_occasion, ogy. These concepts represent seed concepts for spectral associa- accomplish_goal, bad_feeling, be_on_cloud_nine, tion, which spreads their values across the ConceptNet graph. In or lose_temper, which are usually found in natural language particular, in order to accordingly limit the spreading activation of text to express viewpoints and affect. ConceptNet nodes, the rest of the concepts detected via CF-IOF are In order to build a comprehensive resource for opinion mining given as negative inputs to spectral association so that just domain- and sentiment analysis, we use the techniques described in Sec- specific concepts are selected. tion 3 to extract both cognitive and affective information from nat- ural language text in a way that it is possible to map it into a fixed 4.2 Extracting Sentics structure. In particular, we propose to bridge the cognitive and The extraction of sentics associated with common-sense knowl- affective gap between word-level natural language data and their edge concepts is performed through the combined use of Affec- relative concept-level opinions and sentiments, by building seman- tiveSpace and the Hourglass model. In particular, we discard all tics and sentics on top of them (Figure 2). To this end, the Sentic but the first 100 singular values of the SVD and organize the result- API provides polarity (a float number between -1 and +1 that indi- ing vector space using a k-medoids clustering approach [17], with cates whether a concept is positive or negative), semantics (a set of respect to the Hourglass of Emotions (i.e., by using the model’s five semantically-related concepts) and sentics (affective informa- labels as ‘centroid concepts’). tion in terms of the Hourglass affective dimensions) associated with By calculating the relative distances (dot product) of each con- 15,000 natural language concepts. This information is encoded in cept from the different centroids, it is possible to calculate its af- RDF/XML using the descriptors defined by Human Emotion On- fective valence in terms of Pleasantness, Attention, Sensitivity and tology (HEO) [11]. Aptitude, which is stored in the form of a four-dimensional vector, called sentic vector. 4.1 Extracting Semantics The extraction of semantics associated with common-sense knowl- 4.3 Encoding Semantics and Sentics edge concepts is performed through the ensemble application of In order to represent the Sentic API in a machine-accessible and spectral association and CF-IOF on the graph structure of Concept- machine-processable way, results are encoded in RDF triples us- Net. In particular, we apply CF-IOF on a set of 10,000 topic-tagged ing a XML syntax (Figure 3). In particular, concepts are identi- posts extracted from LiveJournal1 , a virtual community of more fied using the ConceptNet Web API and statements are encoded in than 23 million who are allowed to label their posts not only with RDF/XML format on the base of HEO. Statements have forms such a topic tag but also with a mood label, by choosing from more than as concept – hasPlesantness – pleasantnessValue, concept – hasPo- 130 predefined moods or by creating custom mood themes. larity – polarityValue, and concept – isSemanticallyRelatedTo – concept. 1 http://livejournal.com Given the concept celebrate_special_occasion, for ex- ample, the Sentic API provides a set of semantically related con- cepts, e.g., celebrate_birthday, and a sentic vector speci- fying Pleasantness, Attention, Sensitivity and Aptitude associated with the concept (which can be decoded into the emotions of ec- stasy and anticipation and from which a positive polarity value can be inferred). Encoding semantics and sentics in RDF/XML using the descrip- tors defined by HEO allows cognitive and affective information to be stored in a Sesame triple-store, a purpose-built database for the storage and retrieval of RDF metadata. Sesame can be embedded in applications and used to conduct a wide range of inferences on the information stored, based on RDFS and OWL type relations be- tween data. In addition, it can also be used in a standalone server mode, much like a traditional database with multiple applications Figure 2: The semantics and sentics stack connecting to it. 22 · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 4.4 Exploiting Semantics and Sentics Table 1: Structured output example Thanks to its Semantic Web aware format, the Sentic API is Opinion Target Category Moods Polarity very easy to interface with any real-world application that needs ‘iphone4’ ‘phones’, ‘ecstasy’, +0.71 to extract semantics and sentics from natural language. This cog- ‘electronics’ ‘interest’ nitive and affective information is supplied both at category-level ‘speaker’ ‘electronics’, ‘annoyance’ -0.34 (through domain and sentic labels) and dimensional-level (through ‘music’ polarity values and sentic vectors). ‘touchscreen’ ‘electronics’ ‘ecstasy’, +0.82 ‘anticipation’ Sentic labels, in particular, are useful in case we deal with real- ‘camera’ ‘photography’, ‘acceptance’ +0.56 time adaptive applications (in which, for example, the style of an ‘electronics’ interface or the expression of an avatar has to quickly change ac- cording to labels such as ‘excitement’ or ‘frustration’ detected from user input). Polarity values and sentic vectors, in turn, are useful for After feeding the extracted concepts to the Sentic API, we can tasks such as information retrieval and polarity detection (in which exploit semantics and sentics to detect opinion targets and obtain, it is needed to process batches of documents and, hence, perform for each of these, the relative affective information both in a dis- calculations, such as addition, subtraction, and average, on both crete way (with one or more emotional labels) and in a dimensional conceptual and affective information). way (with a polarity value ∈ [-1,+1]) as shown in Table 1. Averaging results obtained at category-level is also possible by using a continuous 2D space whose dimensions are evaluation and activation, but the best strategy is usually to consider the opinion- 5. EVALUATION ated document as composed of small bags of concepts (SBoCs) and As a use case evaluation of the proposed API, we select the prob- feed these into the Sentic API to perform statistical analysis of the lem of crowd validation of the UK national health service (NHS) resulting sentic vectors. [5], that is, the exploitation of the wisdom of patients to adequately To this end, we use a pre-processing module that interprets all the validate the official hospital ratings made available by UK health- affective valence indicators usually contained in text such as spe- care providers and NHS Choices2 . To validate such data, we exploit cial punctuation, complete upper-case words, onomatopoeic repe- patient stories extracted from PatientOpinion3 , a social enterprise titions, exclamation words, negations, degree adverbs and emoti- providing an online feedback service for users of the UK NHS. cons, and eventually lemmatizes text. The problem is that this social information is often stored in natural A semantic parser then deconstructs text into concepts using a language text and, hence, intrinsically unstructured, which makes lexicon based on ‘sentic n-grams’, i.e., sequences of lexemes which comparison with the structured information supplied by health-care represent multiple-word common-sense and affective concepts ex- providers very difficult. To bridge the gap between such data (which tracted from ConceptNet, WNA and other linguistic resources. We are different at structure-level yet similar at concept-level), we ex- then use the resulting SBoC as input for the Sentic API and look ploit the Sentic API to marshal PatientOpinion’s social informa- up into it in order to obtain the relative sentic vectors, which we tion in a machine-accessible and machine-processable format and, average in order to detect primary and secondary moods conveyed hence, compare it with the official hospital ratings provided by by the analyzed text and/or its polarity, given by the formula [6]: NHS Choices and each NHS trust. N In particular, we use Sentic API’s inferred ratings to validate the X P lsnt(ci ) + |Attnt(ci )| − |Snst(ci )| + Aptit(ci ) p= information declared by the relevant health-care providers, crawled i=1 3N separately from each NHS trust website, and the official NHS ranks, extracted using the NHS Choices API4 . This kind of data usually where N is the size of the SBoC. As an example of how the consists of ratings that associate a polarity value to specific features Sentic API can be exploited for microblogging analysis, interme- of health-care providers such as ‘communication’, ‘food’, ‘park- diate and final outputs obtained when a natural language opinion is ing’, ‘service’, ‘staff’, and ‘timeliness’. The polarity can be either given as input to the system can be examined. The tweet “I think a number in a fixed range or simply a flag (positive/negative). iPhone4 is the top of the heap! OK, the speaker is not the best i hv Since each patient opinion can regard more than one topic and ever seen bt touchscreen really puts me on cloud 9... camera looks the polarity values associated with each topic are often independent pretty good too!” is selected. After the pre-processing and seman- from each other, we need to extract, from each opinion, a set of tic parsing operations, the following SBoCs are obtained: topics and then, from each topic detected, the polarity associated with it. Thus, after deconstructing each opinion into a set of SBoCs, SBoC#1: we analyze these through Sentic API in order to tag each SBoC with one of the relevant topics (if any) and calculate a polarity value. We ran this process on a set of 857 topic- and polarity-tagged short stories extracted from PatientOpinion database and computed recall SBoC#2: and precision rates as evaluation metrics. As for the SBoC categorization, results showed that the Sentic API can detect topics in patient stories with satisfactory accuracy. In particular, the classification of stories about ‘food’ and ‘commu- nication’ was performed with a precision of 80.2% and 73.4% and SBoC#3: recall rates of 69.8% and 61.4%, for a total F-measure of 74.6% and 66.8%, respectively. 2 SBoC#4: http://nhs.uk 3 http://patientopinion.org.uk 4 http://data.gov.uk/data 23 · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014 and V. Muller, editors, Cognitive Behavioral Systems, Table 2: Comparative evaluation against WNA and SenticNet volume 7403 of Lecture Notes in Computer Science, pages Category WNA SenticNet Sentic API clinical service 59.12% 69.52% 78.06% 144–157. Springer, Berlin Heidelberg, 2012. communication 66.81% 76.35% 80.12% [7] E. Cambria, R. Speer, C. Havasi, and A. Hussain. SenticNet: food 67.95% 83.61% 85.94% A publicly available semantic resource for opinion mining. parking 63.02% 75.09% 79.42% In AAAI CSK, pages 14–18, Arlington, 2010. staff 58.37% 67.90% 76.19% [8] C. Eckart and G. Young. The approximation of one matrix by timeliness 57.98% 66.00% 75.98% another of lower rank. Psychometrika, 1(3):211–218, 1936. [9] A. Esuli and F. Sebastiani. SentiWordNet: A publicly available lexical resource for opinion mining. In LREC, As for the polarity detection, in turn, positivity and negativity 2006. of patient opinions were identified with particularly high precision [10] L. García-Moya, H. Anaya-Sanchez, and (91.4% and 86.9%, respectively) and good recall rates (81.2% and R. Berlanga-Llavori. A language model approach for 74.3%), for a total F-measure of 85.9% and 80.1%, respectively. retrieving product features and opinions from customer More detailed comparative statistics are listed in Table 2, where the reviews. IEEE Intelligent Systems, 28(3):19–27, 2013. Sentic API is compared against WNA and SenticNet with respect to [11] M. Grassi. Developing HEO human emotions ontology. the polarity detection F-measures obtained on the 857 short stories. volume 5707 of Lecture Notes in Computer Science, pages 244–251. Springer, Berlin Heidelberg, 2009. 6. CONCLUSION [12] C. Havasi, R. Speer, and J. Holmgren. Automated color Today user-generated contents are perfectly suitable for human selection using semantic knowledge. In AAAI CSK, consumption, but they remain hardly accessible to machines. Cur- Arlington, 2010. rently available information retrieval tools still have to face a lot of [13] R. Horn and C. Johnson. Norms for vectors and matrices. In limitations. To bridge the conceptual and affective gap between Matrix Analysis, chapter 5. Cambridge University Press, word-level natural language data and the concept-level opinions 1990. and sentiments conveyed by them, we developed Sentic API, a [14] C. Hung and H.-K. Lin. Using objective words in common-sense based application programming interface that pro- SentiWordNet to improve sentiment classification for word vides semantics and sentics associated with 15,000 natural lan- of mouth. IEEE Intelligent Systems, 28(2):47–54, 2013. guage concepts. [15] C. Lanczos. An iteration method for the solution of the We showed how Sentic API can easily be embedded in real- eigenvalue problem of linear differential and integral world NLP applications, specifically in the field of microblogging operators. Journal of Research of The National Bureau of analysis, where statistical methods usually fail as syntax-based text Standards, 45(4):255–282, 1950. processing works well only on formal-English documents and af- [16] M. Minsky. The Emotion Machine: Commonsense Thinking, ter training on big text corpora. We are keeping on developing the Artificial Intelligence, and the Future of the Human Mind. resource in a way that it can be continuously enhanced with more Simon & Schuster, New York, 2006. concepts from the always-growing Open Mind corpus and other [17] H. Park and C. Jun. A simple and fast algorithm for publicly available common and common-sense knowledge bases. k-medoids clustering. Expert Systems with Applications, We are also developing novel techniques and tools to allow the Sen- 36(2):3336–3341, 2009. tic API to be more easily merged with external domain-dependent knowledge bases, in order to improve the extraction of semantics [18] R. Plutchik. The nature of emotions. American Scientist, and sentics from many different types of media and contexts. 89(4):344–350, 2001. [19] S. Poria, A. Gelbukh, A. Hussain, D. Das, and S. Bandyopadhyay. Enhanced SenticNet with affective labels 7. REFERENCES for concept-based opinion mining. IEEE Intelligent Systems, [1] C. Bazzanella. Emotions, language and context. In 28(2):31–30, 2013. E. Weigand, editor, Emotion in dialogic interaction. [20] R. Speer and C. Havasi. ConceptNet 5: A large semantic Advances in the complex, pages 59–76. Benjamins, network for relational knowledge. In E. Hovy, M. Johnson, Amsterdam, 2004. and G. Hirst, editors, Theory and Applications of Natural [2] C. Bosco, V. Patti, and A. Bolioli. Developing corpora for Language Processing, chapter 6. Springer, 2012. sentiment analysis and opinion mining: A survey and the [21] C. Strapparava and A. Valitutti. WordNet-Affect: An Senti-TUT case study. IEEE Intelligent Systems, affective extension of WordNet. In LREC, pages 1083–1086, 28(2):55–63, 2013. Lisbon, 2004. [3] M. Bradley and P. Lang. Affective norms for english words [22] A. Tsai, R. Tsai, and J. Hsu. Building a concept-level (ANEW): Stimuli, instruction manual and affective ratings. sentiment dictionary based on commonsense knowledge. Technical report, The Center for Research in IEEE Intelligent Systems, 28(2):22–30, 2013. Psychophysiology, University of Florida., 1999. [23] M. Wall, A. Rechtsteiner, and L. Rocha. Singular value [4] E. Cambria and A. Hussain. Sentic Computing: Techniques, decomposition and principal component analysis. In Tools, and Applications. Springer, Dordrecht, Netherlands, D. Berrar, W. Dubitzky, and M. Granzow, editors, A 2012. Practical Approach to Microarray Data Analysis, pages [5] E. Cambria, A. Hussain, C. Havasi, C. Eckl, and J. Munro. 91–109. Springer, 2003. Towards crowd validation of the UK national health service. [24] A. Weichselbraun, S. Gindl, and A. Scharl. Extracting and In WebSci, Raleigh, 2010. grounding context-aware sentiment lexicons. IEEE [6] E. Cambria, A. Livingstone, and A. Hussain. The hourglass Intelligent Systems, 28(2):39–46, 2013. of emotions. In A. Esposito, A. Vinciarelli, R. Hoffmann, 24 · #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014