=Paper= {{Paper |id=Vol-1141/paper_02 |storemode=property |title=Sentic API: A Common and Common-Sense Knowledge API for Cognition-Driven Sentiment Analysis |pdfUrl=https://ceur-ws.org/Vol-1141/paper_02.pdf |volume=Vol-1141 |dblpUrl=https://dblp.org/rec/conf/msm/CambriaPGK14 }} ==Sentic API: A Common and Common-Sense Knowledge API for Cognition-Driven Sentiment Analysis== https://ceur-ws.org/Vol-1141/paper_02.pdf
                                                                       Sentic API
               A Common-Sense Based API for Concept-Level Sentiment Analysis
                                                                   http://sentic.net/api


                                    Erik Cambria                                                       Soujanya Poria
                            Temasek Laboratories                                      School of Electrical & Electronic Engineering
                       National University of Singapore                                   Nanyang Technological University
                            cambria@nus.edu.sg                                                      sporia@ntu.edu.sg
                             Alexander Gelbukh                                                        Kenneth Kwok
                     Center for Computing Research                                                 Temasek Laboratories
                  National Polytechnic Institute of Mexico                                    National University of Singapore
                              gelbukh@cic.ipn.mx                                                  kenkwok@nus.edu.sg

 ABSTRACT                                                                                But when it comes to interpreting sentences and extracting mean-
 The bag-of-concepts model can represent semantics associated with                    ingful information, their capabilities are known to be very limited.
 natural language text much better than bags-of-words. In the bag-                    Machine-learning algorithms, in fact, are limited by the fact that
 of-words model, in fact, a concept such as cloud_computing                           they can process only the information that they can ‘see’. As human
 would be split into two separate words, disrupting the semantics of                  text processors, we do not have such limitations as every word we
 the input sentence. Working at concept-level is important for tasks                  see activates a cascade of semantically related concepts, relevant
 such as opinion mining, especially in the case of microblogging                      episodes, and sensory experiences, all of which enable the comple-
 analysis. In this work, we present Sentic API, a common-sense                        tion of complex tasks – such as word-sense disambiguation, tex-
 based application programming interface for concept-level senti-                     tual entailment, and semantic role labeling – in a quick and effort-
 ment analysis, which provides semantics and sentics (that is, deno-                  less way. Machine learning techniques, moreover, are intrinsically
 tative and connotative information) associated with 15,000 natural                   meant for chunking numerical data. Through escamotages such as
 language concepts.                                                                   word frequency counting, it is indeed possible to apply such tech-
                                                                                      niques also in the context of natural language processing (NLP),
                                                                                      but it would be no different from trying to understand an image by
 Categories and Subject Descriptors                                                   solely looking at bits per pixel information.
 H.3.1 [Information Systems Applications]: Linguistic Process-                           Concept-level sentiment analysis, instead, focuses on a seman-
 ing; I.2.7 [Natural Language Processing]: Language parsing and                       tic analysis of text through the use of web ontologies or semantic
 understanding                                                                        networks, which allow the aggregation of conceptual and affective
                                                                                      information associated with natural language opinions. By rely-
                                                                                      ing on large semantic knowledge bases, such approaches step away
 General Terms                                                                        from blind use of keywords and word co-occurrence count, but
 Algorithms                                                                           rather rely on the implicit features associated with natural language
                                                                                      concepts. Unlike purely syntactical techniques, concept-based ap-
                                                                                      proaches are able to detect also sentiments that are expressed in a
 Keywords                                                                             subtle manner, e.g., through the analysis of concepts that do not
 Natural language processing; Sentiment analysis                                      explicitly convey any emotion, but which are implicitly linked to
                                                                                      other concepts that do so. The bag-of-concepts model can repre-
 1. INTRODUCTION                                                                      sent semantics associated with natural language much better than
                                                                                      bags-of-words. In the bag-of-words model, in fact, a concept such
    Hitherto, online information retrieval, aggregation, and process-
                                                                                      as cloud_computing would be split into two separate words,
 ing have mainly been based on algorithms relying on the textual
                                                                                      disrupting the semantics of the input sentence (in which, for ex-
 representation of webpages. Such algorithms are very good at re-
                                                                                      ample, the word cloud could wrongly activate concepts related to
 trieving texts, splitting them into parts, checking the spelling and
                                                                                      weather).
 counting the number of words.
                                                                                         By allowing for the inference of semantics and sentics, the anal-
                                                                                      ysis at concept-level enables a comparative fine-grained feature-
 Permission to make digital or hard copies of all or part of this work for            based sentiment analysis. Rather than gathering isolated opinions
 personal or classroom use is granted without fee provided that copies are not
                                                                                      about a whole item (e.g., iPhone 5S or Galaxy S5), users are gen-
 made or distributed for profit or commercial advantage and that copies bear
 this notice and
 Copyright      c the
                   2014fullheld
                            citation
                                 by on  the first page. Copyrights
                                     author(s)/owner(s);    copyingfor permitted
                                                                       components     erally more interested in comparing different products according
 of thisfor
 only     work  owned
             private  andby academic
                            others thanpurposes.
                                         ACM must be honored. Abstracting with        to their specific features (e.g., iPhone 5S’s vs Galaxy S5’s touch-
 credit  is permitted.
 Published     as part Toof copy  otherwise, or republish,
                            the #Microposts2014            to postproceedings,
                                                       Workshop    on servers or to   screen), or sub-features (e.g., fragility of iPhone 5S’s vs Galaxy
 available online as CEUR Vol-1141 (http://ceur-ws.org/Vol-1141)
 redistribute  to lists, requires prior specific permission and/or a fee. Request     S5’s touchscreen). In this context, the construction of compre-
 permissions    from permissions@acm.org.
 #Microposts2014,         April 7th, 2014, Seoul, Korea.                              hensive common and common-sense knowledge bases is key for
 WWW’14 Companion, April 7-11, 2014, Seoul, Korea.
                                                                                      feature-spotting and polarity detection, respectively.
 Copyright 2014 ACM 978-1-4503-2745-9/14/04 ...$15.00.




· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
    Common-sense, in particular, is necessary to properly decon-        3.    TECHNIQUES ADOPTED
 struct natural language text into sentiments– for example, to ap-         In this work, we exploit the ensemble application of spectral as-
 praise the concept small_room as negative for a hotel review           sociation [12], an approximation of many steps of spreading activa-
 and small_queue as positive for a post office, or the concept          tion, and CF-IOF (concept frequency - inverse opinion frequency),
 go_read_the_book as positive for a book review but negative            an approach similar to TF-IDF weighting, to extract semantics from
 for a movie review.                                                    ConceptNet [20], a semantic network of common-sense knowl-
    The rest of the paper is organized as follows: Section 2 presents   edge. The extraction of sentics, in turn, is performed through the
 available resources for concept-level sentiment analysis; Section 3    combined use of AffectiveSpace [4], a multi-dimensional vector
 illustrates the techniques exploited to build the Sentic API; Sec-     space representation of affective common-sense knowledge, and
 tion 4 describes in detail how the API is developed and how it can     the Hourglass of Emotions [6], a brain-inspired emotion catego-
 be used; Section 5 proposes an evaluation of the API; finally, Sec-    rization model.
 tion 6 concludes the paper and suggests further research directions.
                                                                        3.1    Spectral Association
 2. RELATED WORK                                                           Spectral association is a technique that involves assigning acti-
    Commonly used resources for concept-level sentiment analysis        vations to ‘seed concepts’ and applying an operation that spreads
 include ANEW [3], WordNet-Affect (WNA) [21], ISEAR [1], Sen-           their values across the graph structure of ConceptNet. This opera-
 tiWordNet [9], and SenticNet [7]. In [22], for example, a concept-     tion transfers the most activation to concepts that are connected to
 level sentiment dictionary is built through a two-step method com-     the key concepts by short paths or many different paths in common-
 bining iterative regression and random walk with in-link normaliza-    sense knowledge.
 tion. ANEW and SenticNet are exploited for propagating sentiment          In particular, we build a matrix C that relates concepts to other
 values based on the assumption that semantically related concepts      concepts, instead of their features, and add up the scores over all
 share common sentiment. Moreover, polarity accuracy, Kendall           relations that relate one concept to another, disregarding direction.
 distance, and average-maximum ratio are used, in stead of mean         Applying C to a vector containing a single concept spreads that
 error, to better evaluate sentiment dictionaries.                      concept’s value to its connected concepts. Applying C 2 spreads
    A similar approach is adopted in [19], which presents a method-     that value to concepts connected by two links (including back to
 ology for enriching SenticNet concepts with affective information      the concept itself). As we aim to spread the activation through any
 by assigning an emotion label to them. Authors use various features    number of links, with diminishing returns, the operator we want is:
 extracted from ISEAR, as well as similarity measures that rely on                                  C2   C3
 the polarity data provided in SenticNet (those based on WNA) and                        1+C +         +    + ... = eC
                                                                                                    2!   3!
 ISEAR distance-based measures, including point-wise mutual in-
 formation, and emotional affinity. Another recent work that builds        We can calculate this odd operator, eC , because we can fac-
 upon an existing affective knowledge base is [14], which proposes      tor C. C is already symmetric, so instead of applying Lanczos’
 the re-evaluation of objective words in SentiWordNet by assess-        method [15] to CC T and getting the singular value decomposition
 ing the sentimental relevance of such words and their associated       (SVD), we can apply it directly to C and get the spectral decom-
 sentiment sentences. Two sampling strategies are proposed and in-      position C = V ΛV T . As before, we can raise this expression to
 tegrated with support vector machines for sentiment classification.    any power and cancel everything but the power of Λ. Therefore,
 According to the experiments, the proposed approach significantly      eC = V eΛ V T . This simple twist on the SVD lets us calculate
 outperforms the traditional sentiment mining approach, which ig-       spreading activation over the whole matrix instantly. We can trun-
 nores the importance of objective words in SentiWordNet. In [2],       cate this matrix to k axes and therefore save space while generaliz-
 the main issues related to the development of a corpus for opinion     ing from similar concepts. We can also rescale the matrix, so that
 mining and sentiment analysis are discussed both by surveying the      activation values have a maximum of 1 and do not tend to collect in
 existing work in this area and presenting, as a case study, an on-     highly-connected concepts, by normalizing the truncated rows of
 going project for Italian, called Senti-TUT, where a corpus for the    V eΛ/2 to unit vectors, and multiplying that matrix by its transpose
 investigation of irony about politics in social media is developed.    to get a rescaled version of V eΛ V T .
    Other work explores the ensemble application of knowledge
 bases and statistical methods. In [24], for example, a hybrid ap-      3.2    CF-IOF Weighting
 proach to combine lexical analysis and machine learning is pro-           CF-IOF is a technique that identifies common topic-dependent
 posed in order to cope with ambiguity and integrate the context of     semantics in order to evaluate how important a concept is to a set
 sentiment terms. The context-aware method identifies ambiguous         of opinions concerning the same topic. It is hereby used to feed
 terms that vary in polarity depending on the context and stores them   spectral association with ‘seed concepts’. Firstly, the frequency of
 in contextualized sentiment lexicons. In conjunction with semantic     a concept c for a given domain d is calculated by counting the oc-
 knowledge bases, these lexicons help ground ambiguous sentiment        currences of the concept c in the set of available d-tagged opinions
 terms to concepts that correspond to their polarity.                   and dividing the result by the sum of number of occurrences of all
    More machine-learning based works include [10], which intro-        concepts in the set of opinions concerning d. This frequency is then
 duces a new methodology for the retrieval of product features and      multiplied by the logarithm of the inverse frequency of the concept
 opinions from a collection of free-text customer reviews about a       in the whole collection of opinions, that is:
 product or service. Such a methodology relies on a language mod-                                       nc,d        X nk
 eling framework that can be applied to reviews in any domain and                    CF -IOFc,d = P             log
                                                                                                         k nk,d         nc
 language provided with a seed set of opinion words. The methodol-                                                  k

 ogy combines both a kernel-based model of opinion words (learned          where nc,d is the number of occurrences of concept c in the set
 from the seed set of opinion words) and a statistical mapping be-      of opinions tagged as d, nk is the total number of concept occur-
 tween words to approximate a model of product features from which      rences and nc is the number of occurrences of c in the whole set of
 the retrieval is carried out.                                          opinions.




                                                                                                                                        20
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
    A high weight in CF-IOF is reached by a high concept frequency            Concept similarity does not depend on their absolute positions
 (in the given opinions) and a low opinion frequency of the concept        in the vector space, but rather on the angle they make with the ori-
 in the whole collection of opinions. Therefore, thanks to CF-IOF          gin. For example we can find concepts such as beautiful_day,
 weights, it is possible to filter out common concepts and detect          birthday_party, laugh and make_person_happy very
 relevant topic-dependent semantics.                                       close in direction in the vector space, while concepts like sick,
                                                                           feel_guilty, be_laid_off and shed_tear are found in a
 3.3     AffectiveSpace                                                    completely different direction (nearly opposite with respect to the
    To extract sentics from natural language text, we use AffectiveS-      centre of the space).
 pace, a multi-dimensional vector space built upon ConceptNet and
 WNA. The alignment operation operated over these two knowl-               3.4    The Hourglass of Emotions
 edge bases yields a matrix, A, in which common-sense and affec-
 tive knowledge coexist, i.e., a matrix 15,000 × 118,000 whose rows           To reason on the disposition of concepts in AffectiveSpace, we
                                                                           use the Hourglass of Emotions (Figure 1), an affective categoriza-
 are concepts (e.g., dog or bake_cake), whose columns are either
                                                                           tion model developed starting from Plutchik’s studies on human
 common-sense and affective features (e.g., isA-pet or hasEmotion-
 joy), and whose values indicate truth values of assertions.               emotions [18]. In the model, sentiments are reorganized around
    Therefore, in A, each concept is represented by a vector in the        four independent dimensions whose different levels of activation
                                                                           make up the total emotional state of the mind. The Hourglass of
 space of possible features whose values are positive for features that
                                                                           Emotions, in fact, is based on the idea that the mind is made of dif-
 produce an assertion of positive valence (e.g., ‘a penguin is a bird’),
 negative for features that produce an assertion of negative valence       ferent independent resources and that emotional states result from
 (e.g., ‘a penguin cannot fly’) and zero when nothing is known about       turning some set of these resources on and turning another set of
 the assertion. The degree of similarity between two concepts, then,       them off [16].
                                                                              The primary quantity we can measure about an emotion we feel
 is the dot product between their rows in A. The value of such a dot
                                                                           is its strength. But when we feel a strong emotion it is because
 product increases whenever two concepts are described with the
 same feature and decreases when they are described by features that       we feel a very specific emotion. And, conversely, we cannot feel
 are negations of each other. In particular, we use truncated SVD          a specific emotion like ‘fear’ or ‘amazement’ without that emotion
                                                                           being reasonably strong. Mapping this space of possible emotions
 [23] in order to obtain a new matrix containing both hierarchical
                                                                           leads to an hourglass shape.
 affective knowledge and common-sense.
    The resulting matrix has the form à = Uk Σk VkT and is a low-
 rank approximation of A, the original data. This approximation
 is based on minimizing the Frobenius norm [13] of the difference
 between A and à under the constraint rank(Ã) = k. For the
 Eckart–Young theorem [8] it represents the best approximation of
 A in the least-square sense, in fact:
           min         |A − Ã| =        min         |Σ − U ∗ ÃV |
       Ã|rank(Ã)=k                 Ã|rank(Ã)=k

                                =        min         |Σ − S|
                                     Ã|rank(Ã)=k

 assuming that à has the form à = U SV ∗ , where S is diagonal.
 From the rank constraint, i.e., S has k non-zero diagonal entries,
 the minimum of the above statement is obtained as follows:
                                     v
                                     u n
                                     uX
           min      |Σ − S| = min t        (σi − si )2 =
        Ã|rank(Ã)=k               si
                                            i=1
             v                        v
             u k               n      u X
             uX                X      u n
       = min t   (σi − si )2 +   σ2 = t   σ2 i                 i
          si
                 i=1                i=k+1            i=k+1


 Therefore, Ã of rank k is the best approximation of A in the Frobe-
 nius norm sense when σi = si (i = 1, ..., k) and the corresponding
 singular vectors are the same as those of A. If we choose to dis-
 card all but the first k principal components, common-sense con-
 cepts and emotions are represented by vectors of k coordinates:
 these coordinates can be seen as describing concepts in terms of
 ‘eigenmoods’ that form the axes of AffectiveSpace, i.e., the basis
 e0 ,...,ek−1 of the vector space. For example, the most significant
 eigenmood, e0 , represents concepts with positive affective valence.
 That is, the larger a concept’s component in the e0 direction is, the
 more affectively positive it is likely to be. Thus, by exploiting the
 information sharing property of truncated SVD, concepts with the
 same affective valence are likely to have similar features – that is,
 concepts conveying the same emotion tend to fall near each other
 in AffectiveSpace.                                                                         Figure 1: The Hourglass model




                                                                                                                                           21
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
    In the model, affective states are not classified, as often happens
 in the field of emotion analysis, into basic emotional categories, but
 rather into four concomitant but independent dimensions, charac-
 terized by six levels of activation, which determine the intensity of
 the expressed/perceived emotion as a f loat ∈ [-1,+1]. Such lev-
 els are also labeled as a set of 24 basic emotions (six for each of
 the affective dimensions) in a way that allows the model to specify
 the affective information associated with text both in a dimensional
 and in a discrete form.                                                  Figure 3: XML file resulting from querying about the seman-
                                                                          tics of celebrate_special_occasion
 4. BUILDING AND USING THE API
    Currently available lexical resources for opinion polarity and af-
 fect recognition such as SentiWordNet or WNA are known to be                Thanks to CF-IOF weights, it is possible to filter out common
 pretty noisy and limited. These resources, in fact, mainly pro-          concepts and detect domain-dependent concepts that individualize
 vide opinion polarity and affective information at syntactical level,    topics typically found in online opinions such as art, food, music,
 leaving out polarity and affective information for common-sense          politics, family, entertainment, photography, travel, and technol-
 knowledge concepts like celebrate_special_occasion,                      ogy. These concepts represent seed concepts for spectral associa-
 accomplish_goal, bad_feeling, be_on_cloud_nine,                          tion, which spreads their values across the ConceptNet graph. In
 or lose_temper, which are usually found in natural language              particular, in order to accordingly limit the spreading activation of
 text to express viewpoints and affect.                                   ConceptNet nodes, the rest of the concepts detected via CF-IOF are
    In order to build a comprehensive resource for opinion mining         given as negative inputs to spectral association so that just domain-
 and sentiment analysis, we use the techniques described in Sec-          specific concepts are selected.
 tion 3 to extract both cognitive and affective information from nat-
 ural language text in a way that it is possible to map it into a fixed   4.2    Extracting Sentics
 structure. In particular, we propose to bridge the cognitive and
                                                                             The extraction of sentics associated with common-sense knowl-
 affective gap between word-level natural language data and their
                                                                          edge concepts is performed through the combined use of Affec-
 relative concept-level opinions and sentiments, by building seman-
                                                                          tiveSpace and the Hourglass model. In particular, we discard all
 tics and sentics on top of them (Figure 2). To this end, the Sentic
                                                                          but the first 100 singular values of the SVD and organize the result-
 API provides polarity (a float number between -1 and +1 that indi-
                                                                          ing vector space using a k-medoids clustering approach [17], with
 cates whether a concept is positive or negative), semantics (a set of
                                                                          respect to the Hourglass of Emotions (i.e., by using the model’s
 five semantically-related concepts) and sentics (affective informa-
                                                                          labels as ‘centroid concepts’).
 tion in terms of the Hourglass affective dimensions) associated with
                                                                             By calculating the relative distances (dot product) of each con-
 15,000 natural language concepts. This information is encoded in
                                                                          cept from the different centroids, it is possible to calculate its af-
 RDF/XML using the descriptors defined by Human Emotion On-
                                                                          fective valence in terms of Pleasantness, Attention, Sensitivity and
 tology (HEO) [11].
                                                                          Aptitude, which is stored in the form of a four-dimensional vector,
                                                                          called sentic vector.
 4.1     Extracting Semantics
    The extraction of semantics associated with common-sense knowl-       4.3    Encoding Semantics and Sentics
 edge concepts is performed through the ensemble application of              In order to represent the Sentic API in a machine-accessible and
 spectral association and CF-IOF on the graph structure of Concept-       machine-processable way, results are encoded in RDF triples us-
 Net. In particular, we apply CF-IOF on a set of 10,000 topic-tagged      ing a XML syntax (Figure 3). In particular, concepts are identi-
 posts extracted from LiveJournal1 , a virtual community of more          fied using the ConceptNet Web API and statements are encoded in
 than 23 million who are allowed to label their posts not only with       RDF/XML format on the base of HEO. Statements have forms such
 a topic tag but also with a mood label, by choosing from more than       as concept – hasPlesantness – pleasantnessValue, concept – hasPo-
 130 predefined moods or by creating custom mood themes.                  larity – polarityValue, and concept – isSemanticallyRelatedTo –
                                                                          concept.
 1
     http://livejournal.com                                                  Given the concept celebrate_special_occasion, for ex-
                                                                          ample, the Sentic API provides a set of semantically related con-
                                                                          cepts, e.g., celebrate_birthday, and a sentic vector speci-
                                                                          fying Pleasantness, Attention, Sensitivity and Aptitude associated
                                                                          with the concept (which can be decoded into the emotions of ec-
                                                                          stasy and anticipation and from which a positive polarity value can
                                                                          be inferred).
                                                                             Encoding semantics and sentics in RDF/XML using the descrip-
                                                                          tors defined by HEO allows cognitive and affective information to
                                                                          be stored in a Sesame triple-store, a purpose-built database for the
                                                                          storage and retrieval of RDF metadata. Sesame can be embedded
                                                                          in applications and used to conduct a wide range of inferences on
                                                                          the information stored, based on RDFS and OWL type relations be-
                                                                          tween data. In addition, it can also be used in a standalone server
                                                                          mode, much like a traditional database with multiple applications
            Figure 2: The semantics and sentics stack                     connecting to it.




                                                                                                                                           22
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
 4.4     Exploiting Semantics and Sentics
                                                                                            Table 1: Structured output example
     Thanks to its Semantic Web aware format, the Sentic API is                 Opinion Target Category           Moods                  Polarity
 very easy to interface with any real-world application that needs              ‘iphone4’        ‘phones’,        ‘ecstasy’,             +0.71
 to extract semantics and sentics from natural language. This cog-                               ‘electronics’    ‘interest’
 nitive and affective information is supplied both at category-level            ‘speaker’        ‘electronics’,   ‘annoyance’            -0.34
 (through domain and sentic labels) and dimensional-level (through                               ‘music’
 polarity values and sentic vectors).                                           ‘touchscreen’    ‘electronics’    ‘ecstasy’,             +0.82
                                                                                                                  ‘anticipation’
     Sentic labels, in particular, are useful in case we deal with real-
                                                                                ‘camera’         ‘photography’, ‘acceptance’             +0.56
 time adaptive applications (in which, for example, the style of an                              ‘electronics’
 interface or the expression of an avatar has to quickly change ac-
 cording to labels such as ‘excitement’ or ‘frustration’ detected from
 user input). Polarity values and sentic vectors, in turn, are useful for
                                                                               After feeding the extracted concepts to the Sentic API, we can
 tasks such as information retrieval and polarity detection (in which
                                                                            exploit semantics and sentics to detect opinion targets and obtain,
 it is needed to process batches of documents and, hence, perform
                                                                            for each of these, the relative affective information both in a dis-
 calculations, such as addition, subtraction, and average, on both
                                                                            crete way (with one or more emotional labels) and in a dimensional
 conceptual and affective information).
                                                                            way (with a polarity value ∈ [-1,+1]) as shown in Table 1.
     Averaging results obtained at category-level is also possible by
 using a continuous 2D space whose dimensions are evaluation and
 activation, but the best strategy is usually to consider the opinion-      5.     EVALUATION
 ated document as composed of small bags of concepts (SBoCs) and               As a use case evaluation of the proposed API, we select the prob-
 feed these into the Sentic API to perform statistical analysis of the      lem of crowd validation of the UK national health service (NHS)
 resulting sentic vectors.                                                  [5], that is, the exploitation of the wisdom of patients to adequately
     To this end, we use a pre-processing module that interprets all the    validate the official hospital ratings made available by UK health-
 affective valence indicators usually contained in text such as spe-        care providers and NHS Choices2 . To validate such data, we exploit
 cial punctuation, complete upper-case words, onomatopoeic repe-            patient stories extracted from PatientOpinion3 , a social enterprise
 titions, exclamation words, negations, degree adverbs and emoti-           providing an online feedback service for users of the UK NHS.
 cons, and eventually lemmatizes text.                                      The problem is that this social information is often stored in natural
     A semantic parser then deconstructs text into concepts using a         language text and, hence, intrinsically unstructured, which makes
 lexicon based on ‘sentic n-grams’, i.e., sequences of lexemes which        comparison with the structured information supplied by health-care
 represent multiple-word common-sense and affective concepts ex-            providers very difficult. To bridge the gap between such data (which
 tracted from ConceptNet, WNA and other linguistic resources. We            are different at structure-level yet similar at concept-level), we ex-
 then use the resulting SBoC as input for the Sentic API and look           ploit the Sentic API to marshal PatientOpinion’s social informa-
 up into it in order to obtain the relative sentic vectors, which we        tion in a machine-accessible and machine-processable format and,
 average in order to detect primary and secondary moods conveyed            hence, compare it with the official hospital ratings provided by
 by the analyzed text and/or its polarity, given by the formula [6]:        NHS Choices and each NHS trust.
        N                                                                      In particular, we use Sentic API’s inferred ratings to validate the
        X P lsnt(ci ) + |Attnt(ci )| − |Snst(ci )| + Aptit(ci )
   p=                                                                       information declared by the relevant health-care providers, crawled
        i=1
                                       3N                                   separately from each NHS trust website, and the official NHS ranks,
                                                                            extracted using the NHS Choices API4 . This kind of data usually
    where N is the size of the SBoC. As an example of how the               consists of ratings that associate a polarity value to specific features
 Sentic API can be exploited for microblogging analysis, interme-           of health-care providers such as ‘communication’, ‘food’, ‘park-
 diate and final outputs obtained when a natural language opinion is        ing’, ‘service’, ‘staff’, and ‘timeliness’. The polarity can be either
 given as input to the system can be examined. The tweet “I think           a number in a fixed range or simply a flag (positive/negative).
 iPhone4 is the top of the heap! OK, the speaker is not the best i hv          Since each patient opinion can regard more than one topic and
 ever seen bt touchscreen really puts me on cloud 9... camera looks         the polarity values associated with each topic are often independent
 pretty good too!” is selected. After the pre-processing and seman-         from each other, we need to extract, from each opinion, a set of
 tic parsing operations, the following SBoCs are obtained:                  topics and then, from each topic detected, the polarity associated
                                                                            with it. Thus, after deconstructing each opinion into a set of SBoCs,
   SBoC#1:                                                                  we analyze these through Sentic API in order to tag each SBoC with
                                                          one of the relevant topics (if any) and calculate a polarity value.
                                                        We ran this process on a set of 857 topic- and polarity-tagged short
                                                       stories extracted from PatientOpinion database and computed recall
   SBoC#2:                                                                  and precision rates as evaluation metrics.
                                                                As for the SBoC categorization, results showed that the Sentic
                                                        API can detect topics in patient stories with satisfactory accuracy.
                                                        In particular, the classification of stories about ‘food’ and ‘commu-
                                                            nication’ was performed with a precision of 80.2% and 73.4% and
   SBoC#3:                                                                  recall rates of 69.8% and 61.4%, for a total F-measure of 74.6%
                                                    and 66.8%, respectively.
     
                                                                            2
   SBoC#4:                                                                    http://nhs.uk
                                                                            3
                                                           http://patientopinion.org.uk
                                                                            4
                                                      http://data.gov.uk/data




                                                                                                                                                 23
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014
                                                                                 and V. Muller, editors, Cognitive Behavioral Systems,
 Table 2: Comparative evaluation against WNA and SenticNet                       volume 7403 of Lecture Notes in Computer Science, pages
  Category         WNA            SenticNet   Sentic API
  clinical service 59.12%         69.52%      78.06%                             144–157. Springer, Berlin Heidelberg, 2012.
  communication    66.81%         76.35%      80.12%                         [7] E. Cambria, R. Speer, C. Havasi, and A. Hussain. SenticNet:
  food             67.95%         83.61%      85.94%                             A publicly available semantic resource for opinion mining.
  parking          63.02%         75.09%      79.42%                             In AAAI CSK, pages 14–18, Arlington, 2010.
  staff            58.37%         67.90%      76.19%                         [8] C. Eckart and G. Young. The approximation of one matrix by
  timeliness       57.98%         66.00%      75.98%                             another of lower rank. Psychometrika, 1(3):211–218, 1936.
                                                                             [9] A. Esuli and F. Sebastiani. SentiWordNet: A publicly
                                                                                 available lexical resource for opinion mining. In LREC,
    As for the polarity detection, in turn, positivity and negativity            2006.
 of patient opinions were identified with particularly high precision       [10] L. García-Moya, H. Anaya-Sanchez, and
 (91.4% and 86.9%, respectively) and good recall rates (81.2% and                R. Berlanga-Llavori. A language model approach for
 74.3%), for a total F-measure of 85.9% and 80.1%, respectively.                 retrieving product features and opinions from customer
 More detailed comparative statistics are listed in Table 2, where the           reviews. IEEE Intelligent Systems, 28(3):19–27, 2013.
 Sentic API is compared against WNA and SenticNet with respect to
                                                                            [11] M. Grassi. Developing HEO human emotions ontology.
 the polarity detection F-measures obtained on the 857 short stories.
                                                                                 volume 5707 of Lecture Notes in Computer Science, pages
                                                                                 244–251. Springer, Berlin Heidelberg, 2009.
 6.    CONCLUSION                                                           [12] C. Havasi, R. Speer, and J. Holmgren. Automated color
    Today user-generated contents are perfectly suitable for human               selection using semantic knowledge. In AAAI CSK,
 consumption, but they remain hardly accessible to machines. Cur-                Arlington, 2010.
 rently available information retrieval tools still have to face a lot of   [13] R. Horn and C. Johnson. Norms for vectors and matrices. In
 limitations. To bridge the conceptual and affective gap between                 Matrix Analysis, chapter 5. Cambridge University Press,
 word-level natural language data and the concept-level opinions                 1990.
 and sentiments conveyed by them, we developed Sentic API, a                [14] C. Hung and H.-K. Lin. Using objective words in
 common-sense based application programming interface that pro-                  SentiWordNet to improve sentiment classification for word
 vides semantics and sentics associated with 15,000 natural lan-                 of mouth. IEEE Intelligent Systems, 28(2):47–54, 2013.
 guage concepts.                                                            [15] C. Lanczos. An iteration method for the solution of the
    We showed how Sentic API can easily be embedded in real-                     eigenvalue problem of linear differential and integral
 world NLP applications, specifically in the field of microblogging              operators. Journal of Research of The National Bureau of
 analysis, where statistical methods usually fail as syntax-based text           Standards, 45(4):255–282, 1950.
 processing works well only on formal-English documents and af-
                                                                            [16] M. Minsky. The Emotion Machine: Commonsense Thinking,
 ter training on big text corpora. We are keeping on developing the
                                                                                 Artificial Intelligence, and the Future of the Human Mind.
 resource in a way that it can be continuously enhanced with more
                                                                                 Simon & Schuster, New York, 2006.
 concepts from the always-growing Open Mind corpus and other
                                                                            [17] H. Park and C. Jun. A simple and fast algorithm for
 publicly available common and common-sense knowledge bases.
                                                                                 k-medoids clustering. Expert Systems with Applications,
 We are also developing novel techniques and tools to allow the Sen-
                                                                                 36(2):3336–3341, 2009.
 tic API to be more easily merged with external domain-dependent
 knowledge bases, in order to improve the extraction of semantics           [18] R. Plutchik. The nature of emotions. American Scientist,
 and sentics from many different types of media and contexts.                    89(4):344–350, 2001.
                                                                            [19] S. Poria, A. Gelbukh, A. Hussain, D. Das, and
                                                                                 S. Bandyopadhyay. Enhanced SenticNet with affective labels
 7.    REFERENCES                                                                for concept-based opinion mining. IEEE Intelligent Systems,
  [1] C. Bazzanella. Emotions, language and context. In                          28(2):31–30, 2013.
      E. Weigand, editor, Emotion in dialogic interaction.                  [20] R. Speer and C. Havasi. ConceptNet 5: A large semantic
      Advances in the complex, pages 59–76. Benjamins,                           network for relational knowledge. In E. Hovy, M. Johnson,
      Amsterdam, 2004.                                                           and G. Hirst, editors, Theory and Applications of Natural
  [2] C. Bosco, V. Patti, and A. Bolioli. Developing corpora for                 Language Processing, chapter 6. Springer, 2012.
      sentiment analysis and opinion mining: A survey and the               [21] C. Strapparava and A. Valitutti. WordNet-Affect: An
      Senti-TUT case study. IEEE Intelligent Systems,                            affective extension of WordNet. In LREC, pages 1083–1086,
      28(2):55–63, 2013.                                                         Lisbon, 2004.
  [3] M. Bradley and P. Lang. Affective norms for english words             [22] A. Tsai, R. Tsai, and J. Hsu. Building a concept-level
      (ANEW): Stimuli, instruction manual and affective ratings.                 sentiment dictionary based on commonsense knowledge.
      Technical report, The Center for Research in                               IEEE Intelligent Systems, 28(2):22–30, 2013.
      Psychophysiology, University of Florida., 1999.                       [23] M. Wall, A. Rechtsteiner, and L. Rocha. Singular value
  [4] E. Cambria and A. Hussain. Sentic Computing: Techniques,                   decomposition and principal component analysis. In
      Tools, and Applications. Springer, Dordrecht, Netherlands,                 D. Berrar, W. Dubitzky, and M. Granzow, editors, A
      2012.                                                                      Practical Approach to Microarray Data Analysis, pages
  [5] E. Cambria, A. Hussain, C. Havasi, C. Eckl, and J. Munro.                  91–109. Springer, 2003.
      Towards crowd validation of the UK national health service.           [24] A. Weichselbraun, S. Gindl, and A. Scharl. Extracting and
      In WebSci, Raleigh, 2010.                                                  grounding context-aware sentiment lexicons. IEEE
  [6] E. Cambria, A. Livingstone, and A. Hussain. The hourglass                  Intelligent Systems, 28(2):39–46, 2013.
      of emotions. In A. Esposito, A. Vinciarelli, R. Hoffmann,




                                                                                                                                       24
· #Microposts2014 · 4th Workshop on Making Sense of Microposts · @WWW2014