Sentimentator: Gamifying Fine-grained Sentiment Annotation Emily Öhman & Kaisla Kajava University of Helsinki firstname.lastname@helsinki.fi February 5, 2018 Abstract We introduce Sentimentator; a publicly available gamified web-based annota- tion platform for fine-grained sentiment annotation at the sentence-level. Senti- mentator is unique in that it moves beyond binary classification. We use a ten- dimensional model which allows for the annotation of 51 unique sentiments and emotions. The platform is gamified with a complex scoring system designed to reward users for high quality annotations. Sentimentator introduces several unique features that have previously not been available, or at best very limited, for senti- ment annotation. In particular, it provides streamlined multi-dimensional annota- tion optimized for sentence-level annotation of movie subtitles. Because the plat- form is publicly available it will benefit anyone and everyone interested in fine- grained sentiment analysis and emotion detection, as well as annotation of other datasets. 1 Introduction The main problem with, even conventional, sentiment analysis methods tends to boil down to a lack of tagged corpora. Proper annotation is costly and can be unfeasible in some cases [6]. Sentimentator addresses this lack of annotated corpora, and provides a novel tool for producing datasets efficiently that cover a wide range of genres (within the domain of movie subtitles). A crowd-sourced gamified annotation scheme based on Plutchik’s eight emotions [25] as well as the sentiments of positive, negative, and neutral presents new opportu- nities, but also challenges. It is more time consuming and requires more reflection on the part of the annotator to tag a sentence with more than two or three dimensions. We solve this by gamifying the process in order to (1) have a simple and straightforward user interface for the annotation, and (2) present an inviting option for students and other non-experts to help with the annotation by setting up a game-like platform. The reason we have chosen to gamify the annotation process is the increased ac- curacy [22] and lower price compared to more traditional crowd-sourcing methods. 1 2 We want to produce more training data easily with lower cost to train better machine learning-based classifiers on top of the annotated datasets. The output of sentiment analysis is often expressed as a numeric value on a sliding scale of negative, neutral, and positive sentiments or simply a ternary score of one of the aforementioned values. This approach is limited [4] and applicable only to some of the myriads of possible uses for sentiment analysis. For this to be feasible, a new approach beyond positive and negative is necessary. We propose to use Plutchik’s eight core emotions [25] (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) alongside the sentiments of positive and negative typically used in sentiment analysis. Figure 1: Plutchik’s wheel of emotions 1 With the use of an intensity measure, Sentimentator effectively allows for senti- ment annotations on the entire wheel. Furthermore, because intensity adjustment and combination of emotions is made possible, the difficulty of the annotation task does not increase linearly with the number of dimensions in our scheme. A further 24 combi- nations of emotions are possible through combinations of the eight core emotions such that, for example, ’awe’ can be expressed through annotating for ’fear’ and ’surprise’. Therefore 51 unique emotions and sentiments are described by the Sentimentator an- notation scheme. 1 Source of figure and table: https://en.wikipedia.org/wiki/Contrasting_and_ categorization_of_emotions 3 Table 1: Emotions and Opposites2 Mild Mild Basic Basic Intense Intense emotion opposite emotion opposite emotion opposite Serenity Pensiveness Joy Sadness Ecstasy Grief Acceptance Boredom Trust Disgust Admiration Loathing Apprehension Annoyance Fear Anger Terror Rage Distraction Interest Surprise Anticipation Amazement Vigilance Although this dataset is only being developed at the moment, when it has been completed and tested, it will be made publicly available. Once we have some anno- tated data, it can be used as training and testing data. Previous work suggests that our approach, if implemented correctly, should be on par with or better than some of the best methods available at the moment [11, 28]. Fine-grained sentiment analysis provides exciting new avenues of research. With a properly tagged dataset, many re- searchers will be able to improve the output of their previous methods as it is hard to come by labeled data for sentiment analysis, especially fine-grained [30]. There are a number of areas where sentiment analysis could become an invaluable tool for digital humanities scholars. Some examples of these areas are history, litera- ture, translation studies, language, and social sciences. Possible approaches for histori- ans and social scientists could be to study how the attitude towards a specific topic has changed through time [27]. In literature, story arcs could be analyzed automatically to find over-arching themes and to identify how stories develop within different genres [15, 16], and sociolinguists or translation studies researchers could compare how dif- ferent languages express emotion and sentiment in what are supposedly identical texts using sentiment analysis on parallel corpora [23]. In section 2 we present an overview of relevant related work and current approaches. In section 3 we discuss gamification from a theoretical perspective and in section 3.1 we shine the light on our framework and platform. In this section we also consider the practical applications of the ideas discussed in section 3 in greater detail. The last two sections are reserved for future work and a concluding discussion. 2 Related Work There are many approaches to mining data for sentiments. They range from purely lexical to fully unsupervised [14] with many hybrid methods in-between. Andreevskaia et al. [1] suggest that the reason for the prevalence of unsupervised knowledge-based methods in binary sentence classification is the lack of labeled training data. This is the main issue Sentimentator will address. There are a few applications that offer similar solutions to ours on some level (see for example [1, 7, 22, 21, 16]), but none of these are all three: (1) domain-independent, (2) sentence-level annotations, (3) beyond positive and negative, i.e. multi-dimensional or fine-grained. Most current approaches still focus on the positive-negative axle of polarity. This 4 binary, or at best ternary with ’neutral’, approach is far too restricted for many ap- plications [17], and new methods increasingly incorporate other dimensions into sen- timent analysis beyond the binary approach. For example Honkela et al. [12] use a five-dimensional PERMA-model (Positive emotion (P), Engagement (E), Relation- ships (R), Meaning (M) and Achievement (A), and EmoTwitter [21] utilizes a ten- dimensional model (positive, negative, joy, sadness, anger, anticipation, trust, disgust, fear, and surprise) based on the NRC lexicon [20] which in turn uses Plutchik’s wheel of emotions. Although sentence or phrase level sentiment analysis is important for many ap- plications [24], e.g. question and answering tasks [31], there are few sentence-level annotated datasets because of the time-consuming annotation process. There is also a lack of sentiment clues in sentences and other short text spans. If there is only one sen- timent clue in a sentence, the entire analysis rests on possibly a single word. Therefore it can be challenging to reach a correct analysis [2]. Wilson et al. [30, 31] show that for sentence-level sentiment analysis to work, it is important to be able to tell when a sentence is neutral. This reduces the risk of assigning sentiments and emotions where there are none and allows for contextually accurate sentiment and emotion analysis. The annotation scheme of Sentimentator allows for neutral tagging increasing the like- lihood of correct contextual analysis. There has long been a discussion on how classifiers trained on data from one do- main might not work as well when applied on data from a different domain [3, 24, 11]. Therefore Boland et al. [2] suggest annotating training data without context and at sentence-level. Furthermore, ignoring context means that although a sentence is im- plicitly negative because it is expected that the following sentence is explicitly neg- ative, it should be tagged as positive or neutral (depending on the sentiments in that sentence alone) as otherwise that one sentiment would be weighted double [2]. Our annotation scheme also allows for all possible permutations and mixed sets of the ten dimensions, so there is no issue with mixed sentiments or emotions in a sentence as all can co-exist. 3 Gamifying Annotation Gamification happens when game elements are used in a non-game context to ”improve user experience and engagement” [5]. In the latter half of the 2010s there has been an increase in gamification [9], mainly for marketing [10], but also scientific purposes [8]. Sentimentator players (annotators) select emotions and sentiments for random sen- tences. Other common gamification elements included in Sentimentator are badges, leaderboards, levels/rank, as well as avatars, and feedback. Variation in the gaming content is key to minimize the repetitiveness of tasks. We offer annotators simple anno- tation tasks, correction of automatically annotated data tasks, and ranking of sentence tasks. Groh discusses some pitfalls of gamification stating that ”pleasure is not additive and rewards can backfire” [9]. We follow the principles described by Deterding [5] and Schell [26] in order to avoid these pitfalls. These principles are (1) Relatedness (connected to other players), (2) Competence (mastering the game problems), and (3) 5 Autonomy (control of own life). A simple way to increase the relatedness of our platform is to allow players to see their own and their peers’ progress as well as in real-time see how their work impacts their grade (if annotation is part of coursework) or some other real-world benefit. This can be done partly by leaderboards, but also by showing the student a progress bar that shows how close they are to the next goal/rank/level. As with Zooniverse3 [8] there is an opportunity to be part of a larger scientific community and contribute to the advancement of science, however small the increment. PlanetHunters (Zooniverse) have even offered co-author credits to those who have helped locate new exoplanets via their gamified data analysis platform4 . For Sentimentator to allow annotators to feel competent and that they are improving they need feedback on their progress in relation to others. It is not desirable for the annotators to see how other annotators have annotated the same data, but annotations can still be compared and scored. When comparing annotations with those made by other annotators, the reliability/accuracy score is dependent on the reliability rating of the other annotator. If annotations correlate better with annotators with a higher reliability rating then the score given is also higher and vice versa. This means that the rank of a player also affects how other players score. Additionally, a score is affected by how well the annotation correlates with validated test sentences. See sections 3.1 and 3.2 for more details about gameplay and scoring. The validated sentences are sentences that have been annotated by expert anno- tators who have received thorough instructions on how to annotate with the aim of consistency across annotators. The results of these expert annotators will be reviewed before they are used as seed sentences. The ”gamer” annotators will receive a sim- ilar tutorial via Sentimentator, but their annotations will generally only be compared against the validated seed sentences and the annotations of their peers. The first players of Sentimentator are students of language technology. It is difficult to not offer these students extrinsic rewards (in the form of extra credit and such), especially in the initial stages of gathering testing and training data. Some of this loss of autonomy is combated by emphasizing the scientific contribution that they make and keeping them posted about e.g. articles published using datasets they helped create. Once the platform is open to all, however, there is significant autonomy. 3.1 Gameplay The annotators are greeted by an info screen where they are presented with Plutchik’s [25] wheel (see figure 1). They are told how to tag the different emotions (i.e. the emo- tion of ’remorse’ would suggest ’disgust’ and ’sadness’ of a higher intensity). There are three different ways to play the game. The first one is to get pre-analyzed (tagged by lexical lookup) sentences and adjust the annotation, the second is to get un-tagged sentences and annotate them, and the third is a sentence intensity ranking task. The first type of gameplay consists of annotating unvalidated pre-annotated sen- tences. The sentences have been tagged by using simple lexical comparison. The 3 https://www.zooniverse.org/ 4 https://www.planethunters.org/ 6 Figure 2: Sentimentator Prototype Interface5 annotator/player needs to judge whether the analysis is correct or needs adjustment. The scoring is a simple fraction of full scores until the annotations can be compared to peer annotations. Both validated and unvalidated sentences are presented to the annotator who does not know which type is in question. The annotator/player needs to recognize the emo- tions and sentiments present in the sentence without context. The scoring is a simple fraction of full scores until the annotations can be compared to peer annotations. In the case of validated sentences, the scores received follow the formula in 3.2 and signifi- cantly impact rank. All of the gametypes have intensity of sentiments/emotions built into the annotation through the use of a slider that is pre-set to 50%. The slider can be adjusted higher or lower to signify the intensity the annotator judges the sentence to possess. For the ranking task sentences, whether the intensity has been adjusted or not, are shown and by dragging and dropping the sentences in order from most intense to least intense, we are able to get more accurate intensity scores through this best-worst scaling approach [19]. We will use both sentences that have been annotated and those that have not for this task to get data on how the nature of the task affects intensity rankings. All annotations are done without giving any context as suggested by the results achieved by Boland et al. [2]. Their research shows that to achieve more accurate 5 For the prototype css we used: http://getskeleton.com 7 results when using an annotated corpus for training and testing, context is confusing and gives erroneous annotations. The issues with choosing the correct annotation is discussed in section 3.3. 3.2 Scoring As discussed in section 3 about Gamification, it is important for players to feel compe- tent and like they are mastering a skill. Therefore scoring is one of the most important aspects of gamification. Players need to feel that they are being compensated appro- priately for the work they are doing even if it is a game and the compensation is in the form of points. Players accumulate both rank (R) and level where rank is a prestige or reliability score based on how well the player’s annotations correlate with validated test sentences. Tsv R= Vmax where Tsv stands for total score from validated sentences and Vmax for maximum pos- sible score for the player in question from validated sentences Level is a straight-forward measure of the number of annotated sentences. Tsv Ap Level = × Ta Vmax 100 where Ap stands for total sentences annotated by player and Ta stands for the total number of sentences in the dataset and 0 ≤ R ≤ 1, and 0 ≤ Level ≤ 100 There are two main types of scores; those based on rank (i.e. prestige or reliabil- ity) and those based on validated sentences. All tasks yield a pre-adjustment score (S). This score is the score that is based on simply doing the task without any regard to how well the task has been completed or how it correlates to other players annotations. The calculations for the score received from annotating validated sentences (Sv ) is fairly straight-forward. S Sv = Vs where Vs stands for the max score possible for that task as per the score for the vali- dated sentence As for the score based on peer annotation (Sp ), this score accumulates rank only after a certain number of annotations have been made for the same sentence. The rank (or reliability/prestige) rating of the annotator (Roa ) who has annotated the same sentence before influences the score for the annotation as per the following: Ps Sp = × Roa Soa 8 where Ps stands for the pre-adjustment annotation score of the peer and Soa stands for the score of the other annotator. In practice this will work much like a weighted aver- age across all peers. The number of annotators per sentence is also limited. The rank influences the score as it is at the time of the annotation, i.e. the rank that was valid at the time of the annotation is considered. If an annotators rank improves or declines, it is a reflection of their annotation skill in real-time, not when they did the original annotation. Therefore dynamic scoring would not accurately reflect the reliability of an annotation. 3.3 Choosing the Right Annotation There is a lot to consider when choosing the right annotation. It is virtually impossible for all annotators to annotate every sentence exactly the same. This results in noisy annotations. Hsueh et al. [13] discuss measures to control the quality of annotations. In their study they compare noisy annotations against the gold standard labels. As we do not have the option to compare to a gold standard, we will have to rely heavily on the scores received for annotating validated sentences (see Scoring). However, with enough annotations we will be able to remove annotations made by the noisiest group of annotators (In Hsueh et al. [13] this group consisted of 20% of annotators). As our scoring already relies on validated sentences even when annotating unval- idated sentences, we are unlikely to need much screening for noisy annotations. It is, however, important to keep the possibility of excluding noisy annotators from the final annotation output. It is also important to be able to exclude ambiguous examples from the annotations in order to maximize the quality of the labels [13]. Even though this is an issue for after we have annotated data, it is an important aspect to keep in mind when creating the framework. All annotations will have been annotated by at least three annotators before they are made final. Naturally, these tags will not always be identical. The way Sentimen- tator is constructed allows for easy checking of differing annotations. The first step is the automatic comparison against validated sentences. The second is to defer to the annotation made by the highest ranked annotator. However, where discrepancies are deemed considerable, annotations can be flagged to be reviewed by experts. 3.4 Data We use the publicly available dataset OPUS.6 Our initial focus is the English and Finnish parallel corpus of movie subtitles, but the number of possible languages to annotate is only limited by the data itself. The current version has been tested on eight languages. We chose movie subtitles [29, 18] as they contain a lot of emotional content in a style applicable to many different types of tasks [23], and because a high-quality parallel corpus exists for many different languages. 6 http://opus.lingfil.uu.se - We use the newest, 2018, version which has at the time of writing not yet been made publicly available. 9 4 Future Work The evaluation of this framework can only begin once a certain amount of lines have been annotated and cross-checked. For a demonstration, some results can be achieved with approximately 1000 lines annotated, but for proper sentiment analysis at least four times that is required. This means that at least three people will need to annotate 4000 lines, preferably many more people annotating tens of thousands of lines/sentences. One simple way of spreading out this task, and to be able to utilize expert annotators for a low cost, is to outsource it as extra-credit coursework in computational linguistics, or corpus linguistics courses and similar. Once enough data has been annotated for training and testing data, we can evaluate our framework and compare it against the current gold-standard. We plan on evaluating the final dataset by taking into account both the distribution of the data and classification performance using a set of different classifier types. We intend to evaluate the distributional balance of the data in regard to the amount and quality of lines/sentences of each label or label combination. This way we reveal pat- terns in the dataset which may affect classification results. For example, sentences of a given label may be considerably longer or shorter than sentences of another label, or contain rare words. Similarly, the sentences may originate from a movie of a specific genre or time period and thus contain a particular type of language use, such as jargon or archaic words. This allows us to evaluate the sparsity of the data in both the dataset as a whole as well as across different labels. We can then assess whether some parts of the dataset are more sparse and thus less likely to allow classifiers to detect meaningful patterns. Using a set of different classifiers also allows us to evaluate the quality of the dataset. By building confusion matrices for each classifier, we can observe the clas- sification accuracy, precision, recall, and F-measure for each class in the dataset as well as the overall performance of the classifier. Other future work includes testing the finalized semi-supervised algorithm on ac- tual datasets. In addition to the suggestions in the Introduction, some possible explo- rations could be newspaper or online discussion forum data dumps with the search keys for migration and other current issues. A comprehensive set of high-quality annotations also allows for comparison be- tween intra-lingual annotations of the same sentences by different users as well as identifying possible patterns in cross-lingual annotations of parallel sentences. An- other interesting question to investigate is whether showing users sentences which have already been annotated influences their choices when choosing the most suitable tags for those sentences. In this research setting, users would choose the gameplay option where they evaluate annotated sentences with the task of either accepting or editing those annotations. This data would then be compared to parallel annotations of sen- tences which users have annotated from scratch. We also hope that other researchers in various fields including computational lin- guistics as well as humanities etc. will find both the annotation platform and the dataset useful and publish their own research based on our work. 10 5 Conclusions and Discussion We have introduced Sentimentator, a publicly available, gamified, web-based annota- tion tool specifically for fine-grained sentiment analysis. Not only do we go beyond bi- nary sentiment classification, but our annotation scheme allows us even more detailed fine-grained annotation by adjusting the intensity of Plutchik’s eight core emotions. The expansion gives us a possible eight core emotions with three intensities each, and 24 combinations of the core emotions with a total of 48 separate emotions and an ad- ditional two sentiments plus neutral, i.e. 51 total sentiments and emotions available for annotation (See figure 1 and table 1 for specifics). The gamification of annotation decreases the cost of annotation and increases the size of the final dataset. It has also been shown to give more accurate annotations than traditional crowd-sourcing methods [2]. Furthermore, we have carefully designed the scoring to reward more accurate annotations and improve the annotation experience by making it more interesting. After initial evaluation tasks, the dataset as well as the platform itself, will be made open to anyone who needs a sentiment annotated dataset. This type of data is rare to come by, and we have high hopes for the applications of the dataset and the platform itself. References [1] A NDREEVSKAIA , A., AND B ERGLER , S. Clac and clac-nb: Knowledge-based and corpus-based approaches to sentiment tagging. In Proceedings of the 4th International Workshop on Semantic Evaluations (Stroudsburg, PA, USA, 2007), SemEval ’07, Association for Computational Linguistics, pp. 117–120. [2] B OLAND , K., W IRA -A LAM , A., AND M ESSERSCHMIDT, R. Creating an anno- tated corpus for sentiment analysis of german product reviews. [3] DAVE , K., L AWRENCE , S., AND P ENNOCK , D. M. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide Web (New York, NY, USA, 2003), WWW ’03, ACM, pp. 519–528. [4] DE A LBORNOZ , J. C., P LAZA , L., AND G ERV ÁS , P. Sentisense: An easily scalable concept-based affective lexicon for sentiment analysis. [5] D ETERDING , S., S ICART, M., NACKE , L., O’H ARA , K., AND D IXON , D. Gamification. using game-design elements in non-gaming contexts. In CHI’11 ex- tended abstracts on human factors in computing systems (2011), ACM, pp. 2425– 2428. [6] E HRMANN , M., T URCHI , M., AND S TEINBERGER , R. Building a multilin- gual named entity-annotated corpus using annotation projection. RECENT AD- VANCES IN (2011), 118. 11 [7] E RYIGIT, G., C ETIN , F. S., YANIK , M., T EMEL , T., AND Ç IÇEKLI , I. Turksent: A sentiment annotation tool for social media. In LAW@ ACL (2013), pp. 131– 134. [8] G REENHILL , A., H OLMES , K., L INTOTT, C., S IMMONS , B., M ASTERS , K., C OX , J., AND G RAHAM , G. Playing with science: Gamised aspects of gamifi- cation found on the online citizen science project-zooniverse. In GAMEON’2014 (2014), EUROSIS. [9] G ROH , F. Gamification: State of the art definition and utilization. Institute of Media Informatics Ulm University 39 (2012). [10] H AMARI , J., AND KOIVISTO , J. Social motivations to use gamification: An empirical study of gamifying exercise. In ECIS (2013), p. 105. [11] H E , Y., AND Z HOU , D. Self-training from labeled features for sentiment analy- sis. Information Processing & Management 47, 4 (2011), 606 – 616. [12] H ONKELA , T., KORHONEN , J., L AGUS , K., AND S AARINEN , E. Five- dimensional sentiment analysis of corpora, documents and words. In Advances in Self-Organizing Maps and Learning Vector Quantization - Proceedings of the 10th International Workshop, WSOM 2014 (2014), pp. 209–218. [13] H SUEH , P.-Y., M ELVILLE , P., AND S INDHWANI , V. Data quality from crowd- sourcing: A study of annotation selection criteria. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Process- ing (Stroudsburg, PA, USA, 2009), HLT ’09, Association for Computational Lin- guistics, pp. 27–35. [14] H U , X., TANG , J., G AO , H., AND L IU , H. Unsupervised sentiment analysis with emotional signals. In Proceedings of the 22nd international conference on World Wide Web (2013), ACM, pp. 607–618. [15] J OCKERS , M. L. Text analysis with R for students of literature. Springer, 2014. [16] K AKKONEN , T., AND K AKKONEN , G. G. Sentiprofiler: creating comparable visual profiles of sentimental content in texts. Language Technologies for Digital Humanities and Cultural Heritage 62 (2011), 189–204. [17] L I , J., AND H OVY, E. Reflections on sentiment/opinion analysis. In A Practical Guide to Sentiment Analysis. Springer, 2017, pp. 41–59. [18] L ISON , P., AND T IEDEMANN , J. Opensubtitles2016: Extracting large par- allel corpora from movie and tv subtitles. In LREC (2016), N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, Eds., European Language Re- sources Association (ELRA). [19] M OHAMMAD , S. M., AND B RAVO -M ARQUEZ , F. Emotion intensities in tweets. CoRR abs/1708.03696 (2017). 12 [20] M OHAMMAD , S. M., AND T URNEY, P. D. Crowdsourcing a word-emotion association lexicon. 436–465. [21] M UNEZERO , M., M ONTERO , C. S., M OZGOVOY, M., AND S UTINEN , E. Emotwitter - a fine-grained visualization system for identifying enduring senti- ments in tweets. In CICLing (2) (2015), A. F. Gelbukh, Ed., vol. 9042 of Lecture Notes in Computer Science, Springer, pp. 78–91. [22] M USAT, C.-C., G HASEMI , A., AND FALTINGS , B. Sentiment analysis using a novel human computation game. In Proceedings of the 3rd Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources and Their Applications to NLP (Stroudsburg, PA, USA, 2012), Association for Com- putational Linguistics, pp. 1–9. [23] Ö HMAN , E., H ONKELA , T., AND T IEDEMANN , J. The challenges of multi- dimensional sentiment analysis across languages. PEOPLES 2016 (2016), 138. [24] PANG , B., AND L EE , L. Opinion mining and sentiment analysis. Information Retrieval 2, 1-2 (2008), 1–135. [25] P LUTCHIK , R. A general psychoevolutionary theory of emotion. Theories of emotion 1 (1980), 3–31. [26] S CHELL , J. The pleasure revolution: Why games will lead the way, googletechtalks std. november 2011, 2015. [27] S PRUGNOLI , R., T ONELLI , S., M ARCHETTI , A., AND M ORETTI , G. Towards sentiment analysis for historical texts. Digital Scholarship in the Humanities 31, 4 (2016), 762–772. [28] T ÄCKSTR ÖM , O., AND M C D ONALD , R. Semi-supervised latent variable models for sentence-level sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technolo- gies: short papers-Volume 2 (2011), Association for Computational Linguistics, pp. 569–574. [29] T IEDEMANN , J. Parallel data, tools and interfaces in opus. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (Istanbul, Turkey, may 2012), N. C. C. Chair), K. Choukri, T. De- clerck, M. U. Dogan, B. Maegaard, J. Mariani, J. Odijk, and S. Piperidis, Eds., European Language Resources Association (ELRA). [30] W ILSON , T., W IEBE , J., AND H OFFMANN , P. Recognizing contextual polar- ity in phrase-level sentiment analysis. In Proceedings of the conference on hu- man language technology and empirical methods in natural language processing (2005), Association for Computational Linguistics, pp. 347–354. [31] W ILSON , T., W IEBE , J., AND H OFFMANN , P. Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Computational linguistics 35, 3 (2009), 399–433.