<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hassan</forename><surname>Saif</surname></persName>
							<email>h.saif@open.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Media Institute</orgName>
								<orgName type="institution">The Open University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yulan</forename><surname>He</surname></persName>
							<email>y.he@cantab.net</email>
							<affiliation key="aff1">
								<orgName type="department">School of Engineering and Applied Science</orgName>
								<orgName type="institution">Aston University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Miriam</forename><surname>Fernandez</surname></persName>
							<email>m.fernandez@open.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Media Institute</orgName>
								<orgName type="institution">The Open University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Harith</forename><surname>Alani</surname></persName>
							<email>h.alani@open.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Media Institute</orgName>
								<orgName type="institution">The Open University</orgName>
								<address>
									<country key="GB">United Kingdom</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CD977B15A4588689D500D617F92C24BB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T15:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Sentiment Analysis</term>
					<term>Semantics</term>
					<term>Lexicon Adaptation</term>
					<term>Twitter</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words' sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Sentiment analysis on Twitter has been attracting much attention recently due to the rapid growth in Twitter's popularity as a platform for people to express their opinions and attitudes towards a great variety of topics. Most existing approaches to Twitter sentiment analysis can be categorised into machine learning <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b12">13]</ref> and lexiconbased approaches <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b5">6]</ref>.</p><p>Lexicon-based approaches use lexicons of words weighted with their sentiment orientations to determine the overall sentiment in texts. These approaches have shown to be more applicable to Twitter data than machine learning approaches, since they do not require training from labelled data and therefore, they offer a domain-independent sentiment detection <ref type="bibr" target="#b14">[15]</ref>. Nonetheless, lexicon-based approaches are limited by the sentiment lexicon used <ref type="bibr" target="#b20">[21]</ref>. Firstly, because sentiment lexicons are composed by a generally static set of words that do not cover the wide variety of new terms that constantly emerge in the social web. Secondly, because words in the lexicons have fixed prior sentiment orientations, i.e. each term has always the same associated sentiment orientation independently of the context in which the term is used.</p><p>To overcome the above limitations, several lexicon bootstrapping and adaptation methods have been previously proposed. However, these methods are either supervised <ref type="bibr" target="#b15">[16]</ref>, i.e., they require training from human-coded corpora, or based on studying the statistical, syntactical or linguistic relations between words in general textual corpora (e.g., The Web) <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b18">19]</ref> or in static lexical knowledge sources (e.g., WordNet) <ref type="bibr" target="#b4">[5]</ref> ignoring, therefore, the specific textual context in which the words appear. In many cases, however, the sentiment of a word is implicitly associated with the semantics of its context <ref type="bibr" target="#b2">[3]</ref>.</p><p>In this paper we propose an unsupervised approach for adapting sentiment lexicons based on the contextual semantics of their words in a tweet corpus. In particular, our approach studies the co-occurrences between words to capture their contexts in tweets and update their prior sentiment orientations and/or sentiment strengths in a given lexicon accordingly.</p><p>As a case study we apply our approach on Thelwall-Lexicon <ref type="bibr" target="#b14">[15]</ref>, which, to our knowledge, is the state-of-the-art sentiment lexicon for social data. We evaluate the adapted lexicons by performing a lexicon-based polarity sentiment detection (positive vs. negative) on three Twitter datasets. Our results show that the adapted lexicons produce a significant improvement in the sentiment detection accuracy and F-measure in two datasets but gives a slightly lower F-measure in one dataset.</p><p>In the rest of this paper, related work is discussed in Section 2, and our approach is presented in Sections 3. Experiments and results are presented in Sections 4. Discussion and future work are covered in Section 5. Finally, we conclude our work in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Exiting approaches to bootstrapping and adapting sentiment lexicons can be categorised into dictionary and corpus-based approaches. The dictionary-based approach <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b13">14]</ref> starts with a small set of general opinionated words (e.g., good, bad) and lexical knowledge base (e.g., WordNet). After that, the approach expands this set by searching the knowledge base for words that have lexical or linguistic relations to the opinionated words in the initial set (e.g., synonyms, glosses, etc).</p><p>Alternatively, the corpus-based approach measures the sentiment orientation of words automatically based on their association to other strongly opinionated words in a given corpus <ref type="bibr" target="#b16">[17,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b18">19]</ref>. For example, Turney and Littman <ref type="bibr" target="#b16">[17]</ref> used Pointwise Mutual Information (PMI) to measure the statistical correlation between a given word and a balanced set of 14 positive and negative paradigm words (e.g., good, nice, nasty, poor). Although this work does not require large lexical input knowledge, its identification speed is very limited <ref type="bibr" target="#b20">[21]</ref> because it uses web search engines in order to retrieve the relative co-occurrences of words.</p><p>Following the aforementioned approaches, several lexicons such as MPQA <ref type="bibr" target="#b19">[20]</ref> and SentiWordNet <ref type="bibr" target="#b0">[1]</ref> have been induced and successfully used for sentiment analysis on conventional text (e.g., movie review data). However, on Twitter these lexicons are not as compatible due to their limited coverage of Twitter-specific expressions, such as abbreviations and colloquial words (e.g, "looov", "luv", "gr8") that are often found in tweets.</p><p>Quite few sentiment lexicons have been recently built to work specifically with social media data, such as Thelwall-Lexicon <ref type="bibr" target="#b15">[16]</ref> and Nielsen-Lexicon <ref type="bibr" target="#b7">[8]</ref>. These lexicons have proven to work effectively on Twitter data. Nevertheless, such lexicons are similar to other traditional ones, in the sense that they all offer fixed and context-insensitive wordsentiment orientations and strengths. Although a training algorithm has been proposed to update the sentiment of terms in Thelwall-Lexicon <ref type="bibr" target="#b15">[16]</ref>, it requires to be trained from human-coded corpora, which is labour-intensive to obtain.</p><p>Aiming at addressing the above limitations we have designed our lexicon-adaptation approach in away that allows to (i) work in unsupervised fashion, avoiding the need for labelled data, and (ii) exploit the contextual semantics of words. This allows capturing their contextual information in tweets and update their prior sentiment orientation and strength in a given sentiment lexicon accordingly. The main principle behind our approach is that the sentiment of a term is not static, as found in general-purpose sentiment lexicons, but rather depends on the context in which the term is used, i.e., it depends on its contextual semantics. <ref type="foot" target="#foot_0">3</ref> Therefore, our approach functions in two main steps as shown in Figure <ref type="figure" target="#fig_0">1</ref>. First, given a tweet collection and a sentiment lexicon, the approach builds a contextual semantic representation for each unique term in the tweet collection and subsequently uses it to derive the term's contextual sentiment orientation and strength. The SentiCircle representation model is used to this end <ref type="bibr" target="#b9">[10]</ref>. Secondly, rule-based algorithm is applied to amend the prior sentiment of terms in the lexicon based on their corresponding contextual sentiment. Both steps are further detailed in the following subsections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Capturing Contextual Semantics and Sentiment</head><p>The first step in our pipeline is to capture the words contextual semantics and sentiment in tweets. To this end, we use our previously proposed semantic representation model, SentiCircle <ref type="bibr" target="#b9">[10]</ref>.</p><p>Following the distributional hypothesis that words that co-occur in similar contexts tend to have similar meaning <ref type="bibr" target="#b17">[18]</ref>, SentiCircle extracts the contextual semantics of a word from its co-occurrence patterns with other words in a given tweet collection. These patterns are then represented as a geometric circle, which is subsequently used to compute the contextual sentiment of the word by applying simple trigonometric identities on it. In particular, for each unique term m in a tweet collection, we build a two-dimensional geometric circle, where the term m is situated in the centre of the circle, and each point around it represents a context term c i (i.e., a term that occurs with m in the same context). The position of c i , as illustrated in Figure <ref type="figure" target="#fig_1">2</ref>, is defined jointly by its Cartesian coordinates x i , y i as:</p><formula xml:id="formula_0">x i = r i cos(✓ i ⇤ ⇡) y i = r i sin(✓ i ⇤ ⇡) Where ✓</formula><p>i is the polar angle of the context term c i and its value equals to the prior sentiment of c i in a sentiment lexicon before adaptation, r i is the radius of c i and its value represents the degree of correlation (tdoc) between c i and m, and can be computed as:</p><formula xml:id="formula_1">r i = tdoc(m, c i ) = f (c i , m) ⇥ log N N ci where f (c i , m</formula><p>) is the number of times c i occurs with m in tweets, N is the total number of terms, and N ci is the total number of terms that occur with c i . Note that all terms' radii in the SentiCircle are normalised. Also, all angles' values are in radian. The trigonometric properties of the SentiCircle allows us to encode the contextual semantics of a term as sentiment orientation and sentiment strength. Y-axis defines the sentiment of the term, i.e., a positive y value denotes a positive sentiment and vice versa. The X-axis defines the sentiment strength of the term. The smaller the x value, the stronger the sentiment. <ref type="foot" target="#foot_1">4</ref> This, in turn, divides the circle into four sentiment quadrants. Terms in the two upper quadrants have a positive sentiment (sin ✓ &gt; 0), with upper left quadrant representing stronger positive sentiment since it has larger angle values than those in the top right quadrant. Similarly, terms in the two lower quadrants have negative sentiment values (sin ✓ &lt; 0). Moreover, a small region called the "Neutral Region" can be defined. This region, as shown in Figure <ref type="figure" target="#fig_1">2</ref>, is located very close to X-axis in the "Positive" and the "Negative" quadrants only, where terms lie in this region have very weak sentiment (i.e, |✓| t 0).</p><formula xml:id="formula_2">C i r i = TDOC(C i ) θ i = Prior_Sentiment (C i ) X Y r i θ i x i</formula><p>Calculating Contextual Sentiment In summary, the Senti-Circle of a term m is composed by the set of (x, y) Cartesian coordinates of all the context terms of m. An effective way to compute the overall sentiment of m is by calculating the geometric median of all the points in its SentiCircle. Formally, for a given set of n points (p 1 , p 2 , ..., p n ) in a Senti-Cirlce ⌦, the 2D geometric median g is defined as: g = arg min</p><formula xml:id="formula_3">g2R 2 P n i=1 k|p i g|| 2 .</formula><p>We call the geometric median g the SentiMedian as its position in the SentiCircle determines the final contextual-sentiment orientation and strength of m.</p><p>Note that the boundaries of the neutral region can be computed by measuring the density distribution of terms in the SentiCircle along the Y-axis. In this paper we use similar boundaries to the ones used in <ref type="bibr" target="#b9">[10]</ref> since we use the same evaluation datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Lexicon Adaptation</head><p>The second step in our approach is to update the sentiment lexicon with the terms' contextual sentiment information extracted in the previous step. As mentioned earlier, in this work we use Thelwall-Lexicon <ref type="bibr" target="#b15">[16]</ref> as a case study. Therefore, in this section we first describe this lexicon and its properties, and then introduce our proposed adaptation method.</p><p>Thelwall-Lexicon consists of 2546 terms coupled with integer values between -5 (very negative) and +5 (very positive). Based on the terms' prior sentiment orientations and strengths (SOS), we group them into three subsets of 1919 negative terms (SOS2[-2,-5]), 398 positive terms (SOS2 <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b4">5]</ref>) and 229 neutral terms (SOS2{-1,1}). The adaptation method uses a set of antecedent-consequent rules that decides how the prior sentiment of the terms in Thelwall-Lexicon should be updated according to the positions of their SentiMedians (i.e., their contextual sentiment). In particular, for a term m, the method checks (i) its prior SOS value in Thelwall-Lexicon and (ii) the SentiCircle quadrant in which the SentiMedian of m resides. The method subsequently chooses the best-matching rule to update the term's prior sentiment and/or strength.</p><p>Table <ref type="table" target="#tab_0">1</ref> shows the complete list of rules in the proposed method. As noted, these rules are divided into updating rules, i.e., rules for updating the existing terms in Thelwall-Lexicon, and expanding rules, i.e., rules for expanding the lexicon with new terms. The updating rules are further divided into rules that deal with terms that have similar prior and contextual sentiment orientations (i.e., both positive or negative), and rules that deal with terms that have different prior and contextual sentiment orientations (i.e., negative prior, positive contextual sentiment and vice versa).</p><p>Although they look complicated, the notion behind the proposed rules is rather simple: Check how strong the contextual sentiment is and how weak the prior sentiment is ! update the sentiment orientation and strength accordingly. The strength of the contextual sentiment can be determined based on the sentiment quadrant of the SentiMedian of m, i.e., the contextual sentiment is strong if the SentiMedian resides in the "Very Positive" or "Very Negative" quadrants (See Figure <ref type="figure" target="#fig_1">2</ref>). On the other hand, the prior sentiment of m (i.e., prior m ) in Thelwall-Lexicon is weak if |prior m | 6 3 and strong otherwise.</p><p>Updating Rules (Similar Sentiment Orientations)</p><formula xml:id="formula_4">Id Antecedents Consequent 1 (|prior| 6 3) ^(SentiM edian / 2 StrongQuadrant) |prior| = |prior| + 1 2 (|prior| 6 3) ^(SentiM edian 2 StrongQuadrant) |prior| = |prior| + 2 3 (|prior| &gt; 3) ^(SentiM edian / 2 StrongQuadrant) |prior| = |prior| + 1 4 (|prior| &gt; 3) ^(SentiM edian 2 StrongQuadrant) |prior| = |prior| + 1</formula><p>Updating Rules (Different Sentiment Orientations)  For example, the word "revolution" in Thelwall-Lexicon has a weak negative sentiment (prior=-2) while it has a neutral contextual sentiment since its SentiMedian resides in the neutral region (SentiM edian 2 NeutralRegion). Therefore, rule number 10 is applied and the term's prior sentiment in Thelwall lexicon will be updated to neutral (|prior| = 1). In another example, the words "Obama" and "Independence" are not covered by the Thelwall-Lexicon, and therefore, they have no prior sentiment. However, their SentiMedians reside in the "Positive" quadrant in their SentiCircles, and therefore rule number 12 is applied and both terms will be assigned with a positive sentiment strength of 3 and added to the lexicon consequently.</p><formula xml:id="formula_5">5 (|prior| 6 3) ^(SentiM edian / 2 StrongQuadrant) |prior| = 1 6 (|prior| 6 3) ^(SentiM edian 2 StrongQuadrant) prior = prior 7 (|prior| &gt; 3) ^(SentiM edian /<label>2</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evaluation Results</head><p>We evaluate our approach on Thelwall-Lexicon using three adaptation settings: (i) the update setting where we update the prior sentiment of existing terms in the lexicon, (ii) The expand setting where we expand Thelwall-Lexicon with new opinionated terms, and (iii) the update+expand setting where we try both aforementioned settings together. To this end, we use three Twitter datasets OMD, HCR and STS-Gold. Numbers of positive and negative tweets within these datasets are summarised in Table <ref type="table">2</ref>, and detailed in the references added in the table. To evaluate the adapted lexicons under the above settings, we perform binary polarity classification on the three datasets. To this end, we use the sentiment detection method proposed with Thelwall-Lexicon <ref type="bibr" target="#b14">[15]</ref>. According to this method a tweet is considered as positive if its aggregated positive sentiment strength is 1.5 times higher than the aggregated negative one, and negative vice versa.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Dataset</head><p>Tweets Positive Negative Obama-McCain Debate (OMD) <ref type="bibr" target="#b3">[4]</ref> 1081 393 688 Health Care Reform (HCR) <ref type="bibr" target="#b11">[12]</ref> 1354 397 957 Standford Sentiment Gold Standard (STS-Gold) <ref type="bibr" target="#b8">[9]</ref> 2034 632 1402</p><p>Table <ref type="table">2</ref>. Twitter datasets used for the evaluation Applying our adaptation approach to Thelwall-Lexicon results in dramatic changes in it. Table <ref type="table">3</ref> shows the percentage of words in the three datasets that were found in Thelwall-Lexicon with their sentiment changed after adaptation. One can notice that on average 9.61% of the words in our datasets were found in the lexicon. However, updating the lexicon with the contextual sentiment of words resulted in 33.82% of these words flipping their sentiment orientation and 62.94% changing their sentiment strength while keeping their prior sentiment orientation. Only 3.24% of the words in Thelwall-Lexicon remained untouched. Moreover, 21.37% of words previously unseen in the lexicon were assigned with contextual sentiment by our approach and added to Thelwall-Lexicon subsequently. Table <ref type="table">3</ref>. Average percentage of words in the three datasets that had their sentiment orientation or strength updated by our adaptation approach Table <ref type="table" target="#tab_2">4</ref> shows the average results of binary sentiment classification performed on our datasets using (i) the original Thelwall-Lexicon (Original), (ii) Thelwall-Lexicon induced under the update setting (Updated), and (iii) Thelwall-Lexicon induced under the update+expand setting. <ref type="foot" target="#foot_2">5</ref> The table reports the results in accuracy and three sets of precision (P), recall (R), and F-measure (F1), one for positive sentiment detection, one for negative, and one for the average of the two.</p><p>From these results in Table <ref type="table" target="#tab_2">4</ref>, we notice that the best classification performance in accuracy and F1 is obtained on the STS-Gold dataset regardless the lexicon being used. We also observe that the negative sentiment detection performance is always higher than the positive detection performance for all datasets and lexicons.</p><p>As for different lexicons, we notice that on OMD and STS-Gold the adapted lexicons outperform the original lexicon in both accuracy and F-measure. For example, on OMD the adapted lexicon shows an average improvement of 2.46% and 4.51% in accuracy and F1 respectively over the original lexicon. On STS-Gold the performance improvement is less significant than that on OMD, but we still observe 1% improvement in accuracy and F1 comparing to using the original lexicon. As for the HCR dataset, the adapted lexicon gives on average similar accuracy, but 1.36% lower F-measure. This performance drop can be attributable to the poor detection performance of positive tweets. Specifically, we notice from Table <ref type="table" target="#tab_2">4</ref> a major loss in the recall on positive tweet detection using both adapted lexicons. One possible reason is the sentiment class distribution in our datasets.</p><p>In particular, one may notice that HCR is the most imbalanced amongst the three datasets. Moreover, by examining the numbers in Table <ref type="table">3</ref>, we can see that HCR presents the lowest number of new opinionated words among the three datasets (i.e., 10.61% lower than the average) which could be another potential reason for not observing any performance improvement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion and Future Work</head><p>We demonstrated the value of using contextual semantics of words for adapting sentiment lexicons from tweets. Specifically, we used Thelwall-Lexicon as a case study and evaluated its adaptation to three datasets of different sizes. Although the potential is palpable, our results were not conclusive, where a performance drop was observed in the HCR dataset using our adapted lexicons. Our initial observations suggest that the quality of our approach might be dependent on the sentiment class distribution in the dataset. Therefore, a deeper investigation in this direction is required. We used the SentiCircle approach to extract the contextual semantics of words from tweets. In future work we will try other contextual semantic approaches and study how the semantic extraction quality affects the adaptation performance.</p><p>Our adaptation rules in this paper are specific to Thelwall-Lexicon. These rules, however, can be generalized to other lexicons, which constitutes another future direction of this work.</p><p>All words which have contextual sentiment were used for adaptation. Nevertheless, the results conveyed that the prior sentiments in the lexicon might need to be unchanged for words of specific syntactical or linguistic properties in tweets. Part of our future work is to detect and filter those words that are more likely to have stable sentiment regardless the contexts in which they appear.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>In this paper we proposed an unsupervised approach for sentiment lexicon adaptation from Twitter data. Our approach extracts the contextual semantics of words and uses them to update the words' prior sentiment orientations and/or strength in a given sentiment lexicon. The evaluation was done on Thelwall-Lexicon using three Twitter datasets. Results showed that lexicons adapted by our approach improved the sentiment classification performance in both accuracy and F1 in two out of three datasets.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The systematic workflow of our proposed lexicon adaptation approach.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. SentiCircle of a term m. Neutral region is shaded in blue.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head></head><label></label><figDesc>StrongQuadrant) |prior| = |prior| 1 8 (|prior| &gt; 3) ^(SentiM edian 2 StrongQuadrant) prior = prior 9 (|prior| &gt; 3) ^(SentiM edian 2 N eutralRegion) |prior| = |prior| 1 10 (|prior| 6 3) ^(SentiM edian 2 N eutralRegion) |prior| = 1 Expanding Rules 11 SentiM edian 2 N eutralRegion (|contextual| = 1) ^AddT erm 12 SentiM edian / 2 StrongQuadrant (|contextual| = 3) ^AddT erm 13 SentiM edian 2 StrongQuadrant (|contextual| = 5) ^AddT erm</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Adaptation</figDesc><table /><note>rules for Thelwall-Lexicon, where prior: prior sentiment value, StrongQuadrant: very negative/positive quadrant in the SentiCircle, Add: add the term to Thelwall-Lexicon.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4 .</head><label>4</label><figDesc>Cross comparison results of original and the adapted lexicons</figDesc><table><row><cell cols="2">Datasets Lexicons</cell><cell>Accuracy</cell><cell>Positive Sentiment Negative Sentiment P R F1 P R F1</cell><cell>P</cell><cell>Average R</cell><cell>F1</cell></row><row><cell></cell><cell>Original</cell><cell cols="5">66.79 55.99 40.46 46.97 70.64 81.83 75.82 63.31 61.14 61.4</cell></row><row><cell>OMD</cell><cell>Updated</cell><cell cols="5">69.29 58.89 51.4 54.89 74.12 79.51 76.72 66.51 65.45 65.8</cell></row><row><cell></cell><cell>Updated+Expanded</cell><cell>69.2</cell><cell cols="4">58.38 53.18 55.66 74.55 78.34 76.4 66.47 65.76 66.03</cell></row><row><cell></cell><cell>Original</cell><cell cols="5">66.99 43.39 41.31 42.32 76.13 77.64 76.88 59.76 59.47 59.6</cell></row><row><cell>HCR</cell><cell>Updated</cell><cell>67.21</cell><cell cols="4">42.9 35.77 39.01 75.07 80.25 77.58 58.99 58.01 58.29</cell></row><row><cell></cell><cell cols="6">Updated+Expanded 66.99 42.56 36.02 39.02 75.05 79.83 77.37 58.8 57.93 58.19</cell></row><row><cell></cell><cell>Original</cell><cell cols="5">81.32 68.75 73.1 70.86 87.52 85.02 86.25 78.13 79.06 78.56</cell></row><row><cell>STS-Gold</cell><cell>Updated</cell><cell cols="5">81.71 69.46 73.42 71.38 87.7 85.45 86.56 78.58 79.43 78.97</cell></row><row><cell></cell><cell>Updated+Expanded</cell><cell>82.3</cell><cell cols="4">70.48 74.05 72.22 88.03 86.02 87.01 79.26 80.04 79.62</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">We define context as a textual corpus or a set of tweets.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">This is because cos ✓ &lt; 0 for large angles.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">Note that in this work we do not report the results obtained under the expand setting since no improvement was observed comparing to the other two settings.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>This work was supported by the EU-FP7 project SENSE4US (grant no. 611242).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">S</forename><surname>Baccianella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Esuli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sebastiani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Seventh conference on International Language Resources and Evaluation</title>
				<meeting><address><addrLine>Malta; Valletta, Malta</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010-05">May. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Twitter mood predicts the stock market</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bollen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zeng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computational Science</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="8" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An introduction to concept-level sentiment analysis</title>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Soft Computing and Its Applications</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="478" to="483" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Characterizing debate performance via aggregated twitter sentiment</title>
		<author>
			<persName><forename type="first">N</forename><surname>Diakopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Shamma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 28th Int. Conf. on Human factors in computing systems</title>
				<meeting>28th Int. Conf. on Human factors in computing systems</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Determining term subjectivity and term orientation for opinion mining</title>
		<author>
			<persName><forename type="first">A</forename><surname>Esuli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Sebastiani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">EACL</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">2006</biblScope>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Unsupervised sentiment analysis with emotional signals</title>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd World Wide Web conf</title>
				<meeting>the 22nd World Wide Web conf</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Twitter sentiment analysis: The good the bad and the omg!</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kouloumpis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Moore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ICWSM</title>
				<meeting>the ICWSM<address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">A new anew: Evaluation of a word list for sentiment analysis in microblogs</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">Å</forename><surname>Nielsen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1103.2903</idno>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Evaluation datasets for twitter sentiment analysis a survey and a new dataset, the sts-gold</title>
		<author>
			<persName><forename type="first">H</forename><surname>Saif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fernandez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings, 1st ESSEM Workshop</title>
				<meeting>1st ESSEM Workshop<address><addrLine>Turin, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Senticircles for contextual and conceptual semantic sentiment analysis of twitter</title>
		<author>
			<persName><forename type="first">H</forename><surname>Saif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fernandez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 11th Extended Semantic Web Conf. (ESWC)</title>
				<meeting>11th Extended Semantic Web Conf. (ESWC)<address><addrLine>Crete, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Semantic sentiment analysis of twitter</title>
		<author>
			<persName><forename type="first">H</forename><surname>Saif</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Alani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 11th Int. Semantic Web Conf. (ISWC)</title>
				<meeting>11th Int. Semantic Web Conf. (ISWC)<address><addrLine>Boston, MA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Twitter polarity classification with label propagation over lexical links and the follower graph</title>
		<author>
			<persName><forename type="first">M</forename><surname>Speriosu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Sudan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Upadhyay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Baldridge</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the EMNLP First workshop on Unsupervised Learning in NLP</title>
				<meeting>the EMNLP First workshop on Unsupervised Learning in NLP<address><addrLine>Edinburgh, Scotland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Distant supervision for emotion classification with discrete binary values</title>
		<author>
			<persName><forename type="first">J</forename><surname>Suttles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ide</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Text Processing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="121" to="136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Extracting semantic orientations of words using spin model</title>
		<author>
			<persName><forename type="first">H</forename><surname>Takamura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Inui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Okumura</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 43rd Annual Meeting on Association for Computational Linguistics</title>
				<meeting>43rd Annual Meeting on Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Sentiment strength detection for the social web</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thelwall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Paltoglou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. American Society for Information Science and Technology</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="163" to="173" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Sentiment strength detection in short informal text</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thelwall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Buckley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Paltoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kappas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. American Society for Info. Science and Technology</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="issue">12</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Measuring praise and criticism: Inference of semantic orientation from association</title>
		<author>
			<persName><forename type="first">P</forename><surname>Turney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Littman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Information Systems</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="315" to="346" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">From frequency to meaning: Vector space models of semantics</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pantel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of artificial intelligence research</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="141" to="188" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">The viability of web-derived polarity lexicons</title>
		<author>
			<persName><forename type="first">L</forename><surname>Velikovich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Blair-Goldensohn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hannan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mcdonald</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human Language Technologies: ACL</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Recognizing contextual polarity in phrase-level sentiment analysis</title>
		<author>
			<persName><forename type="first">T</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wiebe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hoffmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Empirical Methods in NLP Conf. (EMNLP)</title>
				<meeting>Empirical Methods in NLP Conf. (EMNLP)<address><addrLine>Vancouver, Canada</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Identifying the semantic orientation of terms using s-hal for sentiment analysis</title>
		<author>
			<persName><forename type="first">T</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cheng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="279" to="289" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
