<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Polarity Imbalance in Lexicon-based Sentiment Analysis</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Marco</forename><surname>Vassallo</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">CREA Research Centre for Agricultural Policies and Bio-economy</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Giuliano</forename><surname>Gabrieli</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">CREA Research Centre for Agricultural Policies and Bio-economy</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Dipartimento di Informatica</orgName>
								<orgName type="institution">Università degli Studi di Torino</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Cristina</forename><surname>Bosco</surname></persName>
							<affiliation key="aff1">
								<orgName type="department">Dipartimento di Informatica</orgName>
								<orgName type="institution">Università degli Studi di Torino</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Polarity Imbalance in Lexicon-based Sentiment Analysis</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F9CA1CE4E6235C3DB4F2D195B13C56B0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-19T15:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Polarity imbalance is an asymmetric situation that occurs while using parametric threshold values in lexicon-based Sentiment-Analysis (SA). The variation across the thresholds may have an opposite impact on the prediction of negative and positive polarity. We hypothesize that this may be due to asymmetries in the data or in the lexicon, or both. We carry out therefore experiments for evaluating the effect of lexicon and of the topics addressed in the data. Our experiments are based on a weighted version of the Italian linguistic resource MAL (Morphologicallyinflected Affective Lexicon) by using as weighting corpus TWITA, a large-scale corpus of messages from Twitter in Italian. The novel Weighted-MAL (W-MAL), presented for the first time int this paper, achieved better polarity classification results especially for negative tweets, along with alleviating the aforementioned polarity imbalance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Italiano. Lo sbilanciamento della polarità è una situazione di asimmetria che si viene a creare quando si impiegano valori soglia parametrici nella Sentiment Analysis (SA) basata su dizionario. La variazione dei valori soglia può avere un impatto opposto rispetto alla predizione di polarità negativa e positiva. Si ipotizza che questo effetto sia dovuto ad asimmetrie nei dati o nel dizionario, o in entrambi. Abbiamo condotto esperimenti per misurare l'effetto del lessico e degli argomenti trattati nel nostro dataset. I nostri esperimenti sono basati su una versione ponderata della risorsa per l'italiano MAL (Morphologically-inflected Affective Lexi-con), usando come corpus per la ponderazione TWITA, un corpus di larga scala di messaggi da Twitter in italiano. La nuova risorsa Weighted-MAL (W-MAL), presentata per la prima volta in questo articolo, ottiene migliori risultati nella classificazione della polarità specialmente, per i messaggi negativi, oltre ad alleviare il problema sopracitato di sbilanciamento della polarità.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction and Motivation</head><p>Sentiment Analysis (SA) is the task of Natural Language Processing that aims at extracting opinions from natural language expressions, e.g., reviews or social media posts. The basic approaches to SA typically fall into one of two categories: dictionary-based and supervised machine learning. Methods based on a dictionary make use of affective lexicons, language resources where each word or lemma is associated to a score indicating its affective valence (e.g., polarity). In SA they are faster than supervised statistical approaches and require minimal adaptation, unless the resource is domain-specific, also when applied to multiple environments with minimal adaptation overhead. However, they only achieve good performance for identifying coarse opinion tendencies in large datasets, since they cannot take into account the impact of the context on the polarity value associated to a word. Supervised statistical methods, on the other hand, tend to provide better quality predictions across benchmarks, due to their better ability to generalize over individual words and expressions, and learning higher level features. These models also show a better ability to adapt to specific domains, provided the availability of data suitable for training.</p><p>In order to access the lexical entries in an affective dictionary, lemmatization must be performed on each single word. Unfortunately, lemmatization is an error-prone process, with potentially negative impact on the performance of downstream tasks such as SA. <ref type="bibr" target="#b10">Vassallo et al. (2019)</ref> introduced a novel computational linguistic resource, namely the Morphologically-inflected Affective Lexicon (henceforth MAL) in order to address this issue by avoiding the lemmatization step in favor of a morphologically rich affective resource.</p><p>In the experiments we carried out on a specific text genre, namely social media, we have observed that using a threshold to assign polarity classes is beneficial, and using the MAL instead of a lemmatization step improves the SA performance overall, in particular due to a better prediction of the negative polarity. However, the variation in threshold has opposite impact on the prediction of negative and positive tweets. In this paper, we investigate the motivation beyond this polarity imbalance. In particular, we speculate that this may be due to asymmetries in the data (e.g., different internal topics), in the lexicon (e.g., different amounts of negative and positive terms), or both, and we provide experiments to better understand this result and validate these hypotheses. We can therefore summarize as follows our research questions:</p><p>• Is the polarity imbalance due to the topic addressed?</p><p>• Is the polarity imbalance due to the lexicon (i.e., the resources we used, Sentix and MAL)?</p><p>• Is the polarity imbalance due to both?</p><p>A further contribution of the paper consists in providing a statistical method for finding the threshold for using the lexicon in SA tasks. The paper is organized as follows. In the next section, affective lexicons and the resource MAL are discussed. In section 3, we describe the issues related to polarity imbalance in lexicon-based approaches for SA. The fourth section is instead devoted to discuss the impact on SA of lexicon and to introduce W-MAL. Section 5 discusses how the topics addressed in the text may impact ob SA.</p><p>The final section provides conclusive remarks and some hints about future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Affective Lexicons</head><p>SA is typically cast as a text classification task, very often approached by supervised statistical models among the NLP research community <ref type="bibr" target="#b1">(Barbieri et al., 2016)</ref>. However, there are several scenarios where dictionary-based methods are preferred, including large-scale industry-ready systems, and domain-specific applications. While generally less accurate than supervised classification, dictionary-based methods tend to be robust to the classification of sentiment across different domains, faster and with a higher level of scalability.</p><p>For the Italian language, several sentiment dictionaries, or, using a more general term, affective lexicons, were published with different levels of granularity of the annotation and availability to the public, as summarized on the website of the Italian Association of Computational Linguistics<ref type="foot" target="#foot_0">1</ref> .</p><p>Sentix <ref type="bibr" target="#b2">(Basile and Nissim, 2013)</ref> is one of the first affective lexicons created for Italian language, with a first release described in <ref type="bibr" target="#b2">(Basile and Nissim, 2013)</ref>, and a second release called Sentix 2.0<ref type="foot" target="#foot_1">2</ref> . It provides an automatic alignment between SentiWordNet, an automatically-built polarity lexicon for English by <ref type="bibr" target="#b0">Baccianella et al. (2010)</ref>, and the Italian portion of MultiWordNet <ref type="bibr" target="#b5">(Pianta et al., 2002)</ref>. While the first version of Sentix associated two independent positive and negative polarity scores to each word, in Sentix 2.0<ref type="foot" target="#foot_2">3</ref> all the senses of each lemma have been collapsed into one entry by means of a weighted average, where the weights are proportional to sense frequencies computed on the sense-annotated corpus SemCor <ref type="bibr" target="#b4">(Langone et al., 2004)</ref>. Moreover, the positive and negative polarity scores have been combined to form a single polarity score ranging from -1 (totally negative) to 1 (totally positive). Sentix 2.0 includes 41,800 different lemmas.</p><p>In order to use a lemma-based affective lexicon such as Sentix, lemmatization is a necessary step to undertake. In our previous work, we found that such intermediate step causes a considerable amount of noise, in the form of lemmatization er-Table <ref type="table">1</ref>: A tweet with the output of the three lemmatization models where the lemmas are alphabetically ordered and the errors marked in bold.</p><p>Original @ANBI Nazionale Allarme idrico. Dopo il Po anche l'Adige è in crisi d'acqua https://t.co/GLTlMNqzEv di @AgriculturaIT ISDT acqua adigire allarme crisi d dopo idrico po -Sentix score: 0.080 POSTWITA acqua adigere allarme crisi di dopo idrico po -Sentix score: 0.080 PARTUT acquare adigere allarme crisi d dopo idrico po -Sentix score: -0.078 rors such as the ones shown in Table 1 <ref type="bibr" target="#b10">(Vassallo et al., 2019)</ref>. We terefore built a new resource on top of Sentix, described in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">MAL</head><p>We proposed the Morphologically-inflected Affective Lexicon in <ref type="bibr">Vassallo et al. (2019, MAL)</ref>.</p><p>It is an extension of Sentix where the entries associated to polarity scores rather than lemmas are the inflected forms related to each lemma, and the polarity scores to be associated to each form are drawn from the original lemmas in Sentix. The approach consists in linking the lexical items found in tweets with the entries of Sentix 2.0, without the application of an explicit lemmatization step.</p><p>The lexicon is indeed expanded by considering all the acceptable forms of its lemmas extracted from the Morph-It collection of Italian forms <ref type="bibr" target="#b11">(Zanchetta and Baroni, 2005)</ref>. Each form takes the same polarity score of the original lemma, but when different lemmas can assume the same form, the arithmetic mean of their polarity scores is assigned.</p><p>The MAL comprises 148,867 forms and all the items linked to the lemmas of Sentix 2.0 .</p><p>Using the MAL we performed a series of experiments on the impact of lemmatization on dictionary-based SA, which showed how the reduction in lemmatization errors leads to a better polarity classification performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Polarity Imbalance in Lexicon-based Sentiment Analysis</head><p>When using an affective lexicon to predict the polarity of natural language sentences, a threshold must be fixed to translate the numerical scores into discrete classes, e.g., positive, neutral, and negative. In <ref type="bibr" target="#b10">Vassallo et al. (2019)</ref>, we showed how the variation of such threshold has different, opposite impacts on the accuracy of the classification, using as a benchmark the corpus annotated with sentiment polarity made available by the SENTIment POLarity Classification (SEN-TIPOLC) shared task at EVALITA 2016. More precisely, the red dotted lines with label ALL in Figure <ref type="figure">1</ref> show that the F1 score of the classification of positive polarity instances increases with stricter thresholds, while the F1 score of negative polarity instances decreases.</p><p>We postulate two non-mutually exclusive hypotheses on the origin of the polarity imbalance, namely the effect of lexicon and topic. The affective scores in the lexicon may be biased towards one end of the polarity spectrum due to a number of causes, resulting in skewed classification results. On the other hand, some topics tend to attract opinions more polarized towards one end of the spectrum than the other (e.g., "war" is an inherently negative topic), therefore the classification might be influenced by this intrinsic polarization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">The Effect of Lexicon on SA</head><p>In order to shed some light on the polarity imbalance due to lexicon we applied a weighted approach to MAL by developing the Weighted Morphologically-inflected Affective Lexicon (W-MAL). It originates from the intuition that less frequent terms should have a higher impact on the computation of the polarity of the sentence where they occur. This principle stems from the observation that more sought-after terms are often used to convey stronger opinions and feelings.</p><p>We therefore computed the relative frequency of every item in MAL by using TWITA, a largescale corpus of messages from Twitter in the Italian language <ref type="bibr" target="#b3">(Basile et al., 2018)</ref>. TWITA is indeed large (covering over 500 million tweets from 2012 to 2018, and the collection is currently ongoing) and domain-agnostic enough to provide a sufficiently representative sample of the distribution of the Italian language words, although specific to one social media platform.</p><p>Despite its size, not all the terms from the MAL occur in TWITA: 57.9% of the 148,867 terms occurring in MAL were found in TWITA, due to the sparseness of particular inflected forms, and to the presence of multi-word expressions in the lexicon (18,661, about 12%) that were not considered for Figure <ref type="figure">1</ref>: Results of the polarity classification on SENTIPOLC. The threshold value on the X-axis is applied to transform the sum of the scores from the lexicon into a positive or negative label. matching the resources. For comparison, 73,36% of Sentix lemmas were found in TWITA.</p><p>Accordingly, the scores of MAL were recalculated by weighting them with the associated words frequency in TWITA, using the Zipf scale measure <ref type="bibr" target="#b9">(van Heuven et al., 2014)</ref>. We decided to use this measure because of its easy understanding and the short computation timing. Actually, the Zipf scale measure is a logarithmic scale based on the well-known Zipf law of word frequency distribution <ref type="bibr" target="#b12">(Zipf, 1949)</ref>. The computation of Zipf values of terms frequencies from TWITA is straightforward and essentially equals to the logarithm of the absolute frequency scaled down by a multiplicative factor:</p><formula xml:id="formula_0">Zipf (i) = log 10    f (i) N i=1 f (i) 10 6 + N 10 6    + 3</formula><p>where N is the number of tokens in TWITA (6,644,867), f (i) is the absolute frequency of the i−th token in TWITA, and the sum of the token frequencies N i=1 f (i) = 6, 906, 070, 053, therefore:</p><formula xml:id="formula_1">Zipf (i) = log 10</formula><p>f (i) 6, 906.07 + 6.644 + 3</p><p>The original Zipf scale is a continuous scale and it ranges from 1 (very low frequency) to 6 (very high frequency) or even 7 (e.g., for very frequent words like auxiliar verbs). By computing the Zipf score of the MAL terms on TWITA, we found some terms with very low frequencies, resulting in negative values because of the logarithmic function. These were re-coded with the minimun Zipf value. The resulting weights in the W-MAL range from a minimum of -5.16 to a maximum of 5.95 (the original MAL ranged from -1 to 1). Eventually, we decided to keep the terms that were not found in TWITA in the W-MAL with their MAL original score.</p><p>We initially applied the Zipf scale to MAL polarity scores by simply multiplying the two found scores and thus giving more weight to high frequent terms. However, using the affective lexicon with such weighting scheme resulted in a decrease in its polarity classification performance. We therefore simply reversed the Zipf scale by weighting the original scores inversely with respect to their words frequency. By doing so, we tested for our speculation of giving more weight to low frequent terms. We replicated the polarity detection experiment on SENTIPOLC. The results, shown in the green solid lines in Figure <ref type="figure">1</ref> labeled ALL, indicate a better performance over-all, and a reduced imbalance between the positive (F1-scores standard deviation across the thresholds of 0.035 with W-MAL vs 0.054 with MAL) and (especially) the negative polarity class (F1scores standard deviation across the thresholds of 0.008 with W-MAL vs 0.042 with MAL).</p><p>To further clarify the effect found on the polarity scores, we show two example tweets in Figure <ref type="figure" target="#fig_0">2</ref> 4 . In the figure, the MAL and W-MAL scores are included for the highlighted words, along with the total polarity scores computed with both dictionaries, showing how the final judgment can change from neutral to polarized (bottom example) or switch polarity entirely (top example). In particular in the top example the scores are associated with "confondesse" (to confuse in subjunctive mood) and to "diritto" (right), while in the bottom example the scores are associated with "Istituto" (school) and to the periphrastic verbal form "viene taciuto" (is silenced). This result confirms our speculation that negative polarity is expressed with more specific words than positive polarity. Psychology studies also show that more complex forms of language were used for expressing criticisms rather that positive evaluations <ref type="bibr" target="#b8">(Stewart, 2015)</ref>. We also notice how the F1-score on the negative polarity is generally higher than the one on the positive polarity class. This means that the negative polarity of tweets is better predicted than the positive polarity by means of the weighted process with the inverse coding. This outcome seems to be substantially supported also by the W-MAL directly proportional performance that worked worse than the inverse version in terms of prediction. This trend was also observed across most of the results of the SENTIPOLC shared task, mostly based on supervised models with lexical features, further indicating that the vocabulary of negative sentiments is richer than that of positive sentiment. 4 The translation of the examples is as follows. For the top example: They would be #thegoodschool if meritocracy were not confused with "doormatcracy": the one whereby even a right becomes a concession. For the bottom example: @ste-Giannini #thegoodschool In the rankings of the School there are also TFA qualified teachers with 48 months of service. Why is it silenced? where steGiannini refers to the Italian minister for school </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">The Effect of Topic on Sentiment Analysis</head><p>In order to investigate the interaction between the imbalance of dictionary-based polarity classification and a possible asymmetry in the data (i.e. different internal topics), we performed such classification with MAL and W-MAL with the reversed Zipf scale on a benchmark with explicitly stated topics. As a matter of fact, the test set of SEN-TIPOLC is composed of 1,982 Italian tweets, organized in 496 general i.e. domain-independent tweets, and 1,486 political tweets, obtained by filtering data with specific keywords related to political Italian figures. The results of our experiment are also included in Figure <ref type="figure">1</ref> with the GENERAL and POLITICAL labels.</p><p>The first observation we draw from this experiment is that the polarity imbalance is a phenomenon restricted to the topic-specific section of the dataset. This confirms the hypothesis that dictionary-based polarity classification is affected by the imbalance issue with the extent to which its topic is specific. In particular, we hypothesize that some topics (such as politics) tend to attract opinions more polarized towards one end of the spectrum (the negative one in this case), therefore inducing the observed imbalance.</p><p>The second observation is that weighting the polarity scores in the dictionary based on word frequency (W-MAL) provides better overall results.</p><p>In particular, the F1 scores are better in the topicspecific case, specifically due to a better prediction of the negative polarity. This result reinforces the idea that a polarized topic induces polarity imbalance, and therefore a method to alleviate such imbalance (i.e., a weighting scheme) leads to better performance. In our view, a reason for this effect is that topic-specific messages make use of less frequent words on average.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion and Future Work</head><p>The weighting scheme proposed in this work is a promising solution to the polarity imbalance in dictionary-based SA. The experiments show that weighting the polarity scores with word frequencies yielded a more precise prediction of the polarized tweets, with lessened bias in the thresholds for neutral scores. The novel resource here presented, W-MAL, is an attempt to better characterize the most sought-after words, which have an impact on the interaction between sentiment and topic. We believe it also represents a promising attempt to control for context-dependency while using lexicon-based methods for SA. In particular, with this resource we try to give voice to the linguistic intuition that the exploitation of a specific form within a message might meaningfully impact on the sentiment expressed in the message. For instance, referring to the top example in figure <ref type="figure" target="#fig_0">2</ref>, by exploiting the subjunctive mood "confondesse" of the verb "confondere" (to confuse), the author joins together with the meaning of the verb also a sense of doubtfulness and of unreality. This is also improved by the fact that this form introduces a clause which is coordinated with the clause headed by a verb in conditional mood, i.e. "sarebbe" (form of to be). This form of the verb "confondere" seems especially adequate for contexts where a negative polarity is expressed and less appropriate for other cases. The use of this specific mood for the verb has therefore a meaningful impact on the sentiment expressed. The MAL properly encodes this information, which may be lost when a lemmatization step is applied on text and all forms are subsequently considered as bearing the same meaning without further nuances. But the W-MAL does also better: it encodes the probabilistic information about how suitable a form is for expressing a particular sentiment with respect to other available forms in a given context. For all the aforementioned reasons, this work has drawn our attention to the necessity of weighting the dictionary-based affective lexicons to SA with corpora-based word frequencies. The resource is freely available at https://github.com/valeriobasile/ sentixR/blob/master/sentix/inst/ extdata/W-MAL.tsv</p><p>In future work, we plan on working on more refined weighting strategies, e.g., leveraging the frequency information of word forms in addition to lemmas, and taking the topic distribution into consideration. Reducing the computation load is a challenging goal as well (see <ref type="bibr" target="#b7">Prakash et al. (2015)</ref>). On the other hand, modern transformerbased models have reached state-of-the-art results on the task of polarity detection <ref type="bibr" target="#b6">(Polignano et al., 2019)</ref>, although they are far more expensive and time-consuming to run. We plan therefore to compare the predictions of these systems, and study ways to integrate their respective strengths (i.e., speed and transparency of the dictionary-based approach vs. the superior prediction capability of the deep neural models) in order to boost the overall performance.</p><p>The present work was originally conceived in the framework of the AGRItrend project led by the CREA Research Centre for Agricultural Policies and Bio-economy, aiming at collecting and analyzing social media data for opinions in the domain of public policies and agriculture. As such, we plan on studying the impact of the techniques presented in this paper on that particular domain, and observe if the same, or different, patterns emerge. On a similar line, so far we conducted experiments on data from Twitter, which facilitates access to large quantity of data but restricts the range of text style and genre found in them.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: A comparison between the scores calculated for polarized words of a tweet according to MAL and W-MAL in two tweets from the test set.</figDesc><graphic coords="5,325.70,62.81,181.42,212.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="4,78.80,62.81,439.94,257.78" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.ai-lc.it/en/affectivelexica-and-other-resources-for-italian/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">2 https://github.com/valeriobasile/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">sentixR 3 https://github.com/valeriobasile/ sentixR</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">Stefano</forename><surname>Baccianella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrea</forename><surname>Esuli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabrizio</forename><surname>Sebastiani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC&apos;10)</title>
				<meeting>the Seventh conference on International Language Resources and Evaluation (LREC&apos;10)</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
	<note>European Languages Resources Association (ELRA)</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Overview of the Evalita 2016 SENTIment POLarity Classification Task</title>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Danilo</forename><surname>Croce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicole</forename><surname>Novielli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) &amp; Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop</title>
				<meeting>Third Italian Conference on Computational Linguistics (CLiC-it 2016) &amp; Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop<address><addrLine>EVALITA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sentiment analysis on Italian tweets</title>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Malvina</forename><surname>Nissim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</title>
				<meeting>the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="100" to="107" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Long-term Social Media Data Collection at the University of Turin</title>
		<author>
			<persName><forename type="first">Mirko</forename><surname>Valerio Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Manuela</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><surname>Sanguinetti</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018</title>
				<meeting>the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Annotating WordNet</title>
		<author>
			<persName><forename type="first">Helen</forename><surname>Langone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Benjamin</forename><forename type="middle">R</forename><surname>Haskell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">George</forename><forename type="middle">A</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004</title>
				<meeting>the Workshop Frontiers in Corpus Annotation at HLT-NAACL 2004</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="63" to="69" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics (ACL)</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">MultiWordNet: developing an aligned multilingual database</title>
		<author>
			<persName><forename type="first">Emanuele</forename><surname>Pianta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luisa</forename><surname>Bentivogli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Girardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the First International Conference on Global WordNet</title>
				<meeting>the First International Conference on Global WordNet</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="293" to="302" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Polignano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierpaolo</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>De Gemmis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giovanni</forename><surname>Semeraro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019</title>
				<meeting>the Sixth Italian Conference on Computational Linguistics (CLiC-it 2019</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Statistically weighted reviews to enhance sentiment classification</title>
		<author>
			<persName><forename type="first">Saurabh</forename><surname>Prakash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chakravarthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kaveri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Karbala International Journal of Modern Science</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="26" to="31" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The language of praise and criticism in a student evaluation survey</title>
		<author>
			<persName><forename type="first">Martyn</forename><surname>Stewart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Studies In Educational Evaluation</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="1" to="9" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">SUBTLEX-UK: a new and improved word frequency database for British English</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">B</forename><surname>Walter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pawel</forename><surname>Van Heuven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Emmanuel</forename><surname>Mandera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marc</forename><surname>Keuleers</surname></persName>
		</author>
		<author>
			<persName><surname>Brysbaert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Quarterly Journal of Experimental Psychology</title>
		<imprint>
			<biblScope unit="volume">67</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1176" to="1190" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">The tenuousness of lemmatization in lexicon-based sentiment analysis</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Vassallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giuliano</forename><surname>Gabrieli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cristina</forename><surname>Bosco</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Sixth Italian Conference on Computational Linguistics -CLiC-it 2019</title>
				<meeting>the Sixth Italian Conference on Computational Linguistics -CLiC-it 2019</meeting>
		<imprint>
			<publisher>Academia University Press</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Morph-it! a free corpus-based morphological resource for the Italian language</title>
		<author>
			<persName><forename type="first">Eros</forename><surname>Zanchetta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Baroni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Corpus Linguistics</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">George</forename><surname>Kingsley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zipf</forename></persName>
		</author>
		<title level="m">Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology</title>
				<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="1949">1949</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
