<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Defining a Gold Standard for a Swedish Sentiment Lexicon: Towards Higher-Yield Text Mining in the Digital Humanities</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jacobo</forename><surname>Rouces</surname></persName>
							<email>jacobo.rouces@gu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Språkbanken</orgName>
								<orgName type="institution">University of Gothenburg</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lars</forename><surname>Borin</surname></persName>
							<email>lars.borin@gu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Språkbanken</orgName>
								<orgName type="institution">University of Gothenburg</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nina</forename><surname>Tahmasebi</surname></persName>
							<email>nina.tahmasebi@gu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Språkbanken</orgName>
								<orgName type="institution">University of Gothenburg</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stian</forename><forename type="middle">Rødven</forename><surname>Eide</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Språkbanken</orgName>
								<orgName type="institution">University of Gothenburg</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Defining a Gold Standard for a Swedish Sentiment Lexicon: Towards Higher-Yield Text Mining in the Digital Humanities</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8DB4BED98CD484FC922FA63F56099B38</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>There is an increasing demand for multilingual sentiment analysis, and most work on sentiment lexicons is still carried out based on English lexicons like WordNet. In addition, many of the non-English sentiment lexicons that do exist have been compiled by (machine) translation from English resources, thereby arguably obscuring possible language-specific characteristics of sentiment-loaded vocabulary. In this paper we describe the creation from scratch of a gold standard for the sentiment annotation of Swedish terms as a first step towards the creation of a full-fledged sentiment lexicon for Swedish. 1 The language dependence of NLP tools makes up a complex and sorely underresearched area; see, e.g., the insightful discussion in (Bender, 2011).</p><p>2 According to a standard reference, Ethnologue (Simons and Fennig, 2017), there are about 7,000 languages in the world. A fair estimate would be that at the most 1,000 of these have a tradition of writing <ref type="bibr" target="#b2">(Borin, 2009)</ref>. Sentiment analysis tools are available for far fewer languages than this. 3 E.g., the NRC Emotion Lexicon: http://saifmohammad.com/WebPages/ NRC-Emotion-Lexicon.htm 4 Notably, our use of word sense is to be construed as 'lexical word sense', which also is intended to cover lexicalized multi-word expressions.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>As the amounts of digital textual data available to scholars grow beyond all bounds, forever eluding all hope of being able to deal with them in time-honored "close-reading" fashion, text mining (TM; also "text data mining" or "text analytics") is seeing increasing use as a research tool in the humanities and social sciences. TM relies heavily on linguistic processing of the texts in order to produce reliable results. In other words: text mining for a particular language will be limited by the accuracy of the natural language processing (NLP) tools available for that language. 1 2 Sentiment Analysis and its Uses in Digital Humanities</p><p>The NLP subfield known as sentiment analysis or opinion mining is an important component technology of TM, which has seen an explosive expansion over the last decade or so. Since the publication of the comprehensive overview of the field by <ref type="bibr" target="#b9">Pang and Lee (2008)</ref>, we have seen hundreds of papers as well as dedicated workshops on this topic in NLP conferences.</p><p>Even though sentiment analysis has become a standard item in the NLP toolbox, there still remain many theoretical and methodological questions to be answered and resource gaps to be filled. For the latter, we note that most work on automated sentiment analysis has been done on English and a few other languages; for most of even the written languages of the world, 2 this tool is not available. All sentiment analysis methods in the literature rely on lexical knowledge in one way or another, often in the form of a sentiment lexicon, i.e., a list of words (lemmas or text words) and multi-word expressions annotated with sentiment information. This of course must be a language-specific resource. The present paper describes the first steps towards the development of an extensive sentiment lexicon for written (standard) Swedish.</p><p>There is an increasing demand for multilingual sentiment analysis, as well as -in particular in the digital humanities -for sentiment analysis tools for historical texts, while most published work deals with contemporary English, more often than not texts from product and service review websites. In fact, many of the non-English sentiment lexicons that do exist have been compiled by (machine) translation from English resources, 3 thereby arguably obscuring possible language-specific characteristics of sentiment-loaded vocabulary.</p><p>The theoretical and methodological issues arising in connection with sentiment analysis of texts are at least partly due to the position of this field at the intersection of the linguistic subfields of pragmatics and lexical semantics. In practice this means that we find many different proposals in the literature both for how sentiment information should be represented in the lexicon, to which kinds of lexical entities it should be attached (lemmas, lexemes or word senses), and how contextual information should be encoded and used in calculating the sentiment of a text passage from its constituent parts.</p><p>The methodological position taken here is that prior sentiment (or polarity) forms part of a word's sense, and that a word sense only has one prior polarity. 4 Connotations are considered to form part of the word sense (as opposed to, e.g., the practice in Princeton WordNet; <ref type="bibr">Fellbaum, 1998)</ref>. From this follows that if a word appears in text with two different sentiment values, it must either represent two senses of this lexeme or, alternatively, reflect a contextual effect, to be accounted for by invoking the venerable linguistic device of compositionality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Towards a Swedish Sentiment Lexicon</head><p>In this paper we describe the creation of a gold standard (GS) for the sentiment annotation of Swedish terms as a first step towards the creation of a full-fledged sentiment lexicon for Swedish -i.e., a lexicon containing information about prior sentiment values of lexical items. For this purpose, we use human annotations of items sampled from a general-purpose computational lexical resource. More specifically, we employ a multi-stage approach combining corpus-based frequency sampling, direct annotation and Best-Worst Scaling (BWS) <ref type="bibr" target="#b7">(Kiritchenko and Mohammad, 2016)</ref>.</p><p>The remainder of this paper is structured as follows:</p><p>In Section 4 we describe SALDO, the Swedish lexical resource forming the basis for both the GS and the sentiment lexicon under construction. In Section 5 we describe our approach to compiling the GS. Section 6 is devoted to an analysis of the GS in order to arrive at a suitable sentiment model to be encoded in a Swedish sentiment lexicon. In the literature we find different proposed ways of modeling sentiment for a word sense or unit of text. The simplest model is the bipolar model, which assigns to each lexical unit a scalar, often normalized in the interval [−1, +1] (with −1 representing the most negative possible sentiment, and +1 the most positive). SentiWordNet <ref type="bibr">(Baccianella et al, 2010)</ref> and its gold standard Micro-WNOp <ref type="bibr" target="#b4">(Cerini et al, 2007</ref>) use a model with two degrees of freedom. Each semantic unit in WordNet is assigned a three-dimensional vector (pos, neg, neu) with positive, negative and neutral components, normalized so that pos + neg + neu = 1 (this effectively gives 2 degrees of freedom). This model can be trivially converted to the previous one using sen = pos − neg.</p><p>In </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">SALDO</head><p>Both our GS and the sentiment lexicon under construction are based on SALDO, which is an existing large Swedish lexical-semantic computational resource <ref type="bibr" target="#b3">(Borin et al, 2013)</ref>. For the work described here, we use the current stable version SALDO v. 2.3, which contains 131,020 word senses. <ref type="foot" target="#foot_0">5</ref>SALDO is organized as a lexical-semantic network of word senses, whose topology reflects semantic distance among the word senses. It is superficially similar to WordNet, but quite different from it in the principles by which it is structured. The basic organizational principle of SALDO is hierarchical. Every entry in SALDO -representing a word sense<ref type="foot" target="#foot_1">6</ref> is supplied with one or more semantic descriptors, which are themselves also entries in the dictionary. All entries in SALDO are actually occurring words or conventionalized or lexicalized multi-word expressions (MWEs) of the language. The primary -obligatory -descriptor is the entry which better than any other entry fulfills two requirements: (1) it is a semantic neighbor of the entry to be described; and (2) it is more central than it.</p><p>That two entries are semantic neighbors means that there is a direct semantic relationship between them, for instance synonymy, hyponymy, argument-predicate relationship, etc. Centrality is determined by means of several criteria, the most important being frequency: a frequent entry is more central than an infrequent entry. 7 The basic linguistic idea underlying SALDO is in effect that, semantically speaking, the whole vocabulary of a language can be described as having a center -or core -and (consequently) a periphery. In SALDO, the higher levels in the hierarchy contain simpler and more basic entries. Contrast this with WordNet, where the higher nodes in the hierarchy contain very abstract vocabulary (e.g. 'entity').</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Compiling the Gold Standard from SALDO</head><p>We aim to have a GS that assigns a sentiment to each SALDO entry. The bipolar sentiment model should be supported, but we also want to investigate the feasibility of using the Senti-WordNet model. We have used a three-stage procedure for compiling the GS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Corpus-Based Sampling</head><p>First, an initial sampling from SALDO was done following the distribution given by the estimated frequency of each word sense in the Gigaword corpus <ref type="bibr" target="#b5">(Eide et al, 2016)</ref>, which is a one-billion-word mixed-genre corpus of written Swedish.<ref type="foot" target="#foot_2">8</ref> Due to the Zipfian distribution of many kinds of linguistic items <ref type="bibr" target="#b0">(Baayen, 2001)</ref>, the GS would otherwise include mostly words that occur very rarely in written text, including rather obscure and outdated terms, as the lexicon has been designed to cover a time period from the mid-20th century until today.</p><p>We used the subset of the corpus covering the period from 1990 to the present (∼940 MW). Because the tokens in the corpus are not sense-disambiguated, we followed a simple heuristic. The different word senses for a given lemma are not annotated for their corpus frequency in SALDO, but the first sense is by design the most common one. Because the most common sense for a lemma in SALDO tends to occur around 70% of the time in corpus data (Nieto Piña and Johansson, 2016), we assume a distribution where the first of a lemma's n senses is given a probability of p = 0.7, and each of the n − 1 remaining ones are given p = 0.3/(n − 1). Then, for every polysemous lemma in the corpus, an associated word sense is sampled according to p, and a count c for that word sense is increased. By using a sampling 7 The actual work on SALDO relies mainly on the lexicographical experience and linguistic intuition of the compilers, who use clues such as stylistic value, word-formation complexity, the type of semantic relation holding between an entry and its primary descriptor, acquisition order in first-language acquisition, etc. Frequency correlates highly with these, however: It turns out that about 90% of the SALDO entries have primary descriptors which are at least as frequent as the entries themselves in a corpus of more than one billion words of Swedish. A more detailed description and discussion of the semantic organization of SALDO can be found in <ref type="bibr" target="#b3">Borin et al (2013</ref><ref type="bibr">Borin et al ( , 1196</ref><ref type="bibr">Borin et al ( -1200))</ref>.</p><p>based on a large corpus of the last two decades, the GS becomes more representative of modern written language. Namely, it is equivalent to sampling the tokens (sense-disambiguated lemmas) directly from modern text. By filtering out obscure and dated terms, we also reduce the proportion of terms that the annotators may not understand.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Best-Worst Scaling Filtered by Direct Annotation</head><p>Having annotators directly assign continuous sentiment scores to lexicon entries has several issues. It is difficult for annotators to remain consistent throughout their own annotation and across themselves. Best-Worst Scaling (BWS) annotation <ref type="bibr" target="#b7">(Kiritchenko and Mohammad, 2016)</ref> has been proposed as an alternative. With BWS, annotators are presented tuples (usually 4-tuples) of items to annotate, and they select the highest and lowest according to the score at hand (in this case, the most positive and the most negative). If certain statistical properties are ensured about the appearance of elements in the tuples, then the number of times an element is chosen as most positive minus the number of times it is chosen as most negative can be used as a sentiment score. However, we experienced that if the items are chosen by direct sampling from the lexicon or from a general corpus, most 4-tuples would not contain any items with a clear non-neutral polarity, let alone one most positive and one most negative item. Increasing the size of the tuples could solve this, but would imply a higher cognitive load for the annotator. Our solution to this problem is pre-filtering the initial set of terms by means of a preceding direct, but coarse-grained annotation that allows us to feed into the BWS annotation a subset of word senses with a more even distribution of sentiment values.</p><p>Using the corpus-derived distribution described above, we independently sampled 1998 word senses from SALDO, creating the set of words that would be annotated directly, W DA . The sampling was filtered in order to avoid having too many difficult-to-judge non-content items (SALDO contains all parts of speech) in the annotation set. We also left out all multiword expressions and single-letter lemmas (typically corresponding to the names of letters of the alphabet, musical notes, or units of measurement). Thus only single-word adjectives, interjections, nouns, and verbs, having a lemma two letters or longer were sampled. <ref type="foot" target="#foot_3">9</ref>We also sampled 200 additional word senses that were used for a joint annotation exercise across all annotators of W DA , with the purpose of standarizing the annotation criteria.</p><p>Each of the three annotators then independently assigned a label to each word sense in W DA . The possible labels are "positive","negative" or "neutral". All three annotatorscoauthors of the present paper -are NLP researchers with formal backgrounds in linguistics and computer science, and native-level knowledge of Swedish.</p><p>For the BWS annotation, we selected those elements from W DA that had been labeled as non-neutral by at least two annotators (278 items in total), which ensured that most 4- tuples had clear candidates for most positive and most negative. From this set, we generated 572 4-tuples, in order to get a sufficient number of annotations per item <ref type="bibr" target="#b7">(Kiritchenko and Mohammad, 2016)</ref>.</p><p>We developed a web application (see Figure <ref type="figure" target="#fig_1">1</ref>) that allows annotators to assign sentiments to SALDO word senses, using Best-Worst Scaling. The user can select the most positive and most negative item in each tuple, and also has an 'I don't know' option. It includes an interactive menu of pending groups, and the ability to save and load partial annotations to and from local files, allowing the annotators to organize their work over several sessions. We employed 4 annotators, who were different from the previous ones but also had formal background in (computational) linguistics and/or computer science, as well as native-level knowledge of Swedish.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Annotation Outcomes and Choice of Sentiment Model</head><p>We calculated interannotator agreement and other statistics for the annotations. In brief, the interannotator agreement was higher for BWS than for DA. See Rouces et al (forthcoming-a) for a detailed discussion.</p><p>The following table shows some representative scores obtained by BWS annotation.  The histograms in Figure <ref type="figure" target="#fig_2">2</ref> shows the distributions of the (bipolar) sentiment values obtained with the two kinds of annotation, illustrating the effectiveness of the preliminary filtering steps in ensuring that the BWS annotators were presented mainly non-neutral items.</p><p>The output of the BWS annotation could be used both for the SentiWordNet and the bipolar model. From the results of the BWS annotation, 86 of 278 items have pos BWS (w) &gt; 0 and neg BWS (w) &gt; 0, but in many cases one of these components is small and a strong bias is common. The average over w of the value min(pos BWS (w), neg BWS (w)), which reflects the overlap between the positive and negative components, is 0.022. In contrast, for Micro-WNOp, the GS used for SentiWordNet, which uses the same model but was obtained from direct annotation of the two variables 'pos' and 'neg', it is 0.015. Our higher value is probably due to the fact that we made W BWS with a high proportion of non-neutral word senses, and therefore, a non-negligible proportion of the BWS 4-tuples contained elements that either were all negative or all positive, making the choice for most positive or most negative a sort of "lesser evil" or "lesser good", respectively. As an example, absurd from the table in Section 6, appeared in the annotation interface in a tuple containing <ref type="bibr">[dålig 'bad', utplåna 'obliterate', irriterad 'irritated', absurd]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Towards a Sentiment Lexicon for Swedish</head><p>At the moment we are putting the resulting GS to the use for which it was intended: to train and compare different lexicon-based algorithms for creating a complete sentiment lexicon for Swedish. We have made initial experiments using both pure lexicon-based methods and methods combining lexical data and corpus information. This work is described in <ref type="bibr">Rouces et al (forthcoming-b)</ref>. The resulting resource -SenSALDO -will contribute significantly to the development of higher-yield TM tools in support of digital humanities research targeting Swedish data.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Section 7 we wrap up and point to future research directions. Most technical details of our work have been left out of the present exposition. The companion papers Rouces et al (forthcoming-a) and Rouces et al (forthcoming-b) provide detailed technical information pertaining to the compilation of the GS and the construction of the sentiment lexicon, respectively.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 :</head><label>1</label><figDesc>Fig. 1: Screenshot for the Best-Worst Scaling annotation interface. The labels for each group are 'most negative', 'word', 'part of speech', 'associated words', 'most positive', 'don't know/uncertain' from left to right.</figDesc><graphic coords="6,136.63,138.43,322.02,160.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>FrequencyFig. 2 :</head><label>2</label><figDesc>Fig. 2: Histograms of the sen values resulting from direct (left) and BWS (right) annotation</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_0">SALDO is freely available (under a CC-BY license) at https://spraakbanken.gu.se/eng/ resource/saldo.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_1">Each word sense in SALDO is additionally connected to one or more form units (lemmas plus part of speech and full inflectional and compounding information).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_2">The corpus is freely available (under a CC-BY license) at https://spraakbanken.gu.se/eng/ resource/gigaword.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_3">Lexical adverbs were not included, since this set holds too many function words. There are very few deadjectival adverbs in Swedish of the type quickly. These are instead normally rendered by the neuter singular indefinite form of the adjective.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work has been supported by a framework grant (Towards a knowledge-based culturomics;contract 2012-5738) as well as funding to Swedish CLARIN (Swe-Clarin;contract , both awarded by the Swedish Research Council, and by infrastructure funding granted to Språkbanken by the University of Gothenburg.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">H</forename><surname>Baayen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of LREC 2010</title>
				<meeting>LREC 2010<address><addrLine>Dordrecht; Baccianella S, Esuli A, Sebastiani F</addrLine></address></meeting>
		<imprint>
			<publisher>Valletta</publisher>
			<date type="published" when="2001">2001. 2010</date>
			<biblScope unit="page" from="2200" to="2204" />
		</imprint>
	</monogr>
	<note>Word frequency distributions</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">On achieving and evaluating language-independence in NLP</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Bender</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Linguistic Issues in Language Technology</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Linguistic diversity in the information society</title>
		<author>
			<persName><forename type="first">L</forename><surname>Borin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the SALT-MIL workshop 2009</title>
				<meeting>the SALT-MIL workshop 2009<address><addrLine>Donostia</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
		<respStmt>
			<orgName>University of the Basque Country</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">SALDO: A touch of yin to WordNet&apos;s yang</title>
		<author>
			<persName><forename type="first">L</forename><surname>Borin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Forsberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lönngren</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Resources and Evaluation</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="1191" to="1211" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Micro-WNOp: A gold standard for the evaluation of automatically compiled lexical resources for opinion mining</title>
		<author>
			<persName><forename type="first">S</forename><surname>Cerini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Compagnoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Demontis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Formentelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gandini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Language Resources and Linguistic Theory</title>
				<editor>
			<persName><forename type="first">Franco</forename><surname>Angeli</surname></persName>
		</editor>
		<meeting><address><addrLine>, Milano</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="200" to="210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The Swedish culturomics gigaword corpus: A one billion word Swedish reference dataset for NLP</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Eide</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tahmasebi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Borin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the From Digitization to Knowledge workshop at DH 2016</title>
				<meeting>the From Digitization to Knowledge workshop at DH 2016<address><addrLine>Kraków, LiUEP, Linköping</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="8" to="12" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">WordNet: An Electronic Lexical Database</title>
		<editor>Fellbaum C</editor>
		<imprint>
			<date type="published" when="1998">1998</date>
			<publisher>The MIT Press</publisher>
			<pubPlace>Cambridge, Mass</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Capturing reliable fine-grained sentiment associations by crowdsourcing and best-worst scaling</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kiritchenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mohammad</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NAACL 2016, ACL</title>
				<meeting>NAACL 2016, ACL<address><addrLine>San Diego</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="811" to="817" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Embedding senses for efficient graph-based word sense disambiguation</title>
		<author>
			<persName><forename type="first">L</forename><surname>Nieto Piña</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Johansson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of TextGraphs-10, ACL</title>
				<meeting>TextGraphs-10, ACL<address><addrLine>San Diego</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Opinion mining and sentiment analysis</title>
		<author>
			<persName><forename type="first">B</forename><surname>Pang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Foundations and Trends in Information Retrieval</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="1" to="135" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">forthcoming-a) Generating a gold standard for a Swedish sentiment lexicon</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rouces</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tahmasebi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Borin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Eide</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of LREC 2018</title>
				<meeting>LREC 2018<address><addrLine>ELRA, Miyazaki</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">SenSALDO: Creating a sentiment lexicon for Swedish</title>
		<author>
			<persName><forename type="first">J</forename><surname>Rouces</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tahmasebi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Borin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Eide</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of LREC 2018</title>
				<meeting>LREC 2018<address><addrLine>ELRA, Miyazaki</addrLine></address></meeting>
		<imprint/>
	</monogr>
	<note>forthcoming-b</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<ptr target="http://www.ethnologue.com" />
		<title level="m">Ethnologue: Languages of the world, twentieth edn. SIL International</title>
				<editor>
			<persName><forename type="first">G</forename><forename type="middle">F</forename><surname>Simons</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Fennig</surname></persName>
		</editor>
		<meeting><address><addrLine>Dallas</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
