<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Affective Content Classification using Convolutional Neural Networks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Daniel</forename><surname>Claeser</surname></persName>
							<email>daniel.claeser@fkie.fraunhofer.de</email>
							<affiliation key="aff0">
								<orgName type="institution">Fraunhofer FKIE</orgName>
								<address>
									<addrLine>Fraunhoferstrasse 20</addrLine>
									<postCode>53343</postCode>
									<settlement>Wachtberg</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Affective Content Classification using Convolutional Neural Networks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">0BD86F76CEDC6169297A546F610D7199</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:28+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Convolutional Neural Networks</term>
					<term>Unsupervised Learning</term>
					<term>GloVe</term>
					<term>FastText</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present a three-layer convolutional neural network for the classification of two binary target variables 'Social' and 'Agency' in the HappyDB corpus exploiting lexical density of a closed domain and a high degree of regularity in linguistic patterns. Incorporating demographic information is demonstrated to improve classification accuracy. Custom embeddings learned from additional unlabeled data perform competitive to established pre-trained models based on much more comprehensive general training corpora. The top-performing model achieves accuracies of 0.90 for the 'Social' and 0.875 for the 'Agency' variable.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The CL-Aff Shared Task <ref type="bibr" target="#b0">[1]</ref>, held as a part of the Affective Content Analysis workshop at AAAI 2019, invited participants to analyze and classify the contents of HappyDB <ref type="bibr" target="#b1">[2]</ref>, a corpus of 100,000 'Happy Moments'. Subtask 1 consisted of classifying contents with respect to two binary variables, 'Agency' and 'Social', with 'Agency' indicating whether the author of a happy moment was in control of events and 'Social' indicating whether additional people were explicitly or implicitly involved. In addition, an open-ended second subtask invited participants to share insights from the corpus with respect to 'ingredients of happiness'.</p><p>To the best of the author's knowledge, no similar shared task or challenge has previously been proposed, and while there has been extensive research on sentiment and affect analysis, the task at hand is very specific and its scope is limited to pre-classified data describing 'happy moments'. The task at hand could therefore not be approached with established techniques for sentiment or polarity analysis. It was rather considered a classification task aiming for the detection of semantic ('Social' variable) and syntactic ('Agency' variable) patterns, with both implicit and explicit concepts present in the data.</p><p>In recent years, embedding-based deep learning techniques have gained momentum superseding conventional machine learning techniques in a broad range of linguistic tasks, currently constituting the absolute majority of publications at the four major venues of computational linguistics <ref type="bibr" target="#b4">[5]</ref>. The use of neural networks employing the technique of vector embeddings seemed a natural choice given the need to extend the language model to abstract concepts beyond the lexical surface structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Dataset</head><p>A comprehensive description of the dataset provided along with informative basic statistics can be found in the original HappyDB paper ( <ref type="bibr" target="#b1">[2]</ref>). The following section describes some additional insights into the data structure that proved to be relevant for classification approach and performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Analyzed subsets</head><p>It was quickly noted that 95.2% of the provided happy moments were tagged as submitted from just two countries, United States (8378 or 79.3%) and India (1674 or 15.9%), while the remainder of the corpus of just 508 happy moments was distributed among 69 other countries. In light of this uneven distribution and the resulting challenges for claiming statistically significant insights on this data, only the subsets from the aforementioned two countries were considered for further evaluation and additional classification experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Duplicates</head><p>While the authors of HappyDB took basic cleaning and quality assurance measures with respect to misspellings and removal of non-informative entries, the corpus contains a considerable proportion of duplicates.</p><p>While the corpus contains 1,674 entries with the country tag 'IND', a manual inspection of those moments revealed the presence of a high number of duplicates. After removing exact literal duplicates, the subset was 391 entries lighter, leaving 1,283 entries. Removing punctuation to further catch small variations in otherwise identical utterances, like the college example in table The seven most common duplicate entries alone make up 391 (23.3%) of all moments with country tag 'IND'. Note that the entry "i went to college" occurs with and without full stop 15 times each. Additionally, the majority of these duplicates were submitted along with contradicting demographic information. While sentences like "i went to college" might indeed have been submitted by multiple participants, more distinct duplicates like the irregular pattern second to the bottom or complex utterances like the example at the bottom (shortened from originally "when i am getting ready to go to my office my parents send off with cute smile and say have a nice day and take care") were almost certainly submitted multiple times by the same worker. Even the cleaned-up subset still contains several very similar complex utterances. Undeniably, the presence of such a high proportion of duplicates in one category has a considerable distorting effect on training and evaluation of a classifier.</p><p>The situation was far less critical for the 'USA' subset of the corpus with 208 duplicates amounting to less than 2.5% of entries in the corpus. The overall duplicate ratio over the entire corpus was 6.2%</p><p>Only the cleaned up versions of the 'USA' and 'IND' subsets were considered for further analysis and training the classifiers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Lexical, syntactic and idiomatic properties</head><p>The material provided by participants from the US and India differed from each other in several linguistic dimensions. The exact linguistic background of individual authors remained unclear as both countries are polyglot, however, it seems reasonable to assume the majority of US participants to be native speakers of English or highly fluent in the language. The vast majority of authors submitting from India is in contrast assumed to use English as a second language, with a more diverse linguistic background than US participants. Assuming a descriptive rather than prescriptive point of view, it is not of particular interest whether particular patterns in the Indian subset might be considered correct or appropriate by native and proficient speakers of English as long as they are distinct and reproducible enough for a classifier to learn. The intuition that patterns in this subset might be distinct enough for the classifier to benefit from learning them separately was proven correct experimentally.</p><p>American and Indian submissions differed considerably with respect to syntactic patterns to start with: While statements from US authors contained 13.52 tokens on average per sentence with a standard deviation of 6.78, Indian statements contained 12.71 tokens on average with a considerably higher standard deviation of 10.59 caused e.g. by a larger proportion of particularly long statements. While the authors were originally instructed to state complete sentences, the level of compliance varied between the two groups, with e.g. US authors starting 8.4% of sentences by a gerund form compared to 5.7% of Indian authors. Tables <ref type="table" target="#tab_2">2 and 3</ref> show the most common trigrams starting sentences from the two different groups, demonstrating US authors use a considerably higher share of idiomatic expressions such as "i got to" and framing expressions such as "an event that [made me happy]" and "i was happy", marked in bold. The Indian statements might in that light tentatively be characterized as being more straightforward. Additional differences involve Indians using simple and progressive present substituting simple past more often than US authors and a higher rate of omission of particles such as prepositions. Indian statements were lexically more dense with a types to tokens ratio of 9.67 compared to 8.00 in US statements. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Syntactic patterns</head><p>Participants in the crowdfunding process creating the HappyDB corpus were explicitly asked to state moments that made them happy in single full sentences.</p><p>While not all participants submitted strictly complied to those instructions, the overwhelming majority of statements are in the form of full declarative sentences. Syntax in the corpus can thus be regarded fixed and discarded as distinct piece of information in the classification process. 3 Experiments and results</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Basic considerations and setup</head><p>Given the almost uniform syntactic structure of the corpus with respect to declarative sentences, a convolutional neural network was determined to be an appropriate architecture rather than a time-step based approach: Considering syntax more or less fixed relieves the classifier of the effort to interpret the complete input as sequences and allows to focus on detecting the presence or absence of features relating to agency or social participation in the utterance. Two binary classifiers were trained to address each variable separately. A large search space of configurations was explored, yielding the following configuration with the best performance in terms of accuracy: Two convolutional layers with 128 filters each with a step size of 5 and a dense layer with 128 units. Applying dropout of 10 and 20 % yielded slight but statistically insignificant improvements. Batch sizes were iterated in steps of 8, 16, 32, 64 and multiples of 64 up to 1024, with medium batch-sizes of around 384 performing best in the vast majority of configurations. Table <ref type="table" target="#tab_3">4</ref> shows overall results in the best-performing configurations with the architecture described above.</p><p>As higher dimensional embeddings consistently outperformed low-dimensional models, only the 300 dimensional models were considered for further experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Pre-trained and customized embeddings</head><p>Three major groups of pre-trained embeddings were used for the initialization layer of the neural network: FastText by Facebook AI <ref type="bibr" target="#b3">[4]</ref>, GloVe by Stanford University <ref type="bibr" target="#b2">[3]</ref> and custom FastText embeddings trained on the joint set of labeled and unlabeled HappyDB data provided by the task's authors.</p><p>To assess the degree to which the supplied labeled and unlabeled HappyDB data were able to reflect syntactic and semantic relations of the domain in comparison to broader knowledge of predefined embeddings as distributed by the authors of FastText and GloVe, FastText embeddings of different dimensionality and with both available approaches, CBOW and SkipGram, were trained and evaluated as displayed in Table <ref type="table" target="#tab_3">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Constructing two binary classifiers</head><p>Based on aforementioned considerations, one binary classifier was constructed for each dependent variable, 'Agency' and 'Social', each with the target values 'yes' or 'no' as labeled in the training data. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Training classifiers on four classes</head><p>Table <ref type="table" target="#tab_4">5</ref> shows the uneven distribution of the two variables and their co-occurrences in the corpus, illuminating some basic connections in agreement with the psychological findings quoted by the authors of HappyDB: A majority of 73.8% of happy moments involves active participation or control by the author. Within these moments, an absolute majority of 54.4% involves no other people than the acting authors themselves. In turn, within the 26.2% of moments with no active participation of the author, the probability is 74.9% that other people are involved, reflecting the intuition that in most instances, something, or somebody, needs to cause the happiness after all. This connection raised interest in the performance of a classifier considering each combination of the two variables a distinct class, thus forming four classes "Agency no, social no", "Agency yes, social no", "Agency no, social yes" and "Agency yes, social yes". While there is apparently a strong conditional probability of "Social: yes" given "Agency: no", the significantly lowered number of samples was expected to cause a drop in performance, especially with only 693 samples for the "Agency no, social no" class, an assumption that was confirmed by the experimental results as displayed below. The results of the three top-performing high-dimensional configurations instantly affirmed those expectations and ceased interest in further experiments: Combining the two variables into four categories decreased the performance even when evaluating only for one variable per category well below the achievable results in the binary setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Training separate classifiers by countries</head><p>The presence of aforementioned distinct syntactic and lexical characteristics in the two largest groups by country inspired the question whether classification performance would benefit from training separate classifiers for each group. Since only the USA and India subsets contained more than 1000 samples, the exploration was limited to those subsets. The results show a modest but statistically significant (confidence level 0.95) improvement for both language groups with the moments submitted under country code IND benefitting considerably stronger. We suggest this might be an effect of more compact syntax patterns (see above). The picture is even clearer for the Social variable. For both variables, the separated classifiers achieve better performance than their combined average. However, the degree of convergence for this phenomenon towards larger training sets has not been investigated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Classification by concepts</head><p>The authors of HappyDB report on successful efforts to categorize the corpus by a set of crowd-sourced category labels. Additionally, they identified a set of concepts or topics of happy moments in a seemingly rather intuitive and subjective way. To apply a limited test of replicability to this set of topics, a classifier with the aforementioned architecture was trained on a subset of the corpus consisting of happy moments labeled with exactly one concept, limited to concepts with more than 1000 labeled examples, which were careeer (1280), entertainment (1135), family (1259) and food (1007). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>We introduced a rather simplistic architecture to classify the HappyDB contents with respect to the two binary variables 'Agency' and Social. HappyDB prove to be a high-quality linguistic resource with a high degree of replicability in terms of machine learning and classification as proven by experimental results for both the target variables defined by the Shared Task and the ability to reproduce the concepts introduced by the HappyDB authors. We observe that while embeddings trained only on HappyDB without any external world knowledge supplied cannot statistically significantly outperform established general purpose embeddings such as FastText and Glove trained on Wikipedia and crawled web content, they appear to be almost competitive utilizing a database of not even 20,000 types as opposed to up to 2 million types in the pre-trained embeddings. We observe no particular social media benefit for embeddings in accordance to the assumption that most statements were given in a rather formal register as intended by the corpus' authors. Classification appears to benefit from taking linguistic backgrounds of different groups of authors into account, and we recommend cleaning the corpus from remaining duplicates to avoid distortions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Acknowledgement</head><p>I would like to express my gratitude to Fahrettin Gökgöz and Albert Pritzkau of Fraunhofer FKIE and Maria Jabari of University of Bonn for their expertise and insights supporting the system design and dataset analysis.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>1, left 1,246 unique entries, reducing the number of available examples for training and evaluation of the classifier by more than 25%. Example duplicates for country code 'IND'</figDesc><table><row><cell cols="2">Occurrences Duplicate</cell></row><row><cell>126</cell><cell>i went to temple</cell></row><row><cell>100</cell><cell>i went to shopping</cell></row><row><cell>15</cell><cell>i went to college.</cell></row><row><cell>15</cell><cell>i went to college</cell></row><row><cell>13</cell><cell>the day with my wife</cell></row><row><cell>12</cell><cell>my boy friend love feeling</cell></row><row><cell>10</cell><cell>when i am getting ready to [...]*</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>20 most common trigrams at sentence beginning, USA</figDesc><table><row><cell>Occurrence Trigram</cell><cell cols="2">Relative Cumulated</cell></row><row><cell>i was happy</cell><cell>2.65</cell><cell>2.65</cell></row><row><cell>i went to</cell><cell>2.52</cell><cell>5.17</cell></row><row><cell>i was able</cell><cell>2.30</cell><cell>7.47</cell></row><row><cell>i got to</cell><cell>2.18</cell><cell>9.65</cell></row><row><cell>i got a</cell><cell>2.17</cell><cell>11.82</cell></row><row><cell>i had a</cell><cell>1.76</cell><cell>13.59</cell></row><row><cell>i bought a</cell><cell>1.13</cell><cell>14.71</cell></row><row><cell>i received a</cell><cell>0.87</cell><cell>15.58</cell></row><row><cell>i found out</cell><cell>0.87</cell><cell>16.45</cell></row><row><cell cols="2">an event that 0.82</cell><cell>17.28</cell></row><row><cell>i made a</cell><cell>0.69</cell><cell>17.96</cell></row><row><cell>i watched a</cell><cell>0.62</cell><cell>18.59</cell></row><row><cell>i went on</cell><cell>0.53</cell><cell>19.11</cell></row><row><cell>i found a</cell><cell>0.53</cell><cell>19.64</cell></row><row><cell>i ate a</cell><cell>0.51</cell><cell>20.15</cell></row><row><cell>i went out</cell><cell>0.49</cell><cell>20.64</cell></row><row><cell>it made me</cell><cell>0.42</cell><cell>21.06</cell></row><row><cell>my wife and</cell><cell>0.40</cell><cell>21.47</cell></row><row><cell cols="2">my husband and 0.37</cell><cell>21.83</cell></row><row><cell>i took my</cell><cell>0.37</cell><cell>22.20</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>20 most common trigrams at sentence beginning, India</figDesc><table><row><cell cols="2">Occurrence Trigram</cell><cell cols="2">Relative Cumulated</cell></row><row><cell>53</cell><cell>i went to</cell><cell>4.31</cell><cell>4.31</cell></row><row><cell>20</cell><cell>i got a</cell><cell>1.63</cell><cell>5.93</cell></row><row><cell>20</cell><cell>i bought a</cell><cell>1.63</cell><cell>7.56</cell></row><row><cell>15</cell><cell>i went for</cell><cell>1.22</cell><cell>8.78</cell></row><row><cell>14</cell><cell cols="2">my happiest moment 1.14</cell><cell>9.92</cell></row><row><cell>10</cell><cell>yesterday i went</cell><cell>0.81</cell><cell>10.73</cell></row><row><cell>10</cell><cell>i met my</cell><cell>0.81</cell><cell>11.54</cell></row><row><cell>9</cell><cell>me and my</cell><cell>0.73</cell><cell>12.28</cell></row><row><cell>9</cell><cell>i was very</cell><cell>0.73</cell><cell>13.01</cell></row><row><cell>9</cell><cell>in the past</cell><cell>0.73</cell><cell>13.74</cell></row><row><cell>8</cell><cell>when i am</cell><cell>0.65</cell><cell>14.39</cell></row><row><cell>8</cell><cell>last month i</cell><cell>0.65</cell><cell>15.04</cell></row><row><cell>8</cell><cell>i got my</cell><cell>0.65</cell><cell>15.69</cell></row><row><cell>7</cell><cell>my best friend</cell><cell>0.57</cell><cell>16.26</cell></row><row><cell>7</cell><cell>i purchased a</cell><cell>0.57</cell><cell>16.83</cell></row><row><cell>7</cell><cell>i had a</cell><cell>0.57</cell><cell>17.40</cell></row><row><cell>6</cell><cell>we bought a</cell><cell>0.49</cell><cell>17.89</cell></row><row><cell>6</cell><cell>the day i</cell><cell>0.49</cell><cell>18.37</cell></row><row><cell>6</cell><cell>bought a new</cell><cell>0.49</cell><cell>18.86</cell></row><row><cell>5</cell><cell>we went to</cell><cell>0.41</cell><cell>19.27</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Accuracy, Macro F1 Agency, Social (abbreviated A, S) 10-fold cross-validated</figDesc><table><row><cell>Embedding</cell><cell cols="4">Dim's Accuracy A Accuracy S F1 A F1 S</cell></row><row><cell>GloVe, 6B</cell><cell>300</cell><cell>0.868</cell><cell>0.887</cell><cell>0.835 0.885</cell></row><row><cell>GloVe, 840B</cell><cell>300</cell><cell>0.871</cell><cell>0.8975</cell><cell>0.841 0.894</cell></row><row><cell>FastText, Wiki-News</cell><cell>300</cell><cell>0.875</cell><cell>0.900</cell><cell>0.842 0.888</cell></row><row><cell cols="2">FastText, Wiki-News Subword 300</cell><cell>0.87</cell><cell>0.8925</cell><cell>0.839 0.889</cell></row><row><cell>FastText Crawl</cell><cell>300</cell><cell>0.872</cell><cell>0.896</cell><cell>0.842 0.892</cell></row><row><cell>FastText Crawl Subword</cell><cell>300</cell><cell>0.871</cell><cell>0.896</cell><cell>0.840 0.892</cell></row><row><cell>FastText, Wikipedia</cell><cell>300</cell><cell>0.873</cell><cell>0.898</cell><cell>0.841 0.896</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>300</cell><cell>0.874</cell><cell>0.894</cell><cell>0.843 0.889</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 300</cell><cell>0.869</cell><cell>0.889</cell><cell>0.838 0.884</cell></row><row><cell>GloVe, Twitter</cell><cell>300</cell><cell>0.871</cell><cell>0.894</cell><cell>0.840 0.891</cell></row><row><cell>GloVe6B</cell><cell>200</cell><cell>0.87</cell><cell>0.885</cell><cell>0.840 0.882</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>200</cell><cell>0.873</cell><cell>0.896</cell><cell>0.842 0.892</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 200</cell><cell>0.869</cell><cell>0.885</cell><cell>0.839 0.880</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>100</cell><cell>0.872</cell><cell>0.895</cell><cell>0.840 0.891</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 100</cell><cell>0.868</cell><cell>0.882</cell><cell>0.838 0.879</cell></row><row><cell>GloVe, Twitter</cell><cell>100</cell><cell>0.871</cell><cell>0.894</cell><cell>0.837 0.890</cell></row><row><cell>GloVe, 6B</cell><cell>100</cell><cell>0.867</cell><cell>0.881</cell><cell>0.821 0.878</cell></row><row><cell>GloVe, 6B</cell><cell>50</cell><cell>0.862</cell><cell>0.879</cell><cell>0.818 0.876</cell></row><row><cell>GloVe, Twitter</cell><cell>50</cell><cell>0.868</cell><cell>0.871</cell><cell>0.821 0.867</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 50</cell><cell>0.863</cell><cell>0.871</cell><cell>0.830 0.867</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>25</cell><cell>0.862</cell><cell>0.877</cell><cell>0.832 0.873</cell></row><row><cell>GloVe, Twitter</cell><cell>25</cell><cell>0.863</cell><cell>0.871</cell><cell>0.832 0.868</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Distribution of classes and co-occurrence of target variables</figDesc><table><row><cell></cell><cell cols="3">Social no Social yes Sum</cell></row><row><cell cols="2">Agency no 693</cell><cell>2071</cell><cell>2764</cell></row><row><cell cols="2">Agency yes 4242</cell><cell>3554</cell><cell>7796</cell></row><row><cell>Sum</cell><cell>4935</cell><cell>5625</cell><cell>10560</cell></row><row><cell>Embedding</cell><cell></cell><cell cols="2">Agency Social Both</cell></row><row><cell cols="3">FastText, HappyDB, 300d, SkipGram 0.771</cell><cell>0.820 0.691</cell></row><row><cell>Glove6B, 300d,</cell><cell></cell><cell>0.753</cell><cell>0.801 0.690</cell></row><row><cell>FastText, Wiki-News</cell><cell></cell><cell>0.803</cell><cell>0.822 0.686</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 .</head><label>6</label><figDesc>Results of initial experiments with four classes</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 7 .</head><label>7</label><figDesc>Three separate classifiers were trained, one for 'IND' and 'USA' each with 1246 samples (which equals the number of available samples for 'IND' to receive a balanced setting) each and one with the 1246 split between the two countries proportionally in alignment to the original full training corpus. Accuracy for 'Agency' with USA and IND trained separately and jointly</figDesc><table><row><cell>Embedding</cell><cell cols="3">Acc USA Acc IND Acc Mixed</cell></row><row><cell>GloVe840</cell><cell>0.849</cell><cell>0.857</cell><cell>0.844</cell></row><row><cell>GloVe6b</cell><cell>0.849</cell><cell>0.854</cell><cell>0.850</cell></row><row><cell>FastText Crawl</cell><cell>0.852</cell><cell>0.859</cell><cell>0.856</cell></row><row><cell>FastText Crawl Subword</cell><cell>0.852</cell><cell>0.863</cell><cell>0.843</cell></row><row><cell>FastText Wiki-News</cell><cell>0.857</cell><cell>0.857</cell><cell>0.845</cell></row><row><cell cols="2">FastText Wiki-News Subword 0.842</cell><cell>0.845</cell><cell>0.829</cell></row><row><cell>FastText Wikipedia</cell><cell>0.850</cell><cell>0.848</cell><cell>0.855</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 0.856</cell><cell>0.859</cell><cell>0.854</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>0.860</cell><cell>0.857</cell><cell>0.842</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 8 .</head><label>8</label><figDesc>Accuracy for 'Social' with USA and IND trained separately and jointly</figDesc><table><row><cell>Embedding</cell><cell>USA IND Mixed</cell></row><row><cell>GloVe840</cell><cell>0.895 0.866 0.838</cell></row><row><cell>GloVe6b</cell><cell>0.880 0.859 0.821</cell></row><row><cell>FastText Crawl</cell><cell>0.894 0.863 0.830</cell></row><row><cell>FastText Crawl Subword</cell><cell>0.867 0.855 0.820</cell></row><row><cell>FastText Wiki-News</cell><cell>0.896 0.859 0.824</cell></row><row><cell cols="2">FastText Wiki-News Subword 0.843 0.847 0.823</cell></row><row><cell>FastText Wikipedia</cell><cell>0.880 0.851 0.823</cell></row><row><cell cols="2">FastText HappyDB, CBOW 0.890 0.868 0.832</cell></row><row><cell>FastText HappyDB, Skip</cell><cell>0.887 0.864 0.829</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 9 .</head><label>9</label><figDesc>Classification by concepts, four most common single labels</figDesc><table><row><cell>Embedding</cell><cell cols="2">Dimensions Accuracy</cell></row><row><cell>GloVe6b</cell><cell>300</cell><cell>0.908</cell></row><row><cell>GloVe840</cell><cell>300</cell><cell>0.913</cell></row><row><cell>FastText, Wiki-News</cell><cell>300</cell><cell>0.912</cell></row><row><cell cols="2">FastText, Wiki-News Subword 300</cell><cell>0.884</cell></row><row><cell>FastText, Crawl</cell><cell>300</cell><cell>0.916</cell></row><row><cell>FastText, Crawl Subword</cell><cell>300</cell><cell>0.899</cell></row><row><cell>FastText, Wikipedia</cell><cell>300</cell><cell>0.908</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>300</cell><cell>0.915</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 300</cell><cell>0.892</cell></row><row><cell>GloVe, Twitter</cell><cell>200</cell><cell>0.910</cell></row><row><cell>GloVe6B</cell><cell>200</cell><cell>0.901</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>200</cell><cell>0.914</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 200</cell><cell>0.893</cell></row><row><cell>FastText, HappyDB, Skip</cell><cell>100</cell><cell>0.911</cell></row><row><cell cols="2">FastText, HappyDB, CBOW 100</cell><cell>0.889</cell></row><row><cell>GloVe, Twitter</cell><cell>100</cell><cell>0.901</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The CL-Aff Happiness Shared Task: Results and Key Insights</title>
		<author>
			<persName><forename type="first">K</forename><surname>Jaidka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mumick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Chhaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ungar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Affective Content Analysis @ AAAI (AffCon2019)</title>
				<meeting>the 2nd Workshop on Affective Content Analysis @ AAAI (AffCon2019)<address><addrLine>Hawaii</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Hap-pyDB: A Corpus of 100,000 Crowdsourced Happy Moments</title>
		<author>
			<persName><forename type="first">A</forename><surname>Asai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Evensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Golshan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Halevy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lopatenko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of LREC 2018. European Language Resources Association (ELRA)</title>
				<meeting>LREC 2018. European Language Resources Association (ELRA)<address><addrLine>Miyazaki, Japan</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Bag of tricks for efficient text classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL)</title>
				<meeting>the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL)</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Recent trends in deep learning based natural language processing</title>
		<author>
			<persName><forename type="first">T</forename><surname>Young</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hazarika</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Poria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ieee Computational intelligenCe magazine</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="55" to="75" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
