<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Author Profiling with Bidirectional RNNs using Attention with GRUs Notebook for PAN at CLEF 2017</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Don</forename><surname>Kodiyan</surname></persName>
							<email>kodiydon@students.zhaw.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Florin</forename><surname>Hardegger</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stephan</forename><surname>Neuhaus</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mark</forename><surname>Cieliebak</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Zurich University of Applied Sciences</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Author Profiling with Bidirectional RNNs using Attention with GRUs Notebook for PAN at CLEF 2017</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">29DA38640CE51DFD455C2FF06FBB1F38</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes our approach for the Author Profiling Shared Task at PAN 2017. The goal was to classify the gender and language variety of a Twitter user solely by their tweets. Author Profiling can be applied in various fields like marketing, security and forensics. Twitter already uses similar techniques to deliver personalized advertisement for their users. PAN 2017 provided a corpus for this purpose in the languages: English, Spanish, Portuguese and Arabic. To solve the problem we used a deep learning approach, which has shown recent success in Natural Language Processing. Our submitted model consists of a bidirectional Recurrent Neural Network implemented with a Gated Recurrent Unit (GRU) combined with an Attention Mechanism. We achieved an average accuracy over all languages of 75,31% in gender classification and 85,22% in language variety classification.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Social media has become an important platform for communication and exchange of information. In contrast to classical letters and emails, the language on social media is much more personal. This raises the question whether the text style and content allows to draw conclusions about demographics traits of its author, such as age, gender, or language variety. Such insights can be used in various applications, such as forensics, security, or marketing. For instance, on the basis of such profiles it would be possible to determine which users could be interested in a new product or campaign, how urgent a complaint is, or if a profile in an online forum might be a fake profile.</p><p>The Author Profiling Shared Task of the PAN shared task aims to answer these question by extracting information about authors based on their linguistic style of writing <ref type="bibr" target="#b12">[14,</ref><ref type="bibr" target="#b11">13]</ref>. The goal of the 2017 shared task at PAN is to detect the author's gender and dialect from his/her Twitter texts. Both training and test data is provided in four different languages: English, Spanish, Portuguese and Arabic.</p><p>We have implemented a solution that is based on a bidirectional recurrent neural network (bi-RNN) using gated recurrent units (GRUs) in combination with an attention mechanism.</p><p>The paper is structured as follows. In Section 3, we give a short overview of related work. Then, in Section 4, we describe our model, and Section 5 compares the different attempts and their results on test data. Conclusions are drawn in the last section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">PAN</head><p>PAN is a series of different digital text forensics tasks. It organizes shared task evaluations. Shared Tasks are computer science events of a specific problem of interest. This paper is the result of our participation at the Author Profiling Shared Task of 2017. Author Profiling includes gender and language variety predictions of an author of a given Twitter document. To solve this problems, training and test datasets are available <ref type="bibr" target="#b14">[16]</ref>.</p><p>PAN 2017 Training Data. PAN 2017 Training Data consists of Twitter profiles in four different languages: English, Spanish, Portuguese and Arabic. The corpus was annotated with gender and language variety information about the authors.</p><p>For each of the language varieties, there are 600 Twitter profiles. In each language there are the same number of male and female profiles. The dataset includes exactly 100 tweets for each author.</p><p>Language Variety. Language Variety is defined as a specific variation of an author's native language. For instance, one has to identify whether an English author has a language variation from Australia, Canada, Great Britain, Ireland, New Zealand or the United States. TIRA. TIRA is an evaluation-as-a-service platform <ref type="bibr" target="#b10">[12]</ref>. The submission for the PAN shared task was done with this tool. The submitted models were self-evaluated on a virtual machine which was hosted by the organizers. The test data was only available on this virtual machine and was not visible to the participants.</p><p>Evaluation. The performance measure of the submissions at PAN 2017 is done with accuracy. The individual accuracy for gender and variety identification was calculated for each language as follows:</p><p>accuracy correct predicted total .</p><p>(1)</p><p>The joint accuracy is calculated when both gender and variety are properly predicted together. The final ranking is calculated with the averaged accuracy over all four languages.</p><p>In this chapter we provide an overview of the most relevant works for the Author Profiling Task with neural networks.</p><p>Neural Networks. Neural networks have achieved great results in natural language processing in the past few years. In many tasks like machine translation <ref type="bibr" target="#b8">[10]</ref> and sentiment analysis <ref type="bibr" target="#b5">[7]</ref>, neural networks have proven to be very successful. The two state-ofthe-art neural networks used today are recurrent neural networks (RNN) and convolutional neural networks (CNN). The main challenge in most NLP tasks is to simplify the input sequence and keep the most important information. Research on neural machine translation (NMT) already focuses heavily on this challenge. For that reason we applied techniques from NMT to the Author Profiling Task.</p><p>RNNs and CNNs. The recent success of RNNs are achieved through long short-term memory networks (LSTM) and gated recurrent unit networks (GRU) <ref type="bibr" target="#b4">[6]</ref>. With their capabilities of long-term dependencies, LSTMs and GRUs have achieved state-of-theart results in various NLP tasks. The work of Bahdanau et al. <ref type="bibr">[1]</ref> proposed an attention mechanism to simplify a sequence. In combination with a bidirectional RNN (bi-RNN) learns this approach to automatically weigh the most relevant information of the input sequence. This leads to substantial improvements in machine translation and other fields like automatic summarization <ref type="bibr" target="#b2">[4]</ref>. The latest research of Gehring et al. <ref type="bibr" target="#b8">[10]</ref> has shown that CNNs are capable of achieving state-of-the-art results in NMT. Those results were achieved by applying the attention mechanism to CNNs. CNNs are computationally less expensive compared to LSTMs and GRUs, which makes them preferable for large datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Methodology</head><p>In this chapter we describe the technical solution. Main focus is on the system architecture of the neural networks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Preprocessing</head><p>Every single tweet was preprocessed by converting them to lower-case. We replaced URLs and usernames with a standardized token. We converted hashtags to regular words and used the TweetTokenizer from NLTK <ref type="bibr" target="#b0">[2]</ref> to tokenize the tweets. We use a vocabulary to map tokens with an token-ID. The IDs point to a vector representation of the token, which is used later. After the preprocessing step we receive a list of tweets of each author and each tweet is a list of token-IDs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Embeddings</head><p>Each token in a tweet is represented by pretrained word embeddings <ref type="bibr" target="#b6">[8]</ref>. For English and Spanish we used embeddings created with word2vec <ref type="bibr" target="#b9">[11]</ref>. For both languages a corpus of 200 million unlabelled tweets were used. The skip-gram algorithm was used for training with window-size 5, sample size of 1e-05, minimum frequency of 15 and 200 dimensions.</p><p>For Portuguese and Arabic we used pretrained embeddings from <ref type="bibr" target="#b1">[3]</ref>, which were trained on Wikipedia corpus<ref type="foot" target="#foot_0">1</ref> . They have an output dimension of 300.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Architecture</head><p>In this section we describe our model, which consists of a bi-RNN with GRUs followed by an attention mechanism. Embedding Layer. The embedding layer is used to map the token-IDs with their vector representation. The token-ID is used to lookup the word-vector in the embeddings. Those vectors get concatenated and are passed to the next layer. This results in an output matrix S R d¢n , where d stands for the dimension of the word vector and n for the size of the input. To determine n, we took the tweet with the biggest amount of tokens from our training dataset and rounded the number up to the next 10. This resulted in a maximum input size of n 60. Shorter inputs were padded with zeros to match that size. To reduce the effect of unknown and padded words we used masking <ref type="bibr" target="#b3">[5]</ref>. This way our model only uses known words and skips zero-values.</p><p>GRU Layer. This layer consists of two GRUs with u number of units. We used a GRU for each direction, which resulted two matrices R F R u¢n and R B R u¢n . Finally both matrices were concatenated and resulted a matrix R R 2u¢n . For our model we used u 50.</p><p>Attention Layer. This layer is used to weight the most important parts of the GRU encoded input and deliver a more simplified matrix of the input. The output-matrix R of the previous layer, the weight-matrix W a R 2u¢2u and the bias b R 2u is used to calculate a hidden state h t :</p><formula xml:id="formula_0">h t tanhpW a R bq. (2)</formula><p>The hidden state h t and the weight-vector W u R 2u used to calculate the final attention a for each word by a softmaxph t W u q.</p><p>(3)</p><p>The attention-vector a is then multiplied with R and the result summed together. This results a summarized representation of the sentence as a vector s a R 2u . Softmax Layer. As the final layer we used a fully connected layer with softmax as the activation function. The number of output nodes were depending on the number of classification possibilities. For gender prediction were 2 nodes required, for language variety predictions were between 2 and 7 nodes required, depending on the language.</p><p>Dropout. Dropout drops individual nodes during training with a probability of p and is therefore used to reduce overfitting <ref type="bibr" target="#b13">[15]</ref>. We used dropout on our softmax layer with p 0.2.</p><p>Optimization. Our model is trained using the AdaDelta optimizer <ref type="bibr" target="#b15">[17]</ref>. We used 10 ¡5 and default values for the other hyper-parameters.</p><p>Author Prediction. Our model is trained to classify single tweets. To get the classification of an author, his tweets are classified separately. The outputs of our model, which is the output of the softmax layer, is then summed together and the class with the highest value is the final prediction. For example, if we want to predict the gender of an user u who has three tweets t 1 , t 2 , t 3 , we first classify the tweets separately. This could result following predictions: t 1 r0.4, 0.6s, t 2 r0.3, 0.7s, t 3 r1.0, 0.0s. The first number of each output indicates the probability that the tweet is written by a female and the second number indicates the probability that the author is a male. The outputs of the tweets t 1 , t 2 , t 3 are summed together and results r1.7, 1.3s. In this example, user u would be predicted as a female.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Training</head><p>To train our models for submission we used 90% of the training data and the remaining 10% were used as validation set. The validation set was used to select a model checkpoint during training. For more details in model checkpoints, see Section 5.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Evaluation</head><p>We distinguish between the evaluation during development, and the benchmark measured on actual test data on TIRA. The results during the development phase were achieved on the provided training corpus with cross validation.</p><p>Cross Validation. Our models were trained with 10-fold cross validation. We used cross validation to calculate a representative score for the model. The data in each fold was used as follows: 80% training data, 10% validation data and 10% test data. The evaluation on the test data does not influence the training and is only used to evaluate the model. We used a validation set in combination with model checkpoints to prevent overfitting. Model checkpoints will be explained in the following section.</p><p>F1 Score. During the training phase we used F1 score to find the best model. The F1 score considers both precision and recall to compute the score. We used the F1 score, because it penalizes one-sided predictions of a model. The abbreviations tp, f p, f n indicate in the following calculations true positives, false positives and false negatives. Precision is the ratio between correct predicted (tp) to all classified data of this class (tp f p):</p><formula xml:id="formula_1">precision tp tp f p .<label>(4)</label></formula><p>Recall is the ratio between correct classified data (tp) to the number of total data in the corresponding class (tp f n):</p><formula xml:id="formula_2">recall tp tp f n .<label>(5)</label></formula><p>The harmonic mean of this two scores is called F1 score. The F1 score is calculated as follows:</p><formula xml:id="formula_3">F 1 2 ¦ precision ¦ recall precision recall .<label>(6)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Model Checkpoints</head><p>The accuracy and F1 score of the model were measured during training. The scores were evaluated on a validation and a test dataset. If the model achieved a higher F1 score on the validation data than a previous one, the model (and its weights) was saved. An example of the measured scores is shown in Figure <ref type="figure" target="#fig_1">2</ref>.</p><p>The goal is to select the best weights for a model during the training phase. Figure <ref type="figure" target="#fig_1">2</ref> shows that our model performs very similar on validation and test data. That means by choosing the best weights on the validation set, the chances are high that the model performs equally on the test set. This makes our model very stable and predictable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Analysis of the Attention</head><p>While working with attention mechanism we developed a tool to represent how the different words in a tweet are weighted. This tool helped us to understand which words are more important for our model. An example on language variety is shown in Figure <ref type="figure" target="#fig_2">3</ref>, where multiple tweets of British and American authors are compared.</p><p>In Figure <ref type="figure" target="#fig_2">3</ref> the attention of the words are highlighted. As we can see some typical American English and British English words are marked. For example, in the first tweet  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .</head><label>1</label><figDesc>Figure 1. Representation of the bi-GRU+Attention model. We used n 4 and u 3 for visualization purposes.</figDesc><graphic coords="4,185.28,252.11,244.80,180.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 .</head><label>2</label><figDesc>Figure 2. Accuracy graphs of the bi-GRU+Attention model during training. Visualized comparison between validation (orange) and test (blue) accuracy scores on author level. On the X axis is the number of epochs represented and on the Y axis the corresponding accuracy value.</figDesc><graphic coords="7,138.48,128.65,338.41,206.67" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 .</head><label>3</label><figDesc>Figure 3. Visualized attention weights comparison between British and American Twitter users. The left side visualizes the attention of each word in a tweet. The darker the background color, the stronger those words are weighted. On the right side the final prediction and its probability is shown. In these examples are all predictions correct.</figDesc><graphic coords="7,181.68,413.69,251.98,180.68" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Distribution of data for language variety in the PAN 2017 training corpus</figDesc><table><row><cell>Native Language</cell><cell>Author Profiles</cell><cell>Language Variations</cell></row><row><cell>English</cell><cell>3600</cell><cell>Australia, Canada, Great Britain, Ireland, New Zealand, United States</cell></row><row><cell>Spanish</cell><cell>4200</cell><cell>Argentina, Chile, Colombia, Mexico, Peru, Spain, Venezuela</cell></row><row><cell>Portuguese</cell><cell>1200</cell><cell>Brazil, Portugal</cell></row><row><cell>Arabic</cell><cell>2400</cell><cell>Egypt, Gulf, Levantine, Maghrebi</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://wikipedia.org</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>is the word "color" and in the third tweet "Walmart" marked as very important, which are common words in American English. In the second and fourth tweet are the words "bloody" and "cheeky" marked as significant for British English, which are common words in British English.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Cross Validation Results</head><p>During our preparation for the PAN shared task several models were tested and compared. Our baseline was a CNN model <ref type="bibr" target="#b5">[7]</ref> which already participated in PAN 2016. The model has a 2-layer CNN architecture with a fully-connected softmax layer at the end.</p><p>The experiments have shown that the bi-GRU+Attention model has the best performance on both classification tasks (gender, variety). The measured scores of both models are shown in Table <ref type="table">2</ref> and Table <ref type="table">3</ref>.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">PAN 2017 Results</head><p>We trained two distinct models for each language: one for gender and one for variety. These models were uploaded to the virtual machine and were evaluated on the actual test dataset. In Table <ref type="table">4</ref> the results obtained on the PAN 2017 Author Profiling test dataset are shown. The highest score on gender prediction was achieved in English. Portuguese gender prediction follows with 0.075% less accuracy. The gender predictions in Spanish and Arabic are lower than the others. We assume that this issue is related to the worse vocabulary usage: For both languages Spanish and Arabic, the vocabulary coverage is below 80%, in contrast to around 90% coverage of the vocabularies in English and Portuguese.</p><p>In general, good scores are achieved for variety prediction. Outstanding is the variety accuracy of 91,43% for the Spanish language, which consists of seven language variations. The score dropped only in English and Arabic below 80%. The lowest score 76,88% is achieved for variety prediction on Arabic, due to low vocabulary coverage.</p><p>The exact vocabulary coverage of the used embeddings is shown in Table <ref type="table">5</ref>. The results in Table <ref type="table">5</ref> seems to imply that the accuracy for gender prediction correlates with vocabulary coverage.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>In this paper, we presented deep learning models to predict gender and language variety of Twitter profiles. We described a bidirectional RNN with GRU and an attention mechanism. We compared the average accuracy of our models over all languages with a previously developed CNN model. The RNN exceeds the CNN in gender prediction by 1,45% and in variety prediction by 2,69% on average over four languages on PAN 2017 training data.</p><p>For future work, we would like to see if a combination of several high-quality solutions for Author Profiling with a random forest could even outperform each of the subsystems. This has been done successfully for sentiment analysis <ref type="bibr" target="#b7">[9]</ref>, and it would be interesting to see if it works for Author Profiling as well. </p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Natural Language Processing with Python</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Loper</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>O&apos;Reilly Media</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Enriching Word Vectors with Subword Information</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno>CoRR abs/1607.04606</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks</title>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Multimedia</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1875" to="1886" />
			<date type="published" when="2015-11">Nov 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">F</forename><surname>Chollet</surname></persName>
		</author>
		<ptr target="https://github.com/fchollet/keras(2015" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Empirical evaluation of gated recurrent neural networks on sequence modeling</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ç</forename><surname>Gülçehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno>CoRR abs/1412.3555</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Sentiment Analysis using Convolutional Neural Networks with Multi-Task Training and Distant Supervision on Italian Tweets</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deriu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cieliebak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Evaluation of NLP and Speech Tools for Italian</title>
				<imprint>
			<publisher>EVALITA</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deriu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lucchi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">D</forename><surname>Luca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Severyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cieliebak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hofmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jaggi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 26th International Conference on World Wide Web</title>
				<meeting>the 26th International Conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1045" to="1052" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">JOINT_FORCES: Unite Competing Sentiment Classifiers with Random Forest</title>
		<author>
			<persName><forename type="first">O</forename><surname>Dürr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Uzdilli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cieliebak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SemEval 2014-Proceedings of the 8th International Workshop on Semantic Evaluation</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="366" to="369" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Convolutional Sequence to Sequence Learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gehring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Auli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Grangier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yarats</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017-05">May 2017</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Distributed Representations of Words and Phrases and their Compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno>CoRR abs/1310.4546</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Improving the Reproducibility of PAN&apos;s Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gollub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 5th International Conference of the CLEF Initiative (CLEF 14</title>
				<editor>
			<persName><forename type="first">E</forename><surname>Kanoulas</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Lupu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Clough</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Sanderson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Hanbury</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Toms</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2014-09">Sep 2014</date>
			<biblScope unit="page" from="268" to="299" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Overview of PAN&apos;17: Author Identification, Author Profiling, and Author Obfuscation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. 8th International Conference of the CLEF Initiative (CLEF 17)</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Jones</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Lawless</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2017-09">Sep 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<title level="m">Working Notes Papers of the CLEF 2017 Evaluation Labs</title>
				<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Mandl</surname></persName>
		</editor>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Dropout: A Simple Way to Prevent Neural Networks from Overfitting</title>
		<author>
			<persName><forename type="first">N</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1929" to="1958" />
			<date type="published" when="2014-01">Jan 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ljubešic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tiedemann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th Workshop on Building and Using Comparable Corpora (BUCC)</title>
				<meeting>the 7th Workshop on Building and Using Comparable Corpora (BUCC)<address><addrLine>Reykjavik, Iceland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="11" to="15" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">ADADELTA: An Adaptive Learning Rate Method</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Zeiler</surname></persName>
		</author>
		<idno>CoRR abs/1212.5701</idno>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
