<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Twitter Text and Image Gender Classification with a Logistic Regression N-gram Model Notebook for PAN at CLEF 2018</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Moniek</forename><surname>Nieuwenhuis</surname></persName>
							<email>m.l.nieuwenhuis@student.rug.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jeroen</forename><surname>Wilkens</surname></persName>
							<email>j.r.wilkens@student.rug.nl</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Groningen</orgName>
								<address>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Twitter Text and Image Gender Classification with a Logistic Regression N-gram Model Notebook for PAN at CLEF 2018</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">EC276FED603A46A9B9AC3D3E2725A61B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present our participation in the PAN 2018 Author Profiling shared task, classifying authors on gender for English, Arabic and Spanish. We participated in all sub-tasks and propose a system for classification with text, images and the combination of those two. Our final submitted system is a Logistic Regression classifier that uses word and character n-grams as textual features and a set of automatically derived image-based features such as the presence, proportion and number of faces to detect selfies as well as the faces' emotions and gender. We experimented with word embeddings, which negatively affected our system's performance. Our cross-validated training results shows slight improvements in performance for Arabic and Spanish when image-based features are added to text-based features. Our highest scores on the PAN 2018 test dataset are accuracies of 81.2% for English using only text-based features, 78.7% for Arabic using both text-and image-based features and 80.3% for Spanish using only text-based features. Overall, we finished 6 th in the global ranking with an average accuracy for our text and image combination system of 79.6%.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The field of author profiling is about inferring traits from an author such as gender, age and personality. With the rise of social media platforms, such as Twitter and Facebook, the field of author profiling has gained more interest. From multiple viewpoints, it is desirable to profile an author. Examples of such viewpoints could be from a security point of view in order to detect authors with criminal intentions and from a marketing point of view in order to narrow down target audiences for online advertisements.</p><p>In the past years, multiple shared tasks have been organized on the topic of author profiling <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b15">16]</ref>. In this paper, we describe our approach for the Author Profiling shared task at PAN 2018 <ref type="bibr" target="#b16">[17]</ref>. This year's Author Profiling task, is the 6 th iteration of this task and is slightly different from the previous years, since the gold standard data now includes images. The task is to built a system to classify a Twitter author's gender by 100 of its tweets and 10 posted images. Though the images are new for this shared task, previous work already created systems that are capable of detecting gender, emotional expressions and personalities from images <ref type="bibr" target="#b1">[2]</ref>. By combining such image classification systems with textual classification systems, it can be determined whether this addition of images can improve the final accuracy.</p><p>In the last two years, the winning systems for 2016 <ref type="bibr" target="#b21">[22]</ref> and 2017 <ref type="bibr" target="#b2">[3]</ref> were both SVM classifiers that made use of word n-grams and character n-grams. Although deeplearning methods were introduced, such as Recurrent Neural Networks <ref type="bibr" target="#b6">[7]</ref> and Convolutional Neural Networks <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>, they haven't been able to beat those systems yet. Therefore, our approach will focus on the successful models of the previous iterations of the shared task, a SVM classifier such as in <ref type="bibr" target="#b2">[3]</ref> and a Logistic Regression classifier as used in <ref type="bibr" target="#b9">[10]</ref>, in which we will take these systems as baselines and try to improve them by performing a parameter search, experimenting with word embeddings and adding image-based features. The latter includes a feature to indicate selfies, since females tend to post more selfies than males <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b20">21]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Method</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Data</head><p>The PAN 2018 training corpus consists of tweets from three different languages, English, Spanish and Arabic. For each author there are 100 tweets and 10 images labeled by gender. The gender labels (male and female) are evenly distributed over the training corpus. Table <ref type="table" target="#tab_0">1</ref> shows an overview of the PAN 2018 training corpus released by the organization. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Language Tweets Authors Images</head><p>Arabic 150,000 1,500 15,000 English 300,000 3,000 30,000 Spanish 300,000 3,000 30,000</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">N-grams</head><p>The main set of features we used were n-grams. The winners of the previous year's Author Profiling shared task <ref type="bibr" target="#b2">[3]</ref>, as well as <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b18">19]</ref> showed that word ngrams and character n-grams are very robust features for this task. Another advantage of using n-grams is that they are non-handcrafted features and thus easy to generate. Also, there is no dependence on either pre-trained word embeddings, or large corpora of text to train word embeddings. For all three languages, we experimented with different lengths of word and character n-grams.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Word Embeddings</head><p>Aside from the n-gram features, we experimented with using word embeddings. For English, we experimented with pre-trained word embeddings from GloVe <ref type="bibr" target="#b14">[15]</ref>. We used embeddings with vector lengths of 100 and 200 dimensions that were created from a corpus consisting of 2 billion tweets containing 27 billion tokens. For Spanish, we used pre-trained word embeddings from <ref type="bibr" target="#b7">[8]</ref> with vector lengths of 200 dimensions. These embeddings were constructed from a total amount of 58.7 million Spanish tweets having 1.1 billion tokens. For Arabic, we trained our own word embeddings from roughly 70 million recently scraped Arabic tweets with vector lengths of 200 dimensions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Images</head><p>This year's new addition to the gender classification task is classification by images. Our approach to use the images for this task is to utilize existing image feature extraction tools from related research. In our system, we have used the software used in <ref type="bibr" target="#b1">[2]</ref>.</p><p>In that study, a convolutional neural network (CNN) was implemented to find and classify faces in images by gender and emotion. The CNN model contains of 4 residual depth-wise separable convolutions, whereby each convolution is followed by a batch normalization operation and a ReLU activation function. The last layer of the model applies a global average pooling and soft-max activation function to produce the prediction. The system achieved an accuracy of 95% on the IMDB gender dataset and 66% on the FER-2013 emotion dataset. That system, including all code and pre-trained models are available under an open-source license. <ref type="foot" target="#foot_0">1</ref>The software from <ref type="bibr" target="#b1">[2]</ref> was implemented in our system without preprocessing the images. The software converts the images from a Twitter user to a set of 13 features: The first two features are about the presence and number of detected faces in the images. In Table <ref type="table" target="#tab_1">2</ref> are the number of faces detected for each language and gender. We see that there is little to no difference between the genders for which a face is detected.</p><p>Features three and four cover the relative area of images that are covered by a detected face. These features are intended to capture selfies. Previous research <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b20">21]</ref> studied selfie-related behaviours between males and females. One of the findings from those studies is that females tend to make and post more selfies on social media. Having a large face area on an image with only one detected face could identify that an image is a selfie, and therefore these features could be helpful in the classification task.</p><p>Features five and six are about the gender of the detected faces. The values of these two features are floats ranging from 0 to 1, indicating the proportion of each gender. In Table <ref type="table" target="#tab_2">3</ref> are the probabilities for a gender posting more faces of a specific gender in a image. We see that for all languages, males are posting more images of male faces than female faces, especially for Arabic there is a large difference in male and female faces. English and Spanish females are slightly posting more images with female faces than male faces, except for Arabic, in which female user post more male faces than female faces, but still to a lesser extent compared to their male counterparts. Lastly, when one of the seven emotions from <ref type="bibr" target="#b1">[2]</ref> could be detected, which the software was not always capable of, we stored the proportion of these emotions in seven float values ranging from 0 to 1. Table <ref type="table" target="#tab_3">4</ref> shows an overview of the emotions for each gender and language. The table shows that, generally, there are small to no differences between male and female regarding emotions. The only conclusion that holds for all languages is that females tend to post more happy people. For English, males post more images of angry people, but for Arabic this is the opposite. Also, no one is ever surprised, raising the question whether the system of <ref type="bibr" target="#b1">[2]</ref> can accurately detect this. Overall, we expect that these features will not be (very) beneficial for our system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Models</head><p>To get to our best system, we experimented with different classifiers. We used the Python package Sklearn <ref type="bibr" target="#b13">[14]</ref> to implement the LinearSVC classifier as used in <ref type="bibr" target="#b2">[3]</ref> and the Logistic Regression classifier with the parameters C = 1e2 and fit_intercept = False as used in <ref type="bibr" target="#b9">[10]</ref>, we also tried a K-Nearest Neighbour classifier. The results of all tested classification models can be found in Table <ref type="table" target="#tab_4">5</ref>. For every model,we measured its performance by accuracy in a 10-fold cross-validation setup. The models are all using the n-gram features used as in <ref type="bibr" target="#b2">[3]</ref>. We found that using the Logistic Regression classifier resulted in the best performance, meaning we will use this classifier for our next experiments. For the logistic Regression model we performed a parameter search, mainly to find the optimal number of word and character n-grams. Our baseline n-gram model was the n-gram model used in <ref type="bibr" target="#b2">[3]</ref>, which was using word 1-and 2-grams and character 3to 5-grams. We tested different settings of n-grams but we were unable to outperform the settings from <ref type="bibr" target="#b2">[3]</ref>. Table <ref type="table" target="#tab_5">6</ref> shows the results for the best settings, as well as the best results found for word and character n-grams apart. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6">Text Preprocessing</head><p>For the preprocessing of the text data we lowercased all tweets and subsequently tokenized the tweets with the NLTK Tweet Tokenizer. <ref type="foot" target="#foot_1">2</ref> We also replaced every username with @username and every URL to URL. Table <ref type="table" target="#tab_6">7</ref> shows that our preprocessing methods do indeed improve performance for each language. Especially generalizing over URLs was beneficial. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results</head><p>In this section we will report the results of our systems on the training corpus (10-fold CV) and final test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Training Results</head><p>The results in Table <ref type="table" target="#tab_7">8</ref> shows that only using n-grams as features results in having a good baseline result that is in line with the findings in <ref type="bibr" target="#b2">[3]</ref>. The model that only uses embeddings performs worse than the model that utilizes n-grams. Moreover, the model that combines embeddings and n-grams also performs worse.</p><p>The image-only model performed poorly with accuracies around 60%. However, using these features in combination with the n-gram features gave us slight performance improvements for Arabic and Spanish. Although we have found an increase of accuracy in Arabic, approximate randomization testing <ref type="bibr" target="#b10">[11]</ref> <ref type="foot" target="#foot_2">3</ref> showed us that this improvement is not significant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Official Results</head><p>We handed in three final systems; one for classification based on text-only, one for images-only and one for the combination of text and images. For all of the three final systems, we used a Logistic Regression classifier.  <ref type="table" target="#tab_8">9</ref> shows our official results on the PAN 2018 test set. We see that our performance for English is a bit lower than we expected, but for Spanish and Arabic we obtain a better performance than on the training set. On average over all languages we scored an accuracy of 0.799 on the text-only submission. The images do not influence the score much, but since it now decreases the score, it is questionable whether our (simplistic) approach of processing the images is helpful for this task.</p><p>For the combination system with both text and image features we scored an average accuracy score of 0.7963 and became 6 th in the global ranking. In general we can say that our Logistic Regression model does quite well, making it a robust, straightforward and reliable method of doing gender classification. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion and Future Work</head><p>In this paper, we presented our approach for the PAN 2018 author profiling shared task for predicting an author's gender using text-based and image-based features. We submitted a Logistic Regression classifier using word and character n-grams as text-based features and several automatically extracted image features. We found that only using text-based n-gram features gave us the best results for English and Spanish, whereas the combination of text-based and image-based features gave us the best results for Arabic.</p><p>As additional text-based features we tested word embeddings, but results on the training data shows that these rather hurt our system's performance.</p><p>For this shared task we experimented with using images to predict an author's gender. We used an image feature extraction tool to classify detected faces in images on gender and emotion. We also tried to construct a feature that could indicate selfies, as females tend to post more selfies than males. Our results showed that only using such image-based features are performing poorly with accuracy scores around 60% with a Logistic Regression classifier. Adding these features to a text-based n-gram model does not influence the score much. The images decreased the scores slightly on English and Spanish, but gave us a small improvement on Arabic on the PAN 2018 test dataset. Our submitted system only used image-based features extracted from detected faces, but data showed that not all images includes a face. Therefore, for future research we suggest a system that enlarges the set of image-based features.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>1 .</head><label>1</label><figDesc>Average number of faces 2. Number of images that include a face 3. Average area the faces take up 4. Average area the largest face take up 5. Average number of men 6. Average number of women 7. Percentage of faces being angry 8. Percentage of faces being disgusted 9. Percentage of faces being fearful 10. Percentage of faces being happy 11. Percentage of faces being sad 12. Percentage of faces being surprised 13. Percentage of faces being neutral</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>PAN 2018 dataset overview.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Amount of users for which a face is detected.</figDesc><table><row><cell>Male Female</cell></row><row><cell>English 1.401 1.396</cell></row><row><cell>Arabic 683 651</cell></row><row><cell>Spanish 1.430 1.394</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>The probability that a face in a image is a male or female per gender.</figDesc><table><row><cell></cell><cell>Male</cell><cell></cell><cell></cell><cell>Female</cell></row><row><cell></cell><cell cols="4">Male faces Female Faces Male Faces Female Faces</cell></row><row><cell>English</cell><cell>0.598</cell><cell>0.402</cell><cell>0.441</cell><cell>0.559</cell></row><row><cell>Arabic</cell><cell>0.740</cell><cell>0.260</cell><cell>0.564</cell><cell>0.436</cell></row><row><cell cols="2">Spanish 0.622</cell><cell>0.378</cell><cell>0.496</cell><cell>0.504</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Emotion probabilities of detected faces per language and gender.</figDesc><table><row><cell></cell><cell cols="2">Angry Disgust Fear Happy Sad Surprise Neutral</cell></row><row><cell>English</cell><cell>Male 0.061 0.001 0.030 0.280 0.098 0.000 Female 0.049 0.001 0.036 0.330 0.107 0.000</cell><cell>0.167 0.148</cell></row><row><cell>Arabic</cell><cell>Male 0.068 0.002 0.034 0.212 0.119 0.000 Female 0.075 0.001 0.028 0.255 0.152 0.000</cell><cell>0.155 0.206</cell></row><row><cell>Spanish</cell><cell>Male 0.058 0.001 0.026 0.217 0.110 0.000 Female 0.053 0.004 0.027 0.260 0.101 0.000</cell><cell>0.179 0.174</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Results on text of different models on 10-fold CV.</figDesc><table><row><cell>System</cell><cell>En Ar Es</cell></row><row><cell>SVM</cell><cell>0.826 0.772 0.773</cell></row><row><cell cols="2">Logistic regression 0.831 0.779 0.776</cell></row><row><cell cols="2">K-Nearest Neighbour 0.647 0.622 0.597</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 .</head><label>6</label><figDesc>Results of different n-gram combinations for the Logistic Regression model (10-fold CV).</figDesc><table><row><cell>N-grams</cell><cell>En Ar Es</cell></row><row><cell cols="2">word n-grams (n=1,2) + char n-grams (n=3,4,5) 0.831 0.779 0.776</cell></row><row><cell>bag of words</cell><cell>0.811 0.769 0.767</cell></row><row><cell>word n-grams (n=1,2)</cell><cell>0.804 0.757 0.756</cell></row><row><cell>char n-grams (n=3,4,5)</cell><cell>0.814 0.789 0.776</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 7 .</head><label>7</label><figDesc>Accuracies of adding the different preprocessing methods, using 10-fold CV.</figDesc><table><row><cell>Preprocessing</cell><cell>En Es</cell></row><row><cell>Baseline</cell><cell>0.816 0.754 0.759</cell></row><row><cell>+ Tokenization</cell><cell>0.818 0.764 0.760</cell></row><row><cell>+ Lowercasing tweets</cell><cell>0.818 0.764 0.760</cell></row><row><cell>+ URL to URL</cell><cell>0.827 0.774 0.767</cell></row><row><cell cols="2">+ Usernames to @username 0.831 0.779 0.776</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table 8 .</head><label>8</label><figDesc>Accuracy of features with logistic regression model average of 10-fold CV.</figDesc><table><row><cell>Features</cell><cell>En Ar Es</cell></row><row><cell>n-grams</cell><cell>0.831 0.779 0.776</cell></row><row><cell>embeddings</cell><cell>0.786 0.725 0.759</cell></row><row><cell>images</cell><cell>0.604 0.618 0.592</cell></row><row><cell>n-grams + embeddings</cell><cell>0.775 0.728 0.755</cell></row><row><cell>n-grams + images</cell><cell>0.823 0.792 0.778</cell></row><row><cell cols="2">n-grams + embeddings + images 0.815 0.715 0.741</cell></row><row><cell>Table</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 9 .</head><label>9</label><figDesc>Results on PAN 2018 test set.</figDesc><table><row><cell cols="2">Language Text Image Combination</cell></row><row><cell>English 0.812 0.610</cell><cell>0.810</cell></row><row><cell>Arabic 0.783 0.623</cell><cell>0.787</cell></row><row><cell>Spanish 0.803 0.587</cell><cell>0.792</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/oarriaga/face_classification</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://www.nltk.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">We used the script by Vincent Van Asch https://www.clips.uantwerpen.be/ scripts/art</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Arabic tweeps gender and dialect prediction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Alrifai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rebdawi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ghneim</surname></persName>
		</author>
		<imprint>
			<biblScope unit="volume">13</biblScope>
		</imprint>
	</monogr>
	<note>Cappellato et al</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Real-time convolutional neural networks for emotion and gender classification</title>
		<author>
			<persName><forename type="first">O</forename><surname>Arriaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Valdenegro-Toro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Plöger</surname></persName>
		</author>
		<idno>CoRR abs/1710.07557</idno>
		<ptr target="http://arxiv.org/abs/1710.07557" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">N-gram: New groningen author-profiling model</title>
		<author>
			<persName><forename type="first">A</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dwyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Medvedeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rawee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Haagsma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">conference and Labs of the Evaluation Forum (CLEF 2017) : Information Access Evaluation meets Multilinguality, Multimodality, and Visualization ; Conference date</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="14" to="19" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Including dialects and language varieties in author profiling</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Ciobanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zampieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Malmasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1707.00621</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Do age and gender differences exist in selfie-related behaviours?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dhir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pallesen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Torsheim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Andreassen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computers in Human Behavior</title>
		<imprint>
			<biblScope unit="volume">63</biblScope>
			<biblScope unit="page" from="549" to="555" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Insa lyon and uni pasau&apos;s participation at pan@ clef&apos;17: Author profiling task</title>
		<author>
			<persName><forename type="first">G</forename><surname>Kheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Laporte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Granitzer</surname></persName>
		</author>
		<imprint>
			<biblScope unit="volume">13</biblScope>
		</imprint>
	</monogr>
	<note>Cappellato et al</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Author profiling with bidirectional rnns using attention with grus: notebook for pan at clef 2017</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kodiyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hardegger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Neuhaus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cieliebak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Evaluation Labs and Workshop-Working Notes Papers</title>
				<meeting><address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-09-14">11-14 September 2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Ug18 at semeval-2018 task 1: Generating additional training data for predicting emotion intensity in spanish</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kuijper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lenthe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Noord</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th International Workshop on Semantic Evaluation</title>
				<meeting>The 12th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="279" to="285" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Language-and subtask-dependent feature selection and classifier parameter tuning for author profiling</title>
		<author>
			<persName><forename type="first">I</forename><surname>Markov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gómez-Adorno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sidorov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Author profiling -gender and language variety prediction</title>
		<author>
			<persName><forename type="first">M</forename><surname>Martinc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Skrjanec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zupan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pollak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF</title>
				<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
	<note>Pan</note>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Computer-intensive methods for testing hypotheses</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">W</forename><surname>Noreen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1989">1989</date>
			<publisher>Wiley</publisher>
			<pubPlace>New York</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Language variety and gender classification for author profiling in pan</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ogaltsov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Romanov</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">13</biblScope>
		</imprint>
	</monogr>
	<note>Cappellato et al</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Using character n-grams and style features for gender and language variety classification</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Oliveira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">F</forename><surname>De Oliveira Neto</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<ptr target="http://www.aclweb.org/anthology/D14-1162" />
	</analytic>
	<monogr>
		<title level="m">Empirical Methods in Natural Language Processing (EMNLP)</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Overview of pan&apos;17 -author identification, author profiling, and author obfuscation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M R</forename><surname>Pardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>CLEF</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF 2018 Evaluation Labs</title>
		<title level="s">CEUR Workshop Proceedings, CLEF and CEUR-WS</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">Y</forename><surname>Nie</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2018-09">Sep 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Overview of the 4th author profiling task at pan 2016: cross-genre evaluations</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Verhoeven</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Daelemans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings/Balog</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Unine at clef 2017: Tf-idf and deep-learning for author profiling</title>
		<author>
			<persName><forename type="first">N</forename><surname>Schaetti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cappellato et al</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Convolutional neural networks for author profiling</title>
		<author>
			<persName><forename type="first">S</forename><surname>Sierra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Solorio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>González</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Selfie posting behaviors are associated with narcissism among men</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sorokowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sorokowska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oleszkiewicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Frackowiak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Huk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Pisanski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Personality and Individual Differences</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="page" from="123" to="127" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Op Vollenbroek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Carlotto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kreutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Medvedeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pool</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bjerva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Haagsma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nissim</surname></persName>
		</author>
		<title level="m">Gronup: Groningen user profiling</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
