<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Authorship Profiling Without Using Topical Information Notebook for PAN at CLEF 2018</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jussi</forename><surname>Karlgren</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Theoretical Computer Science</orgName>
								<orgName type="institution">KTH</orgName>
								<address>
									<settlement>Stockholm</settlement>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<address>
									<settlement>Gavagai, Stockholm</settlement>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Lewis</forename><surname>Esposito</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Department of Linguistics</orgName>
								<orgName type="institution">Stanford</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chantal</forename><surname>Gratton</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Department of Linguistics</orgName>
								<orgName type="institution">Stanford</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pentti</forename><surname>Kanerva</surname></persName>
							<affiliation key="aff3">
								<orgName type="department">Redwood Center for Theoretical Neuroscience</orgName>
								<orgName type="institution">UC Berkeley</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Authorship Profiling Without Using Topical Information Notebook for PAN at CLEF 2018</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F1AE14A2F23C7623E8FC3359E050ED0E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes an experiment made for the PAN 2018 shared task on author profiling. The task is to distinguish female from male authors of microblog posts published on Twitter using no extraneous information except what is in the posts; this experiment focusses on using non-topical information from the posts, rather than gender differences in referential content.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>unsupported by quantitative data. Indeed, some work has found that there are no differences in the amount of speech men and women produce, such as Mehl's and colleagues' research that equipped male and female university students in the US and Mexico with microphones for several days, which randomly recorded them at various intervals <ref type="bibr" target="#b16">[17]</ref>. Other work has found that men actually speak more than women, particularly in formal and task-oriented activities <ref type="bibr" target="#b8">[9]</ref>, and even young boys outstrip their female classmates, speaking three times as much and calling out answers 8 times more <ref type="bibr" target="#b21">[22]</ref>.</p><p>Another common ideology about language differences among women and men is that women use more hedges than men. This idea generally arises from the folk ideology that women tend to be less sure of themselves. But just as the quantitative evidence described above doesn't support the "talkative women" ideology, work on hedges has similarly found that men and women use hedges at comparable rates <ref type="bibr" target="#b18">[19]</ref>. Similarly, the notion that women also use other linguistic features that signal low confidence, like creaky voice and innovative like, isn't supported by the data either. Men and women have been shown to use creaky voice at roughly equal rates <ref type="bibr" target="#b0">[1]</ref>, and the same holds for different discourse functions of like <ref type="bibr" target="#b4">[5]</ref>.</p><p>But none of this is to say that men and women don't participate in linguistic practices in unique ways. The ideologies described above are exactly that: ideologies, rooted in bias and lacking quantitative reality. Quantitative sociolinguists nonetheless consistently find broad gender patterns in the use of linguistic features. Women, more often than not, drive vocalic sound change <ref type="bibr" target="#b12">[13]</ref>, leading men in the use of incoming variants. Searching for biological or essentialist motivations is an untenable approach, as maleled sound changes have indeed been documented ([18, e.g.], ruling out the potential for sex-based effects on linguistic production. For this reason, Eckert urges us to consider the kinds of social milieu that men and women occupy in society <ref type="bibr" target="#b6">[7]</ref>. As men have historically enjoyed greater power than women in all domains of public and private life, and given that they have been deprived of social and political capital, women may have greater motivation to make use of various kinds of symbolic capital. It should thus not be surprising that women, in the aggregate, are more advanced than men in innovative phonological changes that, at their inception, are believed by many sociolinguists to be imbued with socio-symbolic meanings <ref type="bibr" target="#b7">[8]</ref>. Beyond components of sound change, there are no doubt other linguistic features that men and women employ variable, but the perhaps more interesting question for scholars is why these differences exist, and for whom do they not exist.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Features and Variables of Interest</head><p>In the present data set, where the gender of authors can be expected to be distinguishable with a precision of around 80% using largely lexical cues <ref type="bibr" target="#b20">[21]</ref>. Lexical variation is highly determined by topic, and essentially much of the results can be reduced to the observation that female and male authors write about different things: many discourse topics are strongly gendered.</p><p>If the task is to distinguish female and male authors in this specific data set or very similar ones from more or less the same time period, a well trained topical detector will be useful. If the task is to detect what differences may be systematic between genders across topics and over time, topic will be less reliable as a gender maker. Our experiments start from the assumption that topic is a confounding and non-sustainable variable for the general case. We also wish to point out that for many downstream tasks, the distinction between male and female author may be less useful than other stable characteristics, and that as in many classification tasks, assuming that the number of classes is fixed a priori may lower both the reliability and the usefulness of the classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Linguistic Processing</head><p>We process the linguistic data in a vector space model which incorporates lexical linguistic items together with constructional linguistic items in a unified computational framework.</p><p>Vector Space Models for Meaning Vector space models are frequently used in information access, both for research experiments and as a building block for systems in practical use at least since the early 1970's <ref type="bibr" target="#b22">[23,</ref><ref type="bibr" target="#b5">6]</ref>. Vector space models have attractive qualities: processing vector spaces can be done in a manageable implementational framework, they are mathematically well-defined and understood, and they are intuitively appealing, conforming to everyday metaphors such as "near in meaning" <ref type="bibr" target="#b23">[24]</ref>. The vector space model for meaning is the basis for most all information retrieval experimentation and implementation, most machine learning experiments, and is now the standard approach in most categorisation schemes, topic models, deep learning models, and other similar approaches. In this experiment we encode each post of each author into a vector, and use those vectors to represent the authors profile. Construction Grammar The Construction grammar framework is characterised by the central claims that linguistic information is encoded similarly or even identically with lexical items-the words-and their configurations-the syntax, both being linguistic items with equal salience and presence in the linguistic signal. The parsimonious character of construction grammar in its most radical formulations [4, e.g] is attractive as a framework for integrating a dynamic and learning view of language use with formal expression of language structure: it allows the representation of words together with constructions in a common framework. For our purposes construction grammar gives a theoretical foundation to a consolidated representation of both individual items in utterances and their configuration. In this experiment, after dependency analysis of each sentence of each post, features of potential interest in each sentence are extracted to represent the sentence together with some of its lexical items.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Technical Description</head><p>To represent authors by features of their posts as vectors, we use a high-dimensional model based on random indexing <ref type="bibr" target="#b9">[10]</ref>. The idea is to compute with high-dimensional vectors <ref type="bibr" target="#b10">[11]</ref> using operations that do not modify vector dimensionality during the course of operation and use. We use 2,000-dimensional vectors in these demonstrations and experiments. Information encoded into a vector is distributed over all vector elements. Computing begins by assigning random seed vectors or index vectors for basic objects.</p><p>In working with text each observed word and each observed construction of interest in the collection can be represented by an index vector consisting of 0s, 1s and −1s. These can easily be generated on the fly if new lexical or constructional items appear during processing. Index vectors remain unchanged throughout computations. Typically, index vectors are sparse, and in our model have 10 non-zero elements with an equal number of 1s and −1s. Each item also is given a context vectors, where observations of cooccurring items are recorded through vector addition, and if necessary, vector permutation , which reorders (scrambles) vector coordinates. These operations are inexpensive computationally and allow for a very large feature space within a bounded memory footprint. As in most similar models, vector similarity is measured by cosine between the vectors, with values between −1 and and 1 <ref type="bibr" target="#b11">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Representation of Posts</head><p>The posts were segmented into sentences and word tokens using NLTK <ref type="bibr" target="#b1">[2]</ref>, and each token tagged by Penn Treebank lexical category using CoreNLP <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b14">15]</ref>. The sentences were further analysed for syntactic dependencies, again using CoreNLP.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Full Text Baseline</head><p>As a baseline, all words of each post is included in the representation. Each word was assigned a random index vector and added into the representation weighted by logarithmic frequency weighting to damp the relative effect of highly frequent words and increase the weight of infrequent ones. This weighting scheme was not optimised especially for this material.</p><p>A quick glance through the lexical date will show that some words are more often typically used by female than male authors. The numbers in Table <ref type="table" target="#tab_0">1</ref> are taken directly from the vector space model. The proportion of female and male authors in the 100 authors closest to each word in the vector space is given, along with their frequency in the entire training collection. Some terms (game, win, birthday) can fairly be called topical. Others reflect more stylistic or attitudinal usage (happy, love, wrong, sure). Terms such as stuff, while referential, simultaneously reveal volumes about the authors attitude to the topic under treatment. How to establish that cline of referentiality or topicality vs attitude is a research challenge which partially could be addressed using measures from search technology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">POS sequences</head><p>Each sentence was represented as a sequence of Penn Treebank POS labels. These labels are not always well chosen, but no correction of the output of the NLTK tagger was done. Subsequences of length three were extracted for each sentence.</p><p>( One random permutation Π was generated for each POS label. One random vector pos was generated for encoding all POS labels. Each triple was represented by taking the POS vector and passing it through the POS permutations for the POS labels of the triple. All resulting triple vectors were then added into the post representation. This representation preserves the sequence of POS labels without conflating them for each position in a triple. For example, the sequence DT, JJ, NN will be encoded as</p><formula xml:id="formula_0">S(DT, JJ, N N ) = Π N N (Π JJ (Π DT (pos)))<label>(1)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Constructional Elements</head><p>Some interesting observations can be made from a more general view of the terminological variation and some hypotheses about both syntactic and stylistic and attitudinal variation. Table <ref type="table" target="#tab_1">2</ref> gives some statistics for some observable aggregate features of interest. Some of these are based on lists of lexical items of similar distributional and attitudinal qualities used in various sentiment analysis tasks; others are based on features extracted from dependency analyses from the Stanford CoreNLP package <ref type="bibr" target="#b14">[15]</ref>. Amplifiers in general are slightly more prevalent in posts by female authors, but this separates interestingly with type of amplifier. Amplifiers can be separated into grade amplifiers (very, extremely, ...), veracity amplifiers (truly, really, ...), and anomaly am-plifiers (surprisingly, amazingly, ...). The surprise amplifiers are what carry most of the difference between female and male authors.</p><p>First person singular personal pronouns (I, me, myself, my, mine) are used more by female authors than male authors. We and its inflected forms, by contrast, are evenly distributed.</p><p>Profanity is used more by male authors; interjections (lol, omg, hey, oh, wtf, ...) more by female authors. Some verbal constructions are skewed: male authors use more passives; female authors more progressive tense. Modal auxiliaries are used more by male authors to a certain extent, and this coupled with the observation that male authors also use more hedges and downtoners can most likely be traced to differences in which discourses male and female authors engage in: male authors appear to more often be participate in political debates and argumentation compared to female authors. These and other similar features (tense of main verb, definiteness of subject and object, various categories of adverbials of place, time, and manner) are each encoded with a random index vector and, in keeping with the constructional grammar principles mentioned above, included in the representation as if it were a lexical item.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">Generalised Lexical Elements</head><p>To reduce the topical content nouns, verbs, and adjectives are replaced with their corresponding POS tag, using the Penn tagset. This means adjective comparation, verb tense, and noun number is preserved, but the actual referential meaning of the word will have been taken out.</p><p>(2) a. Anyone have a travel rest pillow I could borrow for a long trip? b. NN, VBP, a , NN, NN, NN, I, could, VB, for, a" JJ, NN, "."</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.5">Centroids and Pool Depth</head><p>As a final series of representational parameter choices, given a vector space of sentences along the lines above, we must first determine if a (1) post is best represented as an average, or a vector centroid, of its constituent sentence vectors or as a bag of separate vectors; if (2) an author is best represented as an average, or a vector centroid, of its constituent post or sentence vectors or as a bag of separate vectors; and (3) if a gender is best represented as an average, or a vector centroid, of its constituent sentence, post or author vectors or as a bag of separate vectors. We have here elected to use an author centroid for each author comprised of a sum of post vectors, in turn comprised of a sum of sentence vectors, but not to average the authors into a single gender vector.</p><p>Given such an author space and a new author of unknown gender with a vector in the space, the next question is to decide how to assess its position in author space. We can assign the author the same gender as its nearest neighbour in space or use a broader range to pool a number of neighbours. In the following tables, we show results from using only the very nearest neighbour and from the 11 closest neighbours.</p><p>Both these questions -centroids or bags of vectors, and how to assess position in author space, are amenable to further experimentation and attendant improvement using classification algorithms of various levels of sophistication.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Cross Validation Results on the Training Data</head><p>All training sentences, posts, and authors are encoded as vectors using all the above features. The nature of the representation is such that these overlayed encodings of multiple features can be used fully or with only some of the features in play. Test sentences, posts, and authors are encoded with all or some subset of the features, and the classification is done using simple cosine calculation to find the closest neighbour to the test author in question.</p><p>Tables <ref type="table">3 and 4</ref> give a combined picture of the quality of the various features setsall words (WDS), generalised content words (NON-TOPIC), part of speech triples (POS), constructional features (CXG), together and separately. The results given are based on 3-fold cross-validation over the training data. The submitted run is based on the -WDS condition, using all feature types except content words, and at a pool depth of 1. Notable from the results is that precision for the female authors is greater (at an attendant cost to recall). This gives us reason to believe that the representation of female authorship in this space is different than that of male authorship. One tentative but likely explanation</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Examples of lexical skewness in the data</figDesc><table><row><cell>1)</cell><cell>a. Anyone have a travel rest pillow I could borrow for a long trip?</cell></row></table><note>b. NN, VBP, DT , NN, NN, NN, PRP, MD, VB, IN, DT, JJ, NN, "." c. [[NN, VBP, DT] , [VBP, DT, NN], ... ]</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Examples of complex feature skewness in the data</figDesc><table><row><cell>all amplifiers</cell><cell>43 57</cell></row><row><cell>grade amplifiers</cell><cell>47 53</cell></row><row><cell>anomaly amplifiers</cell><cell>36 64</cell></row><row><cell>veracity amplifiers</cell><cell>42 58</cell></row><row><cell>hedges and downtoners</cell><cell>74 26</cell></row><row><cell>uncertainty</cell><cell>64 36</cell></row><row><cell>p1 singular</cell><cell>17 83</cell></row><row><cell>p1 plural</cell><cell>53 47</cell></row><row><cell>p2</cell><cell>37 63</cell></row><row><cell>p3</cell><cell>59 41</cell></row><row><cell>profanity</cell><cell>69 31</cell></row><row><cell>interjection</cell><cell>37 63</cell></row><row><cell>passive constructions</cell><cell>67 33</cell></row><row><cell>progressive tense</cell><cell>40 60</cell></row><row><cell>should</cell><cell>61 39</cell></row><row><cell>would</cell><cell>72 28</cell></row><row><cell>could</cell><cell>56 44</cell></row><row><cell>think and cogitation verbs</cell><cell>66 34</cell></row><row><cell>utterance verbs</cell><cell>67 33</cell></row><row><cell>love terms</cell><cell>15 85</cell></row><row><cell>hate terms</cell><cell>43 57</cell></row><row><cell>boredom terms 59</cell><cell>41</cell></row><row><cell>dislike terms</cell><cell>56 44</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements Jussi Karlgren's work was done as a visiting scholar at the Department of Linguistics at Stanford University, supported by a generous VINNMER Marie Curie grant from VINNOVA, the Swedish Governmental Agency for Innovation Systems.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>is that there are more than two styles, and that there are more female styles than male styles among them in this material. These are initial explorations to establish stylistic and attitudinal differences between categories of author. We believe that it would be more functionally appropriate to work with a broader palette of categories than two sexually determined categories; that topical variation majorises gender variation; that gender variation largely is socially determined in ways that has been studied extensively in sociolinguistics; that the intrinsic differences between categories invites further study of the variational space; that the signal found in these data could be better accommodated as an encoding to a more competent classifier; and that constructional analysis can be a key to a computationally habitable combination of lexical and syntactic analysis pipeline. We also acknowledge that none of these issues have fully been explored in this present experiment.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Creaky Voice in a diverse gender sample: Challenging ideologies about sex, gender and creak in American English</title>
		<author>
			<persName><forename type="first">K</forename><surname>Becker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ud Dowla Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zimman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">New Ways of Analyzing Variation</title>
		<imprint>
			<biblScope unit="volume">44</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Natural language processing with Python: analyzing text with the natural language toolkit</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Loper</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2009">2009</date>
			<publisher>O&apos;Reilly Media</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">The Female Brain</title>
		<author>
			<persName><forename type="first">L</forename><surname>Brizendine</surname></persName>
		</author>
		<ptr target="https://books.google.com/books?id=-tpoFcql0kgC" />
		<imprint>
			<date type="published" when="2006">2006</date>
			<publisher>Morgan Road Books</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Radical and typological arguments for radical construction grammar</title>
		<author>
			<persName><forename type="first">W</forename><surname>Croft</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Construction Grammars: Cognitive grounding and theoretical extensions</title>
				<editor>
			<persName><forename type="first">J</forename><forename type="middle">O</forename><surname>Östman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Fried</surname></persName>
		</editor>
		<meeting><address><addrLine>Amsterdam</addrLine></address></meeting>
		<imprint>
			<publisher>John Benjamins</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Like and language ideology: Disentangling fact from fiction</title>
		<author>
			<persName><forename type="first">A</forename><surname>D'arcy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">American speech</title>
		<imprint>
			<biblScope unit="volume">82</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="386" to="419" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">The most influential paper Gerard Salton never wrote</title>
		<author>
			<persName><forename type="first">D</forename><surname>Dubin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Library Trends</title>
		<imprint>
			<biblScope unit="volume">52</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="748" to="764" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The whole woman: sex and gender differences in variation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Eckert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Variation and Change</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="245" to="267" />
			<date type="published" when="1989">1989</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Eckert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annual review of Anthropology</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="page" from="87" to="100" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>James</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Drakich</surname></persName>
		</author>
		<title level="m">Understanding gender differences in amount of talk: A critical review of research</title>
				<imprint>
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Random indexing of text samples for latent semantic analysis</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kanerva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kristoferson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Holst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Cognitive Science Society</title>
				<meeting>the Cognitive Science Society</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors</title>
		<author>
			<persName><forename type="first">P</forename><surname>Kanerva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognitive Computation</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="139" to="159" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Hyperdimensional utterance spaces-a more transparent language representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Karlgren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kanerva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Design of Experimental Search &amp; Information Retrieval Systems</title>
				<meeting>Design of Experimental Search &amp; Information Retrieval Systems<address><addrLine>Bertinoro, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>DESIRES</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Labov</surname></persName>
		</author>
		<ptr target="https://books.google.com/books?id=LS_Ux3CEI5QC" />
		<title level="m">Principles of Linguistic Change, Social Factors. Principles of Linguistic Change</title>
				<imprint>
			<publisher>Wiley</publisher>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Language and woman&apos;s place</title>
		<author>
			<persName><forename type="first">R</forename><surname>Lakoff</surname></persName>
		</author>
		<ptr target="http://www.jstor.org/stable/4166707" />
	</analytic>
	<monogr>
		<title level="j">Language in Society</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="45" to="80" />
			<date type="published" when="1973">1973</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">The Stanford CoreNLP Natural Language Processing Toolkit</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Surdeanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Finkel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Bethard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mcclosky</surname></persName>
		</author>
		<ptr target="http://www.aclweb.org/anthology/P/P14/P14-5010" />
	</analytic>
	<monogr>
		<title level="m">Association for Computational Linguistics (ACL) System Demonstrations</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="55" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Building a large annotated corpus of English: The Penn Treebank</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Marcus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Marcinkiewicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Santorini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="313" to="330" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Are women really more talkative than men?</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">R</forename><surname>Mehl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vazire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ramírez-Esparza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Slatcher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Pennebaker</surname></persName>
		</author>
		<ptr target="http://science.sciencemag.org/content/317/5834/82" />
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">317</biblScope>
			<biblScope unit="page" from="82" to="82" />
			<date type="published" when="2007">5834. 2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Country ideology and the California Vowel Shift</title>
		<author>
			<persName><forename type="first">R</forename><surname>Podesva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>D'onofrio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Van Hofwegen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Language Variation and Change</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="157" to="186" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Sex similarities and differences in stance in informal american conversation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Precht</surname></persName>
		</author>
		<idno type="DOI">10.1111/j.1467-9841.2008.00354.x</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9841.2008.00354.x" />
	</analytic>
	<monogr>
		<title level="j">Journal of Sociolinguistics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="89" to="111" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</title>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y-Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<ptr target=".org" />
	</analytic>
	<monogr>
		<title level="m">Working Notes Papers of the CLEF 2018 Evaluation Labs</title>
		<title level="s">CEUR Workshop Proceedings, CLEF and CEUR-WS</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><forename type="middle">Y</forename><surname>Nie</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2018-09">Sep 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<title level="m">Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter</title>
				<imprint>
			<date type="published" when="2017-09">Sep 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Is the O.K. classroom O</title>
		<author>
			<persName><forename type="first">D</forename><surname>Sadker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">; K</forename><surname>Sadker</surname></persName>
		</author>
		<ptr target="http://www.jstor.org/stable/20387346" />
	</analytic>
	<monogr>
		<title level="j">The Phi Delta Kappan</title>
		<imprint>
			<biblScope unit="volume">66</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="358" to="361" />
			<date type="published" when="1985">1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">A vector space model for automatic indexing</title>
		<author>
			<persName><forename type="first">G</forename><surname>Salton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="613" to="620" />
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Word space</title>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1993 Conference on Advances in Neural Information Processing Systems, NIPS&apos;93</title>
				<meeting>the 1993 Conference on Advances in Neural Information Processing Systems, NIPS&apos;93<address><addrLine>San Francisco, CA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Morgan Kaufmann Publishers Inc</publisher>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="895" to="902" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation</title>
		<author>
			<persName><forename type="first">E</forename><surname>Stamatatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rangel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tschuggnall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kestemont</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Stein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Potthast</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. 9th International Conference of the CLEF Initiative (CLEF 18)</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Bellot</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Trabelsi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mothe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Murtagh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Nie</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Soulier</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Sanjuan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Cappellato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Ferro</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg New York</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018-09">Sep 2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
