<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Authorship Profiling Without Using Topical Information Notebook for PAN at CLEF 2018</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jussi Karlgren</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lewis Esposito</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chantal Gratton</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pentti Kanerva</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Linguistics</institution>
          ,
          <addr-line>Stanford</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Theoretical Computer Science</institution>
          ,
          <addr-line>KTH, Stockholm</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Gavagai</institution>
          ,
          <addr-line>Stockholm</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Redwood Center for Theoretical Neuroscience</institution>
          ,
          <addr-line>UC Berkeley</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>This paper describes an experiment made for the PAN 2018 shared task on author profiling. The task is to distinguish female from male authors of microblog posts published on Twitter using no extraneous information except what is in the posts; this experiment focusses on using non-topical information from the posts, rather than gender differences in referential content. This paper describes an experiment made for the PAN 2018 shared task on author profiling. The task is to distinguish female from male authors of microblog posts published on Twitter using no extraneous information except what is in the posts. The full task allows for using both images and text of the posts which are given in three languages: in this experiment we have only made use of the English-language material, and only the text. The training material consists of 1500 female and 1500 male authors, with 100 posts each. Microblog posts are short and these consist on average of X words and Y sentences [25,20].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>What People Have Thought About Male And Female Language And Why</title>
      <p>
        Robin Lakoff’s 1973 book Language in Woman’s Place [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] initiated conversations
surrounding the role of gender in linguistic practice. While her work might better be
described as a collection of ideologies of gendered language rather than an accurate
depiction of men’s and women’s linguistic styles, it nonetheless cemented the legitimacy
and significance of language and gender studies in its own right. And indeed, ideologies
of how men and women differ in their use of language still pervade public discourse;
and these discourses taint research from various disciplines that make bold claims about
the gendered use of language without taking gender as a serious social construct worth
investigating.
      </p>
      <p>
        Perhaps one of the biggest myths about men’s and women’s is that women talk
more than men, and the ubiquity of this belief has led researchers in fields tangential to
linguistics to look for biological causes ([3, e.g]) despite the fact that such research is
unsupported by quantitative data. Indeed, some work has found that there are no
differences in the amount of speech men and women produce, such as Mehl’s and colleagues’
research that equipped male and female university students in the US and Mexico with
microphones for several days, which randomly recorded them at various intervals [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
Other work has found that men actually speak more than women, particularly in formal
and task-oriented activities [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and even young boys outstrip their female classmates,
speaking three times as much and calling out answers 8 times more [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>
        Another common ideology about language differences among women and men is
that women use more hedges than men. This idea generally arises from the folk ideology
that women tend to be less sure of themselves. But just as the quantitative evidence
described above doesn’t support the “talkative women” ideology, work on hedges has
similarly found that men and women use hedges at comparable rates [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Similarly,
the notion that women also use other linguistic features that signal low confidence, like
creaky voice and innovative like, isn’t supported by the data either. Men and women
have been shown to use creaky voice at roughly equal rates [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and the same holds for
different discourse functions of like [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        But none of this is to say that men and women don’t participate in linguistic
practices in unique ways. The ideologies described above are exactly that: ideologies, rooted
in bias and lacking quantitative reality. Quantitative sociolinguists nonetheless
consistently find broad gender patterns in the use of linguistic features. Women, more often
than not, drive vocalic sound change [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], leading men in the use of incoming variants.
Searching for biological or essentialist motivations is an untenable approach, as
maleled sound changes have indeed been documented ([18, e.g.], ruling out the potential for
sex-based effects on linguistic production. For this reason, Eckert urges us to consider
the kinds of social milieu that men and women occupy in society [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As men have
historically enjoyed greater power than women in all domains of public and private life,
and given that they have been deprived of social and political capital, women may have
greater motivation to make use of various kinds of symbolic capital. It should thus not
be surprising that women, in the aggregate, are more advanced than men in innovative
phonological changes that, at their inception, are believed by many sociolinguists to be
imbued with socio-symbolic meanings [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Beyond components of sound change, there
are no doubt other linguistic features that men and women employ variable, but the
perhaps more interesting question for scholars is why these differences exist, and for
whom do they not exist.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Features and Variables of Interest</title>
      <p>
        In the present data set, where the gender of authors can be expected to be distinguishable
with a precision of around 80% using largely lexical cues [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Lexical variation is
highly determined by topic, and essentially much of the results can be reduced to the
observation that female and male authors write about different things: many discourse
topics are strongly gendered.
      </p>
      <p>If the task is to distinguish female and male authors in this specific data set or very
similar ones from more or less the same time period, a well trained topical detector will
be useful. If the task is to detect what differences may be systematic between genders
across topics and over time, topic will be less reliable as a gender maker. Our
experiments start from the assumption that topic is a confounding and non-sustainable
variable for the general case. We also wish to point out that for many downstream tasks, the
distinction between male and female author may be less useful than other stable
characteristics, and that as in many classification tasks, assuming that the number of classes
is fixed a priori may lower both the reliability and the usefulness of the classification.
3.1</p>
      <sec id="sec-2-1">
        <title>Linguistic Processing</title>
        <p>We process the linguistic data in a vector space model which incorporates lexical
linguistic items together with constructional linguistic items in a unified computational
framework.</p>
        <p>
          Vector Space Models for Meaning Vector space models are frequently used in
information access, both for research experiments and as a building block for systems
in practical use at least since the early 1970’s [
          <xref ref-type="bibr" rid="ref23 ref6">23,6</xref>
          ]. Vector space models have
attractive qualities: processing vector spaces can be done in a manageable
implementational framework, they are mathematically well-defined and understood, and
they are intuitively appealing, conforming to everyday metaphors such as “near in
meaning” [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The vector space model for meaning is the basis for most all
information retrieval experimentation and implementation, most machine learning
experiments, and is now the standard approach in most categorisation schemes, topic
models, deep learning models, and other similar approaches. In this experiment we
encode each post of each author into a vector, and use those vectors to represent the
authors profile.
        </p>
        <p>Construction Grammar The Construction grammar framework is characterised by
the central claims that linguistic information is encoded similarly or even
identically with lexical items—the words—and their configurations—the syntax, both
being linguistic items with equal salience and presence in the linguistic signal. The
parsimonious character of construction grammar in its most radical formulations
[4, e.g] is attractive as a framework for integrating a dynamic and learning view
of language use with formal expression of language structure: it allows the
representation of words together with constructions in a common framework. For our
purposes construction grammar gives a theoretical foundation to a consolidated
representation of both individual items in utterances and their configuration. In this
experiment, after dependency analysis of each sentence of each post, features of
potential interest in each sentence are extracted to represent the sentence together
with some of its lexical items.
4</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Technical Description</title>
      <p>
        To represent authors by features of their posts as vectors, we use a high-dimensional
model based on random indexing [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The idea is to compute with high-dimensional
vectors [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] using operations that do not modify vector dimensionality during the course
of operation and use. We use 2,000-dimensional vectors in these demonstrations and
experiments. Information encoded into a vector is distributed over all vector elements.
Computing begins by assigning random seed vectors or index vectors for basic objects.
In working with text each observed word and each observed construction of interest in
the collection can be represented by an index vector consisting of 0s, 1s and 1s. These
can easily be generated on the fly if new lexical or constructional items appear during
processing. Index vectors remain unchanged throughout computations. Typically, index
vectors are sparse, and in our model have 10 non-zero elements with an equal number
of 1s and 1s. Each item also is given a context vectors, where observations of
cooccurring items are recorded through vector addition, and if necessary, vector
permutation , which reorders (scrambles) vector coordinates. These operations are inexpensive
computationally and allow for a very large feature space within a bounded memory
footprint. As in most similar models, vector similarity is measured by cosine between
the vectors, with values between 1 and and 1 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Representation of Posts</title>
      <p>
        The posts were segmented into sentences and word tokens using NLTK [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and each
token tagged by Penn Treebank lexical category using CoreNLP [
        <xref ref-type="bibr" rid="ref15 ref16">16,15</xref>
        ]. The sentences
were further analysed for syntactic dependencies, again using CoreNLP.
5.1
      </p>
      <sec id="sec-4-1">
        <title>Full Text Baseline</title>
        <p>As a baseline, all words of each post is included in the representation. Each word was
assigned a random index vector and added into the representation weighted by
logarithmic frequency weighting to damp the relative effect of highly frequent words and
increase the weight of infrequent ones. This weighting scheme was not optimised
especially for this material.</p>
        <p>A quick glance through the lexical date will show that some words are more often
typically used by female than male authors. The numbers in Table 1 are taken directly
from the vector space model. The proportion of female and male authors in the 100
authors closest to each word in the vector space is given, along with their frequency in
the entire training collection.</p>
        <p>Some terms (game, win, birthday) can fairly be called topical. Others reflect more
stylistic or attitudinal usage (happy, love, wrong, sure). Terms such as stuff, while
referential, simultaneously reveal volumes about the authors attitude to the topic under
treatment. How to establish that cline of referentiality or topicality vs attitude is a research
challenge which partially could be addressed using measures from search technology.
5.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>POS sequences</title>
        <p>Each sentence was represented as a sequence of Penn Treebank POS labels. These
labels are not always well chosen, but no correction of the output of the NLTK tagger
was done. Subsequences of length three were extracted for each sentence.
(1)</p>
        <p>Anyone have a travel rest pillow I could borrow for a long trip?
sure
wrong
hope
life
game
team
win
America
birthday
happy
love
stuff
fun
thank
thanks
women
Yes
amazing</p>
        <p>NN, VBP, DT , NN, NN, NN, PRP, MD, VB, IN, DT, JJ, NN, "."
[[NN, VBP, DT] , [VBP, DT, NN], ... ]
One random permutation was generated for each POS label. One random vector
pos was generated for encoding all POS labels. Each triple was represented by taking
the POS vector and passing it through the POS permutations for the POS labels of the
triple. All resulting triple vectors were then added into the post representation. This
representation preserves the sequence of POS labels without conflating them for each
position in a triple.</p>
        <p>For example, the sequence DT, JJ, NN will be encoded as</p>
        <p>S(DT; J J; N N ) =</p>
        <p>NN ( JJ ( DT (pos)))
(1)
5.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Constructional Elements</title>
        <p>
          Some interesting observations can be made from a more general view of the
terminological variation and some hypotheses about both syntactic and stylistic and attitudinal
variation. Table 2 gives some statistics for some observable aggregate features of
interest. Some of these are based on lists of lexical items of similar distributional and
attitudinal qualities used in various sentiment analysis tasks; others are based on
features extracted from dependency analyses from the Stanford CoreNLP package [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Amplifiers in general are slightly more prevalent in posts by female authors, but this
separates interestingly with type of amplifier. Amplifiers can be separated into grade
amplifiers (very, extremely, ...), veracity amplifiers (truly, really, ...), and anomaly
amplifiers (surprisingly, amazingly, ...). The surprise amplifiers are what carry most of the
difference between female and male authors.</p>
        <p>First person singular personal pronouns (I, me, myself, my, mine) are used more by
female authors than male authors. We and its inflected forms, by contrast, are evenly
distributed.</p>
        <p>Profanity is used more by male authors; interjections (lol, omg, hey, oh, wtf, ...) more
by female authors.</p>
        <p>Some verbal constructions are skewed: male authors use more passives; female
authors more progressive tense. Modal auxiliaries are used more by male authors to a
certain extent, and this coupled with the observation that male authors also use more
hedges and downtoners can most likely be traced to differences in which discourses
male and female authors engage in: male authors appear to more often be participate in
political debates and argumentation compared to female authors.</p>
        <p>all amplifiers
grade amplifiers
anomaly amplifiers
veracity amplifiers
hedges and downtoners
uncertainty
p1 singular
p1 plural
p2
p3
profanity
interjection
passive constructions
progressive tense
should
would
could
think and cogitation verbs
utterance verbs
love terms
hate terms
boredom terms 59
dislike terms</p>
        <p>These and other similar features (tense of main verb, definiteness of subject and
object, various categories of adverbials of place, time, and manner) are each encoded
with a random index vector and, in keeping with the constructional grammar principles
mentioned above, included in the representation as if it were a lexical item.
To reduce the topical content nouns, verbs, and adjectives are replaced with their
corresponding POS tag, using the Penn tagset. This means adjective comparation, verb tense,
and noun number is preserved, but the actual referential meaning of the word will have
been taken out.
(2)</p>
        <p>Anyone have a travel rest pillow I could borrow for a long trip?</p>
        <p>NN, VBP, a , NN, NN, NN, I, could, VB, for, a„ JJ, NN, "."
5.5</p>
      </sec>
      <sec id="sec-4-4">
        <title>Centroids and Pool Depth</title>
        <p>As a final series of representational parameter choices, given a vector space of sentences
along the lines above, we must first determine if a (1) post is best represented as an
average, or a vector centroid, of its constituent sentence vectors or as a bag of separate
vectors; if (2) an author is best represented as an average, or a vector centroid, of its
constituent post or sentence vectors or as a bag of separate vectors; and (3) if a gender
is best represented as an average, or a vector centroid, of its constituent sentence, post
or author vectors or as a bag of separate vectors. We have here elected to use an author
centroid for each author comprised of a sum of post vectors, in turn comprised of a sum
of sentence vectors, but not to average the authors into a single gender vector.</p>
        <p>Given such an author space and a new author of unknown gender with a vector in
the space, the next question is to decide how to assess its position in author space. We
can assign the author the same gender as its nearest neighbour in space or use a broader
range to pool a number of neighbours. In the following tables, we show results from
using only the very nearest neighbour and from the 11 closest neighbours.</p>
        <p>Both these questions — centroids or bags of vectors, and how to assess position in
author space, are amenable to further experimentation and attendant improvement using
classification algorithms of various levels of sophistication.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Cross Validation Results on the Training Data</title>
      <p>All training sentences, posts, and authors are encoded as vectors using all the above
features. The nature of the representation is such that these overlayed encodings of
multiple features can be used fully or with only some of the features in play. Test
sentences, posts, and authors are encoded with all or some subset of the features, and the
classification is done using simple cosine calculation to find the closest neighbour to
the test author in question.</p>
      <p>Tables 3 and 4 give a combined picture of the quality of the various features sets —
all words (WDS), generalised content words (NON-TOPIC), part of speech triples (POS),
constructional features (CXG), together and separately. The results given are based on
3-fold cross-validation over the training data. The submitted run is based on the -WDS
condition, using all feature types except content words, and at a pool depth of 1. Notable
from the results is that precision for the female authors is greater (at an attendant cost
to recall). This gives us reason to believe that the representation of female authorship in
this space is different than that of male authorship. One tentative but likely explanation
is that there are more than two styles, and that there are more female styles than male
styles among them in this material.
These are initial explorations to establish stylistic and attitudinal differences between
categories of author. We believe that it would be more functionally appropriate to work
with a broader palette of categories than two sexually determined categories; that
topical variation majorises gender variation; that gender variation largely is socially
determined in ways that has been studied extensively in sociolinguistics; that the intrinsic
differences between categories invites further study of the variational space; that the
signal found in these data could be better accommodated as an encoding to a more
competent classifier; and that constructional analysis can be a key to a computationally
habitable combination of lexical and syntactic analysis pipeline. We also acknowledge
that none of these issues have fully been explored in this present experiment.
Acknowledgements Jussi Karlgren’s work was done as a visiting scholar at the
Department of Linguistics at Stanford University, supported by a generous VINNMER Marie
Curie grant from VINNOVA, the Swedish Governmental Agency for Innovation
Systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Becker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , ud Dowla Khan,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Zimman</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Creaky Voice in a diverse gender sample: Challenging ideologies about sex, gender and creak in American English</article-title>
          .
          <source>New Ways of Analyzing Variation</source>
          <volume>44</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
          </string-name>
          , E.:
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit.</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brizendine</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>The Female Brain</article-title>
          . Morgan Road Books (
          <year>2006</year>
          ), https://books.google.com/books?id=-tpoFcql0kgC
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Croft</surname>
          </string-name>
          , W.:
          <article-title>Radical and typological arguments for radical construction grammar</article-title>
          . In: Östman,
          <string-name>
            <given-names>J.O.</given-names>
            ,
            <surname>Fried</surname>
          </string-name>
          , M. (eds.)
          <article-title>Construction Grammars: Cognitive grounding and theoretical extensions</article-title>
          .
          <source>John Benjamins</source>
          , Amsterdam (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D</given-names>
            <surname>'Arcy</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Like and language ideology: Disentangling fact from fiction</article-title>
          .
          <source>American speech 82(4)</source>
          ,
          <fpage>386</fpage>
          -
          <lpage>419</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dubin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The most influential paper Gerard Salton never wrote</article-title>
          .
          <source>Library Trends</source>
          <volume>52</volume>
          (
          <issue>4</issue>
          ),
          <fpage>748</fpage>
          -
          <lpage>764</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Eckert</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>The whole woman: sex and gender differences in variation</article-title>
          .
          <source>Language Variation and Change</source>
          <volume>1</volume>
          ,
          <fpage>245</fpage>
          -
          <lpage>267</lpage>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Eckert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Three waves of variation study: The emergence of meaning in the study of sociolinguistic variation</article-title>
          .
          <source>Annual review of Anthropology</source>
          <volume>41</volume>
          ,
          <fpage>87</fpage>
          -
          <lpage>100</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>James</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drakich</surname>
          </string-name>
          , J.:
          <article-title>Understanding gender differences in amount of talk: A critical review of research</article-title>
          . (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kristoferson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Holst</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Random indexing of text samples for latent semantic analysis</article-title>
          .
          <source>In: Proceedings of the Cognitive Science Society</source>
          . vol.
          <volume>1</volume>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors</article-title>
          .
          <source>Cognitive Computation</source>
          <volume>1</volume>
          (
          <issue>2</issue>
          ),
          <fpage>139</fpage>
          -
          <lpage>159</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Karlgren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanerva</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Hyperdimensional utterance spaces-a more transparent language representation</article-title>
          .
          <source>In: Proceedings of Design of Experimental Search &amp; Information Retrieval Systems</source>
          , Bertinoro, Italy, (DESIRES) (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Labov</surname>
          </string-name>
          , W.:
          <article-title>Principles of Linguistic Change, Social Factors</article-title>
          .
          <source>Principles of Linguistic Change</source>
          , Wiley (
          <year>2001</year>
          ), https://books.google.com/books?id=LS_Ux3CEI5QC
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lakoff</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Language and woman's place</article-title>
          .
          <source>Language in Society</source>
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>45</fpage>
          -
          <lpage>80</lpage>
          (
          <year>1973</year>
          ), http://www.jstor.org/stable/4166707
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finkel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McClosky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Stanford CoreNLP Natural Language Processing Toolkit</article-title>
          . In:
          <article-title>Association for Computational Linguistics (ACL) System Demonstrations</article-title>
          . pp.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          (
          <year>2014</year>
          ), http://www.aclweb.org/anthology/P/P14/P14-5010
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Marcus</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marcinkiewicz</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santorini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Building a large annotated corpus of English: The Penn Treebank</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <fpage>313</fpage>
          -
          <lpage>330</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mehl</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vazire</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramírez-Esparza</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slatcher</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          :
          <article-title>Are women really more talkative than men</article-title>
          ?
          <source>Science</source>
          <volume>317</volume>
          (
          <issue>5834</issue>
          ),
          <fpage>82</fpage>
          -
          <lpage>82</lpage>
          (
          <year>2007</year>
          ), http://science.sciencemag.org/content/317/5834/82
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Podesva</surname>
          </string-name>
          , R.,
          <string-name>
            <surname>D'Onofrio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Hofwegen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Country ideology and the California Vowel Shift</article-title>
          .
          <source>Language Variation and Change</source>
          <volume>27</volume>
          (
          <issue>2</issue>
          ),
          <fpage>157</fpage>
          -
          <lpage>186</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Precht</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Sex similarities and differences in stance in informal american conversation</article-title>
          .
          <source>Journal of Sociolinguistics</source>
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>111</lpage>
          (
          <year>2008</year>
          ), https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-
          <fpage>9841</fpage>
          .
          <year>2008</year>
          .
          <volume>00354</volume>
          .x
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>Working Notes Papers of the CLEF 2018 Evaluation Labs</article-title>
          .
          <source>CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter (Sep</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Sadker</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sadker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Is</surname>
            the
            <given-names>O.K. classroom O.K.</given-names>
          </string-name>
          ?
          <source>The Phi Delta Kappan</source>
          <volume>66</volume>
          (
          <issue>5</issue>
          ),
          <fpage>358</fpage>
          -
          <lpage>361</lpage>
          (
          <year>1985</year>
          ), http://www.jstor.org/stable/20387346
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.S.:</given-names>
          </string-name>
          <article-title>A vector space model for automatic indexing</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>18</volume>
          (
          <issue>11</issue>
          ),
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          (
          <year>1975</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          , H.:
          <article-title>Word space</article-title>
          .
          <source>In: Proceedings of the 1993 Conference on Advances in Neural Information Processing Systems</source>
          ,
          <source>NIPS'93</source>
          . pp.
          <fpage>895</fpage>
          -
          <lpage>902</lpage>
          . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of PAN-2018: Author Identification, Author Profiling, and Author Obfuscation</article-title>
          . In: Bellot,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Trabelsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Murtagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Sanjuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <string-name>
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and
          <string-name>
            <surname>Interaction</surname>
          </string-name>
          .
          <source>9th International Conference of the CLEF Initiative (CLEF 18)</source>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>