<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the 2nd Author Profiling Task at PAN 2014</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francisco Rangel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Rosso</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irina Chugur</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Potthast</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Trenkmann</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benno Stein</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben Verhoeven</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Walter Daelemans</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Autoritas Consulting</institution>
          ,
          <addr-line>S.A.</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CLiPS - Computational Linguistics Group, University of Antwerp</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Natural Language Engineering Lab, Universitat Politècnica de València</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universidad Nacional de Educación a Distancia</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Web Technology &amp; Information Systems, Bauhaus-Universität Weimar</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>898</fpage>
      <lpage>927</lpage>
      <abstract>
        <p>This overview presents the framework and the results for the Author Profiling task at PAN 2014. Objective of this year is the analysis of the adaptability of the detection approaches when given different genres. For this purpose a corpus with four different parts (subcorpora) has been compiled: social media, Twitter, blogs, and hotel reviews. The construction of the Twitter subcorpus happened in cooperation with RepLab in order to investigate also a reputational perspective. Altogether, the approaches of 10 participants are evaluated.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Though the enormous impact of social media on our daily life, we observe a lack of
information about those who create the contents. In this regard, author profiling tries to
determine the gender, age, native language, or personality type of authors by analysing
their published texts. Author profiling is of growing importance: E.g., from a marketing
viewpoint, companies may be interested in knowing the demographics of their target
group in order to achieve a better market segmentation; from a forensic viewpoint,
determining the linguistic profile of a person who wrote a "suspicious text"’ may provide
valuable background information.</p>
      <p>
        In the Author Profiling task at PAN 2013,1 the identification of age and gender
relied on a large corpus collected from social media [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. This year, in PAN 2014,2 we
continue focusing on age and gender aspects but, in addition, compiled a corpus of four
different genres, namely social media, blogs, Twitter, and hotel reviews. Except for the
hotel review subcorpus, which is available in English only, all documents are provided
in both English and Spanish. Note that most of the existing research in computational
linguistics [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and social psychology [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] focuses on the English language, and the
question is whether the observed relations pertain to other languages as well.
      </p>
      <p>The remainder of this paper is organised as follows. Section 2 covers the state of
the art, Section 3 describes the corpus and evaluation measures, and Section 4 presents
the approaches submitted by the participants. Section 5 and 6 discuss results and draw
conclusions respectively.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The study of how certain linguistic features vary according to the profile of their authors
is a subject of interest for several different areas such as psychology, linguistics and,
more recently, computational linguistics. Pennebaker et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] connected language use
with personality traits, studying how the variation of linguistic characteristics in a text
can provide information regarding the gender and age of its author. Argamon et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
analysed formal written texts extracted from the British National Corpus, combining
function words with part-of-speech features and achieving approximately 80%
accuracy in gender prediction. Other researchers (Holmes and Meyerhoff [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Burger and
Henderson [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) have also investigated how to obtain age and gender information from
formal texts.
      </p>
      <p>
        With the rise of the social media, the focus is on other kind of writings, more
colloquial, less structured and formal, like blogs or fora. Koppel et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] studied the
problem of automatically determining an author’s gender by proposing combinations
of simple lexical and syntactic features, and achieving approximately 80% accuracy.
Schler et al. [29] studied the effect of age and gender in the writing style in blogs; they
gathered over 71,000 blogs and obtained a set of stylistic features like non-dictionary
words, parts-of-speech, function words and hyperlinks, combined with content features,
such as word unigrams with the highest information gain. They obtained an accuracy
of about 80% for gender identification and about 75% for age identification. They
modeled age in three classes: 10s (13-17), 20s (23-27) and 30s (33-47). They demonstrated
that language features in blogs correlates with age, as reflected in, for example, the use
of prepositions and determiners. Goswami et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] added some new features as slang
words and the average length of sentences, improving accuracy to 80.3% in age group
identification and to 89.2% in gender detection.
      </p>
      <p>
        It is to be noted that the previously described studies were conducted with texts of at
least 250 words. The effect of data size is known, however, to be an important factor in
machine learning algorithms of this type. In fact, Zhang and Zhang [34] experimented
with short segments of blog post, specifically 10,000 segments with 15 tokens per
segment, and obtained 72.1% accuracy for gender prediction, as opposed to more than 80%
in the previous studies. Similarly, Nguyen et al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] studied the use of language and
age among Dutch Twitter users, where the documents are really short, with an average
length of less than 10 terms. They modelled age as a continuous variable (as they had
previously done in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]), and used an approach based on logistic regression. They also
measured the effect of the gender in the performance of age identification,
considering both variables as inter-dependent, and achieved correlations up to 0.74 and mean
absolute errors between 4.1 and 6.8 years.
      </p>
      <p>
        One common problem when investigating author profiling is the need to obtain
labelled data for the authors, to obtain their age and gender. Studies in classical literature
deal with a small number of well-known authors, where manual labelling can easily be
applied. However for the dimensions of the actual social media data this is a more
difficult task, which should be automated. In some cases, researchers manually label
the collection [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] with some risk of bias. In other cases, as in the vast majority of
the aforementioned studies, researchers took into account information provided by the
authors themselves. For example, in blog platforms, the contributors self-specify their
profiles. This is the case for Peersman et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] who retrieved a dataset from Netlog,3
where authors report their gender and exact age, and Koppel et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], who retrieved
the dataset from Blogspot.4 This is likely to introduce some noise to the evaluation set,
but it also reflects the realistic state of the available data.
      </p>
      <p>The task of obtaining author profiles has an emerging interest in the scientific
community, as can be seen in the number of related tasks around the topic arisen the two
last years: a) the shared task on Native Language Identification at BEA-8 Workshop at
NAACL-HT 2013;5 b) the task on Computational Personality Recognition (WCPR) at
ICWSM 20136 and at ACM Multimedia 2014,7 and; c) the task on Author Profiling at
PAN 2013 and PAN 2014.</p>
      <p>
        With respect to the task on Author Profiling at PAN 2013 [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], most of the
participants used combinations of style-based features such as frequency of punctuation
marks, capital letters, quotations, and so on, together with POS tags and content-based
features such as Latent Semantic Analysis, bag-of-words, TF-IDF, dictionary-based
words, topic-based words, and so on. It is worth mentioning the usage of second order
representations based on relationships between documents and profiles by the winner
of the PAN-AP 2013 task [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and the use of collocations for the winner of the English
task [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>Last but not least, the interest in different author profile aspects is evident also in the
Kaggle platform,8 where companies and research departments shared their needs and
independent researchers joined challenges as Psychopathy Prediction Based on Twitter
Usage;9 Personality Prediction Based on Twitter Stream;10 or Gender Prediction from
Handwriting.11 This shows the rise of interest from the industry in author profiling.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Framework</title>
      <p>In this section we describe the construction of the corpus, covering particular properties,
challenges, and novelties. Finally, the evaluation measures are described.
3.1</p>
      <sec id="sec-3-1">
        <title>Corpus</title>
        <p>In order to study how the different author profiling approaches apply to different genres,
we have built a corpus with four different genres: social media, blogs, Twitter, and hotel
3 http://www.netlog.com
4 http://blogspot.com
5 https://sites.google.com/site/nlisharedtask2013/
6 http://mypersonality.org/wiki/doku.php?id=wcpr13
7 https://sites.google.com/site/wcprst/home/wcpr14
8 http://www.kaggle.com/
9 http://www.kaggle.com/c/twitter-psychopathy-prediction
10 http://www.kaggle.com/c/twitter-personality-prediction
11 http://www.kaggle.com/c/icdar2013-gender-prediction-from-handwriting
reviews. The respective subcorpora cover English and Spanish, with the exception of
the hotel reviews, which have been provided in English only. The corpus documents
are encoded as XML files, one per author, with the contents between &lt;document&gt; tags.
The author is labeled with age and gender information. For labeling age, instead of the
three age classes a) 10s (13-17); b) 20s (23-27); c) 30s (33-47) used in PAN-AP 2013,
this year we opted for modelling age in a more fine-grained way and considered the
following classes: a) 18-24; b) 25-34; c) 35-49; d) 50-64; e) 65+ .</p>
        <p>As in the previous edition, each subcorpus was split into three parts for training,
early birds, and test respectively.</p>
        <p>Social Media We have built the social media subcorpus by selecting a part of the
PAN-AP-13 corpus. We have selected those authors with an average number of words
in their posts greater than 100. We also manually reviewed the documents in order to
remove those authors who seem to be fake profiles such as bots, for example, authors
selling the same product (e.g., mobiles, ads) in most of their posts or authors with a high
number of text reuse (e.g., teenagers sharing poetry or homework). The final distribution
of the number of authors is shown in Table 1. The social media subcorpus is balanced
by gender, so the number of authors per gender is one-half.
Blogs The objective of collecting blogs is to build a gold standard for author profiling
in this specific genre. To achieve this objective, we manually selected and annotated
the documents. Firstly, we looked for public LinkedIn profiles which share a personal
blog URL. We verified that the blog exists, it is written in one of the languages we are
interested in (English or Spanish) and it is updated only by one person and this person
is easily identifiable. We discarded organizational blogs when we were not sure that the
blog was updated by the person identified in the LinkedIn profile. Secondly, we looked
for age information. In some cases the birth date is published in the user’s profile. But
in most cases it is not so we looked for degree starting date in the education section.
We used the information shown in Table 2 to figure out the age range. We discarded
users whose education dates were not clear. Thirdly, if we could figure out the age, we
identified the gender by the user’s photography and name. Again, for those cases where
the gender information was not clear, we discarded the user. Finally, this process was
done by two independent annotators and a third one decided in case of disagreement.
For each blog, we provided up to 25 posts. We provided contents obtained from the
RSS feed but we allow users to download the full text from the permalink.</p>
        <p>The final distribution of the number of authors is shown in Table 3. The blogs
subcorpus is balanced by gender, so the number of authors per gender is half.
Twitter We manually selected and annotated the documents, following the same
methodology as for the blogs. We built this subcorpus in collaboration with RepLab12
where the main goal of author profiling—viewed in the context of reputation monitoring
on Twitter—is to decide how influential a given user is in the domain which the entity
under study belongs to. This includes determining the type of author (e.g., journalist,
stakeholder, professional) and his degree of influence on opinions within the domain.
For the shared PAN-RepLab author profiling task, 131 Twitter profiles from several
domains (energy, environmental, banking, automotive, and Corporate Social
Responsibility sectors) were annotated with age and gender. The profiles were selected from
the RepLab 2013 corpus and from a list of influential authors provided by the online
division of a leading Public Relations consultancy (Llorente &amp; Cuenca).13 Note that
balancing the list of profiles by age and gender turned out to be a challenging task, because
influential Twitter authors in the considered economic domains tend to be male and of
quite a narrow age range (35-49). In addition to age and gender, tweets in RepLab were
manually tagged by reputation experts with a) type of author and; b) opinion-maker
labels (Influencer, Non-influencer, and Undecidable).</p>
        <p>
          For more details on the RepLab 2014 author profiling data set please refer to [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
Due to Twitter terms of service, we provided the tweets URLs so that participants could
download them. For each Twitter profile, we provided up to 1000 tweets. The final
12 http://nlp.uned.es/replab2014
13 http://www.llorenteycuenca.com/
distribution of the number of authors is shown in Table 4. The Twitter subcorpus is
balanced by gender, so half of the authors are male and the other half are female.
Hotels Reviews To study the applicability of author profiling approaches to the
review genre, we have compiled the Webis-TripAd-13 corpus, a large subset of hotel
reviews from the PAN 2014 author profiling evaluation corpus. The corpus has been
carefully constructed to ensure its quality with regard to text cleanliness and annotation
accuracy.
        </p>
        <p>The Webis-TripAd-13 corpus is derived from another corpus that was originally
used for aspect-level rating prediction [31].14 The original corpus was crawled from the
hotel review site TripAdvisor15 in the period of one month from mid February to mid
March 2009, and contains 235 793 reviews about 1,850 different hotels. Each review
comprises its author’s user name, the review text, and the date the review was written.
In addition, there are seven numerical aspect ratings and an overall rating score assigned
by the user, which serve as ground-truth for aspect-level rating prediction or sentiment
analysis tasks in general. However, the original dataset does not feature age and gender
annotations.</p>
        <p>In order to make this dataset applicable to author profiling and to ensure its quality,
we applied the following four post-processing steps: first, we removed short reviews of
less than 10 words which were found to be malformed reviews due to parsing errors.
Second, we removed reviews whose text was not found to be English according to
a language detector. Third, since the original dataset does not provide any age and
gender information, we compiled a list of user names who submitted the reviews and
crawled the corresponding user profiles from the TripAdvisor website. Fourth, given
this metadata, we discarded all reviews written by authors whose age and gender was
not given on their user profile or whose user profile was inactive. Moreover, to ensure
data quality, we reviewed user profiles and reviews with regard to sanity (i.e., whether
the information given made sense). The final Webis-TripAd-13 corpus contains 58 101
reviews and covers six age classes. The distribution of reviews across these classes is
shown in columns 3 and 4 of Table 5.16
14 http://times.cs.uiuc.edu/~wang296/Data
15 http://www.tripadvisor.com
16 This version of the corpus has been released at: http://www.webis.de/research/corpora
Gender
female
male</p>
        <p>Age</p>
        <p>To match the requirements of PAN’s author profiling evaluation corpus, we unified
the Webis-TripAd-13 corpus accordingly: to obtain a nearly uniform age class
distribution, we sampled 700 authors from each of the three major classes (25–34, 35–49,
50–64). For the two minor classes (18–24, 65+), however, the number of authors
available was limited by the size of the smaller age class, so that 254 authors (18–24) and
547 authors (65+) remained, respectively. Class 13–17 was discarded completely since
the number of available authors was found to be not representative for evaluation
purposes. The final distribution of the subset of the Webis-TripAd-13 corpus that forms
part of the PAN author profiling evaluation corpus is shown in Table 5, column 7–8.
For evaluating participants’ approaches we have used accuracy. More specifically, we
calculated the ratio between the number of authors correctly predicted by the total
number of authors. We calculated separately accuracy for each subcorpus, language, gender,
and age class. Moreover, we combined accuracy for the joint identification of age and
gender. The final score used to rank the participants is the average for the combined
accuracies for each subcorpus and language.</p>
        <p>
          We computed statistical significance of performance differences between systems
using approximate randomisation testing [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].17 As noted by Yeh [33], for comparing
output from classifiers, frequently used statistical significance tests such as paired
ttests make assumptions that do not hold for precision scores and f-scores. Approximate
randomisation testing does not make these assumptions and can handle complicated
distributions as well as normal distributions. We did a pairwise comparison of
accuracies of all systems and with p &lt; 0.05, we consider the systems to be significantly
17 We used the implementation by Vincent Van Asch available from the CLiPS website:
http://www.clips.uantwerpen.be/scripts/art
different from each other. The complete set of statistical significance tests is illustrated
in Appendix A.
        </p>
        <p>In case of age identification we also measured the average and standard deviation of
the distance between the predicted and the truth class. We define the distance between
classes as the number of hops between them, with the maximum distance equal to 4 in
case of the most distant ones (18-24 and 65+). In case the participant did not provide a
prediction, we added 1 to the maximum distance, penalising this missing value with a
distance of 5. We also calculated the total time needed to process the test data, in order
to investigate the applicability in a real world.
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Software Submissions</title>
        <p>
          We continue to invite software submissions instead of run submissions for the second
time. Within software submissions, participants are asked to submit executables of their
author profiling softwares instead of just the output (i.e., runs) of their softwares on a
given test set. Our rationale to do so is to increase the sustainability of our shared task
and to allow for the re-evaluation of approaches to Author Profiling later on, for
example, on future evaluation corpora. To facilitate software submissions, we develop the
TIRA experimentation platform [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ], which makes handling software submissions
at scale as simple as handling run submissions. Using TIRA, participants deploy their
software into virtual machines at our site, which allows us to keep them in a running
state [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Overview of the Submitted Approaches</title>
      <p>
        Ten teams participated in the Author Profiling task. Eight of them submitted the
notebook paper, a further one (liau14) provided us with a description of the approach, and
castillojuarez14 did not comment on any change with respect to their last year’s
system [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Pre-processing. Various participants cleaned the HTML and XML to obtain plain
text [
        <xref ref-type="bibr" rid="ref13 ref18 ref19 ref4">18, 19, 4, 13, 32</xref>
        ]. One participant [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] removed URLs, user mentions and hashtags
from the Twitter texts. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], participants carried out case conversion, deleted invalid
characters and multiple white spaces, and similarly in [32] where the participants also
escaped invalid characters. Only in [30] and [32] participants performed tokenisation,
whereas in [32] they studied the effect of subset selection, and in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] they tried to
delete spam bots by deleting contents with high percentage of the % character.
      </p>
      <p>
        Features. Many participants [
        <xref ref-type="bibr" rid="ref13 ref18 ref19 ref20 ref4">20, 19, 13, 4, 32, 18</xref>
        ] and (liau14) considered different
kinds of stylistic features. For example frequencies of different punctuation signs were
used in [
        <xref ref-type="bibr" rid="ref13 ref18 ref20 ref4">13, 20, 4, 18, 32</xref>
        ], size of sentences, words that appear once and twice or the
use of deflections in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], the number of characters, words and sentences in [32]. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
participants measured the number of posts per user, the frequency of capital letters and
capital words, whereas in [32] participants measured the correctness, cleanliness and
diversity of the texts. Only in [32] and [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] participants took advantage of the HTML
information, using the occurrence of tags such as img, href or br. Different
readability features where used in [
        <xref ref-type="bibr" rid="ref13 ref19 ref20 ref4">20, 19, 13, 4, 32</xref>
        ]. For example, Automated Readability
Index [
        <xref ref-type="bibr" rid="ref13 ref19">19, 13</xref>
        ], Coleman-Liau Index [
        <xref ref-type="bibr" rid="ref13 ref19">19, 13</xref>
        ], Rix Readability Index [
        <xref ref-type="bibr" rid="ref13 ref19">19, 13</xref>
        ], Gunning
Fox Index [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Flesch-Kinkaid [32]. A lexical analysis was carried out in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
where participants employed parts-of-speech as features together with the identification
of proper nouns or words with character flooding (e.g., hellooooo). The occurrence of
emoticons was used in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and liau14.
      </p>
      <p>
        With respect to content features, in [
        <xref ref-type="bibr" rid="ref18">30, 18</xref>
        ] and (liau14) participants modeled the
language with n-grams or bag-of-words. In [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] they extracted topic words such as
money, home, smartphone, games, sports, job, marketing, etc. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] participants used
MRC and LIWC features to extract frequency of words related to different
psycholinguistic concepts such as familiarity, concreteness, imagery, motion, emotion, religion,
and so on. Some participants used dictionaries to differentiate words per subcorpus and
class [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], identify lexical errors [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], foreign words [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or specific phrases such as my
husband or my wife [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and liau14.
      </p>
      <p>
        Specific features were used in [32], where participants obtained features employed
in information retrieval (IR) such as the cosine similarity or the Okapi BM25. Finally,
in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] participants estimated the sentiment of the sentences and in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] participants
used a second order representation based on relationships among terms, documents,
profiles and subprofiles.
      </p>
      <p>
        Classification approaches. All the participants approached the task as a machine
learning task. For example, logistic regression was used in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and liau14, and also
in [32] where participants used a different algorithm per subcorpus, for instance logic
boost, rotation forest, multi-class classifier, multilayer perceptron and simple logistic.
In [30] participants used multinominal Naïve Bayes, in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] libLINEAR, in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] random
forests, in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] support vector machines and in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] decision tables. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] participants
implemented their own frequency-based prediction function.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation and Discussion of the Submitted Approaches</title>
      <p>We divided the evaluation in two steps, providing an early bird option for those
participants who wanted to receive some feedback. There were 7 early bird submissions and
eventually 10 for final evaluation. We show results separately for the evaluation in each
corpus part and for each language. Results are given in accuracy of identification of age,
gender, as well as the joint identification of age and gender. Results for early birds are
shown in Tables 6 - 9, whereas final results are shown in Tables 10 to 13. In case of final
evaluation, a baseline was provided for comparison purposes. This baseline considered
the 1 000 most frequent character trigrams. Some participants did not run their systems
on any of the subcorpora.</p>
      <p>As can be seen in the early bird results, the best ones were obtained for Twitter,
both in English and Spanish, with no big differences between the two languages. In
case of blogs, there are similar results for gender identification, but for age and joint
identification the best results were obtained on the Spanish partition. The English blogs
subcorpus is the one with the lowest results in age and joint identification, together
with social media in English and hotel reviews in joint identification. For social media,
the results are better in Spanish than in English for all the predictions. Spanish social
media got one of the highest accuracies in gender identification, together with hotel
Team
liau14
shrestha14
lopezmonroy14
castillojuarez14
marquardt14
ashok14
baker14
Team
lopezmonroy14
liau14
ashok14
shrestha14
marquardt14
castillojuarez14
baker14
Team
lopezmonroy14
shrestha14
liau14
marquardt14
baker14
ashok14
castillojuarez14</p>
      <sec id="sec-5-1">
        <title>English</title>
        <p>
          Joint
reviews and Twitter texts. With respect to hotel reviews, gender accuracies are close
to Twitter, but age and joint identification belong to the lowest among all subcorpora.
The highest values were obtained by shrestha14 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] on Spanish Twitter with 0.8846 in
gender identification, 0.6923 in age identification and 0.6154 in joint identification of
both age and gender.
Joint
0.3571
0.2857
0.2857
0.2143
0.1429
0.0714
        </p>
        <p>
          As for the early birds, the best results in the final evaluation were achieved for
Twitter. In this case gender identification accuracies are higher in English whereas age and
joint identification are higher in Spanish. In any case, all the results are much lower
than the early birds ones, where the size of the set was approximately 10%. With
respect to the blogs, the best results in gender identification were achieved in English
and for age identification in Spanish. Although the joint identification obtained similar
values, in English there are more participants with higher results. The lowest accuracy
for gender identification was reoprted for the Spanish blogs, with values very close to
the random chance. These results are even worse than the early birds ones. Most of
the participants obtained better results for English than in the early birds, except
marquardt14 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] who obtained worse results. Results in social media and hotel reviews are
very similar to the early birds ones, probably caused by the large number of authors.
The results for blogs are very similar to social media in case of age identification. The
lowest results in joint identification were achieved in English social media and in hotel
reviews, where furthermore the lowest results in age identification were obtained. The
lowest results in gender identification were achieved in English blogs, with values very
close to the random chance. On the contrary, the highest results for gender
identification were achieved in hotel reviews and in Twitter. The high ranking of the baseline
approach in hotel reviews is noteworthy, with values for gender identification of 0.6626
and a joint identification just in mid-ranking.
        </p>
        <p>
          The highest effectiveness values were achieved by liau14 in gender identification
on English Twitter (accuracy of 0.7338) and by shrestha14 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] in age identification
on Spanish Twitter (accuracy of 0.6111) as well as in joint identification on Spanish
Twitter (accuracy of 0.4333). It is difficult to draw a correlation between approaches
and results, but looking at the three highest accuracies per subcorpus and task (gender,
age and joint identification), it seems that on overall simple content features such as
bag-of-words or word n-grams achieve the best results. Similarly, bag-of-words used
by liau14, word n-grams used by shrestha14 [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and term vector model used by
villenaroman14 [30] achieved the best results for almost all genres. Also noteworthy is the
contribution of IR features used by weren14 [32] in all the identifications in English
blogs, joint identification in English social media, age identification in Spanish
Twitter, Spanish social media and hotel reviews, gender identification in Spanish blogs and
joint identification in English social media. The mix of content and style features of
marquardt14 [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] gave good results in gender identification in Spanish Twitter and in
the three identifications in Spanish blogs. The second ranking in gender identification in
Spanish social media was obtained with the char n-grams baseline, but low rankings in
the other subcorpora demonstrate that the use of character n-grams does not seem to be
a good approach for author profiling in general. The overall best performance was
obtained by lopezmonroy14 [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] employing second order representation based on terms.
Table 14 shows the joint identification accuracies per subcorpus and their average.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Spanish</title>
        <p>Joint</p>
        <p>In Table 14 joint identification accuracies per subcorpus and the average are shown.
From this table we can infer that: a) the best results were obtained on Twitter maybe
due to the higher number of documents (tweets) per author in comparison to the other
genre and quite likely also to the spontaneous way people express themselves; b) the
lowest results were achieved in English social media and hotel reviews, due to the lowest
results in gender identification in the first case and age identification in the second one.</p>
        <p>In Figure 1 the average and standard deviation of the distances between predicted
and true classes per subcorpus is shown. The highest distance on average is produced for
reviews with a value of 1.69. The lowest distances on average and standard deviation are
produced for Twitter. The similarity in distances between the social media subcorpora
and the Spanish blogs is noteworthy. The complete list of distances among participants
for each subcorpus is shown in Appendix B.</p>
        <p>In Appendix A, statistical significances of all pairwise system comparisons are
detailed. As can be seen in Table A17, although lopezmonroy14 is the first in the general
ranking, this system is statistically not significantly different from shrestha14,
villenaroman14 and weren14. All systems are significantly different from the baseline, although
weren, villenaroman and marquardt form a group close to baseline. It is noteworthy that
most of the systems are statistically indistinguishable regarding English social media,
Spanish Twitter, and blogs (both languages).</p>
        <p>With respect to age identification, all systems are significantly different from the
baseline except ashok14 (the latter team did not participate in the Spanish task). There
are some systems where differences are not statistically significant, such as
lopezmonroy14 and liau14 or weren14 and villenaroman14. In blogs most of the systems are
indistinguishable but significantly different from the baseline. On the other subcorpora,
most of the systems are also different from the baseline. Looking at the accuracies the
results show that most of the systems work significantly better than the baseline in age
identification.</p>
        <p>With respect to gender identification, all the systems are statistically different
from the baseline, but lopezmonroy14, marquardt14, shrestha14, villenaroman14 and
weren14 form a closer group. In English social media, English and Spanish blogs and
Spanish Twitter, most of the systems are statistically not significantly different.
Although all the systems are different from the baseline, most of them are statistically
indistinguishable. Therefore, we cannot conclude that the systems perform better or
worse than the baseline in gender identification. For example, in English social media
all systems that are different from the baseline performed better in gender identification,
in Twitter most of them performed better, but for Spanish social media the other way
around happened and all the systems performed worse. The same happened in hotel
reviews (in English) where most of the systems performed worse.</p>
        <p>
          In Table 15 runtime results are shown. The fastest team was liau14 with
bagof-words features. With regard to the smallest data sets (Twitter and Blogs), we can
make two groups depending on their runtime. The fastest teams utilised bag-of-words
(liau14), words n-grams [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], style features [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], style and content features [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] or, in
some cases, the second order features of [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In case of the largest subcorpora, such as
social media and reviews, the difference among runtimes is more evident. The fastest
ones also utilised simple content features and in some case stylistic ones. The slowest
ones, with high difference, utilised IR-based features [32], parts-of-speech [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] or
combinations of style and content-based features [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. One of the slowest approaches [30]
utilised term-vectors, but team participants reported that the low performance was due
to the Weka library.
        </p>
        <p>
          We executed PAN-AP 2013 approaches for gender identification on the social
media documents of PAN-AP 2014 (social media was the data used in PAN-AP 2013). A
comparison for age identification was not possible due to the different age classes in
PAN-AP 2013 and PAN-AP 2014. Most of the approaches failed at execution time so
we only show those which could be executed. The only team with results for both years
is lopezmonroy.18 In Table 16 a comparison is shown. In English, although the best
result was obtained by lopezmonroy13 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], the majority of PAN-AP 2014 approaches
obtained better results than PAN-AP 2013. In Spanish, results are more balanced
between teams of the two years, although the two best results were obtained respectively
by cagnina13 and haro13 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The high number of approaches below the baseline in
Spanish is noteworthy, as well as the higher accuracies obtained in Spanish than in
English (being Spanish a gender-marked language). With respect to participants of both
years, lopezmonroy13 achieved better results than lopezmonroy14 in English but not in
Spanish.
In this paper we present the results of the 2nd International Author Profiling Task at
PAN-2014 within CLEF-2014. Given four different genres, namely, social media, blogs,
Twitter, and hotel reviews, in the two languages English and Spanish, the 10 participants
of the task had to identify gender and age of anonymous authors.
18 lopezmonroy team was identified by pastor in PAN-AP 2013 (team obtaining the best
performance)
        </p>
        <p>
          The participants used several different features to approach the problem:
contentbased (bag of words, words n-grams, term vectors, named entities, dictionary words,
slang words, contractions, sentiment words, and so on) and stylistic-based
(frequencies, punctuations, POS, HTML use, readability measures and many different
statistics). One participant [32] also combined many different IR-based features such as the
cosine similarity or the Okapi BM25. This evaluation showed that good results were
obtained by approaches which used simple content features (except the second order
representation in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and the IR based features in [32]), for example bag-of-words
(liau14), words n-grams [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] and term vectors [30]. Character n-grams demonstrated
not to be a good approach for author profiling in general. The best results employed a
second order representation based on relationships among terms, documents, profiles
and subprofiles [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>We draw following conclusions with respect to the different corpus parts: a) the
highest accuracies were achieved on Twitter. We think this is due to the fact that we
have a larger number of documents (tweets) per profile and the more spontaneous way
to communicate in this social medium; b) the lowest results were obtained in English
social media and hotel reviews, due to the lowest results in gender and age identification
respectively; c) the highest distance between predicted and truth classes in age
identification occurs in hotel reviews. A further analysis is needed in order to understand if for
instance there are cases of deceptive opinions.</p>
        <p>Acknowledgements The PAN task on author profiling has been organised in the
framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP 7 Marie
Curie People Framework of the European Commission. We would like to thank Atribus
by Corex for sponsoring the award for the winner team. We thank Julio Gonzalo,
Jorge Carrillo and Damiano Spina from UNED for helping with the Twitter
subcorpus. The work of the first author was partially funded by Autoritas Consulting SA and
by Ministerio de Economía y Competitividad de España under grant ECOPORTUNITY
IPT-2012-1220-430000 and CSO2013-43054-R. The work of the second author was in
the framework the DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts:
Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on
Multimodal Interaction in Intelligent Systems.
29. Jonathan Schler, Moshe Koppel, Shlomo Argamon, and James W. Pennebaker.</p>
        <p>Effects of age and gender on blogging. In AAAI Spring Symposium:</p>
        <p>
          Computational Approaches to Analyzing Weblogs, pages 199–205. AAAI, 2006.
30. Julio Villena-Román and José-Carlos González-Cristóbal. DAEDALUS at PAN
2014: Guessing Tweet Author’s Gender and Age—Notebook for PAN at CLEF
2014. In Cappellato et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
31. Hongning Wang, Yue Lu, and Chengxiang Zhai. Latent Aspect Rating Analysis
on Review Text Data: A Rating Regression Approach. In Proceedings of the 16th
ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pages 783–792, 2010.
32. Edson R.D. Weren, Viviane P. Moreira, and José P.M. de Oliveira. Exploring
Information Retrieval features for Author Profiling—Notebook for PAN at CLEF
2014. In Cappellato et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
33. Alexander Yeh. More accurate tests for the statistical significance of result
differences. In Proceedings of the 18th Conference on Computational Linguistics
- Volume 2, pages 947–953, Stroudsburg, PA, USA, 2000. Association for
Computational Linguistics.
34. Cathy Zhang and Pengyu Zhang. Predicting gender from blog posts. Technical
report, Technical Report. University of Massachusetts Amherst, USA, 2010.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Appendix A</title>
    </sec>
    <sec id="sec-7">
      <title>Pairwise Comparison of All Systems</title>
      <p>For all subsequent tables, the significance levels are encoded as follows:
Symbol</p>
      <p>Significance Level
=
*
**
***
English social media.
Spanish social media.
English Twitter.
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
in English social media.
in Spanish social media.
the entire corpus.
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok baker castillojuarez liau lopezmonroy marquardt mechti shrestha villenaroman weren baseline
*** = *** *** *** *** *** *** *** ***
*** = = = = * = = =
*** *** *** *** *** *** *** ***
= = = = = = =
= = = = = =
= = = = =
* = = =
= = **
= =</p>
      <p>=
in English Twitter.
in Spanish Twitter.
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
Table A18. Significance of accuracy differences between system pairs for joint identification in
English social media.</p>
      <p>ashok baker castillojuarez liau lopezmonroy marquardt mechti shrestha villenaroman weren baseline
= = = = = = = = = =
= ** = = = ** ** ** =
** = = = ** ** * =
= = = = = = **
= = = = = =
= = = = =
= * = =
= = *
= **
*
Table A19. Significance of accuracy differences between system pairs for joint identification in
Spanish social media.</p>
      <p>Table A17. Significance of accuracy differences between system pairs for joint identification in
the entire corpus.
Spanish blogs.
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline
ashok baker castillojuarez liau lopezmonroy marquardt mechti shrestha villenaroman weren baseline
*** = *** *** *** *** *** *** *** ***
*** = = = = * = = =
*** *** *** *** *** *** *** ***
= = = = = = =
= = = = = =
= = = = =
* = = =
= = *
= =</p>
      <p>=
English Twitter.
English hotel reviews.
ashok
baker
castillojuarez
liau
lopezmonroy
marquardt
mechti
shrestha
villenaroman
weren
baseline</p>
    </sec>
    <sec id="sec-8">
      <title>Appendix B</title>
    </sec>
    <sec id="sec-9">
      <title>Distances in Age Identification</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Yuridiana</given-names>
            <surname>Aleman</surname>
          </string-name>
          , Nahun Loya, Darnes Vilarino Ayala, and
          <string-name>
            <given-names>David</given-names>
            <surname>Pinto</surname>
          </string-name>
          .
          <article-title>Two Methodologies Applied to the Author Profiling Task-Notebook for PAN at CLEF 2013</article-title>
          . In Forner et al. [
          <volume>8</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Enrique</given-names>
            <surname>Amigó</surname>
          </string-name>
          , Jorge
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            , Irina Chugur, Adolfo Corujo, Julio Gonzalo, Edgar Meij, Maarten de Rijke, and
            <given-names>Damiano</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          .
          <source>Overview of RepLab</source>
          <year>2014</year>
          :
          <article-title>author profiling and reputation dimensions for Online Reputation Management</article-title>
          .
          <source>In Proceedings of the Fifth International Conference of the CLEF Initiative</source>
          ,
          <year>September 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Shlomo</given-names>
            <surname>Argamon</surname>
          </string-name>
          , Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. Gender, genre, and
          <article-title>writing style in formal written texts</article-title>
          . TEXT,
          <volume>23</volume>
          :
          <fpage>321</fpage>
          -
          <lpage>346</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Ian</surname>
          </string-name>
          <article-title>Baker. Proof of Concept Framework for Prediction-Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato et al. [
          <volume>6</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>John D. Burger</surname>
            , John Henderson, George Kim, and
            <given-names>Guido</given-names>
          </string-name>
          <string-name>
            <surname>Zarrella</surname>
          </string-name>
          .
          <article-title>Discriminating gender on twitter</article-title>
          .
          <source>In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '11</source>
          , pages
          <fpage>1301</fpage>
          -
          <lpage>1309</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Linda</given-names>
            <surname>Cappellato</surname>
          </string-name>
          , Nicola Ferro,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Halvey</surname>
          </string-name>
          , and Wessel Kraaij, editors.
          <source>CLEF 2014 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings (CEUR-WS.org)</source>
          ,
          <source>ISSN 1613-0073</source>
          , http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1180</volume>
          /,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Fermin</given-names>
            <surname>Cruz</surname>
          </string-name>
          , Rafa Haro, and
          <string-name>
            <given-names>Javier</given-names>
            <surname>Ortega</surname>
          </string-name>
          .
          <source>ITALICA at PAN</source>
          <year>2013</year>
          :
          <article-title>An Ensemble Learning Approach to Author Profiling-Notebook for PAN at CLEF 2013</article-title>
          . In Forner et al. [
          <volume>8</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Pamela</given-names>
            <surname>Forner</surname>
          </string-name>
          , Roberto Navigli, and Dan Tufis, editors.
          <source>CLEF 2013 Evaluation Labs and Workshop - Working Notes Papers</source>
          ,
          <volume>23</volume>
          -
          <fpage>26</fpage>
          September, Valencia, Spain,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Tim</given-names>
            <surname>Gollub</surname>
          </string-name>
          , Benno Stein, and
          <string-name>
            <given-names>Steven</given-names>
            <surname>Burrows</surname>
          </string-name>
          . Ousting Ivory Tower Research:
          <article-title>Towards a Web Framework for Providing Experiments as a Service</article-title>
          . In Bill Hersh, Jamie Callan, Yoelle Maarek, and Mark Sanderson, editors,
          <source>35th International ACM Conference on Research and Development in Information Retrieval (SIGIR 12)</source>
          , pages
          <fpage>1125</fpage>
          -
          <lpage>1126</lpage>
          . ACM,
          <year>August 2012</year>
          .
          <source>ISBN 978-1-4503-1472-5</source>
          . doi: http://dx.doi.org/10.1145/2348283.2348501.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tim</surname>
            <given-names>Gollub</given-names>
          </string-name>
          , Benno Stein, Steven Burrows, and
          <string-name>
            <given-names>Dennis</given-names>
            <surname>Hoppe</surname>
          </string-name>
          . TIRA: Configuring, Executing, and
          <article-title>Disseminating Information Retrieval Experiments</article-title>
          . In A Min Tjoa, Stephen Liddle,
          <string-name>
            <surname>Klaus-Dieter Schewe</surname>
          </string-name>
          , and Xiaofang Zhou, editors,
          <source>9th International Workshop on Text-based Information Retrieval (TIR</source>
          <volume>12</volume>
          )
          <string-name>
            <surname>at</surname>
            <given-names>DEXA</given-names>
          </string-name>
          , pages
          <fpage>151</fpage>
          -
          <lpage>155</lpage>
          , Los Alamitos, California,
          <year>September 2012</year>
          .
          <source>IEEE. ISBN 978-1-4673-2621-6</source>
          . doi: http://doi.ieeecomputersociety.
          <source>org/10</source>
          .1109/DEXA.
          <year>2012</year>
          .
          <volume>55</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tim</surname>
            <given-names>Gollub</given-names>
          </string-name>
          , Martin Potthast, Anna Beyer, Matthias Busse, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, and
          <string-name>
            <given-names>Benno</given-names>
            <surname>Stein</surname>
          </string-name>
          .
          <article-title>Recent Trends in Digital Text Forensics and its Evaluation</article-title>
          . In Pamela Forner, Henning Müller, Roberto Paredes, Paolo Rosso, and Benno Stein, editors,
          <source>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 4th International Conference of the CLEF Initiative (CLEF 13)</source>
          , pages
          <fpage>282</fpage>
          -
          <lpage>302</lpage>
          , Berlin Heidelberg New York,
          <year>September 2013</year>
          . Springer.
          <source>ISBN 978-3-642-40801-4</source>
          . doi: http://dx.doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -40802-1_
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sumit</surname>
            <given-names>Goswami</given-names>
          </string-name>
          , Sudeshna Sarkar, and
          <string-name>
            <given-names>Mayur</given-names>
            <surname>Rustagi</surname>
          </string-name>
          .
          <article-title>Stylometric analysis of bloggers' age and gender</article-title>
          . In Eytan Adar, Matthew Hurst, Tim Finin, Natalie S. Glance, Nicolas Nicolov, and Belle L. Tseng, editors,
          <source>ICWSM. The AAAI Press</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Gilad</surname>
            <given-names>Gressel</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hrudya</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surendran</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thara</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aravind</surname>
            <given-names>A</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Prabaharan</given-names>
            <surname>Poomachandran</surname>
          </string-name>
          .
          <article-title>Ensemble Learning Approach for Author Profiling-Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato et al. [
          <volume>6</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>Janet</given-names>
            <surname>Holmes</surname>
          </string-name>
          and
          <string-name>
            <given-names>Miriam</given-names>
            <surname>Meyerhoff</surname>
          </string-name>
          .
          <article-title>The Handbook of Language and Gender</article-title>
          . Blackwell Handbooks in Linguistics. Wiley,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Moshe</surname>
            <given-names>Koppel</given-names>
          </string-name>
          , Shlomo Argamon, and Anat Rachel Shimoni.
          <article-title>Automatically categorizing written texts by author gender</article-title>
          .
          <source>literary and linguistic computing 17(4)</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>A. Pastor</surname>
            Lopez-Monroy,
            <given-names>Manuel</given-names>
          </string-name>
          <string-name>
            <surname>Montes-Y-Gomez</surname>
          </string-name>
          ,
          <article-title>Hugo Jair Escalante, Luis Villasenor-Pineda, and Esau Villatoro-Tello. INAOE's Participation at PAN'13: Author Profiling task-Notebook for PAN at CLEF 2013</article-title>
          . In Forner et al. [
          <volume>8</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>A. Pastor</surname>
          </string-name>
          López-Monroy, Manuel Montes y Gómez,
          <article-title>Hugo Jair-Escalante, and Luis Villase nor Pineda. Using Intra-Profile Information for Author Profiling-Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato et al. [
          <volume>6</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Suraj</surname>
            <given-names>Maharjan</given-names>
          </string-name>
          , Prasha Shrestha, and
          <string-name>
            <given-names>Thamar</given-names>
            <surname>Solorio</surname>
          </string-name>
          .
          <article-title>A Simple Approach to Author Profiling in MapReduce-Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato et al. [
          <volume>6</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>James</surname>
            <given-names>Marquardt</given-names>
          </string-name>
          , Golnoosh Fanardi, Gayathri Vasudevan,
          <string-name>
            <surname>Marie-Francine</surname>
            <given-names>Moens</given-names>
          </string-name>
          , Sergio Davalos, Ankur Teredesai, and Martine De Cock.
          <article-title>Age and Gender Identification in Social Media-Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato et al. [
          <volume>6</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Seifeddine</surname>
            <given-names>Mechti</given-names>
          </string-name>
          , Maher Jaoua, and Lamia Hadrich Belguith.
          <article-title>Machine learning for classifying authors of anonymous tweets, blogs and reviews-Notebook for PAN at CLEF 2014</article-title>
          . In Cappellato et al. [
          <volume>6</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Michal</surname>
            <given-names>Meina</given-names>
          </string-name>
          , Karolina Brodzinska, Bartosz Celmer, Maja Czokow, Martyna Patera, Jakub Pezacki, and
          <string-name>
            <given-names>Mateusz</given-names>
            <surname>Wilk</surname>
          </string-name>
          .
          <article-title>Ensemble-based Classification for Author Profiling Using Various Features-Notebook for PAN at CLEF 2013</article-title>
          . In Forner et al. [
          <volume>8</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Dong</surname>
            <given-names>Nguyen</given-names>
          </string-name>
          , Noah A.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>and Carolyn P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosé</surname>
          </string-name>
          .
          <article-title>Author age prediction from text using linear regression</article-title>
          .
          <source>In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage</source>
          ,
          <source>Social Sciences, and Humanities</source>
          ,
          <source>LaTeCH '11</source>
          , pages
          <fpage>115</fpage>
          -
          <lpage>123</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Dong</surname>
            <given-names>Nguyen</given-names>
          </string-name>
          , Rilana Gravel, Dolf Trieschnigg, and
          <string-name>
            <given-names>Theo</given-names>
            <surname>Meder</surname>
          </string-name>
          .
          <article-title>"how old do you think i am?"; a study of language and age in twitter</article-title>
          .
          <source>Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Eric</surname>
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Noreen</surname>
          </string-name>
          .
          <article-title>Computer intensive methods for testing hypotheses: an introduction</article-title>
          . Wiley, New York,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Claudia</surname>
            <given-names>Peersman</given-names>
          </string-name>
          , Walter Daelemans, and Leona Van Vaerenbergh.
          <article-title>Predicting age and gender in online social networks</article-title>
          .
          <source>In Proceedings of the 3rd international workshop on Search</source>
          and
          <article-title>mining user-generated contents</article-title>
          ,
          <source>SMUC '11</source>
          , pages
          <fpage>37</fpage>
          -
          <lpage>44</lpage>
          , New York, NY, USA,
          <year>2011</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>James</surname>
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Pennebaker</surname>
          </string-name>
          .
          <article-title>The Secret Life of Pronouns: What Our Words Say About Us</article-title>
          .
          <source>Bloomsbury USA</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>James</surname>
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Pennebaker</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mathias R. Mehl</surname>
          </string-name>
          , and Kate G. Niederhoffer.
          <article-title>Psychological aspects of natural language use: Our words, our selves</article-title>
          .
          <source>Annual review of psychology</source>
          ,
          <volume>54</volume>
          (
          <issue>1</issue>
          ):
          <fpage>547</fpage>
          -
          <lpage>577</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28. Francisco Rangel, Paolo Rosso, Moshe Koppel, Efstatios Stamatatos, and
          <string-name>
            <given-names>Giacommo</given-names>
            <surname>Inches</surname>
          </string-name>
          .
          <article-title>Overview of the Author Profiling Task at PAN 2013-Notebook for PAN at CLEF 2013</article-title>
          . In Forner et al. [
          <volume>8</volume>
          ].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>