<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>PANcakes Team: A Composite System of Genre-Agnostic Features For Author Profiling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pepa Gencheva</string-name>
          <email>pkgencheva@uni-so</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Boyanov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Deneva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Preslav Nakov</string-name>
          <email>pnakov@qf.org.qa</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yasen Kiprov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivan Koychev</string-name>
          <email>koychev@uni-so</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georgi Georgiev</string-name>
          <email>g.d.georgiev@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FMI, Sofia University "St. Kliment Ohridski"</institution>
          ,
          <addr-line>Sofia</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Qatar Computing Research Institute</institution>
          ,
          <addr-line>HBKU, Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>We present the system we built for participating in the PAN-2016 Author Profiling Task [9]. The task asked to predict the gender and the age group of a person given several samples of his/her writing, and it was offered for three different languages: English, Spanish, and Dutch. We participated in both subtasks, for all three languages. Our approach focused on extracting genre-agnostic features such as bag-of-words, sentiment and topic derivation, and stylistic features. We then used these features to train SVM-based classifiers, as implemented in LIBLINEAR for the gender classification sub-task, and in LIBSVM for the age classification sub-task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Author Profiling is a task in Natural Language Processing that aims at identifying
different characteristics of the authors by analyzing texts written by them. The task can
range from classifying the author by his/her age, gender or mother tongue, to finding
his/her socio-economic category.</p>
      <p>
        The PAN-2016 Author Profiling Task [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] asks participants to identify the gender and
age-group of a person, given a set of documents s/he has authored. The task is even more
challenging, because the system is given training data only for social media documents,
but the evaluation would be performed over data in another genre. Furthermore, the task
is held in English, Spanish and Dutch. Thus, the participants must provide a cross-genre
multi-lingual solution to the problem. Their systems are evaluated through TIRA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
which is a platform for evaluation as a service.
      </p>
      <p>In this paper, we present our approach to the Author Profiling Task. Our main focus
is on extracting genre- and language-agnostic features based on the content of the
documents written by the author. We also experimented with bootstrapping the algorithm
via several iterations of learning and classification.</p>
      <p>The paper is structured as follows. In the next section, we give a brief overview of
some related work. Then, in the following sections, we describe in detail the data used,
the different steps we undertake for tackling the task, as well as the evaluation of the
system.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>The task of Author Profiling was first introduced to the PAN series of scientific events
in 2013. It started as a task for identifying the author’s gender and age in English and
Spanish. In the following years, the notion of posts’ genre was introduced, the Dutch
language was added to the task, and in 2015 the task also included author personality
profiling.</p>
      <p>
        The participants from the previous years applied a large variety of techniques. They
used different pre-processing steps like removing the HTML code, the hash-tags or
user mentions from Tweeter posts, replacing or removing URLs, etc. The features used
represent a wide variety from simple style-based features to more complex
contentbased features and combinations thereof [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In terms of cross-genre classification, this is the first year that the PAN Author
Profiling Task is setup to evaluate the algorithms on domains different than the training
domain.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data</title>
      <p>For training we used the dataset provided by the PAN-2016 Author Profiling Task
organizers. It consisted of three different training sets for the different languages: English,
Spanish, and Dutch. The English, Spanish, and Dutch training datasets contained
Twitter posts for 436, 250, and 384 authors, respectively. For the purpose of the task, only
the posts’ text could be used for training.</p>
      <p>The training datasets contained labels, classifying the authors by gender (male or
female), and by age-group (18-24, 25-34, 35-49, 50-64, 65+). For Dutch, only the gender
subtask was available.</p>
      <p>For testing, we used training sets from the PAN-2014 Author Profiling Task, which
contained posts in three non-Twitter domains: social, blogs, and reviews. These datasets
are available for English and Spanish only. For Dutch, we used the training dataset from
the PAN-2015 Author-Profiling Task, even though it is from the Twitter domain. We
also assembled our own test dataset, which contained posts from the three available
non-Twitter domains from the PAN-2014 Author Profiling Task. It was also used for
testing.</p>
      <p>
        We also used the NRC Word-Emotion Association Lexicon [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It contains 14,182
English words associated with positive and negative sentiment and emotions such as
anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. These words are
translated into several other languages, so that the associations could be used in them, too.
We use the Lexicon for English, Spanish, and Dutch. Another type of Lexicons we used
were lists of swear words for English, Spanish, and Dutch, which were accumulated
from several resources [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10,11,12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-4">
      <title>Our Genre-Agnostic Approach</title>
      <p>We designed and implemented a system that could be easily configured with different
parameters and extended with new features.</p>
      <p>We then proceeded to extract a large variety of features. They can be divided into
sentiment and lexical features. The lexical features can further be split into content
and stylistic features. The semantic features and the content-based lexical features are
based on the pre-processed tokens, while the style-based lexical features are based both
on the pre-processed and on the raw authors’ posts. From these features, we further
selected those that serve best in the classification task and we used them in the training
of our model. We concentrated on training the model for age and gender classification
in English. For the models for Dutch and Spanish, we used the same features and model
parameters as the ones chosen for English.</p>
      <p>
        The system is built on top of the tools, provided by the scikit-learn machine learning
library in Python [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It contains various machine learning algorithms, some feature
extraction mechanisms and a handy way to create a pipeline of all the features and the
chosen classification method.
      </p>
      <p>In the next subsections, we describe in detail the individual steps in the model
construction.
4.1</p>
      <sec id="sec-4-1">
        <title>Pre-processing</title>
        <p>The pre-processing step is an important part of the domain-agnostic author profiling
task. For this step, we remove all the genre-specific strings from the authors’ posts.
Most of them are Twitter-style sequences. We also join all the author’s posts to create
bigger text and eliminate the differences in the posts’ length in the different domains
such as Tweeter and blogs.</p>
        <p>We first clean the posts from HTML tags. Then, we remove all genre-specific
sequences in the text including strings such as user-mentions, at-mentions, hash-tags,
URLs, punctuation sequences, emoticons, etc. The next step is to tokenize the cleaned
text. The tokens created in this way are used for most of the features. Some of the
features use the raw post as well.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Features</title>
      </sec>
      <sec id="sec-4-3">
        <title>Lexical Style-based Features</title>
        <p>The lexical style-based features aim at finding a correlation between author’s gender
and age and his/her manner of writing with the use of simple surface metrics. We count
the number of the following: function words used by the author; words that are not from
a predefined dictionary of words for each of the languages; words starting with a capital
letter; words written with all capital letters; the total number of sentences; punctuation
signs in the post’s text; URLs used in the posts; e-mails used in the post; phone numbers
used; different pronouns used. We also include the type-token ratio, which counts the
number of unique words used by the author divided by the total number of words used,
the average word length, the average length of a sentence, the numbers used in the post.
We further take into account the number of img and br HTML tags used in the post.</p>
        <p>
          Another feature we use is finding whether the person has mentioned something that
makes it obvious whether s/he is a man or woman, e.g., “my wife”, “my man”, “my
girlfriend”, “my boyfriend”, etc. We do this for English only. We devise such a feature
for the author’s age too, searching for the pattern “I am ”, followed by a number, which
is supposed to signify the age of the author. We also count the frequencies of
Part-ofspeech tags. For this purpose, we employ the Natural Language Toolkit for Python.[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
We also tried to include some readability metrics such as the Automatic Readability
Index, but they showed poor performance on the test sets.
        </p>
        <p>From these features, we selected the best-performing ones. For age classification,
we eventually used the number of function words used, the number of img HTML tags,
the number of sentences, the ratio of capital letters over all letters, and the number
of punctuation signs used. For the gender classification task, we included the strings,
showing that the author obviously belongs to one of the classes, the average length
of the words, the number of img HTML tags, the proportion of capital letters over all
letters, and the number of punctuation signs used. All of these features are included in
the models for English, Spanish and Dutch.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Lexical Content-based Features</title>
        <p>The content-based features look deeper into the words used and the topics in the
posts of the authors. The feature that was the most essential one was bag-of-words. For
age classification, we transformed the authors’ posts into a matrix of word uni-gram and
bi-gram counts. For the gender classification task, we employ a similar bag-of-words
approach and further transform the count matrix into a normalized TF-IDF
representation. The document frequency we use is logarithmic and the normalization is cosine. For
gender, classification TF-IDF bag-of words approach proved to be effective with char
tri-grams and word uni- and bi-grams. We tuned this feature to use only tokens with
document frequency between 30 and 80 percent. Another interesting content feature is
the Non-negative Matrix Factorization (NMF), which we use for topic extraction. We
extracted a total of 20 topics. This feature is used in both age and gender classification.</p>
        <p>We created a dictionaries containing information about the Point-wise Mutual
Information (PMI) between the words and the classes. These dictionaries were extracted
from the training data for every language. During the testing phase, we calculate the
sum of the PMI for all the words for a given author and provide them as features for the
classifier. Unfortunately, this approach did not perform very well in the cross-domain
genre because of the different vocabulary one uses when switching to another context.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Semantic Features</title>
        <p>
          The semantic features look into the overall sentiment expressed by the words, used
in the posts. For this purpose, we use the NRC Word-Emotion Association Lexicon [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
We accumulate the sentiment from each of the words. The sentiment is collected for
positive and negative meanings of the words, as well as for the emotions of anger,
anticipation, disgust, fear, joy, sadness, surprise, and trust. From these sentiment meanings,
we finally chose the best-performing ones, which are the following: positive, negative,
joy, surprise, and trust. They showed good performance for the age classification for all
of the three languages and were not included in the gender classification.
        </p>
        <p>
          For gender classification, we also used the number of swear words used. For the
different languages, we collected lists of swear words from several resources [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10,11,12</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments &amp; Evaluation</title>
      <p>
        For building the classification models for the two tasks and the three languages, we tried
a large variety of machine learning algorithms and different ensembles thereof. For
gender classification, liblinear proved to perform better, while for age classification,
libsvm with a radial basis kernel worked best. We used the implementations of the
algorithms from scikit-learn, which are built on top of LIBLINEAR [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and LIBSVM
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For the final models, we tuned the parameters for liblinear (C was set at 0.7), and
for libsvm (C was set to 1.25 and gamma was set to 0.125). From all of the features,
we selected the best-performing ones on the different test sets, described in the third
section. In the following tables, we represent the scores we achieved for the available
domains and for the real test set (testset2 on Tira). In Table 1, we present the accuracy
for the author gender classification, and in Table 2, we present the accuracy for the
author age-group classification.
5.1
      </p>
      <sec id="sec-5-1">
        <title>Feature Selection</title>
        <p>For feature selection, we used several approaches. We tried setting a variance
threshold of 80 percent, meaning that all of the features that have 0/1 value for more that
80 percents of the training examples will be removed. We also tried selecting different
percentages of the features based on the F-value between label/feature for classification
tasks or the chi-square statistics of non-negative features for classification tasks.
Unfortunately, none of the feature selection attempts yielded good results on the test sets and
we did not use feature selection for the final models.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Self Training</title>
        <p>
          We figured out that if we could classify a couple of authors from the target domain with
high confidence, then we could use their corresponding data for training. This approach
is known as self training and shows good results in a number of NLP tasks [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Thus,
we modified the system to pass through several iterations of training and testing. At
each iteration, the entries classified with high confidence by the SVM classifier would
be given as training data for the next iteration. We hoped that this way we could
provide more of the needed data from the target domain. And indeed, this was the case in
some of our experiments. However, this approach introduces two new parameters to the
system: the confidence level to accept new training data and the number of iterations.
They were hard to tune for the different test sets and if not tuned well, the results were
worse. Thus, we did not include it in out last submission, because we could not be sure
how they would perform on the unknown test set.
        </p>
        <p>blogs
We have presented the system developed by our team for participating in PAN-2016
Author Profiling Task. It included different lexical and semantic features and used
liblinear and libsvm for training the model for the age and gender classification tasks in
the three languages. The system is easily extendable and can serve as a basis for other
attempts to tackle the problem.</p>
        <p>For future improvement and investigation of the problem of domain-agnostic
author profiling, we find the bootstrapping approach interesting to develop and work on.
It showed some promising results on the official test set, where for Dutch we got an
accuracy score of above 0.70 for the gender task. However, it performed really poorly
on the other languages and it was not part of our final submitted system. We believe
that this approach can lead to better results, but further study is required to identify the
correct parameters.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was performed by a team of students from MSc programs in Computer
Science in the Sofia University “St Kliment Ohridski”.</p>
      <p>We thank the Sofia University “St Kliment Ohridski” for the support and guidance
to our team participation at the CLEF 2016 Conference.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <surname>Steven</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.: Natural</given-names>
          </string-name>
          <string-name>
            <surname>Language Processing with Python. O'Reilly Media</surname>
            <given-names>Inc.</given-names>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <issue>2</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>LIBSVM: A library for support vector machines</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>2</volume>
          ,
          <issue>27</issue>
          :
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          :
          <fpage>27</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chapelle</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schlkopf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zien</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised learning (</article-title>
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>K.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsieh</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.J.:</given-names>
          </string-name>
          <article-title>LIBLINEAR: A library for large linear classification</article-title>
          .
          <source>Journal of Machine Learning Research 9</source>
          ,
          <fpage>1871</fpage>
          -
          <lpage>1874</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burrows</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoppe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : TIRA: Configuring, Executing, and
          <article-title>Disseminating Information Retrieval Experiments</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Text-based Information Retrieval (TIR 12) at DEXA</source>
          . pp.
          <fpage>151</fpage>
          -
          <lpage>155</lpage>
          . Los Alamitos, California (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Grivas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krithara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannakopoulos</surname>
          </string-name>
          , G.:
          <article-title>Author profiling using stylometric and structural feature groupings : Notebook for PAN at CLEF</article-title>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mohammad</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turney</surname>
          </string-name>
          , P.D.:
          <article-title>Crowdsourcing a word-emotion association lexicon</article-title>
          .
          <source>Computational Intelligence</source>
          <volume>29</volume>
          (
          <issue>3</issue>
          ),
          <fpage>436</fpage>
          -
          <lpage>465</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author Identification, and Author Profiling</article-title>
          . In: Kanoulas,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Lupu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Clough</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Sanderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hanbury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Toms</surname>
          </string-name>
          , E. (eds.)
          <article-title>Information Access Evaluation meets Multilinguality, Multimodality, and Visualization</article-title>
          .
          <source>5th International Conference of the CLEF Initiative (CLEF 14)</source>
          . pp.
          <fpage>268</fpage>
          -
          <lpage>299</lpage>
          . Springer, Berlin Heidelberg New York (
          <year>Sep 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Wikipedia:
          <article-title>Dutch profanity - wikipedia, the free encyclopedia (</article-title>
          <year>2016</year>
          ), https://en.wikipedia.org/w/index.php?title=Dutch_profanity&amp;oldid=719768946, [Online; accessed 18-May-2016]
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Wikipedia:
          <article-title>English profanity - wikipedia, the free encyclopedia (</article-title>
          <year>2016</year>
          ), https://en.wiktionary.org/wiki/Category:English_vulgarities, [Online; accessed 18-May-2016]
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Wikipedia:
          <article-title>Spanish vulgarities - wikipedia, the free encyclopedia (</article-title>
          <year>2016</year>
          ), https://en.wiktionary.org/wiki/Category:Spanish_vulgarities, [Online; accessed 18-May-2016]
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>