<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Straightforward Multimodal Approach for Author Profiling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mario Ezra Aragón</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Pastor López-Monroy</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering, Universidad Autónoma de Chihuahua; Chihuahua</institution>
          ,
          <addr-line>Chih., México, 31100</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Houston;</institution>
          <addr-line>Houston TX, USA 77004</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>In this paper we evaluate different strategies from the literature for text and image classification at PAN 2018. The main objective of this shared task is the identification of the gender of different users by using tweets and images posted. We evaluate four popular strategies for the text representation: 1) Bag of Terms (BoT), 2) Second Order Attributes (SOA) representation, 3) Convolutional Neural Network (CNN) models and 4) an Ensemble of n-grams at word and character level. For the image representation we used a Convolutional Neural Network (CNN) based on [6]. We observed that the n-grams Ensemble presented the highest performance. For our participation we chose the Ensemble and perform an early fusion with the image representation to create a multimodal representation.</p>
      </abstract>
      <kwd-group>
        <kwd>Author Profiling</kwd>
        <kwd>Bag of Words</kwd>
        <kwd>CNN</kwd>
        <kwd>Mining</kwd>
        <kwd>Text Classification</kwd>
        <kwd>Text</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Author Profiling (AP) is a common and well know task in Natural Language Processing
(NLP), that consists in extracting all the possible information from an author’s
document [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The AP could help in different problems such as the detection of a person
of interest, security, prevention, political opinion, business intelligence, etc. The PAN
2018 shared task has the objective of tackling this problem using machine learning and
NLP techniques. The main objective is to identify the gender’s user with the novelty
of considering posted tweets and images as new information. The shared task has three
different languages: English, Spanish and Arabic.
      </p>
      <p>
        In this work we separately evaluate the AP in three modalities: Textual, Visual and
Textual-Visual. For the textual modality, we mainly evaluate what should be the de
facto baseline: a huge ensemble of n-gram histograms at word and character level.
Then, we compare this approach with three strategies: Bag-of-Terms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Second
Order Attributes [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and CNNs [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The core idea behind our evaluation is to determine
which approach captures better the thematic content, which according to the literature
has been the cornerstone to effectively profile users [
        <xref ref-type="bibr" rid="ref11 ref13 ref14">11,13,14</xref>
        ]. Regarding to the
visual modality, we only evaluate one very simple, yet effective, CNN based method.
For this visual approach, we extracted the category layer from the VGG16 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and use
it as features with a SVM on the top. Then, a set of images belonging to the same
users are averaged. Intuitively this approach exploits the posting behavior of users [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
where the idea is to capture the visual content that is being posted by users. The
intuition is that such visual content is significantly different between males and females,
thus highly discriminative. This is somewhat analogous to observe the thematic content
when classifying documents. Finally for the Textual-Visual modality, we bring together
the textual-visual thematic into a single approach. For this we performed an early
fusion to combine our multiple histograms of n-grams and the features extracted from the
VGG16; this is precisely our submitted approach to PAN18.
      </p>
      <p>The remainder of this paper is at follows: Section 2 presents some of the AP related
work. In Section 3 we described the different strategies that we evaluated for the text
representation. Section 4 describes our approach for the image representation. In
Section 5 is described the multimodal representation and how was created. Section 6 and
7 describes the Experimental Settings and a description of the Experimental Results.
Finally Section 8 include our conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In this section we present a review of AP related work that have been proposed to
handle this task. There are different methods: from a simple representations like removing
stopwords and creating a Bag of Terms [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] to a more complex representations
using traditional word embeddings [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] or embeddings exploiting the morphology and
semantics of the words [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] the authors proposed a simple method of
classification based on the similarity between the objects; they consider different terms used
in the texts that corresponds to a user’s tweets. Other approach is to extract groups of
terms that are presented in the tweets [
        <xref ref-type="bibr" rid="ref13 ref22">13,22</xref>
        ], where the authors also used extra
information like emojis, document sentiment, POS tags, etc. In these approaches the authors
found that including the extra information like emojis or POS tags do not improve the
performance.
      </p>
      <p>
        Another popular approach is to address the problem as a profile based problem [
        <xref ref-type="bibr" rid="ref18 ref4">18,4</xref>
        ],
where they create targets of profiles and groups of subprofiles for each user’s tweets.
In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] authors built a system where they used a combination of different classifiers,
with the objective of identified the behavior of different users. There are also some
approaches that handle this task using relative new approaches like deep learning. For
instance in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] the authors generate embeddings representations that are classified
using deep averaging networks. This model receives as input the word embeddings and
the first layer average those embeddings, the next hidden layers transform the
computed average. In [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] the authors used a deep learning model based on CNNs using
a matrix of 2-grams of letters with punctuation marks as features. These deep learning
approaches got an accuracy above the average results for the task.
      </p>
      <p>
        When we talk about visual and multimodal for AP, these approaches have been less
studied in comparison with text approaches. For the visual modality approach, the
authors had focused their research on gender recognition task [
        <xref ref-type="bibr" rid="ref24 ref25 ref26">24,25,26</xref>
        ], where some
general statistics have been considered using the images as features. For the multimodal
approach, is a type of strategy that has just recently been explored [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] authors
used a image and text weighted strategy for gender classification. Their idea consist
using a CNN for determining a score related to a user’s image, and they combined this
information with textual features and average the score.
      </p>
      <p>In this work, inspired in the related work we attempt to evaluate different approaches
(see Section 3) and select the best one for the test data presented for the task. We
proposed to bring an early fusion from the textual and visual features for our approach.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Textual Modality Strategies</title>
      <p>In this section we described the different strategies that we select from the literature for
the textual representation. All these representations have resolved and got remarkable
performance in different NLP classification tasks.
3.1</p>
      <sec id="sec-3-1">
        <title>Bag of Terms (BoT)</title>
        <p>
          The bag of terms is the most simple and well know strategy for text representation where
the text is described by the occurrence of words within a documents, i)the first step is
the creation of a vocabulary form training data and then ii) the presence of the words
are measure by its frequency [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This representation is an histogram thus it ignores the
structure of the words, accounting only the occurrence of the words in the document
and not the position or order in it.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Second Order Attributes (SOA)</title>
        <p>
          In this representation the document vectors are build in a space of profiles. Where each
value in the vector represents the relationship that exist between each document with
each target profile and subprofiles [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. This representation has the objective of dividing
the profiles using a clustering algorithm to create several subprofiles. First is needed
to capture the relation of each term with the profiles. Then compute the term vector
in a profile’s space, it creates a term vectors of the terms that are contained in the
document. Lastly they are weighted by the relative frequency of the term contained in
the document.
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>CNN Models</title>
        <p>
          For this strategy we used CNN models that are based on [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], we used three different
training techniques for the models:
– CNN-Rand: This model tested is where we randomly initialized all weights and
then are modified during the training phase..
– CNN-Static: Where this model uses word embedding vectors to initialize the
embedding layer. During the training the weights of the embedding are kept fixed so
they are not modified.
– CNN-NonStatic: This model is similar as the previous one, but we allowed to
change the embedding weights during training.
        </p>
        <p>
          We used filter windows of size 3,4 and 5 with 100 feature maps for each one, a dropout
rate of 0.5, and stochastic gradients descent for training over shuffled mini-batches. We
used pre-trained word vectors with a dimensionality 300, word vectors were obtained
using word2vec [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for English and FastText [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for Spanish and Arabic.
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>N-Grams Subspaces for Author Profiling</title>
        <p>The first step for our approach is the creation of the representation from the text. We
proposed a method that has two stages for the creation: i) extract n-grams of size one
to four at word level and size two to five at character level, ii) then we select the best
n-grams using chi2 distribution applied to each group and then concatenate the best
selected n-grams from each group, as shows in Figure 1 we can see the overall process
of extraction and creation of the n-grams. In the following lines we explain the main
two stages.</p>
        <p>
          Extract n-grams The first step for our approach is to create the group of n-grams [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] of
size one to four for the word level and two to five for the character level. To extract the
n-grams we have three steps i) first we represent the documents using the occurrences of
the group of words in the document, ii) then each group of terms vectors are normalize
and iii) smoothing the weights of each group by the inverse document frequency adding
one to document frequencies, preventing zero divisions as if every terms was seen in
other document.
        </p>
        <p>
          CHI2 distribution The second stage of this approach is the selection of the best
features of each group of n-grams, we used the chi2 distribution Xk2 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] for this selection.
When using this function we select the features that are the most likely to be relevant
for the detection of the gender.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Visual Modality: CNN for features extraction</title>
      <p>
        The second step of our approach is the feature extraction of the images. Each user has
10 images that they post in the social media. We use a well know state-of-the-art model
in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] with pre-trained weights on ImageNet. We used the last layer (the class layer
with the 1000 classes) of the pre-trained model as the features of the feed image. Then
create a mean vector formed from the features of the 10 images. These mean vectors are
used for training the image model. The idea behind this approach is to capture a similar
distribution of images that users post, and achieve a discrimination between them. As
the model is designed for visual object recognition (this includes objects and scenes),
we expect similar values for users that post similar images. The pre-trained model that
we used was the VGG16 with 1000 classes, VGG16 is a CNN model and refers to the
16 weight layers of the model. Figure 2 shows the extraction of the features from the
images for training the image model.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Early Fusion: Textual and Visual Representations</title>
      <p>The last step for the shared task is the classification of the users using both text and
images. Our approach consist of an early fusion of both Textual and Visual representations
concatenating previous vectors and then we pass the new representation to the
classifier, we used a Support Vector Machine (SVM) for the training and classification. Our
hypothesis is that combining both features the results should improve by giving more
information about the users, than only using one kind of feature. Figure 3 describes this
feature extraction from the text and the images for the concatenation.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Experimental Settings</title>
      <p>
        The objective for this task is to determinate the gender of a user using a set of different
tweets and images that the user posted. We evaluate the task in a separated way: i) only
using the text we extract the group of n-grams and trained the SVM classifier for the
prediction, ii) only using the images we used the VGG16 to extract the features of the
images and calculate the mean vector of it and then trained other SVM classifier, and
iii) we used both the features of the text and the images then concatenate and trained a
third SVM classifier. The shared task have 3 different languages to test the models, and
we trained one model for each language with this we have 9 different predictions for
the test dataset. In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] it presents an overview that describe in detail the tasks, data and
evaluation.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Experimental Results</title>
      <p>To test the models before the TIRA platform we first separated the training dataset in
70% for training and 30% for test and extract the text and images features. For this task
we measure the Accuracy over the predictions. For the test dataset we trained our
models and then predict the gender using all the users in the training dataset. Table 1 shows
the detailed classification results obtained with the text, images and both strategies for
the three languages. In these results we could appreciate that the n-gram Ensemble
performs better than the others strategies for text representation. SOA and CNN did not
perform better in these tasks than the n-grams Ensemble this could be due the base
term (words) used in those representations. Therefore this presents an opportunity to
integrate the same idea as the n-grams and look for a better performance. The image
representation alone did not perform better than only using the text, but when we
combine both representations it increases the results.</p>
      <p>In order to study the remarkable performance of the n-grams, we extract the best 10
n-grams for the words group from the English and Spanish training corpus that were
obtained using the chi2 distribution and then cherry pick the best 5, Table 2 shows these
group of words. In this table we can appreciate the selection of words that people prefer
to use when they tweet about something of their interest.</p>
      <p>English Spanish
1 word 2 words 3 words 4 words 1 word 2 words 3 words 4 words
’cute’ ’More for’ ’have the best’ ’have the best day’ ’amiga’ ’mi novio’ ’el gol de’ ’de Trump https co’
’girls’ ’my bed’ ’in the league’ ’liked YouTube video from’ ’equipo’ ’te amo’ ’en mi corazón’ ’EE UU https co’
’league’ ’my mum’ ’so excited to’ ’new photo to Facebook’ ’gol’ ’gol de’ ’más grande de’ ’en EE UU https’
’lovely’ ’my wife’ ’happy birthday mate’ ’photo to Facebook https’ ’jugador’ ’un equipo’ ’porque no me’ ’la vida https co’
’mum’ ’the league’ ’have lovely day’ ’posted new photo to’ ’partido’ ’mi corazón’ ’que mi mamá’ ’que si https co’</p>
      <p>To analyze the performance of the image model, we select some images and get
the probabilities from our model of been post by male or female. Figure 4 shows the
probabilities from some pictures from the three languages. For the English users, we
can appreciate that sport’s images related are more common for males and landscapes
or cats are more common for females. We also present images from the Spanish users
where for male is more common to post about sports and video games and for the
females their prefer pictures from artist and landscapes. Last part of the figure shows
images from the Arabic user where we can appreciate that males have a high probability
of posting something related to sports too (even greater than English and Spanish) and
for females is common to post more colorful pictures. But in general there are a lot of
neutral pictures about politics, social events or comic images that are harder to classify.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusions</title>
      <p>In this notebook we presented an approach in order to determine the gender of a user
using the tweets and images they post. For the text part we used four different
strategies, where the n-gram Ensemble gets the best overall performance. We used a n-gram</p>
      <sec id="sec-8-1">
        <title>Spanish</title>
      </sec>
      <sec id="sec-8-2">
        <title>Arabic</title>
        <p>representation and analyze it to see the different group of words that have most
discriminative value to give weight to the classes, in this representation we could see that the
model capture important words that the users post when they tweet about something.
For the image part we used a VGG16 model to extract features from the images and
capture the kind of image that people usually post. The images alone did not get the
expected results, due the similarity of the image topics about politics or social events.
Then for the final step we concatenate the features from the text and images to see if the
model could gain extra information for the classification. With these experiments we
obtained evidence that only text information gives better results than only using the
images, but the features combined improves the results in the training and test sets proving
our hypothesis.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Walck</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Hand-book on Statistical Distributions for experimentalists</article-title>
          .
          <source>Internal Report SUF-PFY/96-01</source>
          , Stockholm (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          : Speech and
          <string-name>
            <given-names>Language</given-names>
            <surname>Processing</surname>
          </string-name>
          .
          <article-title>An Introduction to Natural Language Processing</article-title>
          , Computational Linguistics, and
          <string-name>
            <given-names>Speech</given-names>
            <surname>Recognition</surname>
          </string-name>
          .
          <source>Third Edition draft. Chapter</source>
          <volume>4</volume>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies)</article-title>
          .
          <source>Graeme Hirst</source>
          , (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lopez-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-Y-Gomez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villasenor-Pineda</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VillatoroTello</surname>
          </string-name>
          , E.:
          <article-title>Inaoe's participation at pan'13: Author profiling task</article-title>
          .
          <source>In: Notebook Papers of CLEF 2013 LABs and Workshops</source>
          , Valencia, Spain, (
          <year>September 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y:</given-names>
          </string-name>
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>CoRR</source>
          , (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schütze</surname>
          </string-name>
          , H.:
          <article-title>Comparative Study of CNN and RNN for Natural Language Processing</article-title>
          . CoRR, (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Moreno-Lopez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalita</surname>
          </string-name>
          , J.: Deep Learning applied to NLP. CoRR, (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolovn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Bag of Tricks for Efficient Text Classification</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Use of language and author profiling: Identification of gender and age</article-title>
          .
          <source>Natural Language Processing and Cognitive Science</source>
          , page
          <volume>177</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pellegrin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sánchez-Vega</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>López-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villaseñor-Pineda</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villatoro-Tello</surname>
          </string-name>
          , E.:
          <article-title>A visual approach for age and gender identification on Twitter</article-title>
          . IOS Press (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwyer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medvedeva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rawee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haagsma</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
          </string-name>
          , M.:
          <string-name>
            <surname>N-GrAM</surname>
          </string-name>
          :
          <article-title>New Groningen Author-profiling Model. Notebook for PAN at CLEF (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-</surname>
            y-Gómez,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.Y.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.)
          <article-title>Working Notes Papers of the CLEF 2018 Evaluation Labs</article-title>
          .
          <source>CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Agrawal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goncalves</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Age and Gender Identification using Stacking for Classification</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Bakkar-Deyab</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duarte</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonçalves</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Author Profiling Using Support Vector Machines</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Bougiatiotis</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krithara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Author Profiling using Complementary Second Order Attributes and Stylometric Features</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Adame-Arcia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro-Castro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortega-Bueno</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muñoz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,:
          <article-title>Author Profiling, instance-based Similarity Classification</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Akhtyamova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardiff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Twitter Author Profiling Using Word Embeddings and Logistic Regression</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Franco-Salvador</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plotnikova</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pawar</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benajiba</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Subword-based Deep Averaging Networks for Author Profiling in Social Media</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Martinc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Škrjanec</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zupan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollak</surname>
            ,
            <given-names>S.: PAN</given-names>
          </string-name>
          <year>2017</year>
          :
          <article-title>Author Profiling - Gender and Language Variety Prediction</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Schaetti</surname>
          </string-name>
          , N.:
          <source>UniNE at CLEF</source>
          <year>2017</year>
          :
          <article-title>TF-IDF and Deep-Learning for Author Profiling</article-title>
          .
          <source>Notebook for PAN at CLEF</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Azam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gavrilova</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Gender prediction using individual perceptual image aesthetics</article-title>
          .
          <source>Journal of WSCG</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ):
          <fpage>53</fpage>
          -
          <lpage>62</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsuboshita</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kato</surname>
          </string-name>
          , N.:
          <article-title>Gender estimation for sns user profiling using automatic image annotation</article-title>
          .
          <source>In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ,
          <string-name>
            <surname>July</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Hum</surname>
            ,
            <given-names>N. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chamberlin</surname>
            ,
            <given-names>P. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hambright</surname>
            ,
            <given-names>B. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portwood</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schat</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bevan</surname>
            ,
            <given-names>J. L.:</given-names>
          </string-name>
          <article-title>A picture is worth a thousand words: A content analysis of Facebook profile photographs</article-title>
          .
          <source>Computers in Human Behavior</source>
          ,
          <volume>27</volume>
          (
          <issue>5</issue>
          ):
          <fpage>1828</fpage>
          -
          <lpage>1833</lpage>
          , (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Merler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          :
          <article-title>You are what you tweet...pic! gender prediction based on semantic analysis of social media images</article-title>
          .
          <source>In 2015 IEEE International Conference on Multimedia and Expo (ICME)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          , June (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Taniguchi</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakaki</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shigenaka</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsuboshita</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohkuma</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A Weighted Combination of Text and Image Classifiers for User Gender Inference</article-title>
          , pages
          <fpage>87</fpage>
          -
          <lpage>93</lpage>
          . Association for Computational Linguistics (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>