<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Twitter Text and Image Gender Classification with a Logistic Regression N-gram Model</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Groningen</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>We present our participation in the PAN 2018 Author Profiling shared task, classifying authors on gender for English, Arabic and Spanish. We participated in all sub-tasks and propose a system for classification with text, images and the combination of those two. Our final submitted system is a Logistic Regression classifier that uses word and character n-grams as textual features and a set of automatically derived image-based features such as the presence, proportion and number of faces to detect selfies as well as the faces' emotions and gender. We experimented with word embeddings, which negatively affected our system's performance. Our cross-validated training results shows slight improvements in performance for Arabic and Spanish when image-based features are added to text-based features. Our highest scores on the PAN 2018 test dataset are accuracies of 81.2% for English using only text-based features, 78.7% for Arabic using both text- and image-based features and 80.3% for Spanish using only text-based features. Overall, we finished 6th in the global ranking with an average accuracy for our text and image combination system of 79.6%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The field of author profiling is about inferring traits from an author such as gender, age
and personality. With the rise of social media platforms, such as Twitter and Facebook,
the field of author profiling has gained more interest. From multiple viewpoints, it is
desirable to profile an author. Examples of such viewpoints could be from a security
point of view in order to detect authors with criminal intentions and from a marketing
point of view in order to narrow down target audiences for online advertisements.</p>
      <p>
        In the past years, multiple shared tasks have been organized on the topic of author
profiling [18,16]. In this paper, we describe our approach for the Author Profiling shared
task at PAN 2018 [17]. This year’s Author Profiling task, is the 6th iteration of this
task and is slightly different from the previous years, since the gold standard data now
includes images. The task is to built a system to classify a Twitter author’s gender by
100 of its tweets and 10 posted images. Though the images are new for this shared task,
previous work already created systems that are capable of detecting gender, emotional
expressions and personalities from images [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. By combining such image classification
systems with textual classification systems, it can be determined whether this addition
of images can improve the final accuracy.
      </p>
      <p>
        In the last two years, the winning systems for 2016 [22] and 2017 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] were both
SVM classifiers that made use of word n-grams and character n-grams. Although
deeplearning methods were introduced, such as Recurrent Neural Networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
Convolutional Neural Networks [19,20], they haven’t been able to beat those systems yet.
Therefore, our approach will focus on the successful models of the previous iterations
of the shared task, a SVM classifier such as in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and a Logistic Regression classifier as
used in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], in which we will take these systems as baselines and try to improve them
by performing a parameter search, experimenting with word embeddings and adding
image-based features. The latter includes a feature to indicate selfies, since females
tend to post more selfies than males [
        <xref ref-type="bibr" rid="ref5">5,21</xref>
        ].
2
2.1
      </p>
      <p>Method</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>
        The PAN 2018 training corpus consists of tweets from three different languages,
English, Spanish and Arabic. For each author there are 100 tweets and 10 images labeled
by gender. The gender labels (male and female) are evenly distributed over the training
corpus. Table 1 shows an overview of the PAN 2018 training corpus released by the
organization.
The main set of features we used were n-grams. The winners of the previous year’s
Author Profiling shared task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], as well as [
        <xref ref-type="bibr" rid="ref1 ref10 ref12 ref13 ref4 ref6 ref9">1,4,6,9,10,12,13,19</xref>
        ] showed that word
ngrams and character n-grams are very robust features for this task. Another advantage
of using n-grams is that they are non-handcrafted features and thus easy to generate.
Also, there is no dependence on either pre-trained word embeddings, or large corpora of
text to train word embeddings. For all three languages, we experimented with different
lengths of word and character n-grams.
Aside from the n-gram features, we experimented with using word embeddings. For
English, we experimented with pre-trained word embeddings from GloVe [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. We used
embeddings with vector lengths of 100 and 200 dimensions that were created from a
corpus consisting of 2 billion tweets containing 27 billion tokens. For Spanish, we used
pre-trained word embeddings from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with vector lengths of 200 dimensions. These
embeddings were constructed from a total amount of 58.7 million Spanish tweets
having 1.1 billion tokens. For Arabic, we trained our own word embeddings from roughly
70 million recently scraped Arabic tweets with vector lengths of 200 dimensions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2.4 Images</title>
      <p>
        This year’s new addition to the gender classification task is classification by images. Our
approach to use the images for this task is to utilize existing image feature extraction
tools from related research. In our system, we have used the software used in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In that study, a convolutional neural network (CNN) was implemented to find and
classify faces in images by gender and emotion. The CNN model contains of 4 residual
depth-wise separable convolutions, whereby each convolution is followed by a batch
normalization operation and a ReLU activation function. The last layer of the model
applies a global average pooling and soft-max activation function to produce the
prediction. The system achieved an accuracy of 95% on the IMDB gender dataset and
66% on the FER-2013 emotion dataset. That system, including all code and pre-trained
models are available under an open-source license.1</p>
      <p>
        The software from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] was implemented in our system without preprocessing the
images. The software converts the images from a Twitter user to a set of 13 features:
1. Average number of faces
2. Number of images that include a face
3. Average area the faces take up
4. Average area the largest face take up
5. Average number of men
6. Average number of women
7. Percentage of faces being angry
8. Percentage of faces being disgusted
9. Percentage of faces being fearful
10. Percentage of faces being happy
11. Percentage of faces being sad
12. Percentage of faces being surprised
13. Percentage of faces being neutral
      </p>
      <p>The first two features are about the presence and number of detected faces in the
images. In Table 2 are the number of faces detected for each language and gender. We
see that there is little to no difference between the genders for which a face is detected.</p>
      <p>
        Features three and four cover the relative area of images that are covered by a
detected face. These features are intended to capture selfies. Previous research [
        <xref ref-type="bibr" rid="ref5">5,21</xref>
        ]
studied selfie-related behaviours between males and females. One of the findings from those
studies is that females tend to make and post more selfies on social media. Having a
      </p>
      <sec id="sec-3-1">
        <title>1 https://github.com/oarriaga/face_classification</title>
        <p>large face area on an image with only one detected face could identify that an image is
a selfie, and therefore these features could be helpful in the classification task.</p>
        <p>Features five and six are about the gender of the detected faces. The values of these
two features are floats ranging from 0 to 1, indicating the proportion of each gender. In
Table 3 are the probabilities for a gender posting more faces of a specific gender in a
image. We see that for all languages, males are posting more images of male faces than
female faces, especially for Arabic there is a large difference in male and female faces.
English and Spanish females are slightly posting more images with female faces than
male faces, except for Arabic, in which female user post more male faces than female
faces, but still to a lesser extent compared to their male counterparts.</p>
        <p>
          Lastly, when one of the seven emotions from [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] could be detected, which the
software was not always capable of, we stored the proportion of these emotions in seven
float values ranging from 0 to 1. Table 4 shows an overview of the emotions for each
gender and language. The table shows that, generally, there are small to no differences
between male and female regarding emotions. The only conclusion that holds for all
languages is that females tend to post more happy people. For English, males post more
images of angry people, but for Arabic this is the opposite. Also, no one is ever
surprised, raising the question whether the system of [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] can accurately detect this. Overall,
we expect that these features will not be (very) beneficial for our system.
2.5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Models</title>
      <p>
        To get to our best system, we experimented with different classifiers. We used the
Python package Sklearn [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to implement the LinearSVC classifier as used in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
the Logistic Regression classifier with the parameters C = 1e2 and fit_intercept = False
as used in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we also tried a K-Nearest Neighbour classifier.
      </p>
      <p>
        The results of all tested classification models can be found in Table 5. For every
model,we measured its performance by accuracy in a 10-fold cross-validation setup.
The models are all using the n-gram features used as in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We found that using the
Logistic Regression classifier resulted in the best performance, meaning we will use
this classifier for our next experiments.
      </p>
      <p>
        For the logistic Regression model we performed a parameter search, mainly to find
the optimal number of word and character n-grams. Our baseline n-gram model was
the n-gram model used in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which was using word 1- and 2-grams and character
3to 5-grams. We tested different settings of n-grams but we were unable to outperform
the settings from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Table 6 shows the results for the best settings, as well as the best
results found for word and character n-grams apart.
bag of words
word n-grams (n=1,2)
char n-grams (n=3,4,5)
word n-grams (n=1,2) + char n-grams (n=3,4,5) 0.831 0.779 0.776
En
      </p>
      <p>Ar</p>
      <p>Es
For the preprocessing of the text data we lowercased all tweets and subsequently
tokenized the tweets with the NLTK Tweet Tokenizer.2 We also replaced every username
with @username and every URL to URL.</p>
      <p>Table 7 shows that our preprocessing methods do indeed improve performance for
each language. Especially generalizing over URLs was beneficial.
In this section we will report the results of our systems on the training corpus (10-fold
CV) and final test set.</p>
      <p>
        The results in Table 8 shows that only using n-grams as features results in having a
good baseline result that is in line with the findings in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The model that only uses
embeddings performs worse than the model that utilizes n-grams. Moreover, the model
that combines embeddings and n-grams also performs worse.
      </p>
      <p>
        The image-only model performed poorly with accuracies around 60%. However,
using these features in combination with the n-gram features gave us slight performance
improvements for Arabic and Spanish. Although we have found an increase of accuracy
in Arabic, approximate randomization testing [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]3 showed us that this improvement is
not significant.
3.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Official Results</title>
      <p>We handed in three final systems; one for classification based on text-only, one for
images-only and one for the combination of text and images. For all of the three final
systems, we used a Logistic Regression classifier.</p>
      <sec id="sec-5-1">
        <title>2 http://www.nltk.org/</title>
        <p>3 We used the script by Vincent Van Asch https://www.clips.uantwerpen.be/
scripts/art
In this paper, we presented our approach for the PAN 2018 author profiling shared task
for predicting an author’s gender using text-based and image-based features. We
submitted a Logistic Regression classifier using word and character n-grams as text-based
features and several automatically extracted image features. We found that only using
text-based n-gram features gave us the best results for English and Spanish, whereas the
combination of text-based and image-based features gave us the best results for Arabic.
As additional text-based features we tested word embeddings, but results on the training
data shows that these rather hurt our system’s performance.</p>
        <p>
          For this shared task we experimented with using images to predict an author’s
gender. We used an image feature extraction tool to classify detected faces in images on
gender and emotion. We also tried to construct a feature that could indicate selfies, as
females tend to post more selfies than males. Our results showed that only using such
image-based features are performing poorly with accuracy scores around 60% with a
Logistic Regression classifier. Adding these features to a text-based n-gram model does
not influence the score much. The images decreased the scores slightly on English and
Spanish, but gave us a small improvement on Arabic on the PAN 2018 test dataset.
Our submitted system only used image-based features extracted from detected faces,
but data showed that not all images includes a face. Therefore, for future research we
suggest a system that enlarges the set of image-based features.
16. Potthast, M., Pardo, F.M.R., Tschuggnall, M., Stamatatos, E., Rosso, P., Stein, B.: Overview
of pan’17 - author identification, author profiling, and author obfuscation. In: CLEF (2017)
17. Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M., Stein, B.: Overview of the 6th
Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter. In:
Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.) Working Notes Papers of the CLEF
2018 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep
2018)
18. Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of
the 4th author profiling task at pan 2016: cross-genre evaluations. In: Working Notes Papers
of the CLEF 2016 Evaluation Labs. CEUR Workshop Proceedings/Balog, Krisztian [edit.];
et al. pp. 750–784 (2016)
19. Schaetti, N.: Unine at clef 2017: Tf-idf and deep-learning for author profiling. Cappellato et
al.[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] (2017)
20. Sierra, S., Montes-y Gómez, M., Solorio, T., González, F.A.: Convolutional neural networks
for author profiling. Working Notes Papers of the CLEF (2017)
21. Sorokowski, P., Sorokowska, A., Oleszkiewicz, A., Frackowiak, T., Huk, A., Pisanski, K.:
Selfie posting behaviors are associated with narcissism among men. Personality and
Individual Differences 85, 123–127 (2015)
22. op Vollenbroek, M.B., Carlotto, T., Kreutz, T., Medvedeva, M., Pool, C., Bjerva, J.,
Haagsma, H., Nissim, M.: Gronup: Groningen user profiling (2016)
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alrifai</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rebdawi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghneim</surname>
          </string-name>
          , N.:
          <article-title>Arabic tweeps gender and dialect prediction</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Arriaga</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valdenegro-Toro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plöger</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Real-time convolutional neural networks for emotion and gender classification</article-title>
          .
          <source>CoRR abs/1710</source>
          .07557 (
          <year>2017</year>
          ), http://arxiv.org/abs/1710.07557
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwyer</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medvedeva</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rawee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haagsma</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>N-gram: New groningen author-profiling model (2017), conference and Labs of the Evaluation Forum (CLEF</article-title>
          <year>2017</year>
          )
          <article-title>: Information Access Evaluation meets Multilinguality, Multimodality,</article-title>
          and Visualization ; Conference date:
          <fpage>11</fpage>
          -
          <lpage>09</lpage>
          -2017 Through 14-
          <fpage>09</fpage>
          -2017
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ciobanu</surname>
            ,
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zampieri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malmasi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dinu</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          :
          <article-title>Including dialects and language varieties in author profiling</article-title>
          .
          <source>arXiv preprint arXiv:1707.00621</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dhir</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pallesen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torsheim</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andreassen</surname>
            ,
            <given-names>C.S.:</given-names>
          </string-name>
          <article-title>Do age and gender differences exist in selfie-related behaviours?</article-title>
          <source>Computers in Human Behavior</source>
          <volume>63</volume>
          ,
          <fpage>549</fpage>
          -
          <lpage>555</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kheng</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laporte</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granitzer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Insa lyon and uni pasau's participation at pan@ clef'17: Author profiling task</article-title>
          .
          <source>Cappellato</source>
          et al.[
          <volume>13</volume>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kodiyan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hardegger</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neuhaus</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cieliebak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Author profiling with bidirectional rnns using attention with grus: notebook for pan at clef 2017</article-title>
          . In:
          <article-title>CLEF 2017 Evaluation Labs</article-title>
          and Workshop-Working Notes Papers, Dublin, Ireland,
          <fpage>11</fpage>
          -
          <issue>14</issue>
          <year>September 2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kuijper</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenthe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noord</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>Ug18 at semeval-2018 task 1: Generating additional training data for predicting emotion intensity in spanish</article-title>
          .
          <source>In: Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          . pp.
          <fpage>279</fpage>
          -
          <lpage>285</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Markov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gómez-Adorno</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidorov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Language-and subtask-dependent feature selection and classifier parameter tuning for author profiling</article-title>
          .
          <source>Working Notes Papers of the CLEF</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Martinc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skrjanec</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zupan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Pan 2017:
          <article-title>Author profiling - gender and language variety prediction</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Noreen</surname>
            ,
            <given-names>E.W.</given-names>
          </string-name>
          :
          <article-title>Computer-intensive methods for testing hypotheses</article-title>
          . Wiley New York (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ogaltsov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romanov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Language variety and gender classification for author profiling in pan 2017</article-title>
          . Cappellato et al.[
          <volume>13</volume>
          ]
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Oliveira</surname>
            , R.R., de Oliveira Neto,
            <given-names>R.F.</given-names>
          </string-name>
          :
          <article-title>Using character n-grams and style features for gender and language variety classification</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          ), http://www.aclweb.org/anthology/D14-1162
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>