<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Evolutionary Approach to Build User Representations for Profiling of Bots and Humans in Twitter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto López-Santillán</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Carlos González-Gurrola</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Montes-y-Gómez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Graciela Ramírez-Alonso</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olanda Prieto-Ordaz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Ciencias Computacionales, Instituto Nacional de Astrofísica</institution>
          ,
          <addr-line>Óptica y Electrónica, Tonantzintla, Puebla</addr-line>
          ,
          <country country="MX">México</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Facultad de Ingeniería, Universidad Autónoma de Chihuahua</institution>
          ,
          <addr-line>Circuito No. 1, Nuevo Campus Universitario, Apdo. postal 1552, Chihuahua, Chih., México. C.P. 31240</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>In this work, we describe a novel weighting scheme of terms to produce document representations of Twitter posts for the Author Profiling task at PAN 2019. The purpose of this task is to predict the type and gender of the author, bot/human, and bot/female/male, respectively. The novelty of our approach resides in that we use an evolutionary approach to successfully combine traditional statistics features, e.g. tf, idf, generating, then an improved weighted scheme. Results suggest that our proposal outperforms the baseline, which uses the mean of the word embeddings as the weight, for up to 6% in accuracy when predicting author type in the English language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Bots are autonomous programs that aim to pose as humans on online platforms. They
can be employed in a variety of applications that range from Customer Service activities
to attempting to influence mass opinion. Take for instance the 2016 U.S. presidential
election, where political organizations influenced voters through the use of these
software agents [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The problem of correctly identifying bots out of human beings has
been addressed following a variety of approaches that include computing some
statistics of its behavior summarized for example on the "friend to followers ratio", "friend
count" and "follower count" [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, given its constant improvement in its
principles, more sophisticated approaches are needed with the capacity to analyze even the
discourse that they create.
      </p>
      <p>
        Author Profiling (AP) consist in predicting characteristics of authors based on
information that they create or share. Based on the characteristic of bots it comes naturally
then, to approach its recognition as an AP Task. The PAN at CLEF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a yearly
competition for AP, is proposing for 2019 to identify bots in two prediction problems: type of
writer (human or bot) and gender (bot, female or male), based on anonymized Twitter
posts [
        <xref ref-type="bibr" rid="ref10 ref11">11,10</xref>
        ].
      </p>
      <p>
        For this competition we addressed the problem using Word Embedding (WEs) and
Genetic Programming (GP), the latter to calculate a weighting-scheme so the former
could be aggregated to produce Document Embeddings (DEs). The DEs were later
deducted with the first component, obtained with Principal Component Analysis (PCA), to
boost the accuracy of prediction. Combining WEs to produce DEs has been attempted
before, using aggregate functions such as sum, mean and median [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nonetheless we
propose custom aggregate formulas evolved through GP, which to the best of our
knowledge have been used for the AP problem only in our more comprehensive work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and
in the present report.
      </p>
      <p>
        In the present document we report the results attained by our approach in the
training and competition stages at PAN 2019 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Next we demonstrate our methodology,
the results and conclusions.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>
        Our proposed approach to solve the bot / human distinction, depicted in Figure 1, uses
an ensemble of novel ML and Natural Language Processing (NLP) algorithms. The
competitive results attained in other datasets by this technique, rises the question of
whether this method could deliver a practical outcome in the PAN 2019 AP task. We
have already tested a variant of this methodology in the AP datasets from PAN
20132018, achieving promising results as demonstrated in the work of López-Santillán et. al
in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Next we elaborate on the main steps of this approach.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Word Embeddings</title>
        <p>
          The first step in our technique is to construct a vocabulary of Word Embeddings (WEs)
from the training dataset. To do so we used the Skip-gram variant of word2vec (w2v)
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] to produce our own WEs set. We also tested several WEs algorithms such as GloVe
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and fastText [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], but preliminary results showed a better performance of w2v in
our task. Furthermore, we tried pre-trained WEs from large datasets like Wikipedia,
available for the English and Spanish languages. Nonetheless, WEs trained with the
PAN 2019 dataset delivered better accuracy.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Computing Statistical Features of Dataset</title>
        <p>Simultaneously with the previous step, we calculate several statistics from the training
data. These variables represent the importance of terms within the dataset from different
perspectives. In Table 1, we present the list of features that we considered (below an
explanation of what each variable captures). At the end of this stage there will be seven
different dictionaries, all consisting of the words in the vocabulary of the dataset and
their respective values according to each formula.
tf-idft;d = tf (t; d) (1 + log tft;d) log dNft</p>
        <p>W = HF
X 0 is the term frequency by user, t is a word from the vocabulary and d is a set of
documents (tweets) by user.</p>
        <p>X 1 is the idf value, which represents the importance of the term across the dataset,
N is the total number of documents (tweets) in the dataset and dft is the number of
documents where the term appears, thus assigning words with values accordingly
to how often appear in documents.</p>
        <p>X2 is the term frequency by user, divided by the summation of the length of all
documents (tweets) where the term appears, where D is the number of documents
containing the term.</p>
        <p>X3 computes the distance (number of words) between the 1st. and last appearance
of a term by user, where w0 is the initial position and wlast the last position.
X4 is the distance (number of documents) between the first and last appearance of a
term by user, where d0 is the initial document (tweet) and dlast is the last document
where a term appears.</p>
        <p>X5 establishes the mutual information (dependence) between x (features from a
tf-idf sparse matrix) and y (the target), where P (x; y) is the probability of x given
y, P (x) is the probability of the vector of features and P (y) the probability of the
target.</p>
        <p>X6 is the product of the term frequency in a document and its idf value. This means
that a term that appears frequently in just a few documents (tweets) will have a
greater tf-idf value than a term that appears very often in many documents.
X7 determines the importance of terms according to the most likely topic a
document (tweet) belongs to. Topics are extracted from the dataset using the
NonNegative Matrix Factorization method, where F is a tf-idf sparse matrix of features,
W is an array of the shape (topics, terms_in_vocabulary) which contains values
for all terms according to the topic, and H is a non negative matrix that multiplied
by W results in the original features (F ).</p>
        <p>
          We designed the last variable (X7) particularly for AP tasks. As already mentioned,
a more extensive work has been done in a larger number of datasets, where this feature
has shown promising results in the AP problem, as detailed in López-Santillán et. al [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Composing Document Embeddings: An Evolutionary Approach</title>
        <p>
          Once the WEs and the statistical values for all terms in the training data vocabulary are
computed, we need a strategy to represent all the tweets from each user. The
representation of larger chunks of text is a matter of current interest in the NLP community.
Our approach to compose Document Embeddings (DEs), that comprise all posts from
a user into a single vector, is an aggregation of all WEs from terms in the tweets, using
a Weighted Average Scheme (WAS) of them. To calculate the weights of terms for the
WAS, we opted for the evolutionary algorithm Genetic Programming (GP), which uses
a symbolic regression approach to evolve mathematical equations to approximate a
target, by minimizing the error in each passing generation of new individuals (equations).
GP codes the math equations in form of tree graphs, where the terminal nodes are
variables and constants. We implemented the GP phase using a library called GPLearn in
the Python language [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Table 2 shows the parameters employed by GP.
        </p>
        <p>A different equation was devised for each combination of language / target. For
instance, equation 1 is the evolved formula to aggregate WEs into DEs to predict gender
(bot / female / male) in the Spanish language.</p>
        <p>Where X2wn is the term frequency in documents containing the word; X4wn is the
term dispersion in # of documents (number of Twitter posts between the first and last
appearance of a term) and X5wn is the information gain value of such term to predict
the gender.</p>
        <p>Finally we produce DEs for all Twitter posts of each user, by computing a
weightedaverage of the WEs of all terms in such posts, using the according formula to calculate
the term weights. Equation 2 shows how we integrate the DEs for each user in the
dataset. It is worth noting that each statistical value (as detailed in section 2.2), was
tested collectively as well as individually, yet the evolutionary approach to combine
such statistics, performed better every time.</p>
        <p>DEs-GP-Fmla =</p>
        <p>Pin=1(W En</p>
        <p>Pin=1 GP W eight[wn]</p>
        <p>GP W eight[wn])
(2)</p>
        <p>Where W En is the Word Embedding of the n term in the tweets of each user, and
GP W eight[wn] is the GP computed weight value of n term in the posts.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Boosting the Accuracy of DEs</title>
        <p>
          Inspired by the work of of Arora et. al in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we used Principal Component Analysis
(PCA) to reduce the dimensionality of the DEs, but only its first component was
employed, then we multiply it by its transpose and the DEs themselves. Finally we deduct
such product from the original DEs vectors. Arora et. al. proposed this step as they
found a boost up to 10% in accuracy in textual similarity tasks. Equation 3 shows this
final processing of the DEs.
(DEs f irstComp(DEs) f irstComp(DEs)T )
(3)
        </p>
        <p>Although we did not achieve the 10% boost in accuracy attained by Arora et. al.
results in the training phase showed a small but clear increase in accuracy, probably
because the AP problem is more difficult.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Classification</title>
        <p>The dataset provided included data in the English and Spanish languages. Also the
entries were divided into training and validation partitions. Table 3 details the distribution
of samples in the dataset, note that there is a 70% to 30% proportion for the training
and validation partitions respectively. Moreover each user contains 100 Twitter posts.</p>
        <p>
          We approach the classification stage using a Support Vector Machine (SVM)
algorithm implemented in the Scikit-learn library for Python [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We trained the SVM
classifier using the proper partition, then we performed a Grid-search of hyper-parameters
using the validation partition. Table 4 details the best parameters found to predict each
target. The SVM classifier received the training and validation samples in the form of
the DEs previously built, as detailed in section 2.3.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>Since the dataset was already provided with training and validation partitions, we
evaluated our method using 70% of all data to train and 30% to test. For comparison reasons
we also implemented a common baseline method, which consisted in composing DEs
using the mean of the WEs. Table 5 shows the accuracy results attained in the training
stage, by both the evolutionary weighted-scheme and the baseline (simple mean), to
predict the type of user (bot or human) and gender (bot, female and male). Moreover,
for the competition phase we trained our SVM classifier with the whole available data
(training + validation). Table 6 demonstrates the performance of our approach in the
actual competition test data.
As can be noticed in the results section, our approach outperforms an accepted popular
baseline. Additionally a decrease in accuracy was expected for the competition stage,
as seen in Table 6. We already tested a similar version of the approach explained in this
paper on several datasets (AP task in PAN 2013-2018), and we attained competitive
results, a similar performance is expected for the PAN 2019 dataset. Although we can not
guarantee the same performance or even to score in the top-quartile of the participants,
we believe our approach can be implemented as an engineering application that could
deliver good results in the real world.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work was supported by CONACYT project FC-2410.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , T.:
          <article-title>A simple but tough-to-beat baseline for sentence embeddings (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          ,
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavancas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zangerle</surname>
          </string-name>
          , E.: Overview of PAN 2019:
          <article-title>Author Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection</article-title>
          . In: Crestani,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Heinatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <source>Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ). Springer (Sep
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woolley</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          : Algorithms, bots, and
          <article-title>political communication in the us 2016 election: The challenge of automated political communication for election law and administration</article-title>
          .
          <source>Journal of Information Technology &amp; Politics</source>
          <volume>15</volume>
          (
          <issue>2</issue>
          ),
          <fpage>81</fpage>
          -
          <lpage>93</lpage>
          (
          <year>2018</year>
          ), https://doi.org/10.1080/19331681.
          <year>2018</year>
          .1448735
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>López-Santillán</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-Y-Gómez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>González-Gurrola</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramírez-Alonso</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prieto-Ordaz</surname>
            ,
            <given-names>O.:</given-names>
          </string-name>
          <article-title>A genetic programming strategy to produce document embeddings for author profiling tasks</article-title>
          .
          <source>Under Review</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          . In: Burges,
          <string-name>
            <given-names>C.J.C.</given-names>
            ,
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . Curran Associates, Inc. (
          <year>2013</year>
          ), http://papers.nips.cc/paper/5021-distributed
          <article-title>-representations-of-words-and-phrases-andtheir-compositionality</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.: Glove:
          <article-title>Global vectors for word representation</article-title>
          . In: In EMNLP (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of</article-title>
          CLEF. Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco-Salvador</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A low dimensionality representation for language variety identification</article-title>
          .
          <source>In: Proceedings of the 17th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing</source>
          <year>2016</year>
          ). Springer-Verlag (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 7th Author Profiling Task at PAN 2019: Bots and Gender Profiling</article-title>
          . In: Cappellato L.,
          <string-name>
            <surname>Ferro</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>M.H.L.D.</surname>
          </string-name>
          (ed.)
          <article-title>CLEF 2019 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR Workshop Proceedings. CEUR-WS.org. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (Sep</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Stephens</surname>
          </string-name>
          , T.: gplearn documentation pp.
          <fpage>1</fpage>
          -
          <lpage>55</lpage>
          (
          <year>Apr 2019</year>
          ), https://buildmedia.readthedocs.org/media/pdf/gplearn/stable/gplearn.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Van Der Walt</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eloff</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Using machine learning to detect fake identities: Bots vs humans</article-title>
          .
          <source>IEEE Access 6</source>
          ,
          <fpage>6540</fpage>
          -
          <lpage>6549</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>