<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bots and Gender Profiling on Twitter using Sociolinguistic Features</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edwin Puertas</string-name>
          <email>edwin.puertas@javeriana.edu.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Gabriel Moreno-Sandoval</string-name>
          <email>morenoluis@javeriana.edu.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Flor Miriam Plaza-del-Arco</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Andres Alvarado-Valencia</string-name>
          <email>jorge.alavarado@javeriana.edu.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexandra Pomares-Quimbaya</string-name>
          <email>pomares@javeriana.edu.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>L.Alfonso Ureña-López</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center of Excellence and Appropriation in Big Data and Data Analytics</institution>
          ,
          <addr-line>CAOBA</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pontificia Universidad Javeriana</institution>
          ,
          <addr-line>Bogotá</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad Tecnológica de Bolívar</institution>
          ,
          <addr-line>Cartagena</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universidad de Jaén</institution>
          ,
          <addr-line>Jaén, Andalucía</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Unfortunately, in social networks, software bots or just bots are becoming more and more common because malicious people have seen their usefulness to spread false messages, spread rumors and even manipulate public opinion. Even though the text generated by users in social networks is a rich source of information that can be used to identify different aspects of its authors, not being able to recognize which users are truly humans and which are not, is a big drawback. In this work, we describe the properties of our multilingual classification model submitted for PAN2019 that is able to recognize bots from humans, and females from males. This solution extracted 18 features from the user's posts and applying a machine learning algorithm obtained good performance results.</p>
      </abstract>
      <kwd-group>
        <kwd>Bots profiling</kwd>
        <kwd>gender profiling</kwd>
        <kwd>author profiling</kwd>
        <kwd>sociolinguistic</kwd>
        <kwd>computational linguistic</kwd>
        <kwd>user profiling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recent studies conducted by Yang [15] indicate that there is a steady growth of
autonomous artificial entities known as social bots on digital platforms such as Twitter,
which have allowed them to spread messages and influence large populations with ease.
That study concludes in their research that between 9% and 15% of Twitter accounts
show similar behaviors to bots [
        <xref ref-type="bibr" rid="ref2">2,13,14</xref>
        ].
      </p>
      <p>
        Bots can be designed for doing malicious activities to manipulate opinions in a
certain domain. These bots mislead, exploit, and manipulate social media discourse with
rumors, malware, misinformation, spam, slander, among others [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Some emulate
human behavior to enact fake political support or change the public perception of political
entities [12]. For instance, social bots was distorted the 2016 U.S Presidential election
online discussion, according to a report published by researchers at Oxford University5.
Also, bots are used in the marketing area to manipulate the stock market [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or terrorist
purposes to promote terrorist propaganda and recruitment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The detection of social
bots is therefore an important research endeavor.
      </p>
      <p>
        The automatic detection of bots in social media has attracted the attention of
researchers in recent years. In fact, many techniques to analyze this problem are proposed
in the literature. If we focus on systems based on feature-based machine learning
methods, we found several works, such as the one proposed by David et al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] that study
the first social bot detection framework publicly available for Twitter. It analyzed more
than 1000 features and grouped them into six classes: network, user, friends,
temporal, content, and sentiment. On the other hand, Dickerson [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed SentiBot, an
architecture and associated set of algorithms that automatically identify bots on Twitter
by using a combination of features including tweet sentiment and they conclude that a
number of sentiment related factors are key to the identification of bots.
      </p>
      <p>
        In this paper, the proposal is described as part of our participation in the Bots and
Gender Profiling task of PAN 2019 [
        <xref ref-type="bibr" rid="ref4">11,4</xref>
        ] at CLEF. This task is focused on investigating
whether the author of a Twitter account is a bot or a human. Furthermore, in case of
human, to profile the gender of the author. For that purpose, we study the generations and
analysis of different sociolinguistic features in order to identify how various linguistic
characteristics differ between bots and humans and women and men.
      </p>
      <p>The rest of the paper is structured as follows. In Section 2, we explain the data used
in our methods. Section 3 presents the details of the proposed system. In Section 4, we
discuss the analysis and evaluation results of our system. We conclude in Section 6 with
remarks on future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Description</title>
      <p>This year’s task of author profiles of PAN 2019 is to predict if the author on Twitter is a
bot or a human. The dataset contains tweets in English and Spanish as shown in Table
1. The data is split evenly between human and bot users. The tweets recovered for each
user come from their timeline, which can vary between days and months depending on
the frequency of use. Finally, the last 100 tweets from a timeline were recovered for
each author.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Model Description</title>
      <p>
        In this section, we explain the multilingual predictive model used in our submission.
The model used for the task of Bots and Gender profiling in PAN 2019 [
        <xref ref-type="bibr" rid="ref4">11,4</xref>
        ], was
designed to identify two types of classes: bot and gender. We proposed two hypotheses
in accordance with the attributes of the dataset and the goals of the task, which are
described in detail in Table 2.
5 https://nyti.ms/2mNTwnk
      </p>
      <p>According to the hypothesis presented in Table 2, we proposed two strategies. The
first one generates features from the vocabulary terms used in the tweets. The second
one, computes statistics for each profile to characterize the use of terms, hashtags,
mentions, URLs, and emojis. On the basis of the proposed strategies, the "Training System"
was designed. Figure 1 shows the proposed system to predict bot and gender, which
consists of the following stages: preprocessing, standardization and transformation,
extraction of features, configuration and classification, and testing.
3.1</p>
      <sec id="sec-3-1">
        <title>Preprocessing</title>
        <p>In the preprocessing stage, we use the concatenated vocabulary terms of each user’s
tweets, in order to have only one document per user profile. In addition, we applied the
re-labeling of the hashtags using the word "label_hashtag", the mentions word with the
word "label_mention", the URLs with the word "label_url", and the emojis by
UTF8 were replaced with the word "label_emoji". Finally, globally re-tagged words are
searched and counted.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Normalization and Transformation</title>
        <p>The next stage is associated with the normalization and transformation process. The
normalization process generates random samples for the training and testing process.</p>
        <p>During the transformation process, the vector representation of words is performed
and the features for each user profile are calculated. This process can be configured in
such a way that the vectorial representation of the words can be done with "N-gram" and
the global features related to the tweets of the user profiles can also be parameterized.</p>
        <p>It must be taken into account that the transformation process can be configured in
such a way that the vectorial representation of the words can be done with "N-grams"
and the global features related to the tweets of the user profiles can also be
parameterized.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Feature Extraction</title>
        <p>
          According to [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], human knowledge is distributed among a large number of
information sources, with data volumes constantly growing. Social networks have become
indispensable tools for the automatic understanding of language because they allow us to
model the user’s writing habits by extracting features from the texts published by them.
Bots For the hypothesis of bots classification, it is suggested that bots have less
linguistic diversity than humans. For this reason, it was proposed to use classifiers
that use vocabulary features and linguistic diversity.
        </p>
        <p>Gender For the hypothesis of gender classification, we believe that the vocabulary used
by users can be associated with the use of linguistic features. For this reason,
we analyze the way authors use emojis, hashtags, and mentions in addition to
the vocabulary.</p>
        <p>
          In fact, the main challenge of the task for classifying Bots and Gender is associated
with the detection of writing style on Twitter. According to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], tweets produced by
bots have a high amount of URLs compared to human tweets, thus, calculating the
average of URLs per tweet is a valuable feature for classification algorithms. In addition,
1 stats_avg_word Average word size per tweet
2 stats_kur_word Kurtosis of the variable stats_avg_word
3 stats_label_emoji Amount of emojis per tweet for the profile
4 stats_label_hashtag Number of hastags per tweet for the profile
5 stats_label_mention Number of mentions per tweet for the profile
6 stats_label_url Number of urls per tweet for the profile
7 stat_label_retweets Number of retweets per tweet for the profile
8 stat_lexical_diversity Lexicon diversity for all tweets by profile
9 stats_label_word Number of words per tweet for the profile
10 kurtosis_avg_word Kurtosis of the variable stats_kur_word
11 kurtosis_label_word Kurtosis of the variable stats_label_word
12 skew_avg_word Statistical asymmetry of the variable stats_avg_word
13 skew_label_word Statistical asymmetry of the variable stats_avg_word
14 stats_person_1_sing Number of tweets used by the first person of the singular
15 stats_person_2_sing Number of tweets used by the second person singular
16 stats_person_3_sing Number of tweets used by the third person singular
17 stats_person_1_plu Number of tweets used by the first and second person of the plural
18 stats_person_3_plu Number of tweets used by the third person plural
it is well known that people don’t always spell words, hashtags, mentions, URLs and
emojis correctly. For the aforementioned reasons, we extracted features at two levels:
the tweet and the user profile level. At the tweet level we extracted the words, and the
counts of hashtags, mentions, URLs, and emojis. At the user profile level we integrated
the results obtained in the previous level calculating the average, kurtosis and
asymmetry of the counts of hashtags, mentions, URLs, and emojis. Likewise, we analyze the
lexical diversity comparing the words used in one tweet to the words used in the rest of
the tweets.
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Settings and classifiers</title>
        <p>At the configuration stage, the system will adjust machine hardware parameters such as
processors and threads. In addition, different scenarios can be configured for the use of
the classifiers. Finally, the system may be adjusted to store the best performing vector
words and qualifiers. It should be noted that during the execution of the system, the data
set was divided into 60% for training and 40% for tests for all our experiments.</p>
        <p>On the other hand, based on the goals of the task and on previous results of the
author profiling tasks in the PAN, we analyzed different classifiers such as Naive Bayes
(NB), Gaussian Naive Bayes (GNB), Complement Naive Bayes (CNB), Logistic
Regression (LR), and Random Forests (RF).
3.5</p>
      </sec>
      <sec id="sec-3-5">
        <title>Test</title>
        <p>During the test stage, a software component was developed. It first reads the test data
sets. Then the tweets are processed independently for each user profile. Afterwards,
it calculates the features for each user. Subsequently, vector representation is made.
The best classifiers for bots and gender classes are then calculated. Finally, the best
predictors are exported. Figure 2 shows the "System Test" used by our models.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Analysis of Results</title>
      <p>During the pre-evaluation phase we carried out different experiments and the best ones
were taken into account for the evaluation phase. The system was evaluated using the
usual competition metrics, including Accuracy (Acc), Precision (P), Recall (R) and
F1score (F1). The best systems for bots and gender classification in the pre-evaluation
phase will be explained in detail in the following sections.</p>
      <p>
        It should be noted that the system presented was trained and tested with the dataset
provided by the official site of PAN 2019 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition, submissions were made on
the TIRA platform [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for the task of bots and gender profiles. The results obtained after
evaluating our system with training dataset is shown in Table 4. The system uses various
classification algorithms, such as Random Forest, GaussianNB, ComplementNB and
Logistic Regression. But in the case of the English language Random Forest obtained
better performance for bots and gender. And for the Spanish language Random Forest
had better accuracy for Bots while Logistic Regression had better accuracy for genre.
different datasets provided by the task in that phase were applied. The measure used
was the macro-F1 score, which was used to determine a weighted single value of the
precision and integrity of the models used. It should be noted that the final results were
obtained with the test2 dataset. In the general ranking of the task, we occupy the 33th
position and we occupy the 9th position respected to baseline LDSE.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusion</title>
      <p>
        The task of Bots and Gender profiling CLEF PAN 2019 [
        <xref ref-type="bibr" rid="ref4">11,4</xref>
        ] involved different tasks.
The first one was the preprocessing of the corpus, which was composed of 100 posts
for each user profile, for a total of 300.000 posts. Fortunately, the quality assurance
during this preprocessing was not a challenge because the tweets were cleaned and the
dataset balanced for each one of the target classes. On the contrary, feature extraction
was one of the most significant challenges, because it was necessary to achieve a good
performance with few samples of texts per user profile. To deal with this we decided
to extract features at two levels: the tweet and the user profile. The first level aimed
to obtain traditional counting values of words, hashtags, mentions, URLs, and emojis
per tweet. The second one was intended to explore the author’s diversity based on the
analysis of the features extracted at the first level. The resulted features demonstrated
to be very useful to discriminate bots from humans, and the different genders.
      </p>
      <p>Regarding the second task, the classification itself, it was necessary to evaluate
different techniques with different parametrization and different inputs. The final results
demonstrated that Random Forest and Logistic Regression were the most relevant
techniques for this problem.</p>
      <p>In addition, during the final task, the evaluation of the model, we demonstrated our
hypothesis, the lexical diversity, expressed using the 18 features, is a well discriminant
for the target classes. It is important to highlight that for the classification of bots the
best classifier using the n-grams and the proposed features obtained from the training
dataset got an accuracy of 0.912, and using only the proposed features in the study it
got 0.907 of accuracy. This demonstrates the predictive value of these features for the
bots problem.</p>
      <p>Finally, there are still issues to explore. One important aspect is to improve the
profile analysis from the sociolinguistic point of view integrating features that describe
the interaction dynamics of each user.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>We thank the Center for Excellence and Appropriation in Big Data and Data Analytics
(CAOBA), Pontificia Universidad Javeriana, and the Ministry of Information
Technologies and Telecommunications of the Republic of Colombia (MinTIC). The models and
results presented in this challenge contribute to the building of the research
capabilities of CAOBA. Also, Fondo Europeo de Desarrollo Regional (FEDER) and REDES
project (TIN2015-65136-C2-1-R) from the Spanish Government. Finally, the author
Edwin Puertas gives thank to Universidad Tecnológica de Bolívar. Needless to say, we
thank the organizing committee of PAN, especially Paolo Rosso, Francisco Rangel,
Matti Wiegmann and Martin Potthast for their encouragement and kind support.
11. Rangel, F., Rosso, P.: Overview of the 7th Author Profiling Task at PAN 2019: Bots and
Gender Profiling. In: Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019
Labs and Workshops, Notebook Papers. CEUR-WS.org (Sep 2019)
12. Ratkiewicz, J., Conover, M.D., Meiss, M., Gonçalves, B., Flammini, A., Menczer, F.M.:
Detecting and tracking political abuse in social media. In: Fifth international AAAI
conference on weblogs and social media (2011)
13. Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot
interactions: Detection, estimation, and characterization. In: Eleventh international AAAI
conference on web and social media (2017)
14. Varol, O., Ferrara, E., Menczer, F., Flammini, A.: Early detection of promoted campaigns
on social media. EPJ Data Science 6(1), 13 (2017)
15. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the
public with ai to counter social bots. arXiv preprint arXiv:1901.00912 (2019)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Berger</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgan</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The isis twitter census: Defining and describing the population of isis supporters on twitter</article-title>
          .
          <source>The Brookings Project on US Relations with the Islamic World</source>
          <volume>3</volume>
          (
          <issue>20</issue>
          ),
          <fpage>4</fpage>
          -
          <lpage>1</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zengi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Behavior enhanced deep bot detection in social media</article-title>
          .
          <source>In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI)</source>
          . pp.
          <fpage>128</fpage>
          -
          <lpage>130</lpage>
          . IEEE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galbraith</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danforth</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dodds</surname>
            ,
            <given-names>P.S.:</given-names>
          </string-name>
          <article-title>Sifting robotic from organic text: a natural language approach for detecting automation on twitter</article-title>
          .
          <source>Journal of Computational Science</source>
          <volume>16</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Daelemans</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kestemont</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manjavancas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specht</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tschuggnall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zangerle</surname>
          </string-name>
          , E.: Overview of PAN 2019:
          <article-title>Author Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection</article-title>
          . In: Crestani,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Braschler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Savoy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Losada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Heinatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Cappellato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          , N. (eds.)
          <source>Proceedings of the Tenth International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ). Springer (Sep
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varol</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flammini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menczer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Botornot: A system to evaluate social bots</article-title>
          .
          <source>In: Proceedings of the 25th International Conference Companion on World Wide Web</source>
          . pp.
          <fpage>273</fpage>
          -
          <lpage>274</lpage>
          . International World Wide Web Conferences Steering Committee (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dickerson</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kagan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subrahmanian</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Using sentiment to detect bots on twitter: Are humans more opinionated than bots?</article-title>
          <source>In: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</source>
          . pp.
          <fpage>620</fpage>
          -
          <lpage>627</lpage>
          . IEEE Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ferrara</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varol</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menczer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flammini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The rise of social bots</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <issue>7</issue>
          ),
          <fpage>96</fpage>
          -
          <lpage>104</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Krzywicki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wobcke</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bain</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Compton</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Data mining for building knowledge bases: techniques, architectures and applications</article-title>
          .
          <source>The Knowledge Engineering Review</source>
          <volume>31</volume>
          (
          <issue>2</issue>
          ),
          <fpage>97</fpage>
          -
          <lpage>123</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of</article-title>
          CLEF. Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco-Salvador</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A low dimensionality representation for language variety identification</article-title>
          .
          <source>In: International Conference on Intelligent Text Processing and Computational Linguistics</source>
          . pp.
          <fpage>156</fpage>
          -
          <lpage>169</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>