<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fake News Spreader Detection on Twitter using Character N -Grams</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Secure Information Technology SIT Rheinstrasse 75</institution>
          ,
          <addr-line>64295 Darmstadt</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>The authors of fake news often use facts from verified news sources and mix them with misinformation to create confusion and provoke unrest among the readers. The spread of fake news can thereby have serious implications on our society. They can sway political elections, push down the stock price or crush reputations of corporations or public figures. Several websites have taken on the mission of checking rumors and allegations, but are often not fast enough to check the content of all the news being disseminated. Especially social media websites have offered an easy platform for the fast propagation of information. Towards limiting fake news from being propagated among social media users, the task of this year's PAN 2020 challenge lays the focus on the fake news spreaders. The aim of the task is to determine whether it is possible to discriminate authors that have shared fake news in the past from those that have never done it. In this notebook, we describe our profiling system for the fake news detection task on Twitter. For this, we conduct different feature extraction techniques and learning experiments from a multilingual perspective, namely English and Spanish. Our final submitted systems use character n-grams as features in combination with a linear SVM for English and Logistic Regression for the Spanish language. Our submitted models achieve an overall accuracy of 73% and 79% on the English and Spanish official test set, respectively. Our experiments show that it is difficult to differentiate solidly fake news spreaders on Twitter from users who share credible information leaving room for further investigations. Our model ranked 3rd out of 72 competitors.</p>
      </abstract>
      <kwd-group>
        <kwd>Author Profiling</kwd>
        <kwd>Fake News Spreader</kwd>
        <kwd>Fake News Detection</kwd>
        <kwd>Deception Detection</kwd>
        <kwd>Social Media</kwd>
        <kwd>Twitter</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Author profiling uses information of people’s writing style to determine specific
characteristics such as the author’s gender, age, personality, or cultural and social context, like
mother tongue and dialects [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Author profiling is not only used in criminal
investigations and in the security sector [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] but also in marketing by specifying the target group.
This year, the author profiling task of PAN 2020 was designed to investigate whether
the author of a Twitter feed is a fake news spreader or not1 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The dataset provided by
the organizers covers two languages: English and Spanish.
      </p>
      <p>Fake news poses a serious threat to our society. They can destroy reputations of
corporations and public figures, can push down the stock price and manipulate peoples
opinions and therefore also their actions. Social media has become an ideal place for
fake news propagation as user-generated content reaches very quickly a broad audience.
Fraudsters use those networks to deceive users and shape specific opinions by making
the reader believe a certain political or social agenda. The sheer mass of false
information spread on the internet has reached new heights and cannot be handled by manual
fact-checking alone. However, automatic recognition of fake news is a challenging task.
Knowledge-based and context-based approaches to combat fake news can be applied,
but only after the fake in the news has been verified by experts. This is often not fast
enough as fake news spread very quickly and reach a broad audience, especially on
social media websites.</p>
      <p>
        Style and content-based approaches are a viable alternative [
        <xref ref-type="bibr" rid="ref13 ref14 ref3 ref6 ref8">14,13,3,6,8</xref>
        ] and have
been proven to be effective in addressing the problem of author profiling in social
networks [
        <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
        ]. Style-based approaches analyze how the author expresses himself while
writing, whereas the content-based approaches consider the topic of the text. We
propose a content-based approach by identifying possible fake news spreaders on Twitter
as a first step towards preventing fake news from being propagated among online users.
We investigate whether it is possible to discriminate authors that have shared fake news
in the past from those who share credible information. We conduct different learning
experiments for the English (EN) and Spanish (ES) language. The performance of our
system is ranked by accuracy. The best-performed models achieve an overall accuracy
of 73% and 79% on the English and Spanish corpus, respectively. The results show that
it is not an easy task to differentiate solidly fake news spreaders from users spreading
credible information. Our model ranked 3rd out of 72 competitors.
      </p>
      <p>In the following, we describe our approach for the author profiling task at PAN 2020.
After a review of related work in Section 2, Section 3 details the Twitter data that was
provided by the PAN organizers and shows some key statistics observed in the corpus.
The preprocessing steps and features used to train our models are detailed in Section 4.
Our models and classification results are discussed in Section 5. We also provide some
information about our alternatively tested methods (Section 6) and conclude our work
in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Potthast et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used the manually fact-checked BuzzFeed news corpus2 and extended
it with linked articles, ratings and other metadata. The enriched BuzzFeed-Webis Fake
1 PAN at CLEF 2020 “Profiling Fake News Spreaders on Twitter”: https://pan.webis.de/
clef20/pan20-web/author-profiling.html
2 https://github.com/BuzzFeedNews/2016-10-facebook-fact-check
News Corpus3 was then used to analyze the writing style of different news creators,
namely mainstream, hyperpartisan and satire news. Hyperpartisan refers to extremely
left-wing or right-wing standpoints. Using the unmasking method, which was originally
proposed for authorship verification by Koppel et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Potthast et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] showed that
the writing style of extremely one-sided news and satire can be distinguished from the
writing style of mainstream news (F1 78%). Fake news, on the other hand, could not be
detected by their style alone [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Liu and Wu [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a method to early detect fake news on social media.
Therefore, a propagation path of each news was constructed as a multivariate time series.
Each tuple in the path is a numerical vector which represents user characteristics who
engaged in spreading the news story. The user features (e.g. length of the user name,
age, followers, account verification) were extracted from the profile and transformed
into a fixed-length sequence. A time series classifier was built incorporating RNN and
CNN to capture the user’s characteristics and to predict whether a given news story is
fake or true. Experiments on two Twitter datasets and a SinaWeibo4 corpus showed that
the model can detect fake news within five minutes after it started to spread. The model
achieved an accuracy of 85% on the Twitter data and 92% on the SinaWeibo corpus.
      </p>
      <p>
        Zhou et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] studied different features of fake news being spread on social
networks, which refer to the news itself, the spreaders of the fake news and the
relationship among the engaged users. Therefore, they analyzed features like the frequency and
number of news that have been spread, the distance of the fake news spreaders in a
network, or the number of user engagements. The existence of the selected patterns
validated in empirical studies that fake news spread farther and attract more readers than
true news. Additionally, fake news spreaders are more connected and engaged than
other users. The accounts of the Twitter users derived from PolitiFact5 and BuzzFeed6.
The extracted features were additionally used to train classifiers such as SVM, KNN,
Random Forests etc. Random Forests performed best among all the other classifiers
achieving an F1-Score of 93% on PolitiFact and 84% on the BuzzFeed corpus.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset and Corpus Analysis</title>
      <p>
        To train our system, we used the PAN 2020 author profiling corpus7 proposed by Rangel
et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The corpus consists of 300 English (EN) and Spanish (ES) Twitter user
accounts each. The tweets of every Twitter user are stored in an XML file containing 100
tweets per author. Every tweet is stored in a &lt;document&gt; XML tag. The tweets were
manually collected and fact-checked. The dataset is balanced which means the data
refers to an equal distribution of class instances. Half of the documents per language
folder are authors that have been identified sharing fake news. The other half are texts
from credible users. Table 1 shows excerpts from the data. Every author received an
3 https://zenodo.org/record/1239675#.XrVvwWgzaUm
4 https://www.weibo.com
5 https://www.politifact.com
6 https://github.com/BuzzFeedNews/2016-10-facebook-fact-check/tree/master/data
7 https://zenodo.org/record/3692319#.XrlnomgzZaQ
alphanumeric author-ID which is stored in a separate text file together with the
corresponding class affiliation. For training and testing, we split the data in the ratio 70/30.
The gold-standard can only be accessed through the TIRA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] evaluation platform
provided by the PAN organizers. The results are hidden for the participants.
      </p>
      <p>EN and ES True News Tweets EN and ES Fake News Tweets
“RT #USER#: Best dunk of the contest no doubt “Jay-Z Must Give Beyonce $5 Million Per
about it. Aaron Gordon robbed again #URL#” Child They Have Together Due to Crazy</p>
      <p>Prenup. . . #URL#”
“RT #USER#: Sure would be an interesting day “RT #USER# #USER# When Obama was
tapto read a book that examines Trump’s obsession ping my phones in October, just prior to
Elecwith the king-like powers of his offic. . . ” tion!”
“A Data-Driven Approach Aims to Help Cities “Why Trump lies, and why you should care
Recover After Earthquakes #URL#” The Boston Globe #URL#”
“Javier Cámara ya es el líder más valorado de “Dictadura pura y dura toma tasas y todos
felilos españoles por delante de Pedro Sánchez, cices #URL#”
según una encuesta #URL# #URL#”
“Me gusta la foto. Una foto con variedad, diver- “GANAR DINERO AHORA ES FACIL –
sidad. Me da la impresion que con más sonrisas Google te paga 15 dólares por contestar
encuesque otras. #URL#” tas #URL# #URL#”
“Navidad en RD: son 3 días gozando, luego 362 “Ortega Smith: ‘VOX expulsará de España a
tollorando y deseando mal a los demás. Dejen su dos los inmigrantes ilegales’ #URL#”
hipocresía !!”
As can be seen in Table 1, the Twitter specific tokens hashtags, URLs and user mentions
were replaced by the providers with the following placeholders: #HASHTAG#, #URL#
and #USER#. Prior to the feature engineering, we analyzed the distribution of different
tokens. Additionally, we determined the sentiment of each tweet (positive, negative, or
neutral) using TextBlob8. For recognizing the named entities (NER), we used the Python
library spaCy. Table 2 shows some key insights for both languages.</p>
      <p>The observations of the corpus content were the following:
– Fake news spreaders:
mention other Twitter users less often (#USER#9).
utilize fewer hashtags (#HASHTAG#).
re-post fewer tweets (RT).</p>
      <p>share slightly more URLs (#URL#).
– Spanish speaking authors use more emojis than English speaking Twitter users.
– Half of the English tweets are based on factual information and most of the Spanish
tweets (90%) are free of emotions.</p>
      <sec id="sec-3-1">
        <title>8 https://textblob.readthedocs.io/en/dev 9 e.g. “@Username”</title>
        <p>– Fake news tend to be more often negative.
– Tweets of true news spreaders tend to be more often positive.
– By counting the named entities no significant difference between the classes could
be established.
– Fake news spreaders tend to tweet slightly more often about other people.
– Uppercased tokens are shared equally by true news and fake news spreaders.
– Spanish fake news spreaders make more often use of capitalized phrases.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Preprocessing and Feature Extraction</title>
      <p>The preprocessing pipeline was performed for both languages (EN and ES) basically.
The steps for cleaning and structuring the data were performed as follows:
1. First, we extracted the text from the original XML document of each user and
concatenated all 100 tweets to a single text.
2. White space between tokens were normalized to a single space.
3. URLs, hashtags and user mentions were left untouched as they are already replaced
by placeholders by default.
4. Numbers and emojis were replaced by the placeholders #NUMBER# and #EMOJI#.
5. Irrelevant signs, e.g. “+,*,/,” were deleted.
6. Sequences of repeated characters with a length greater than three were normalized
to a maximum of two letters (e.g. “LOOOOOOOOL” to “LOOL”).
7. Words with less than three characters were ignored.
8. Stopwords were deleted by using the NLTK (Natural Language Toolkit) library10
for each language separately.
10 https://www.nltk.org/
9. From the NLTK library we additionally used the TwitterTokenizer to tokenize the
words. The tokenizer is suitable for Twitter and other casual speech that is often
used in social networks. Additionally, TwitterTokenizer contains different
regularization and normalization features. We made use of the lowercaser.</p>
      <p>After the Twitter texts were preprocessed, we tested different vectorization techniques
with manual hyperparameter tuning, and by employing scikit-learn’s grid search
function. The hyperparameters were tuned separately for English and Spanish, but the
features we used were mainly language-independent which means that the same set of
features can be used in multi-language domains. The selected features were presented
in Section 3 (e.g. counts of tokens or named entities). The only language dependant
feature we experimented with was the sentiment polarity calculated separately for every
tweet (whether it is positive, negative, or neutral). Besides the handcrafted features, we
also experimented with automatically learned features i.e. term frequency distribution
(tf) and character and word n-grams. Additionally, we made use of Feature Union11 to
experiment with feature concatenation. To convert the tokens to a numerical matrix in
order to build a vector for each language, we made use of:
(1) Scikit-learn’s term frequency-inverse document frequency (TF-IDF)
(2) GloVe12 (Global Vectors for Word Representation) word vectors pre-trained on</p>
      <p>Twitter data as well as custom trained word2vec13 word embeddings
(3) Scikit-learn’s Count Vectorizer</p>
      <sec id="sec-4-1">
        <title>All tested features and their representations are summarized in Table 3.</title>
        <p>We defined the author profiling task as a binary problem predicting whether a tweet was
composed by a fake news spreader or a reliable Twitter user. For each language (EN and
ES) a separate classification model was trained. As mentioned before, for training and
testing, we split the data in the ratio 70/30. We tested different features, vectorization
techniques and dimensionality sizes in combination with a Support Vector Machine
11 https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html
12 https://nlp.stanford.edu/projects/glove
13 https://radimrehurek.com/gensim/models/word2vec.html
(SVM) and Logistic Regression of which we report the best performed ones. For the
final SVM, we used a linear kernel with default hyperparameter values14. Logistic
Regression was also trained by utilizing default hyperparameters15.</p>
        <p>
          The performance of the fake news spreader author profiling task was ranked by
accuracy. Table 4 shows the scores for our final system performed on the official PAN
2020 test set on the TIRA platform [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Accuracy scores were calculated individually
for each language by discriminating between the two classes. Each model was trained
on 70% of the training data. Hyperparameters were tuned on the remaining 30% split.
As the data set is hidden, the four confusion matrix values (TP, TN, FP and FN) and
other metrics like Precision and Recall cannot be provided. Therefore, we display these
classification results and accuracy scores which we achieved on the 30% test dataset
(see Table 5). The highest accuracy in English was obtained using SVM with TF-IDF
weighted character n-grams with range [1; 3] and top 3,000 features. In Spanish, the
best results were achieved using Logistic Regression employing a feature union of
TFIDF weighted character n-grams with range [1; 3] and top 5,000 features and a vector
consisting of character n-gram counts with range [3; 7] and top 50,000 features. The
submitted models achieve an overall accuracy of 73% and 79% on the English and
Spanish corpus, respectively.
14 https://scikit-learn.org/stable/modules/generated/sklearn.svm.
        </p>
        <p>LinearSVC.html
15 https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.</p>
        <p>LogisticRegression.html</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Other Tested Methods and Features</title>
      <p>In this Section, we report our experiments with alternatively tested feature selections
and representation techniques which were not able to keep up with the systems
described above in terms of performance (see Section 5). Besides character n-grams, we
also experimented with word n-grams in the range of [1;7]. Other selected features
comprised counts of emojis, uppercase tokens and phrases, hashtags, user mentions,
URLs and retweets. Additionally, we incorporated sentiment analysis in our vector by
using TextBlob. The selected features we presented in Section 4 and Table 3.</p>
      <p>Besides TF-IDF, we tested term frequencies (tf) and word embeddings as feature
representations. Therefore, we utilized GloVe word vectors pre-trained on Twitter data
as well as custom trained word2vec word embeddings. To combine the different features
in one vector, the inner product space of two vectors was required. First, all texts of
the fake news spreaders were concatenated and vectorized. Then, the cosine similarity
of this vector and every twitter user was determined. The resulting vector comprising a
varying number of features was standardized (using StandardScaler 16). The final vector
was then forwarded to train the SVM and Logistic Regression models. Our aim was to
test whether emotions and sentiments, emojis, or uppercase tokens in fake news could
improve the classification performance. The training results showed that none of those
features or feature combinations could improve the performance in both languages. The
accuracy has even slightly decreased.
7</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Conclusion</title>
      <p>In this paper, we described our participation in the author profiling task at PAN 2020.
The goal was to develop a system for profiling fake news spreaders on Twitter as a
first step towards preventing the propagation of fake news among online users. For
our experiments, we used the PAN 2020 author profiling corpus provided by the
organizers. We conducted different learning experiments from a multilingual perspective,
namely English and Spanish. We evaluated different features, most of them
languageindependent. The features were extracted and had their importance evaluated in the
detection task. We provided some corpus statistics that showed that there are
differences between fake and true news spreaders. We experimented with different features,
vectorization techniques and dimensionality sizes.</p>
      <p>For the English language, our model performed best using SVM with TF-IDF weighted
character n-grams with range [1; 3] and top 3,000 features. For the Spanish language,
the best results were achieved using Logistic Regression employing a feature union
of TF-IDF weighted character n-grams with range [1; 3] and top 5,000 features and a
vector consisting of character n-gram counts with range [3; 7] and top 50,000 features.
The submitted models achieve an overall accuracy of 73% and 79% on the English and
Spanish corpus, respectively. Our model ranked 3rd out of 72 competitors.</p>
      <p>The results showed that it is challenging to detect fake news spreaders in Twitter
data. It was challenging in two ways. First, not every tweet of a fake news spreader is
16 https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
false but a mixture of true and false information. Second, Twitter data is short, noisy
and incorporates platform-specific features (such as user mentions and retweets). The
biggest challenge is the orthography. The tweets are strewn with spelling mistakes
and grammatical errors. Word-level based approaches perform poorly compared to
approaches based on character n-grams.</p>
      <p>In the future, we first want to experiment with style-based approaches in order to
determine whether fake news spreaders can be identified by the writing style alone.
Finally, we plan to experiment with different standardization and pre-processing
techniques as our submitted system does not consider misspelled words.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work was supported by the German Federal Ministry of Education and Research
and the Hessen State Ministry for Higher Education, Research and the Arts within their
joint support of the National Research Center for Applied Cybersecurity ATHENE and
under grant agreement "Lernlabor Cybersicherheit" (LLCS) for cyber security research
and training.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Álvarez-Carmona</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>López-Monroy</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montes-y Gómez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villaseñor-Pineda</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meza</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Evaluating topic-based representations for author profiling in social media</article-title>
          . In: Montes y Gómez,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Escalante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.J.</given-names>
            ,
            <surname>Segura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Murillo</surname>
          </string-name>
          , J.d.D. (eds.)
          <source>Advances in Artificial Intelligence - IBERAMIA 2016</source>
          . pp.
          <fpage>151</fpage>
          -
          <lpage>162</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Argamon</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dhawle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          :
          <article-title>Lexical predictors of personality type</article-title>
          .
          <source>In: Proceedings of the Joint Annual Meeting of the Interface and the Classification Society of North America (01</source>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Leveraging emotional signals for credibility detection</article-title>
          .
          <source>In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          . pp.
          <fpage>877</fpage>
          -
          <lpage>880</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schler</surname>
          </string-name>
          , J.:
          <article-title>Authorship verification as a one-class classification problem</article-title>
          . In: Brodley,
          <string-name>
            <surname>C.E</surname>
          </string-name>
          . (ed.)
          <article-title>Machine Learning</article-title>
          ,
          <source>Proceedings of the Twenty-first International Conference (ICML</source>
          <year>2004</year>
          ), Banff, Alberta, Canada,
          <source>July 4-8</source>
          ,
          <year>2004</year>
          . ACM International Conference Proceeding Series, vol.
          <volume>69</volume>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2004</year>
          ), http://doi.acm.
          <source>org/10</source>
          .1145/1015330.1015448
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.B.</given-names>
          </string-name>
          :
          <article-title>Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks</article-title>
          . In: McIlraith,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence</source>
          ,
          <source>(AAAI-18)</source>
          ,
          <source>the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18)</source>
          , New Orleans, Louisiana, USA, February 2-
          <issue>7</issue>
          ,
          <year>2018</year>
          . pp.
          <fpage>354</fpage>
          -
          <lpage>361</lpage>
          . AAAI Press (
          <year>2018</year>
          ), https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16826
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pérez-Rosas</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lefevre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Automatic detection of fake news</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on Computational Linguistics</source>
          . pp.
          <fpage>3391</fpage>
          -
          <lpage>3401</lpage>
          . Association for Computational Linguistics (
          <year>2018</year>
          ), http://aclweb.org/anthology/C18-1287
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of</article-title>
          CLEF. Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reinartz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bevendorff</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A stylometric inquiry into hyperpartisan and fake news</article-title>
          . In:
          <article-title>The 56th Annual Meeting of the Association for Computational Linguistics (Long Papers)</article-title>
          .
          <source>Association for Computational Linguistics</source>
          (
          <year>2018</year>
          ), http://arxiv.org/abs/1702.05638
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Eickhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>CLEF 2020 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR Workshop Proceedings (Sep</source>
          <year>2020</year>
          ),
          <article-title>CEUR-WS</article-title>
          .org
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Profiling fake news spreaders on twitter</article-title>
          .
          <source>In: PAN at CLEF 2020 Fake News Spreader Twitter Dataset. Zenodo (Feb</source>
          <year>2020</year>
          ), https://doi.org/10.5281/zenodo.3692319
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koppel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Inches</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the author profiling task at pan 2013</article-title>
          .
          <source>In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation</source>
          . pp.
          <fpage>352</fpage>
          -
          <lpage>365</lpage>
          . CELCT (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>B.H.</given-names>
          </string-name>
          :
          <article-title>Profile of a terrorist</article-title>
          .
          <source>Studies in conflict &amp; terrorism</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <fpage>17</fpage>
          -
          <lpage>34</lpage>
          (
          <year>1977</year>
          ), https://doi.org/10.1080/10576107708435394
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Thorne</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vlachos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christodoulopoulos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Fever: a large-scale dataset for fact extraction and verification</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers). pp.
          <fpage>809</fpage>
          -
          <lpage>819</lpage>
          . Association for Computational Linguistics (
          <year>2018</year>
          ), http://aclweb.org/anthology/N18-1074
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wang</surname>
          </string-name>
          , W.Y.:
          <article-title>Liar, liar pants on fire: A new benchmark dataset for fake news detection</article-title>
          .
          <source>In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</source>
          . pp.
          <fpage>422</fpage>
          -
          <lpage>426</lpage>
          . Association for Computational Linguistics (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafarani</surname>
          </string-name>
          , R.:
          <article-title>Network-based fake news detection: A pattern-driven approach</article-title>
          .
          <source>SIGKDD Explor. Newsl</source>
          .
          <volume>21</volume>
          (
          <issue>2</issue>
          ),
          <fpage>48</fpage>
          -
          <lpage>60</lpage>
          (
          <year>Nov 2019</year>
          ), https://doi.org/10.1145/3373464.3373473
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>