<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multilingual Detection of Fake News Spreaders via Sparse Matrix Factorization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Boško Koloski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Senja Pollak</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Blaž Škrlj</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Science - University of Ljubljana</institution>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jožef Stefan Institute</institution>
          ,
          <addr-line>Ljubljana</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Fake news is an emerging problem in online news and social media. Efficient detection of fake news spreaders and spurious accounts across multiple languages is becoming an interesting research problem, and is the key focus of this paper. Our proposed solution to PAN 2020 fake news spreaders challenge models the accounts responsible for spreading the fake news by accounting for different types of textual features, decomposed via sparse matrix factorization, to obtain easy-to-learn-from, compact representations, including the information from multiple languages. The key contribution of this work is the exploration of how powerful and scalable matrix factorization-based classification can be in a multilingual setting, where the learner is presented with the data from multiple languages simultaneously. Finally, we explore the joint latent space, where patterns from individual languages are maintained. The proposed approach scored second on the 2020 PAN shared task for identification of fake news spreaders.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The notion of fake news refers to distortions of news with the intention to affect the
political landscape and to create confusion and divisions in society. Even if the
phenomenon of fake news is not new, the scale and impact of fake news has never been so
important than today, which can be attributed to the digital transformation of the news
industry, and especially to the rise of social media as a news distribution channel. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
      </p>
      <p>
        One of the crucial problems is the recognition of fake news spreaders. For example,
Twitter bots (fake accounts) are capable of generating fake information and propagating
it through their follower networks, which can impact real-life entities such as stock
markets and possibly even elections [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Automatic detection of such spreaders is thus
becoming one of the key approaches to minimize the manual annotation costs employed
by the social media owners. This work fits under the framework of the PAN author
profiling tasks [
        <xref ref-type="bibr" rid="ref19 ref21">21,19</xref>
        ], and describes our approach submitted to the PAN 2020 shared
task on Profiling Fake News Spreaders on Twitter [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ].
      </p>
      <p>This paper is structured as follows. In Section 2 presents related work, Section 3, we
discuss the problem addressed in this work. Next, in Section 4, we discuss the proposed
method, followed by empirical evaluation and discussion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        A critical mass of fake news can have serious, real-life consequences, and can for
example impact election process [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Distinguishing between real and fake news content has
been addressed by linguistic approaches focusing on text properties, such as the
writing style and content [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and by network approaches, where using network properties
and behavior are ways to complement content-based approaches that rely on deceptive
language and leakage cues to predict deception. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] A very relevant subtopic of fake
news research is detection of fake news spreaders. Commonly, fake news spreaders
are implemented as bots [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], and are able to carry out the spreading process in
completely automated manner. It is still researched, whether active prevention of fake news
spreading is a viable tactic, and to what extent it can be implemented in real-life online
systems [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Further, previous PAN submissions on the topic of bot prediction indicate
(e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), that the best models perform well when different types of textual features,
entailing semantic, as well as morphological information, are used.
      </p>
      <p>
        Twitter fake news spreaders can be captured in their own social bubbles, which was
shown to be an efficient defense tactic [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Here, simple tweet frequency distributions
were already indicative of spurious behavior. Classification via features, such as the
account age and similar was also shown to work well [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In a recent survey [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], the
authors emphasize that fact-checking is an important step in maintaining online social
media quality. By employing automated systems, capable of prioritizing potentially
interesting users, less time is spent on manual curation, which can be an expensive and
time-consuming process.
      </p>
      <p>
        Traditional classifiers with extensive feature engineering seem to be pervasive in
the literature about distinguishing between bots and humans but there was also some
attempts to tackle the task with neural networks. In the recent work, [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a
behavior enhanced deep model (BeDM) that regards user content as temporal text data
instead of plain text and fuses content information and behavior information using a
deep learning method. They report an F1-score of 87.32% on a Twitter-related dataset.
Finally, low-dimensional representations have recently been shown to perform well for
social media-based profiling [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Problem description</title>
      <p>Provided a timeline of chosen tweets of ground truth labeled data consisting of fake
news spreaders and non-spreaders, the goal is to decide if a new author is a spreader of
fake news or not. Formally, we are given a decision problem which states:
Given an author A who tweets in language L 2 fEnglish _ Spanishg and from the
collection of tweets C, given a subset of tweets CA (of an author A),</p>
      <p>CA = t1; t2; : : : ; tn</p>
      <p>where ti represents a tweet content,
find a decision function that maps f : CA 7! author reliability, hence</p>
      <p>(0 a non fake-news spreader;
f (C(A)) =</p>
      <p>1 a fake-news spreader;
This decision problem is specialization of the problem of author profiling. It requires
learning a representation from CA, suitable for approximating f . The provided data
consists of tweets by 300 English and 300 Spanish authors respectively, respectively.</p>
      <p>
        For each author 100 tweets are provided making a total of 300000 English and
300000 Spanish tweets. The balance of classes is consistent for both languages, both
having 150 negative and 150 positive samples, as shown in Table 1.
The following section includes description of the proposed method with the
corresponding intermediate steps.
1. From the original data punctuation is removed
2. URL and hashtags are removed from the result of step (1)
3. stop-words are removed from the output of step (2).
For each author’s collection of tweets we initially define a collection of candidate n
features from the pre-processed data which are iteratively selected and weighted, similarly
to Martinc et. al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Features generated in the construction are based on choosing
following feature types:
– character based: each of the texts is tagged with character n-grams of size 2 and
3 characters and generates a predetermined maximum allowed number of features
ranging from n2 up to 15000 features.
– word based: each of the texts is tagged with word n-grams of size 1 and 2 words
and generates a preconditioned maximum allowed number of features ranging from
n2 up to 15000 features.
      </p>
      <p>At this we have prepared word and character features from each author’s collection of
tweets, ready to be used in the feature selection step.
4.3</p>
      <sec id="sec-3-1">
        <title>Dimensionality reduction via matrix factorization</title>
        <p>
          Next, we perform sparse singular value decomposition (SVD)1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] that can be
summarized via the following expression:
        </p>
        <p>M = U</p>
        <p>V T :
The final representation (embedding) E is obtained by multiplying back only a portion
of the diagonal matrix ( ) and U , giving a low-dimensional, compact representation
of the initial high dimensional matrix. Note that E 2 RjDj d, where d is the number
of diagonal entries considered. The obtained E is suitable for a given down-stream
learning task, such as classification (considered in this work). Note that performing
SVD in the text mining domain is also commonly associated with the notion of latent
semantic analysis.
4.4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Classifier selection</title>
        <p>
          Classification model we aimed for in this task was to be robust yet highly flexible,
one that will score well on the prepared data without using many features or extensive
processing power. Following this goal we conducted a series of experiments, trying
different representations with corresponding linear models as presented in Section 5. The
classifiers used were the following (from scikit-learn [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]): Random Forest, Logistic
Regression and the Support Vector Machines [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
5
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conducted experiments</title>
      <p>Considering the size of the dataset and the distribution of the data within the dataset,
we preformed a series of experiments. All of them aimed to test the pipeline described
in the Section 4. The experiments conducted can be divided into two main categories,
based on the language considered by a given model:
1. Multilingual - Both languages’ data is fused together and is subject to the same
feature construction and representation creation steps.
2. Monolingual - For each language in the dataset, English and Spanish, we create a
separate pipeline, that is also executed exclusively on the data from a given
language.</p>
      <p>For both approaches we performed extensive grid search over parameter space to
find best hyper-parameter configuration with the help of Scikit’s Learn GridSearchCV
function. By doing 10-fold cross validation, the grid consisted of reducing the
dimensions parametrized by k in the following interval:</p>
      <p>k 2 [128; 256; 512; 640; 768; 1024]
and the number of generated n features from the interval</p>
      <p>n 2 [2500; 5000; 10000; 20000; 30000]:
1 https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
Monolingual variant was based on splitting the data from each language separately into
training 90% and 10% validation set, obtaining 270 training examples Ctraining and 30
validation examples Cvalidation. Such splits were obtained for each language. Only
training data was used for feature construction and dimensionality reduction.
Multilingual variant merged the data from both languages and after that the same
approach as previously was applied. Merging the data from both languages potentially
reduces the computational load required to train two separate models. Data was split
into training 90% and 10% validation set, obtaining 540 training examples Ctraining and
60 validation examples Cvalidation. In each iteration we generated n features in R540xn,
reduced them to dimension k obtaining a matrix from the space R540xk.
g(Ctraining, n features) : RNxn SV!D RNxk
where g denotes the 4.3 process.</p>
      <p>Once constructed, the feature space was subject to learning. We experimented with both
logistic regression and linear SVMs and in initially some experiments were conducted
with RandomForest model, of which hyperparameters we optimized in 5-fold cross
validation considering the size of the dataset. Finally, we tested the performance on the
Cvalidation set.</p>
      <p>
        We visualise the distribution of the dataset reduced to 2 dimensions using UMAP
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] dimensionality reduction in Figure 4. Figures 1 and 2 represent the visualization
with the best monolingual model described in Chapter 6, Figure 3 represents the joint
latent space generated by the multilingual model described in the same chapter.
6
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>We constructed two baselines one that was based on TF-IDF on Logistic Regression
(LR) with L1 regularization and the second was doc2vec modeled with RandomForest
(RF) as classifier. The array of experiments conducted yielded the results presented in
Table 2, and the outcomes of our final submission in Table 3.</p>
      <p>As discussed in Section 5 all training was conducted by using Ctraining data and the
validation was done on Cvalidation set. The next presented Table 2 shows the model results
as measured on TIRA training evaluation on the whole Cvalidation [ Ctraining data.</p>
      <p>The final un-official evaluation as reported on TIRA’s page is presented in Table 3.</p>
      <p>The Model column in Table 2 refers to the classifiers used, such that if two classifiers
are present the model is monolingual - the first classifier is for English and the second
one for Spanish and in case the model is multilingual only one classifier is used. The
type column discriminates between the number of languages the model is trained on.
Name column consists of vectorizer used and is followed by dimension size or type of
tokinizer used or, dimensions column denotes the number of dimenstions SVD reduces
to.</p>
      <p>As it can be seen the highest evaluation score on our training data was obtained
by the multilingual model tfidf_large, with the following hyper-parameters: k = 768
dimensions, n = 5000 features, Logistic Regression classifier with 2 = 0.002 and
fit_intercept= F alse.</p>
      <p>Monolingual model that preformed best is tfidf_cv which for English is
paramatrized as SVM model with the following hyper-parameters: = 0:001, 1 = 0.8 while
penalizing elastic-net, loss-function = hinge and power_t = 0.5 and for Spanish of SVM
model with hyper-parameters: = 0:0005, 1 = 0.25 while penalizing elastic-net,
lossfunction = hinge and power_t = 0.9 .</p>
      <p>The more detailed insight into the performance of the best performing models over
the inference of the number of word and char n-grams and the accuracy on the 5fCV of
the models is also given in Figures 5 and 6. The figures show the performance of the
best mono and multilingual models – the confidence intervals indicate the variability
obtained when repeating the experiments
The code and the pilot experiments are freely available at https://gitlab.com/skblaz/pan2020.
8</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Conclusions</title>
      <p>
        The series of experiments conducted as a part of this work indicates, n-grams for the
task of Author Profiling are still sufficient and method compared to more complex
methods as transformers and word2vec [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] alike, which can easily overfit when considering
only hundreds of instances. As part of the initial experiments, we also attempted to
include semantic features [25], however, the results were not significantly better (nor
worse), but only added to the computational time, hence such features were omitted
from the final solution. We tried to change the feature space by trying different NLTK
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] tokenizers - TweetTokenizer and the TPOT [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] automatic model generation and
selection, however results obtained were similar to the ones obtained by manual
construction. The joint vector space, obtained by merging the data from both languages
maintains the patterns, observed when projecting individual language data sets,
indicating merging of the data is a suitable tactic that does not result in complete loss of
information.
      </p>
      <p>Further on we can focus on exploring the possibility for detecting fake news
profiles across different languages by first considering Latent Semantic Analysis across
different language settings, further segmenting the semantic space prior to learning.
9</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The work of the last author was funded by the Slovenian Research Agency through a
young researcher grant. The work of other authors was supported by the Slovenian
Research Agency (ARRS) core research programme Knowledge Technologies
(P2-0103), an ARRS funded research project Semantic Data Mining for Linked Open
Data (financed under the ERC Complementary Scheme, N2-0078) and European
Unions´ Horizon 2020 research and innovation programme under grant agreement No
825153, project EMBEDDIA (Cross-Lingual Embeddings for Less-Represented
Languages in European News Media).
25. Škrlj, B., Martinc, M., Kralj, J., Lavracˇ, N., Pollak, S.: tax2vec: Constructing interpretable
features from taxonomies for short text classification. Computer Speech &amp; Language p.
101104 (2020)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Automatic deception detection: Methods for finding fake news</article-title>
          .
          <source>Proceedings of the Association for Information Science and Technology Computer Science</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.: Natural</given-names>
          </string-name>
          <string-name>
            <surname>Language Processing with Python. O'Reilly Media</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bovet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Makse</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          :
          <article-title>Influence of fake news in twitter during the 2016 us presidential election</article-title>
          .
          <source>Nature communications 10(1)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Brigida</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pratt</surname>
            ,
            <given-names>W.R.</given-names>
          </string-name>
          :
          <article-title>Fake news</article-title>
          .
          <source>The North American Journal of Economics and Finance</source>
          <volume>42</volume>
          ,
          <fpage>564</fpage>
          -
          <lpage>573</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zengi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Behavior enhanced deep bot detection in social media</article-title>
          .
          <source>In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI)</source>
          . pp.
          <fpage>128</fpage>
          -
          <lpage>130</lpage>
          . IEEE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>An Emotional Analysis of False Information in Social Media and News Articles</article-title>
          .
          <source>ACM Transactions on Internet Technology (TOIT) 20(2)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gilani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kochmar</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crowcroft</surname>
          </string-name>
          , J.:
          <article-title>Classification of twitter accounts into automated agents and human users</article-title>
          .
          <source>In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining</source>
          <year>2017</year>
          . pp.
          <fpage>489</fpage>
          -
          <lpage>496</lpage>
          . ACM (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Halko</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinsson</surname>
            ,
            <given-names>P.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tropp</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hearst</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Support vector machines</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>13</volume>
          (
          <issue>4</issue>
          ),
          <fpage>18</fpage>
          -
          <lpage>28</lpage>
          (
          <year>Jul 1998</year>
          ). https://doi.org/10.1109/5254.708428, https://doi.org/10.1109/5254.708428
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caverlee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webb</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Uncovering social spammers: social honeypots+ machine learning</article-title>
          .
          <source>In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval</source>
          . pp.
          <fpage>435</fpage>
          -
          <lpage>442</lpage>
          . ACM (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Martinc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Blaž Škrlj Pollak,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Fake or not: Distinguishing between bots, males and</article-title>
          .
          <article-title>CLEF 2019 Evaluation Labs</article-title>
          and Workshop - Working Notes Papers (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Martinc</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skrlj</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pollak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Multilingual gender classification with multi-view deep learning: Notebook for PAN at CLEF 2018</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Soulier</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (eds.) Working Notes of CLEF 2018 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Avignon, France,
          <source>September 10-14</source>
          ,
          <year>2018</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2125</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2018</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2125</volume>
          /paper_156.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>McInnes</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Healy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saul</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Großberger</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Umap: Uniform manifold approximation and projection</article-title>
          .
          <source>Journal of Open Source Software</source>
          <volume>3</volume>
          (
          <issue>29</issue>
          ),
          <volume>861</volume>
          (
          <year>2018</year>
          ). https://doi.org/10.21105/joss.00861, https://doi.org/10.21105/joss.00861
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          . In: Burges,
          <string-name>
            <given-names>C.J.C.</given-names>
            ,
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Welling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Weinberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.Q</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . Curran Associates, Inc. (
          <year>2013</year>
          ), http://papers.nips.cc/paper/5021-distributed
          <article-title>-representations-of-words-and-phrases-andtheir-compositionality</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mustafaraj</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Metaxas</surname>
          </string-name>
          , P.T.:
          <article-title>The fake news spreading plague: was it preventable?</article-title>
          <source>In: Proceedings of the 2017 ACM on web science conference</source>
          . pp.
          <fpage>235</fpage>
          -
          <lpage>239</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Olson</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urbanowicz</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrews</surname>
            ,
            <given-names>P.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavender</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kidd</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Applications of Evolutionary Computation: 19th European Conference</article-title>
          ,
          <source>EvoApplications</source>
          <year>2016</year>
          , Porto, Portugal, March 30 - April 1,
          <year>2016</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , chap.
          <source>Automating Biomedical Data Science Through Tree-Based Pipeline Optimization</source>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>137</lpage>
          . Springer International Publishing (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Pérez-Rosas</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleinberg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lefevre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Automatic detection of fake news</article-title>
          .
          <source>In: Proceedings of the 27th International Conference on Computational Linguistics</source>
          . pp.
          <fpage>3391</fpage>
          -
          <lpage>3401</lpage>
          . Association for Computational Linguistics, Santa Fe, New Mexico, USA (Aug
          <year>2018</year>
          ), https://www.aclweb.org/anthology/C18-1287
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gollub</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiegmann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>TIRA Integrated Research Architecture</article-title>
          . In: Ferro,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>C</surname>
          </string-name>
          . (eds.)
          <article-title>Information Retrieval Evaluation in a Changing World</article-title>
          . Springer (Sep
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franco-Salvador</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A low dimensionality representation for language variety identification</article-title>
          .
          <source>In: International Conference on Intelligent Text Processing and Computational Linguistics</source>
          . pp.
          <fpage>156</fpage>
          -
          <lpage>169</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Eickhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>CLEF 2020 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR-WS.org (Sep</source>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter</article-title>
          . In: Cappellato,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Eickhoff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Névéol</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (eds.)
          <article-title>CLEF 2020 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR Workshop Proceedings (Sep</source>
          <year>2020</year>
          ),
          <article-title>CEUR-WS</article-title>
          .org
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciampaglia</surname>
            ,
            <given-names>G.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varol</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flammini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menczer</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>The spread of fake news by social bots</article-title>
          .
          <source>arXiv preprint arXiv:1707.07592 96</source>
          ,
          <issue>104</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafarani</surname>
          </string-name>
          , R.:
          <article-title>Fake news: A survey of research, detection methods, and opportunities</article-title>
          . arXiv preprint arXiv:
          <year>1812</year>
          .
          <volume>00315</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>