<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Comparative Evaluation of Term Selection Functions for Authorship
Attribution. Digital Scholarship in the Humanities</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>UniNE at PAN-CLEF 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Catherine Ikae</string-name>
          <email>Catherine.Ikae@unine.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jacques Savoy</string-name>
          <email>Jacques.Savoy@unine.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, University of Neuchatel</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>30</volume>
      <issue>2</issue>
      <fpage>1289</fpage>
      <lpage>1305</lpage>
      <abstract>
        <p>In our participation of the “Profiling Fake News Spreaders on Twitter” task (both in English and Spanish), our main objective is to be able to detect Twitter user accounts used to spread disinformation, fake news, as well as conspiracy theories. To automatically solve these questions based only on the tweets' contents, we suggest to reduce the number of features (isolated words) to a few hundred. This suggested approach is based on a two-stage method ignoring infrequent terms and ranking the others according to their occurrence differences between the two categories. Finally, a classifier is implemented combining decision tree, random forest, and boosting. Our first evaluation experiments indicate an overall accuracy around 70%.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction1</title>
      <p>
        Since 2013, CLEF-PAN has been generating test collections on author profiling with
datasets extracted from social networks (e.g., blogs, tweets) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. During the last years,
UniNE has participated in these text categorization tasks to identify some of the author's
demographics (e.g., gender, age range, psychological traits, geographical origin) or to
know if a set of tweets was created by bots or humans.
      </p>
      <p>For this year, the participants need to implement a system identifying whether or not
a set of 100 tweets was sent by a user spreading fake news (or junk news, pseudo-news,
hoaxes, or in general disinformation). More precisely, the target task could be
rephrased as knowing whether a set of tweets contains fake news (or misleading
content). In fact, the available information is just the tweet contents, and the tweet
context (e.g., number of likes, retweets, etc.) with the author source details (e.g.,
information about the Twitter account) not provided. Moreover, the multimedia
elements are not included.</p>
      <p>
        The first step to solve this question is to define precisely what we mean by fake
news. This is not an easy task, mainly because different variants of fake news can be
encountered [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For example, satire or parody with its irony and sarcasm could
represent the less harmful form of fake news (e.g., Ig (ignoble) Nobel Prize). For
others, humor cannot be viewed as fake news because it is evident that the underlying
information is not true. At a higher level, one can see a sentence extracted from its
context (or embraced in the wrong context) while in its most sophisticated form the
news is entirely fabricated (with additional multimedia elements).
      </p>
      <p>
        Usually fake news [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is rendered as normal customer reviews, political or financial
news as well as advertising but with the objective to favor or undermine the image of a
product or the reputation of a candidate. Their presence could be limited to a few
seconds (e.g., flash ads), to the period of an electoral or advertising campaign, or can
even stay visible longer (to support a conspiracy theory, even an extreme one such as
“Hitler is alive on a Nazi moon base” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
      </p>
      <p>The identification of fake news is still a complex problem. One can take account of
four main sources of evidence, namely a) the news content, b) the news creator or
spreader, c) the post or social context, d) the target audience (e.g., users or news
platforms).</p>
      <p>
        The content of fake information tends to present more emotional words, usually to
evoke anger and fear in the readers [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. They employ more negative forms (e.g., not,
never), usually with more uppercase letters or words, more spelling errors, more
specific punctuation symbols (e.g., !, ?, as well as !!!), hashtags, mentions or hyperlinks.
According to Pennebaker's studies [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], lying writers tend to use less Self words (I, me,
my, mine), but more nouns, and some discrepancy verbs (would, should, could, ought).
When telling the truth, the sentences are longer and more complex, containing more
numbers, more details and more longer words.
      </p>
      <p>The author’s name and the URL of the source could also be pertinent during the
identification. The user credibility could be estimated by his geolocation, the fact that
the account was verified or not, or by the presence of weird tokens in the URL (as well
as uncommon domains). Usually, creators of fake news (humans, bots or cyborgs) will
send many posts during a short time interval. They also tend to have more friends and
followers, and reply more often.</p>
      <p>The social or post context can also provide some information indicating a fake news
spreader such as a larger number of likes, a high intensity of retweets and more shares
and comments than one would expect from a normal user. When monitoring the
temporal activity, some patterns can identify a bot or cyborg activity.</p>
      <p>
        The spreading of fake news presents several advantages for the sender. When
receiving the same fake news many times (echo chamber effect) and particularly when
received by friends, the misinformation is finally accepted as true. For example,
analyzing Trump's tweets, the probability to see the word fake (of faker) just before (or
after) CNN is high (more precisely, one can count 266 occurrences of CNN in which
88 times the term fake news appears in the short context). The same observation is
valid for the New York Times or the Washington Post. After repeating this
misinformation, only 9% of Republicans consider the New York Times as trustworthy
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Therefore, it is not surprising to observe that Conservatives tend to share fake news
more often than Democrats and older persons [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. And this trend continues to
undermine US politics. Now accepting (or spreading) conspiracy theories could be
considered mandatory for a Republican candidate to win a primary election for the
Congress or the Senate [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>The rest of this paper is organized as follows. Section 2 describes the text datasets
while Section 3 describes our feature selection procedure. Section 4 exposes our
classifier and shows some of our evaluation results. A conclusion draws the main
findings of our experiments.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Corpus</title>
      <p>
        When limited to spreading action, one can focus only on the tweet contents. In our
point of view, the problem is therefore to identify a set of tweets containing
disinformation, leading to consider that the user generates and/or spreads fake news
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This task will be performed both in English and Spanish.
      </p>
      <p>When faced with a new dataset, a first analysis extracts an overall picture of the data,
the relationship, and detects and explores some simple patterns related to the different
categories. In the current study, two categories are provided (Category = 0 or 1),
without further information about the precise meaning of the two values. When
observing some examples of tweets reported in Tables 1, one can assume that
Category = 1 means fake news.</p>
      <p>England ease to World Cup win over France #HASHTAG# #HASHTAG# #HASHTAG#
#URL# #URL#
Spain rescues 276 migrants crossing perilous Mediterranean #HASHTAG# #HASHTAG#
#HASHTAG# #HASHTAG# #HASHTAG#…
Italy’s Uffizi demands return of Nazi-looted painting #URL#
Trump invites congressional leaders to border security briefing #URL#
#USER# Merkel is using her IMMIGRATION INVASION as a demographic weapon to
destroy Germany. #HASHTAG# #HASHTAG# #HASHTAG# #HASHTAG# #HASHTAG#
#USER# #USER# Trump is 1/2 Scottish and 1/2 German. Trump will smash the shyster rats.
#HASHTAG# #HASHTAG# #HASHTAG#
With Obama’s Approval, Russia Selling 130 Tons of Uranium to Iran #URL#
FBI admits illegal wiretapping of President Trump, issues apology #URL#</p>
      <p>As one can see in Tables 1, tweets in Category #0 describe facts without expressing
many emotions. In tweets appearing under the second label, the terms belong to swear
expressions (e.g., shyster rats) or tend to cause fear (or anger) (e.g., invasion, destroy).</p>
      <p>The available tweets are included in a training corpus available in English and
Spanish. As depicted in Table 2, the training data contains the same number of
documents in the two categories and in the two languages.</p>
      <p>As each document corresponds to 100 tweets, the mean number of tokens (composed
only by letters) per document is around 1,260 for the English language. For the Spanish
language, the mean length is around 1,508, with a significant difference between the
two categories (Category #0: 1,655; Category #1: 1,361).</p>
      <p>Nb. doc.</p>
      <p>Nb tweets
Mean length</p>
      <p>|Voc|
Hashtag</p>
      <p>URL</p>
      <p>As shown in Table 2, one can observe that the number of hashtags is larger in
Category #0 than in the second one with a difference between them close to 20%. In
addition, the number of URLs (hyperlinks) is higher in Category #1 than in
Category #0. For the English language, the difference is small, but clearly larger for
the Spanish corpus.</p>
      <p>
        As text categorization problems are known for having a large and sparse feature set
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Table 2 also indicates the number of distinct terms per category (or the vocabulary
size denoted under the label |Voc|) which is 20,509 for the English Category #0. Fusing
the two categories, the English corpus counts 29,521 distinct words (or 40,867
wordtypes for the Spanish collection).
      </p>
      <p>For both languages, the vocabulary size is larger for Category #0 than for
Category #1 (English: 20,509 vs. 19,851; Spanish: 29,137 vs. 24,825). The texts sent
when spreading fake news are composed with a smaller lexis implying that the same or
similar expressions are often repeated.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Feature Selection</title>
      <p>To achieve a good understanding of the distinction between normal tweets and
tweets containing fake news, a feature selection function must be applied. As a simple
strategy, the term frequency (tf) or the document frequency (df) have been suggested,
under the assumption that a higher frequency could indicate a more useful feature. Both
functions return similar results and have been shown to be effective approaches for
solving the authorship attribution problem [13]. For example, the Delta method [14] is
based on the 50 to 200 most frequent words to determine the true author of a text.
Identifying disinformation (lies) and authorship identification are however not the same
question.</p>
      <p>Moreover, when considering the most frequent words employed, very similar sets of
terms appear in both categories (e.g., “URL”, “HASHTAG”, “the”, “to”, and some
punctuations symbols (' , : ...)). Thus, a simple feature selection based on the tf
information is not fully effective and the distinction between features associated with
each category is not guaranteed.</p>
      <p>However, it is always useful to ignore features having a low occurrence frequency.
According to the Zipf's law, a large number of word-types appear just once or twice.
According to statistics reported in Table 3, when removing words appearing only once,
the vocabulary size of the English corpus (Category #0) decreases from 20,509 to
10,474 (a reduction of 48.9%). For the Spanish language (Category #0), the reduction
is larger, from 29,137 to 12,882 (a decrease of 55.8%).</p>
      <p>On the other hand, one can encounter terms having a relatively high occurrence
frequency but appearing only in a few documents (one document = a set of 100 tweets).
Thus, we also suggest to remove terms having a low document frequency (df), for
example, with a df &gt; 3. The effect on the vocabulary size is shown in the next to last
row of Table 3. For example, for the English corpus (Category #0), the vocabulary
decreases from 20,509 to 4,636, showing a reduction of 77.4%. Similar decreases can
be observed for the other sub-collections. In this study, we have considered both
frequency counts by ignoring terms having a tf &lt; 6 and a df &lt; 4 as indicated in the last
row of Table 3.</p>
      <p>|Voc|
tf &gt; 1
tf &gt; 3
tf &gt; 5
df &gt; 3
tf &gt;5 and df &gt; 3</p>
      <p>English
Cat. #0
20,509
10,474
5,797
4,136
4,636
2,433</p>
      <p>Cat. #1
19,851
10,672
6,184
4,431
5,247
3,720</p>
      <p>Spanish
Cat. #0
29,137
12,882
6,573
4,463
5,838
3,800</p>
      <p>Cat. #1
24,825
11,514
6,001
4,150
5,386
3,590</p>
      <p>After removing infrequent terms, we propose a feature selection method that works
in two stages. In the first, the term frequency (tf) information is taken into account.
For each term, the discriminative power is computed by estimating the occurrence
probability difference in both categories as indicated in Equation 1. In this case, tfi0
indicates the absolute frequency of the ith term in class c0 (or Category #0), and n0 the
text length (in tokens) of all tweets belonging in class c0 (and similarly with class c1).
(!, ") = (!, ") − (!, #) = $%!" - $%!#
&amp;" &amp;#
(1)</p>
      <p>To determine terms able to describe the Category #0, only terms having a positive
probD value are extracted. Of course, one can impose a stricter constraint by selecting
terms having a probD larger than a threshold. Similarly, only words with a negative
probD value are chosen to represent Category #1. This step generates two term clusters,
one per class.</p>
      <p>After this procedure, one can identify some terms more strongly associated to each
category. For example, Table 4 reports the top fifteen words having the largest value
for each language and category (the negative scores for Category #1 have been
multiplied by -1). For both languages, tweets containing true information have more
hashtags, retweets (rt) and user mentions. Tweets spreading fake news have more URL,
“video” meaning that they tend to refer more often to other websites containing
supporting information (in the form of text, video, etc.).</p>
      <p>For the English language, the names of political leaders (“trump”, “obama”,
“clinton”), or the adjective “new” are more recurrent in tweets spreading
disinformation. This is also an indication that political news is more frequently spread
than other domains. It is interesting to see the verb “says” as a feature indicating fake
news (e.g., reporting a sentence spoken by a well-known person).</p>
      <p>
        Some punctuation symbols (, ... ! : ? or ¿) appear more recurrently in normal tweets
than in fake news (e.g., in the sequence RT #USER#:). The comma is more associated
with longer sentences, usually indicating a real story [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as well as the pronoun I.
      </p>
      <p>English</p>
      <p>Spanish</p>
      <p>
        After this step, one can stop the feature selection by considering the k terms (with k
= 100 to 250) having the highest and smallest probD scores. To go further in this space
reduction, the second step applies an additional feature selection procedure. In this
perspective, previous studies have shown that the chi-square, odds ratio, or mutual
information tend to produce effective reduced term sets for different text categorization
tasks [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [13]. In this study, the chi-square method was selected to reduce the feature
space to a few hundred terms.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>To define the different machine learning models, the scikit-learn library
(Python) was applied [15]. The default setting defined in the library was chosen. The
decision tree approach was applied to define our first model (with Gini function to
measure the node impurity) [16]. As more complex classifiers, the random forest with
200 trees was applied and the final decision was acquired by majority voting. As
another approach belonging to the ensemble learning, the bagging model forms one of
our selected approaches. With boosting, represented by our last model, a set of weak
learners is combined to produce a more effective assignment. More precisely, we chose
the extreme gradient boosting (XGB) [17], [18] based on a set of 100 decision trees
(maximum depth was set to 2).</p>
      <p>When computing the decision for a new set of tweets, the three classifiers determine
a proposed attribution based on the same set of chosen features. To combine the three
resulting decisions, a simple majority vote could be applied, giving the same
importance to each of the three individual classifiers. This solution corresponds to a
democratic vote.</p>
      <p>However, each classifier returns not only the proposed decision but an estimated
probability that the input set of tweets belongs to that category. Thus, our second
approach, called soft vote, adds these three probabilities to determine the final
assignment (this merging strategy was used in our early bird submission).</p>
      <p>To compute the accuracy rates shown in Tables 5, only the training subset is used to
select the feature sets and to generate the document surrogates (the same number of
documents appears in both categories and languages). To achieve a fair evaluation, we
randomly extracted 50 documents from each category to generate the test set.</p>
      <p>From the results depicted in Tables 5, one can see that after reducing the feature set
to a few hundred words one can still achieve a good overall effectiveness. Moreover,
having more features does not imply obtaining a higher effectiveness level. For
example, using in total only 150 features, our model achieves an accuracy rate of 0.81
(English corpus, majority vote) or 0.78 for the Spanish language (majority vote).
Doubling the number of features does not always improve the overall effectiveness
(English corpus: 0.81 vs. 0.75, Spanish: 0.788 vs. 0.79). It is interesting to know that
even if, in mean, combining different classifiers provides a higher effectiveness, the
best solutions for the Spanish corpus are often a single boosting model.</p>
      <p>Table 6 reports our official results achieved with the TIRA system [19] and using
the official test subset of the data. Our first results called early bird results have been
obtained under the soft vote scheme. They appear in the second row in Table 6. Our
official performance was achieved with the majority scheme depicted in the last row in
Table 6.
In both cases, the infrequent terms have been ignored (tf &gt; 5 and df &gt; 3). Then the
top 150 terms having the highest chi-square values have been selected to define the
feature set.</p>
    </sec>
    <sec id="sec-5">
      <title>6 Conclusion</title>
      <p>In our participation to the “Profiling Fake News Spreaders on Twitter” (CLEF PAN
2020) we have worked with tweets written in English and Spanish. Overall, we achieve
the following main findings. First, we suggested a feature selection approach able to
extract a reduced set of features (precisely 150). Based on such a reduced set, it is
possible to identify those features more associated to normal tweets (e.g., I, this, film,
review, episode, etc.). In addition, the conjunction and and the comma appears more
often in normal posts, indicating the presence of longer sentences. In tweets spreading
fake news, one can count more names of political leaders, as well as the terms says,
post, president, she, he, democrat, etc. This is an indication of the presence of posts
reporting opinions and words uttered by other persons.</p>
      <p>Second, our analysis indicates that tweets containing fake news tend to include more
references (URL) (see Table 2) to other webpages than normal tweets, references used
to support the misinformation or to justify some conspiracy theory. On the other hand,
normal tweets present more retweets and hashtags as shown in Table 2.</p>
      <p>Third, our attribution approach is based on a model combining three individual
attributions computed by a decision tree, a boosting, and a random forest classifier. It
was a surprise to see that a simple majority scheme achieved a higher accuracy rate
than a merging approach based on the probability estimates computed by each
individual classifier.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stamatatos</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2019a</year>
          ).
          <article-title>A Decade of Shared Tasks in Digital Text Forensics at PAN</article-title>
          .
          <source>Proceedings ECIR 2019</source>
          , Springer LNCS #
          <volume>11437</volume>
          ,
          <fpage>291</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wardle</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2017</year>
          ). Fake News.
          <source>It's Complicated. First Draft. February</source>
          ,
          <volume>16</volume>
          . (URL: https://firstdraftnews.org/latest/fake-news-complicated/).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ghorbani</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>An Overview of Online Fake News: Characterization, Detection, and</article-title>
          <string-name>
            <given-names>Discussion. Information</given-names>
            <surname>Processing</surname>
          </string-name>
          &amp; Management,
          <volume>57</volume>
          (
          <issue>2</issue>
          ),
          <fpage>102025</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Selk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ). No,
          <article-title>Hitler isn't Alive on a Nazi Moon Base</article-title>
          . Washington Post, May,
          <volume>20</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hart</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Trump and Us. What he Says and Why People Listen</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Pennebaker</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>The Secret Life of Pronouns</article-title>
          . Bloomsbury Press, New York.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Francia</surname>
            ,
            <given-names>P.L.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Going Public in the Age of Twitter and Mistrust of the Media</article-title>
          . In J. C. Baumgartnet, T.L. Towned (eds),
          <source>The Internet and the 2016 Presidential Campaign</source>
          , Lexington Books, Lanham.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Guess</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nagler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Tucker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Less than you Think: Prevalence and Predictors of Fake News Dissemination on Facebook</article-title>
          .
          <source>Sciences Avdances</source>
          ,
          <volume>5</volume>
          ,
          <fpage>eaau4586</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Philips</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Why QAnon Supporters are Winning Congressional Primaries</article-title>
          . Washington Post, June, 13th.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter</article-title>
          .
          <source>CLEF 2020 Labs and Workshops</source>
          , Notebook Papers.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rangel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>An Emotional Analysis of False Information in Social Media and News Articles</article-title>
          .
          <source>ACM Transactions on Internet Technology</source>
          ,
          <volume>20</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <source>Machine Learning in Automatic Text Categorization. Computing Survey</source>
          ,
          <volume>34</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>