=Paper= {{Paper |id=Vol-2696/paper_132 |storemode=property |title=Analyzing User Profiles for Detection of Fake News Spreaders on Twitter |pdfUrl=https://ceur-ws.org/Vol-2696/paper_132.pdf |volume=Vol-2696 |authors=María S. Espinosa,Roberto Centeno,Álvaro Rodrigo |dblpUrl=https://dblp.org/rec/conf/clef/EspinosaCR20 }} ==Analyzing User Profiles for Detection of Fake News Spreaders on Twitter== https://ceur-ws.org/Vol-2696/paper_132.pdf
    Analyzing User Profiles for Detection of Fake News
                  Spreaders on Twitter
                        Notebook for PAN at CLEF 2020

               María S. Espinosa, Roberto Centeno, and Álvaro Rodrigo

               Natural Language Processing and Information Retrieval Group
                   Universidad Nacional de Educación a Distancia, Spain.
             mespinosa@lsi.uned.es, rcenteno@lsi.uned.es, alvarory@lsi.uned.es



       Abstract The massive spread of digital information to which our society is sub-
       jected nowadays has led to a great amount of false or extremely biased infor-
       mation being shared and consumed by Internet users every day. Disinformation,
       including misleading and even false information, is a major issue for our current
       society. The impact of fake news on politics, economy and even public health is
       yet to be specified. Internet users must face a high amount of false information
       in digital media such as rumours, fake news, and extremely biased news. Given
       the crucial role that the spread of fake news plays in our current society, it is
       becoming essential to design tools to automatically verify the veracity of online
       information. In order to address this issue, the PAN@CLEF 2020 competition
       has proposed a task focused on the detection of fake news spreaders on Twitter.
       In this paper, we offer a detailed description of the system developed for this com-
       petition. Our system relies on psychological features for modelling the behaviour
       of users.


1   Introduction
The rise of social media in the past years has changed the way people consume in-
formation, especially news. The amount of time spent online, immediacy and lower to
nonexistent price are decisive factors for this global change in the ways of news con-
sumption.
    According to a study conducted by the Pew Research Center in 2016 in the United
States, the percentage of American adults getting their news through social media in-
creased from 49% in 2012 to 62% in 20161 . The same study in 2018 reported a value of
68%, confirming that this number is still increasing2 . The shift in how people consume
news during the past few years is undeniable. Nowadays, people are more likely than
ever to use social networks for news instead of more traditional sources such as printed
newspapers and television [23].
   Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li-
   cense Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25 September 2020, Thessa-
   loniki, Greece.
 1
   https://www.journalism.org/2016/05/26/news-use-across-social-media-platforms-2016/
 2
   https://www.journalism.org/2018/09/10/news-use-across-social-media-platforms-2018/
    Our society is subjected to a massive exposition of information nowadays, and this
has led to a great amount of false or extremely biased information being shared and
consumed by Internet users every day. In 2013, the need to combat fake news was
already appointed by the World Economic Forum’s Global Risks Report that warned
that “digital wildfires" could spread false information rapidly3 .
    Fake news has become a global issue, especially since recent social and political
events such as the 2016 U.S. Presidential Election. Online political discussion was
strongly influenced by social media users and bots spreading misinformation, which
potentially altered public opinion and endangered the integrity of the elections. Fur-
thermore, a paper published in 2019 conducting a study over a dataset with 171 million
tweets in the five months preceding the election day founded that, from the 30 mil-
lion tweets which contained a link to news outlets, 25% of them spread either fake or
extremely biased news [3].
    The impact of fake news in global economy, public health and even in the creation
of panic in society has been extensively documented in the past few years with count-
less examples, such as [15], [8], [22] and [17]. These are examples of the high cost
associated to the spread of fake news: the absence of control and verification of the
information, which makes social media a fertile ground for the spread of unverified or
false information.
    With this in mind, we can affirm that the magnitude, diversity and substantial dan-
gers of fake news and, in more general terms, the disinformation circulating on social
media is becoming a reason of concern due to the potential social cost it may have in
the near future [2]. As a consequence, the research community has launched several
evaluation competitions to foster the development of systems able to detect false infor-
mation. The Author Profiling task of Profiling Fake News Spreaders on Twitter at the
PAN@CLEF 2020 competition is a good example of one of such competitions [14].
In this paper, we describe the proposal send to the competition, analysing the main
contributions and errors detected.
    Our proposal focuses on the content created by a user instead of the content created
by other users as for example retweets. Then, we use a combination of psychological
and linguistic features aimed at modelling the behaviour of a user to detect if they are
spreader or non-spreader.


2     Related Work

Over the past few years, several definitions have been given for the term fake news. One
of the most frequent definitions is one that overlaps with the contents of misinformation
and disinformation as well: Fake news are fabricated information that mimics news
media content intentionally created to deceive, mislead or misinform readers [9]. The
intention behind the creation and dissemination of fake news often has a political or
economic component. Given the crucial role that the spread of fake news plays in our
current society, research on this topic is developing significantly. In fact, the number
of published papers indexed in the the Scopus database concerning the topic of fake
 3
     http://www3.weforum.org/docs/WEF_GlobalRisks_Report_2013.pdf
news has increased considerably from less than 20 in 2006 to more than 200 in 2018
[25]. These works concentrate on understanding how false information spreads through
social media, and how can it be efficiently detected in order to reduce its negative impact
on society. This task has been approached from different perspectives, such as Natural
Language Processing (NLP), Data Mining (DM), and Social Media Analysis (SMA).
    Recent research proposes an approach which combines text generation and fact-
checking in order to mitigate the effects of fake news spreading [24]. In many cases the
task is treated as a binary classification problem where a news piece is classified as fake
or real. However, there are cases in which this classification may not be adequate since
the news could be partially true and partially false. For this reason, systems capable of
multi-class classification have also been proposed [16].
    In the field of Natural Language Processing, research has been focusing on the
detection and intervention of fake news using techniques such as Machine Learning
and Deep Learning [18], and taking into account:
    – Content-based features contain information that can be extracted from the text, such
      as linguistic features.
    – Context-based features contain surrounding information such as user characteris-
      tics, social network propagation features, or users’ reactions to the information.
    Detecting fake news in the context of social media presents characteristics and chal-
lenges that result in content-based methods not being effective on their own. Fake news
are intentionally created to deceive, making it difficult identify them only from their
textual content. For this reason, it is common to use surrounding information such as
the way in which they are disseminated and the behavior of the users involved in this
dissemination, as well as information related to the author of the news [19].
    Recent research has demonstrated that studying the correlation of the user profile
and the spread of fake news works for the identification of those users mere likely to
believe fake news and for the differentiation of those more likely to believe real news
[20]. Approaches considering context-based features in combination with content-based
features have been gaining popularity in the past years, due to the promising results
obtained in recent studies, such as [21] and [4]. Mainly three aspects of this type of
information can be studied:
    – User information, such as location, age, number of followers, etc.
    – The responses generated by fake news, which can stand as an important source of
      detection not only because users use responses to express their opinions but also
      because they can help in the construction of a credibility index for users [7].
    – The social networks through which the news disseminate. The study of the net-
      works through which the information is propagated has special relevance since the
      rapid diffusion of these networks is used to reach the maximum number of users in
      the shortest possible time.

3     Dataset Description and Preprocessing
The dataset provided by the organizers of the task was divided into two collections of
tweets: one in Spanish and one in English. Each of them contained 300 XML documents
containing 100 tweets written or shared by one user. There was, therefore, information
regarding 300 users. In addition, each directory contained a truth file with the user
identifier and a 0 or a 1 determining the class label 4 .
    The user identification numbers were not the real Twitter IDs since these were ob-
fuscated for privacy reasons. Their purpose was to be able to identify the users in the
truth file as well as in the user XML documents.
    In the same way, the contents of the tweets in what regards mentions, hashtags,
URLs, and usernames were also obfuscated indicating only that one of them had oc-
curred by the use of a keyword in capital letters and between dollar symbols. Therefore,
a tweet containing the following text:

      “RT The new president of the USA is @bartsimpson! #usa https://thenews.com/
      new-president-bart-simpson”

      would have the following content in the provided dataset:

      “RT The new president of the USA is $USER$ ! $HASHTAG$ $URL$”

    Our participation in this task was only for the English language, therefore, we only
used the data in the English directory in our model.
    With regards to the preprocessing applied to the data, there are three important steps
that we took before the feature extraction:

 1. Tokenization. In this step, the words conforming the tweets were separated into to-
    kens using the RegexpTokenizer from the Natural Language Toolkit (NLKT)
    in Python [10], which splits a string into substrings using a regular expression that
    matches the tokens. In this case, we selected words of 3 or more alphabetic charac-
    ters.
 2. Stop words removal. After the words were tokenized, the stop words were removed
    from the text. For this task, we used the stopword set form the NLTK corpus
    combined with a small list of custom words added manually to the set. These words
    were commonly used words such as prepositions, coordinating conjunctions, and
    determiners that were not included in the NLTK corpus stop word set.
 3. Tweet aggregation. After the two previous steps were executed, the resulting to-
    kens were aggregated conforming a single document per user. The reason for this
    aggregation was to have a single piece of text per user before the processing phase.

    It is important to notice that, as we will see in the following sections, some of this
preprocessing had to be done later on the processing phase because some of the features
of our model take into account metrics such as the number of stop words, the number
of determiners, or the number of coordinating conjunctions.

 4
     The information regarding which class label corresponds to fake news spreaders was not avail-
     able due to GDPR reasons
4      Feature Engineering

The main contents that users share in social media can be divided in: (1) content cre-
ated by the user, and (2) content created by others. Our model for the task of profiling
fake news spreaders on Twitter is based on the following hypothesis: Establishing the
difference between user-created and user-shared content will reveal more accurate
features of the user’s online behavior. This hypothesis states that the individual anal-
ysis of these groups of contents will reveal more precise features of the user profiles.
    For this reason, the process of feature extraction was applied differently to the con-
tent that was originally written by the user (i.e his/her tweets) and to the content shared
by the user but originally written by other users (i.e. retweets). In this section, we will
describe all the feature engineering applied to the data detailing how the distinction
between tweets and retweets was made in each case.
    The complete set of features extracted from the data is depicted in Table 1. The set
of features used in the model can be divided in the following four main categories:

    – Psychological features, which will help defining the user profile in order to differ-
      entiate between fake news spreaders and real news spreaders.
    – Linguistic features that will help identifying the linguistic traits that identify each
      category.
    – Twitter actions features. Exploring how the users behave in the social network could
      offer some insights on the online behaviour of fake news spreaders.
    – Headline analysis data, which can help us in the identification of news pieces that
      a user shares that are actually fake.


4.1     Psychological Features

The psychological features were extracted using a third-party API developed by Symanto5 .
The documents containing the aggregated tweets for each user were sent to the API in
order to retrieve the values of their (1) personality traits, (2) their communication styles,
and (3) the sentiment analysis of their text.
    The personality traits value would be either “emotional” or “rational” depending on
the analysis of the user’s text. The value returned by the API when the communication
styles are requested is a collection of traits, such as self-revealing, which means shar-
ing one’s own experience and opinion; fact-oriented, which implies focusing on factual
information, objective observations or statements; information-seeking, that is, posing
questions; and action-seeking or aiming to trigger someone’s action by giving recom-
mendation, requests or advice. Finally, the values returned by the sentiment analysis of
the text returns either “positive” o “negative” depending on the sentiment found in the
user’s text.
    All the values returned by Symanto’s API included a percentage for the predicted
value and had to be converted to a binary representation (0,1).
 5
     https://symanto-research.github.io/symanto-docs/
Category             Feature name Description                                 Set of values
                       pers_pred     personality prediction value             {emotional, rational}
                       self_pred     self-revealing prediction value          {yes, no}
                       info_pred     information-seeking prediction value     {yes, no}
psychological features
                       action_pred   action-seeking prediction value          {yes, no}
                       fact_pred     fact-oriented prediction value           {yes, no}
                       sent_pred     sentiment analysis prediction value      {positive, negative}
                       num_ADJ       number of adjectives                     N
                       num_DET       number of determiners                    N
                       num_ADP       number of adpositions                    N
                       num_NOUN      number of nouns                          N
                       num_VERB      number of verbs                          N
                       num_PROPN     number of proper nouns                   N
                       num_NUM       number of numerals                       N
                       num_AUX       number of auxiliary verbs                N
                       num_ADV       number of advebrs                        N
linguistic features    num_CONJ      number of coodinating conjunctions       N
                       num_INTJ      number of interjections                  N
                       num_PART      number of particles                      N
                       I-ORG         number of named entities (organizations) N
                       I-LOC         number of named entities (locations)     N
                       I-PER         number of named entities (persons)       N
                       num_POS       number of positive words                 N
                       num_NEG       number of negative words                 N
                       num_NEUT      number of neutral words                  N
                       num_words     total number of words                    N
                       num_htag      number of hashtags                       N
                       num_rt        number of retweets                       N
Twitter actions
                       num_url       number of URLs                           N
                       num_user      number of mentions                       N
                       all_caps      number of words all in caps              N
                       per_stop      percentage of stopwords                  N
headline analysis data
                       num_propn     number of proper nouns                   N

               Table 1: Complete set of features extracted from the data
4.2    Linguistic Features

For the extraction of linguistic features, a natural language pipeline called Polyglot was
used [1]. This library is built using distributed word representations (word embeddings)
in conjunction with traditional NLP features for over 100 different languages in order
to solve NLP tasks, such as Part-of-Speech (POS) tagging, Named Entity Recognition
(NER), sentiment analysis, etc.
    For our model’s set of features we choose 12 POS tagging metrics, 3 named entity
recognition metrics and total word count. Details regarding the specific metrics can be
found in Table 1.


4.3    Twitter Actions Features

The analysis of the activities of the users in Twitter was restricted by the data obfusca-
tion described in Section 3. Therefore, only 4 metrics were recorded from the actions
of the user within Twitter: the number of mentions, the URL number, the number of
retweets and the number of hashtags. The values of these metrics were counted from
the total aggregation of tweets of each user.


4.4    Headline Analysis Data

In this category of the model, we try to study if there are specific message characteristics
that accompany fake news articles being produced and widely shared. Recent studies
suggest that not only these characteristics exist, but also that some of them can be found
in the headline of the news article [6]. Therefore, we took the 3 most significative char-
acteristics differentiating fake news headlines from real news headline and applied them
to the text in our dataset.
    Based on the assumption of our main hypothesis being true, we separated tweets
from retweets and applied these measurements only to the retweet subset.


5     Experiments and Results

5.1    Experiments

For the creation of our model we first did some experiments in order to select the most
important features as well as the best performing algorithms6 . We tested the model
taking into account the set of features available in each category separately, and we also
tested the possible combinations of the features to evaluate their performance on the
data.
    With regards to the classification models, we used an open source machine learning
library, scikit-learn [12]. We performed a comparative analysis in which we tested the
model with some of the most pupular classification algorithms, such as Logistic Re-
gression, K-Neighbors, Random Forest, Decision Tree, and Support Vector Machines.
 6
     All the experiments and results can be found in a notebook uploaded to
     https://github.com/mariaesp/PANCLEF_PAPER/blob/master/notebooks/Spreaders.ipynb.
 Features Measures LogisticRegression KNeighbors RandomForest DecisionTree SVM
             accuracy          0.57            0.55          0.62           0.62      0.6
 Cat1        precision         0.58            0.56          0.63           0.63      0.59
             recall            0.56            0.55          0.59           0.59      0.61
             accuracy          0.65            0.55          0.66           0.58      0.62
 Cat2        precision         0.65            0.56          0.68           0.59      0.60
             recall            0.72            0.54          0.59           0.57      0.77
             accuracy          0.56            0.53          0.59           0.55      0.53
 Cat3        precision         0.55            0.53          0.58           0.55      0.53
             recall            0.7             0.55          0.57           0.55      0.65
             accuracy          0.61            0.55          0.59           0.57      0.56
 Cat4        precision         0.58            0.48          0.56           0.55      0.54
             recall            0.79            0.66          0.82           0.85      0.88

Table 2: Performance results with different classifiers for the evaluation of the feature
 categories described. Cat1: psychological features, Cat2: linguistic features, Cat3:
                    Twitter actions, Cat4: headline analysis data.



In order to use all the available data in for the tests, we used cross-validation with 5
iterations. The results obtained for each classifier can be seen in Table 2. Results are
given in terms of accuracy, the official measure, as well as precision and recall.


 Features Measures LogisticRegression KNeighbors RandomForest DecisionTree SVM
             accuracy          0.61             0.6          0.68           0.61      0.62
 All         precision         0.60            0.62           0.7           0.61      0.61
             recall            0.65            0.59          0.64           0.61      0.77

       Table 3: Performance results with different classifiers for the evaluation of the
                                     complete model.



     After comparing the results obtained with the different categories and classifiers,
we trained the model using a combination of all categories. With regards to the clas-
sification algorithm, the Random Forest Classifier outperformed the others in 3 of the
4 categories, and also in an additional test in which the set of all the features in the
four categories were combined. The results of this experiment can be found in table 3.
Therefore, the Random Forest Classifier was chosen as the algorithm to train our model.

5.2     Final Model Definition and Results
Once all the experiments allowed us to choose the classifier and the set of features for
our model, we trained the model with the data provided for the task and exported our
trained model in order to make the submission and evaluation in the TIRA7 environment
[13].




     Figure 1: Performance result plot with different classifiers for the evaluation of the
                                        final model


     There were two evaluations for our model. On the one hand, there was an early-bird
submission evaluation for the task and, on the other hand, there was the final submission
evaluation. We participated in both evaluations, first with and early model and then with
a final model. The evaluation results can be found in table 4.


                       Data          Model Phase                     Accuracy
                                     early   experimentation           0.67
                       development
                                     final   experimentation           0.68
                                     early   early-bird submission     0.67
                       test
                                     final   final submission          0.64

 Table 4: Early and final model evaluation resutls throughout the different evaluation
                                       phases.



    With regards to the general results of the competition, our team was in position 61
from 66 in the classification. This classification considers results in both Spanish and
English languages calculating the average from both accuracies. However, since our
team only participated in the English part of the task, we have taken into account only
the results in English language in order to see our position. In this case, our team was
positioned 45th form 66 participants. Furthermore, if we aggregate the results, that is,
 7
     https://www.tira.io/
if we count all the participants with the same results as just one participant, our result
would be 16th from 33 participants.
    It is important to notice that our model evolved from the early-bird submission to
the final sumbmission. There were 3 main changes performed in the model:

    – The psychological features could not be included in the first evaluation due to tech-
      nical issues with the platform. The organizers helped us to solve those issues and
      we could add the psychological features in the final submission.
    – The separation of the user original tweets from the user retweets was done after the
      early-bird sumbission was completed. As we explained in detail in section 4, this
      decision was made based on the assumption of our main hypothesis. Therefore, the
      way in which several features of our model were calculated changed for the final
      submission as well.
    – The fourth category of features, namely the headline analysis data, was added to
      the model for the final evaluation submission.

    As it can be observed in the evaluation results, our model performed slightly better
in the evaluation of the early bird submission than in the final submission. The reasons
for this performance drop are unknown to the author, since the final model did perform
better in our experimentation. One possible explanation is that the evaluation dataset
has slightly different characteristics than the training dataset and, therefore, the results
vary accordingly. Nevertheless, the difference in the results is too small to certainly
know the causes.


6     Conclusions and Future Work

In this paper, we have described our proposal for the Author Profiling task of Profil-
ing Fake News Spreaders on Twitter at the PAN@CLEF 2020 competition. Our model
aimed to differentiate fake news spreaders from real news spreaders using a combina-
tion of psychological and linguistic traits extracted from the user’s data, together with
characteristics extracted from both the user behaviour in the social network and the
news headline analysis.
     On the one hand, one of the next experiments will be to use deep learning in the
training phase of the model. The advances made in the development of Recurrent Neu-
ral Networks (RNNs) and Convolutional Neural Networks (CNNs) in the past years
demonstrate promising results in the field of natural language processing.
     On the other hand, more work needs to be done with regards to the psychological
and psycholinguistic dimension of the model. The are several psychological models
that we want to explore in the following months, such as the Big5 personality model [5]
and the Myers–Briggs Type Indicator (MBTI) model [11]. Due to the lack of existing
tools for the automatic labelling of these indicators, we will work in the retrieval and
labelling of a larger dataset in order to learn to automatically predict such personality
traits.
     As it can be seen in the results exposed in the previous section, our results in de-
velopment and test are consistent, which means that, despite the work that needs to be
done in order to improve it, it is a robust model with an expectable performance when
datasets vary.
    This work is at very early stages of development and will continue evolving towards
a more efficient and better performing system. This is why we show preliminary results
of our experiments and there is still room for improvement.


Acknowledgements

This research project has been supported by the European Social Fund through the
Youth Employment Initiative (YEI 2019) and the Spanish Ministry of Science, Inno-
vation and Universities (DeepReading RTI2018-096846-B-C21, MCIU/AEI/FEDER,
UE).


References

 1. Al-Rfou, R., Perozzi, B., Skiena, S.: Polyglot: Distributed word representations for
    multilingual nlp. In: Proceedings of the Seventeenth Conference on Computational Natural
    Language Learning. pp. 183–192. Association for Computational Linguistics, Sofia,
    Bulgaria (August 2013), http://www.aclweb.org/anthology/W13-3520
 2. Allcott, H., Gentzkow, M.: Social media and fake news in the 2016 election. Journal of
    economic perspectives 31(2), 211–36 (2017)
 3. Bovet, A., Makse, H.A.: Influence of fake news in twitter during the 2016 us presidential
    election. Nature communications 10(1), 1–14 (2019)
 4. Crestani, F., Rosso, P.: The role of personality and linguistic patterns in discriminating
    between fake news spreaders and fact checkers. In: Natural Language Processing and
    Information Systems: 25th International Conference on Applications of Natural Language
    to Information Systems, NLDB 2020, Saarbrücken, Germany, June 24–26, 2020,
    Proceedings. p. 181. Springer Nature
 5. Goldberg, L.R.: An alternative" description of personality": the big-five factor structure.
    Journal of personality and social psychology 59(6), 1216 (1990)
 6. Horne, B.D., Adali, S.: This just in: fake news packs a lot in title, uses simpler, repetitive
    content in text body, more similar to satire than real news. In: Eleventh International AAAI
    Conference on Web and Social Media (2017)
 7. Jin, Z., Cao, J., Zhang, Y., Luo, J.: News verification by exploiting conflicting social
    viewpoints in microblogs. In: Thirtieth AAAI conference on artificial intelligence (2016)
 8. Kang, C., Goldman, A.: In washington pizzeria attack, fake news brought real guns. The
    New York Times (2016), https://www.nytimes.com/2016/12/05/business/media/comet-
    ping-pong-pizza-shooting-fake-news-consequences.html
 9. Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F.,
    Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D., et al.: The science of fake news.
    Science 359(6380), 1094–1096 (2018)
10. Loper, E., Bird, S.: Nltk: The natural language toolkit. In: In Proceedings of the ACL
    Workshop on Effective Tools and Methodologies for Teaching Natural Language
    Processing and Computational Linguistics. Philadelphia: Association for Computational
    Linguistics (2002)
11. Myers, I.B.: The myers-briggs type indicator: Manual (1962). (1962)
12. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
    Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D.,
    Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal
    of Machine Learning Research 12, 2825–2830 (2011)
13. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture.
    In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World.
    Springer (Sep 2019)
14. Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th Author Profiling
    Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: Cappellato, L., Eickhoff,
    C., Ferro, N., Névéol, A. (eds.) CLEF 2020 Labs and Workshops, Notebook Papers.
    CEUR-WS.org (Sep 2020)
15. Rapoza, K.: Can ‘fake news’ impact the stock market? by Forbes (2017)
16. Rashkin, H., Choi, E., Jang, J.Y., Volkova, S., Choi, Y.: Truth of varying shades: Analyzing
    language in fake news and political fact-checking. In: Proceedings of the 2017 conference
    on empirical methods in natural language processing. pp. 2931–2937 (2017)
17. Rich, M.: As coronavirus spreads, so does anti-chinese sentiment. The New York Times.
    Available from: URL: https://www. nytimes.
    com/2020/01/30/world/asia/coronavirus-chinese-racism. html (2020)
18. Ruchansky, N., Seo, S., Liu, Y.: Csi: A hybrid deep model for fake news detection. In:
    Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.
    pp. 797–806 (2017)
19. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data
    mining perspective. ACM SIGKDD explorations newsletter 19(1), 22–36 (2017)
20. Shu, K., Wang, S., Liu, H.: Understanding user profiles on social media for fake news
    detection. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval
    (MIPR). pp. 430–435. IEEE (2018)
21. Shu, K., Zhou, X., Wang, S., Zafarani, R., Liu, H.: The role of user profiles for fake news
    detection. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in
    Social Networks Analysis and Mining. pp. 436–439 (2019)
22. Takahashi, R.: Amid virus outbreak, japan stores scramble to meet demand for face masks.
    Japan Times. Consultado el 1 (2020)
23. Tolmie, P., Procter, R., Randall, D.W., Rouncefield, M., Burger, C., Wong Sak Hoi, G.,
    Zubiaga, A., Liakata, M.: Supporting the use of user generated content in journalistic
    practice. In: Proceedings of the 2017 chi conference on human factors in computing
    systems. pp. 3632–3644 (2017)
24. Vo, N., Lee, K.: Learning from fact-checkers: Analysis and generation of fact-checking
    language. In: Proceedings of the 42nd International ACM SIGIR Conference on Research
    and Development in Information Retrieval. pp. 335–344 (2019)
25. Zhou, X., Zafarani, R.: Fake news: A survey of research, detection methods, and
    opportunities. arXiv preprint arXiv:1812.00315 (2018)