Detecting Fake News Spreaders with Behavioural, Lexical and Psycholinguistic Features Notebook for PAN at CLEF 2020 Héctor Ricardo Murrieta Bello, Lukas Heilmann, and Esben Ronan University of Copenhagen xhd160@alumni.ku.dk, tnm946@alumni.ku.dk, and srh648@alumni.ku.dk Abstract In this paper, we propose a multilingual approach to identifying fake news spreaders on Twitter data, as part of the PAN 2020 competition for au- thor profiling. We manually engineered domain-specific features covering be- havioural, lexical and psycholinguistic aspects and evaluated them using tradi- tional machine learning models. The focus of this paper is exploring the problem domain by testing domain-specific features on different types of classifiers first, and evaluating a pure multilingual approach on combined English and Spanish texts second. Out of the methods tested, the Gradient Boosting classifiers per- formed best. We extended our experiments to a multilingual design using less preprocessing and feature selection, with a comparable result to the monolingual Gradient Boosting models. Keywords: fake news detection, gradient boosting, liwc, psycholinguistic features, be- havioural features, lexical features, feature engineering. 1 Introduction Fake news has become a catchphrase present all over the political arena in the recent years, and its presence is rapidly increasing. The effects are wide-ranging. There is, for example, an undeniable interference of fake news in traditional political processes like political campaigns. It is believed that fake news generated to favor either of the two nominees in the 2016 presidential campaign was shared up to 37 million times on Facebook [19], the top 20 of which generating up to 9 million shares. Fake news reporting the injury of Barack Obama in an explosion caused a wiping out of up to $130 billion in stock value [16]. Fake news is not a new problem, but its ability to be easily and rapidly dissemi- nated in large scale across a population is. Fake news as a term thus uniquely locates its domain in the realm of social media, where its new, pressing importance is found. The rapid democratisation of public expression that social media encapsulates operates hand-in-hand with the new role of fake news. Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons Li- cense Attribution 4.0 International (CC BY 4.0) CLEF 2020, 22-25 September 2020, Thessa- loniki, Greece. The multifaceted nature of this phenomenon, and it’s crucial relevance to the func- tioning of political processes, public discourse, and the well-being of citizens highlights the importance to develop cutting-edge techniques in detection of fake news spreading, so damaging information can be cut off before it spreads beyond a critical mass. Our study answers the PAN Shared Task "Profiling Fake News Spreaders on Twitter"[12]. We approach the problem using a classical machine learning approach utilising Gradient- Boosting methods on domain-specific features, such as lexical features, grammatical features, sentimental analysis with LIWC1 , SentiWordnet2 and EmoLex3 , Twitter-DNA as well as TF-IDF vectorisation. 2 Related Work Fake news is a complex phenomenon that has elicited a great variety of approaches from different fields of academia. [19] provides a succinct overview over four broad types of approaches hitherto taken: i) "knowledge-based", which focuses upon the truth-value of the knowledge content of the news; ii) "style-based", which concerns the linguistic manner in which news is communicated; iii) "propagation-based", which focuses on how fake news spreads through a network, and iv) "credibility-based", which investi- gates the credibility of those who create and those who spread news. We will briefly discuss previous research in the category most relevant to our task: Style-based. 2.1 Style-based Fake News Detection Style-based approaches focus on quantifiable stylistic features of texts, upon which ma- chine learning algorithms can be trained to detect characteristics of the expression of fake news creators and spreader. These tend towards assessing the intention of the text, over its content. There is good experimental evidence in forensic studies to suggest statements derived from factual experiences differ from those derived from statements based in fantasy [19]. Not only this, but the current state-of-the-art in style-based ap- proaches varies from 60-90% [19], which is significantly higher than a normal person’s ability to differentiate, as mentioned in the introduction. A variety of stylistic features can be studied, two main streams of which [19] have identified. The first is "attribute-based language features", which derive from psycho- logical deception theories. They include attributes such as: Quantity (counts of char- acters, words, sentences, etc), sentiment (positivity and negativity), diversity (number of unique words, unique content words, etc), and uncertainty (degree of modal words, quantifiers, generalising terms, etc used). The other category spans "structure-based language features", which describes content style from four different language levels: i) lexicon, ii) syntax, iii) semantic and iv) discourse. The lexicon level assesses frequency of letters and words using n-gram models or TF-IDF. The syntax level uses NLP tech- niques such as POS-tagging. The semantic level identifies topics that texts cover, using 1 http://liwc.wpengine.com/ 2 http://sentiwordnet.isti.cnr.it/ 3 https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm packages such as LIWC. Discourse features include features on the level furthest out, such as Rhetorical Structure Theory (RST). Generally, it seems to be the case that with only one language used, lexicon-level features perform better than all others - 11 out of 14 studies studied by [19] report better performance at the lexicon-level. [?] achieved a very high level performance (75-99% accuracy) detecting fake news on a selection of Bulgarian news sites. They utilised linguistic features (n-grams), credibility-related features such as sentiment polarity, capitalisation and punctuation, as well as semantic embeddings. More recently, [4] introduced the usage of novel stylometric features such as Twitter DNA, text readability and TF-IDF to assess the author’s profile. 2.2 Author Profiling A related task to a style-based fake news detection is author profiling. Author profiling is an old problem where one attempts to infer characteristics about the author of a text based on stylistic analysis. This could be, for example, gender, age, native language, personality, age, economic class. Author profiling on social media inspires its own set of problems but previous PAN tasks have shown a success in predicting a variety of characteristics from Twitter data: Whether or not the user is a bot [3], personality traits [15], gender [13] and language variety [14]. 2.3 Multilingual Approaches An interesting direction this research is beginning to take is multilingualism. Combining features across different languages was found to outperform those using features from a single language in eight out of twelve studies, according to a survey by [19]. A recent innovative approach is the one shown by [3], which incorporated counting features and psycho-linguistic features in order to assess whether an user is a bot or a human. They proposed an ensemble model utilising a neural network and a fine-tuned BERT model to a high accuracy. 3 Dataset The provided dataset had various idiosyncrasies that provided various constraints to analysis. All meta data of users and tweets (e.g. profile picture, timestamps) was omitted - only the textual data remained. This prevented any analysis according to suggestive avatars, or posting time patterns. Beyond that, URLs, hashtags and usernames retweeted were replaced with the respective placeholders #URL#, #HASHTAG#, #USER#. This prevented any classification according to the news they were spreading, and instead forced us to focus on stylistic features specifically. The task itself was crucially focused upon classifying fake news spreaders, which meant we had to concentrate upon the stylistic features of a fake news spreader. Knowledge- based approaches did not seem relevant, since tweets did not necessarily make knowl- edge claims, but often consisted of clickbait-like expression, e.g."#HASHTAG #The One Super Bowl Moment We Can’t Stop Watching #URL# #URL#". An interesting subtlety that makes this task more difficult, is there may well be a large presence of "naive" fake news spreaders, who only intermittently spread fake news. Thus, some users may only partially have tweets with fake news as content, but they are classified as a whole as fake news. 4 Method 4.1 Preprocessing To preprocess the raw data, we implemented separate steps that specifically tackled the idiosyncrasies of the dataset. In the first step, we removed all tags which we la- belled as user "behavior": Retweets, URLs, hashtags and mentions of users. Further steps provided functionality for: lowercasing words, removing extra whitespaces, ex- panding contractions, removing punctuation, removing stop words, removing numbers, lemmatizing and removing ellipses. Lemmatisation and lowercasing were reserved for higher level feature extraction like TF-IDF. 4.2 Feature Extraction We chose to extract domain-specific features from the respectively preprocessed texts in English and Spanish. We did this through multiple extraction steps, which we shall illustrate in the following. Behavioral Features The first features to be extracted were the URL, hashtag, retweet and user counts per user. We did this because initial data explorations indicated that fake news spreaders often used more hashtags and retweets than normal users. To investigate the structural distribution of these elements we decided to explore a novel tweet history DNA algorithm designed by [5]. This algorithm was used in the context of bot detection, but it provided a novel way to analyse the behaviour of a Twitter user. We implemented it with some modifications for our purposes. As proposed in [5], specific tweet behaviors are represented by the values illustrated in . A weighted linear combination of these behavioural values in each tweet is transformed into a letter using the ASCII index.    0, plain  8, retweet     16, reply An = (1)    1, has hashtags 2, has mentions      4, has URLs  For example, a retweet with hashtags, mentions, but no URLs would be calculated as 0 ∗ 0 + 8 ∗ 1 + 1 ∗ 1 + 2 ∗ 1 + 4 ∗ 0 + 65 = 84 = T (each value multiplied by 1 if the feature is present and 0 if not present). Each resultant letter is concatenated to each other to form a tweet history DNA string 100 letters long. From this string we calculate the frequency of every character in the string so that we finish with a vector of 16 dimensions. Lexical Features To extract all lexical, grammatical and miscellaneous features we implemented an additional vectorisation step. We counted all character-level features (digits, alpha characters, capital characters, punctuation, etc), all token-level features (word lengths, syllable counts, emoji/emoticon counts, etc), as well as POS tag counts, NER tag counts and others (see table below for more details). To extract the emoticons and emojis, we utilised the emot library 4 . To extract NER tags and POS tags we used the open-source SpaCy library. To count misspellings, we utilised the open-source pyspellchecker library 5 . Due to the limited range of the library, it was crucial to subtract any identified named entities from the total count. We furthermore implemented some measures of vocabulary diversity. The first was hapax legomena frequency, which calculates the amount of words that appear only once [10]. Hapax legomena frequency is commonly used in author profiling [6], as often their frequency correlates with the size of the author’s vocabulary. Another way to measure vocabulary diversity is by the type-token ratio - the ratio of unique word count to total word count. We also implemented a universal quantifier count, since we hypothesised that fake news spreaders often utilise more absolute manners of expression, e.g. "why liberals are always wrong". The length of the final vector outputted per user in regard to lexical measures was 166. The exact composition is depicted in Table 4.2. 4 https://pypi.org/project/emot/ 5 https://pypi.org/project/pyspellchecker/ Description Number of features Stop word count 1 Digit character count 1 Alpha character count 1 Capital character count 1 Capital character ratio 1 Capitalised word count 1 Capitalised word ratio 1 Punctuation count 1 Punctuation sequence count 1 Character count 1 Word count 1 Average word length 1 Syllable count 1 Average syllable count 1 Emoji count 1 Emoji ratio 1 Emoticon count 1 Emoticon ratio 1 Misspell count 1 Type-token ratio 1 Universal quantifier count 1 Hapax count 1 Individual alphabet count 26 Individual punctuation count 32 Individual POS tag counts 68 Individual NER tag counts 18 Table 1. Index of lexical features implemented in LexicalVectorizer Psycholinguistic Features Inspired by the work of Joo et. al. [3], we considered psy- cholinguistic lexica as a valuable resources for extracting features. While humans are trained to perceive such underlying mental mechanisms, machines require the help of annotated lexical resources, which we implemented in a variety of manners. We started off by incorporating sentiment polarity scores that have proven useful as facilitators for automatic authorship profiling in earlier competitions. [9] SentiWord- Net 3.06 , a publicly available lexical resource for opinion mining in English, holds three sentiment scores for each synonym set present in the popular WordNet7 : positivity, neg- ativity and objectivity. As an example, the English adjective "amazing" is represented by the values 0.875, 0.125 and 0 for the aforementioned measures. The noun "secret", how- ever, has the scores 0.125, 0.375 and 0.5. Hence, "amazing" can be interpreted as a more positive word, while "secret" reveals a rather negative and objective nature. The objec- tivity score of each word is calculated with the formula 1 − (positivity + negativity), which consequently serves as a general measure of affectivity. To tackle words with a 6 http://sentiwordnet.isti.cnr.it/ 7 https://wordnet.princeton.edu/ plurality of meanings, the part-of-speech tag of the given word is integrated. Accord- ingly, we computed part-of-speech tags per document as a preparatory step. We then queried SentiWordNet through the Natural Language Toolkit8 (NLTK) and transformed the obtained scores into features. Linguistic Inquiry and Word Count9 (LIWC) is a lexical resource for psycholin- guistic measures. By integrating the tool, we intended to capture the author’s social and psychological states, which have proven to be effective in author profiling tasks. [3] To our advantage, the lexicon was available in both English and Spanish. With the pur- pose of summarizing language variables and linguistic dimensions, the first category LIWC captures includes word frequencies, POS-frequencies, etc. The second category aims at associating a word with certain psychological processes, such as affective pro- cesses, social processes (e.g. family), personal concerns, etc. [3] As we already covered a significant amount of the stylistic features in the aforementioned lexical approach, we directed our attention at the second category. To deepen the study of the emotional patterns of fake news spreaders, we inves- tigated the NRC Word-Emotion Association Lexicon10 (EmoLex) developed by Mo- hammad et. al.. This vectorizer captures eight basic emotions from Plutchik’s wheel of emotions: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. [8,?] The lexicon has been used for sentiment and emotion analysis, abusive language detection, personality trait identification, etc. on word-, sentence-, and tweet-level and has proven useful in the SemEval shared task competition 2018. [7] Token Count Vectorizers An important set of features to employ alongside the afore- mentioned feature extraction methods is TF-IDF. TF-IDF is important in measuring not just the relevance of a word to a document (using frequency as a heuristic), but it’s relative relevance, as compared to the compositions of other documents. TF-IDF is composed of: The term frequency within a document, the frequency a term appears within a given document; and the inverse document frequency, which takes the logarithm of the inverse of the frequency of documents in which the term appears. These two are calculated for each term in each document and compiled into a vector. The vector is restricted to a length specified by the hyperparameter "max_features". After experimentation the performance was found to be optimum when max_features was equal to 500. This restriction includes only the most frequent terms. 4.3 Models We investigated a variety of different experimental designs which we will now present. We used as input the aforementioned reduced set of features for English and Spanish texts, as well as all features, in order to allow for later comparisons. To tune hyperparameters of each algorithm, we utilised a grid search automation. Grid search is an exhaustive brute-force algorithm that takes a list of lists specifying a range of values for each hyperparameter, derives all possible permutations and tests for 8 https://www.nltk.org/ 9 http://liwc.wpengine.com/ 10 https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm their accuracy. Each test was achieved by averaging the accuracies of a repeated strat- ified 10-fold (scikit-learn’s implementation). Since we used scikit-learn’s implementa- tion, we were unable to perform TF-IDF within each fold, since the module implements cross-validation automatically. We reasoned, however, that since it was only for tuning parameters, it did not matter so much. Multinomial Naive Bayes Multinomial Naive Bayes is a supervised machine learning algorithm often used for text classification. It relies upon Bayesian conditional reason- ing and generating data from which to classify. One manner of generating the data is by creating Gaussian distributions from the mean and standard deviation of each class, another is by creating simple multinomial distributions. Since "the multinomial distri- bution describes the probability of observing counts among a number of categories" [18], this algorithm seems to be most appropriate for our vectors composed of count features. We tested hyperparameters using grid search. The hyperparameters available to tune were: "Alpha", which we tested values between 0.5 and 1.5; and "fit_prior", for which there were values of True and False. Random Forest Random Forest is a flexible and quick-to-train ensemble machine learning algorithm based upon decision trees [17]. Decision trees iteratively split the dataset into subsets according to a quantitative threshold along an axis, the resultant class for the subset derived from a majority vote. Random forest prevents the overfit- ting tendency of decision trees by averaging the results of decision trees trained upon randomly split training data (bagging). Hyperparameters to adjust included number of estimators (decision trees), for which we selected logarithmically interpolated values: 10, 100, 1000, and max_features (for node-splitting), for which we chose: ’sqrt’, ’log2’. Support Vector Machine (SVM) SVM is a supervised learning algorithm which learns by finding the most optimum hyperplane to linearly separate data into distinct areas, corresponding to a positive and negative classification. The hyperplane is cal- culated by the utilisation of support vectors, data points close to the hyperplane, the distance (margin) from which is to be maximised. Though in its basic form it is de- signed for linear classification, it can be mapped to non-linear sets by using different "kernels" [2]. To tune it we utilised scikit-learn’s GridsearchCV with a stratified 5-fold cross validation procedure. Kernel parameters we tested were: ’poly’ (polynomial), ’rbf’ (ra- dial base function), ’sigmoid’ and ’linear’. The regularization parameter C values we tested were: 50, 10, 1.0, 0.1, and 0.01. We used "scale" as the gamma hyperparameter value and a linear kernel. Gradient Boosting Gradient boosting is a powerful supervised machine learning algo- rithm that is born out of the theoretical assumption that "weak learners" can be turned into better learners [1]. Weak learners are observations who as hypotheses perform only slightly better than chance. An ensemble of these can sequentially handle their respec- tive observation and iteratively tackle difficult problems. Though it is comparatively computationally expensive algorithm, it is powerful at grasping subtleties in data, and since our dataset is small, computational expenditure is not as big an issue. The hyperparameters we selected to tune are: number of estimators with values of 10, 100, 500, 800, 1000; Learning rate with logarithmic values of: 0.001, 0.01, 0.1, subsample with values of 0.5, 0.7, 1.0. and max depth with values of 3, 7, 9. Since the GradientBoostingClassifier turned out to be our most promising algorithm, we decided to implement a manual grid search that allowed implementation of TF-IDF within each fold, so that we could be assured to discover the best parameters. 5 Results We subsequently present our results for the aforementioned methods employed. As we aimed for a stabilized comparison of performances among the different experimental setups, we consistently utilized a stratified 10-fold cross validation with 3 repeats, each with a training size of 80% of the features. 5.1 Experiment 1 For the first experiment we tested each dataset individually using a variety of machine learning algorithms previously described. First we tested the algorithms on just the domain-specific features as described in the method section. After that we tested them with a combination vector of domain-specific features and the TF-IDF vector. We then tuned the hyperparameters for each one using the SKLearn’s GridSearchCV. As can be seen in Table 2, the Gradient Boosting algorithm performed significantly better on both the English and Spanish datasets than any of the other algorithms. Ran- dom Forest performed next best, with the Support Vector Machine and Multinomial Naive Bayes coming last. Interestingly, in every single case we tested, there was a higher accuracy on the Spanish dataset than on the English dataset. 5.2 Experiment 2 Our multilingual approach consisted of individual preprocessing of the English and the Spanish dataset, followed by extraction of the manually engineered features for each dataset. We deleted language dependent features such as sentimental analysis with Sen- tiNet (only English) and the additional ten LIWC categories for Spanish. We then con- catenated the two vectors into one. Subsequently, we split the datasets into training and validation sets (20%) in each fold. Finally, as a further experiment, we merged English and Spanish tweets and fitted a TF-IDF vectorizer. In this setup we did not explore the application of any feature extraction algorithm or hyper-parameter optimization. Since the best algorithm in the monolingual setup was gradient boosting, we decided to use that algorithm for this section. Language Model Features Accuracy Std English Multinomial Naive Bayes ALL 0.651 0.089 Spanish Multinomial Naive Bayes ALL 0.703 0.073 English Multinomial Naive Bayes ALL + TFIDF 0.651 0.089 Spanish Multinomial Naive Bayes ALL +TFIDF 0.703 0.073 English Random Forest ALL 0.703 0.079 Spanish Random Forest ALL 0.72 0.070 English Random Forest ALL + TFIDF 0.71 0.078 Spanish Random Forest ALL +TFIDF 0.73 0.076 English Support Vector Machine ALL 0.652 0.075 Spanish Support Vector Machine ALL 0.717 0.085 English Support Vector Machine ALL + TFIDF 0.651 0.085 Spanish Support Vector Machine ALL +TFIDF 0.711 0.084 English Gradient Boosting ALL 0.669 0.075 Spanish Gradient Boosting ALL 0.73 0.068 English Gradient Boosting ALL + TFIDF 0.726 0.087 Spanish Gradient Boosting ALL +TFIDF 0.747 0.064 Table 2. Results for English and Spanish As can be seen in the results table, the gradient boosting algorithm used upon the vector combining all features and TF-IDF functioned best, though not by a particularly large margin. There was however no increase in accuracy compared to the monolingual setups, and it performed significantly worse than upon just the spanish dataset. Model Features Accuracy Std Gradient Boosting All 0.71 0.075 Gradient Boosting All + TF-IDF 0.7218 0.0460 Table 3. Accuracies and standard deviations for experiments in the multilingual setup 5.3 Results on test data Our final submission was made using the methodology of the second experiment. How- ever during for the test phase, we predicted our answers separately for the Spanish tweet feed and for the English tweet feed. In order to test the efficiency of our algorithm we made use of the software submission platform [11] To our surprise, the accuracy on the Spanish dataset vastly outperformed the accuracy on the English dataset, with an accuracy of 0.7450 to 0.65550. This was in contrast to our first experiments, where the difference was markedly less. Language accuracy English 0.6550 Spanish 0.7450 Table 4. Results for English and Spanish on the test data 6 Conclusion Our study has demonstrated the importance of carefully selected domain-specific fea- tures in the domain of fake news identification. These features are integral in classifying monolingual texts as well as multilingual text. We can conclude that it was possible to classify fake news spreaders on a limited dataset consisting of 300 Twitter users, by applying gradient boosting to a set of lexical, behavioural and psycholinguistic features and TFIDF to represent the text. Furthermore, these features have been shown to pro- vide useful information in multilingual environments and are not limited to monolingual contexts. References 1. Friedman, J.H.: Stochastic gradient boosting. Computational statistics & data analysis 38(4), 367–378 (2002) 2. Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intelligent Systems and their applications 13(4), 18–28 (1998) 3. Joo, Y., Hwang, I.: Author Profiling on Social Media: An Ensemble Learning Model using Various Features. Notebook for PAN at CLEF 2019 (2019) 4. Kosmajac, D., Keselj, V.: Twitter bot detection using diversity measures. In: Proceedings of the 3rd International Conference on Natural Language and Speech Processing. pp. 1–8 (2019) 5. Kosmajac, D., Keselj, V.: Twitter user profiling: Bot and gender identification. In: CLEF (Working Notes) (2019) 6. Martinc, M., Skrjanec, I., Zupan, K., Pollak, S.: Pan 2017: Author profiling-gender and language variety prediction. In: CLEF (Working Notes) (2017) 7. Mohammad, S.M., Bravo-Marquez, F., Salameh, M., Kiritchenko, S.: Semeval-2018 Task 1: Affect in tweets. In: Proceedings of International Workshop on Semantic Evaluation (SemEval-2018). New Orleans, LA, USA (2018) 8. Mohammad, S.M., Turney, P.D.: Crowdsourcing a word-emotion association lexicon 29(3), 436–465 (2013) 9. Patra, B.G., Banerjee, S., Das, D., Saikh, T., Bandyopadhyay, S.: Automatic author profiling based on linguistic and stylistic features. Notebook for PAN at CLEF 1179 (2013) 10. Popescu, I.I., Altmann, G.: Hapax legomena and language typology. Journal of Quantitative Linguistics 15(4), 370–378 (2008) 11. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. Springer (Sep 2019) 12. Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th Author Profiling Task at PAN 2020: Profiling Fake News Spreaders on Twitter. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) CLEF 2020 Labs and Workshops, Notebook Papers. CEUR-WS.org (Sep 2020) 13. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: Bots and gender profiling in twitter. In: Proceedings of the CEUR Workshop, Lugano, Switzerland. pp. 1–36 (2019) 14. Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter. Working notes papers of the CLEF pp. 1613–0073 (2017) 15. Rangel Pardo, F.M., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at pan 2015. In: CLEF 2015 Evaluation Labs and Workshop Working Notes Papers. pp. 1–8 (2015) 16. Rapoza, K.: Can ‘fake news’ impact the stock market? by Forbes (2017) 17. Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and qsar modeling. Journal of chemical information and computer sciences 43(6), 1947–1958 (2003) 18. VanderPlas, J.: Python data science handbook: Essential tools for working with data. " O’Reilly Media, Inc." (2016) 19. Zhou, X., Zafarani, R.: Fake news: A survey of research, detection methods, and opportunities. CoRR abs/1812.00315 (2018), http://arxiv.org/abs/1812.00315