Predicting the Gullibility of Users from their Online Behaviour Mateja Jovanović1 , Vida Groznik1 and Marko Tkalčič1 1 University of Primorska, Titov trg 4, 6000 Koper, Slovenia Abstract In this research we aimed to explore the predictors of gullibility in an online environment. We used machine learning algorithms to build models for predicting gullibility from social media behaviour. In total 103 Twitter users had completed the survey containing a scale for measuring gullibility. Survey data was then combined with the features extracted from the user’s activity on Twitter. Besides data that was directly accessible through the Twitter API, we engineered new features containing punctuation data, usage of emojis and text vectorization with TF-IDF. This data was then standardized and reduced using Principal Component Analysis. In the modeling phase we used both regression and classification techniques. After comparison of the results with their baselines, we conclude that there is an indication that gullibility can be predicted from online behaviour. Further research and analysis are planned and are needed for a better understanding of the relationship between social media activity and gullibility. Results from this experiment showed us great potential for future work. Keywords gullibility, machine learning, Twitter, predictive modeling 1. Introduction In a world filled with misinformation and people with bad intentions, gullibility has become a hot research topic. Broadly speaking, the term gullibility can be defined as “the quality of being easily deceived or tricked, and too willing to believe everything that other people say”1 . Similarly, the definition found on Wikipedia says that ”gullibility is a failure of social intelligence in which a person is easily tricked or manipulated into an ill-advised course of action”2 . There are many different interpretations of the definition of this personal trait, however, all of the authors agree on one thing and that is the need for further research in measuring and describing gullibility. It is believed that gullibility is fully or at least partially accountable for foolish actions such as falling for romance and financial scam, political exploitation and susceptibility to fake news and other forms of disinformation [1, 2, 3, 4, 5]. Classes of people that are especially vulnerable to exploitation due to gullibility include children, the elderly, and the developmentally disabled [6]. Besides financial damage, scam victims face other problems such as trust issues and long-term Human-Computer Interaction Slovenia 2021, November 11, 2021, Koper, Slovenia Envelope-Open matejajovanovicoffice@gmail.com (M. Jovanović); vida.groznik@famnit.upr.si (V. Groznik); marko.tkalcic@famnit.upr.si (M. Tkalčič) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 1 https://dictionary.cambridge.org/dictionary/english/gullibility 2 https://en.wikipedia.org/wiki/Gullibility trauma as a result of being a scam victim [7]. Protective organizations, banks, and insurance companies are constantly trying to inform people about threats on the internet and provide prevention systems to reduce the possibility of scams. Sadly, scammers are becoming much more creative and sophisticated with their ideas on tricking people and making a profit. Moreover, compared to the time period before the 2016 US presidential elections, there has been an increasing number of fake news. According to Google trends, people searched for the term “fake news” notably more often than before the elections3 . In 2016, the Oxford dictionary had declared that we are living in the “post-truth” age. That term has also become the word of the year4 . These are just some of the indicators of the power of disinformation. The impact of fake news is huge and has the potential to cause great damage in the future. Combining it with the gullibility of individuals is highly dangerous. Therefore, the goal of this work is to provide a tool for an unobtrusive detection of users’ gullibility, which can help the users themselves and the agencies that wish to help the users. 2. Related work Right from the beginning, we noticed a quite sparse set of definitions of gullibility [2, 3, 8, 5, 9, 4]. Researchers tried to address the problem of gullibility in different scenarios, mostly because of the assumptions that this trait is highly contextual. For example, Greenspan has studied gullibility in adults with intellectual disabilities [3]. He claims that this group of people is especially vulnerable to any kind of scams and is easily fooled. He claims that the accountable trait for such an unfortunate outcome is gullibility. However, adults with intellectual disabilities are just the most noticed victims of their foolish actions. The author believes that other people face gullibility as well but to a different extent and has described that the foolish action can be broken down into four parts, described in the four-factor model of gullible behavior [3]. The model is displayed in Fig 1. Figure 1: Gullible behaviour as modeled by Greenspan [3] Other researchers [4, 10] have proposed that gullibility is caused by insensitivity to untrust- worthiness cues. Yamagishi tested if a high level of trust is correlated with a high level of gullibility and has shown that it is quite the opposite. His results indicate that people who have higher initial levels of trust are better at detecting untrustworthiness cues and therefore less gullible than people with low initial trust levels. There is also confusion between gullibility and credulity. In his work, Greenspan has addressed this issue and made a difference between those 3 https://trends.google.com/trends/explore?date=all&q=fake%20news 4 https://languages.oup.com/word-of-the-year/2016/ two terms [3]. Credulity is described as a tendency to believe unlikely propositions without having supporting evidence for them. However, if those credulous beliefs involve action and there is a cause-effect relationship between them, it is defined as gullibility [3]. Taking into account the work already done in this domain, we found one research that manages to measure gullibility using a twelve item self-report gullibility scale [5]. The authors did a thorough job and performed five different studies for developing and validating their gullibility questionnaire. Moreover, this scale has been behaviourally validated in another study where participants were exposed to phishing emails[2]. Both studies showed that the 12-item gullibility scale is a reliable method for measuring gullibility. Nevertheless, even after reviewing the current state of the art methods for measuring gullibility, we were unable to find research focused on measuring the user’s gullibility in an unobtrusive way, for example by using their social media activity. We believe that this could be a great benefit to understanding gullible acts in the first place, but also a useful tool for preventing potential victims from being exploited in financial, romance, political and other scams. Hence, in this paper we propose such a method. 3. Methodology In order to devise a method for detecting gullibility from social media traces of users we used the methodology depicted in Fig 2. We first performed a pre-study to validate the questionnaire, then collected the data in the main study. We then proceeded with data pre-processing and feature engineering, finally we evaluated the predictive model. 3.1. Pre-study Since researchers have found evidence that gullibility can be highly contextual and that it is correlated to the weak sense of self and high emotionality, we had decided to reproduce their findings. To do that we created a pre-study consisting of 66 items coming from 7 different scales and questionnaires. We tested the performance of this questionnaire and decided to remove some questions (the ones that do not add much information). This trade-off was made because of the long completion time and a high number of uncompleted questionnaires. The final version of the survey consisted of 42 questions and it took on average 10 minutes to complete. 3.2. Main study Data collection has been made through a shareable link that redirected participants to the landing page hosted on 1ka.si. Participants were recruited through the personal network of the authors. Prior to filling in the survey all of the participants were given the instructions and consent form. Besides regular questions mentioned in the Sect. 3.1, we added a form where users had to input their Twitter usernames. All participants who wished to participate were asked to provide their unprotected (public) Twitter profile. However, there were still invalid entries that we had to remove during the data cleaning phase. Information about user profiles and their responses to the questionnaire were stored separately in order to protect their privacy and remain their information confidential. In the Figure 2: Methodology pipeline data cleaning stage we had to remove all of the invalid data points. Survey entries that had been uncompleted or contained false answers to the attention check questions were excluded and considered invalid. Similarly, all of the provided Twitter profiles that were protected were excluded together with their respectful survey entries. When the data was cleaned it was time to sum up answers by each group that they are coming from, e.g. all emotionality questions were added up together to make a new variable that represented the sum of scores from emotionality questions. While we were summing up scores we were adjusting answers which had been reversely scored. Furthermore, free-form questions were converted into True/False entries. When summing up these questions we counted how many questions did user answer correctly. On the other side we were scraping data from their Twitter profiles. Directly from Twitter we obtained the following information: likes count, followers count, friends (followees) count, statuses (tweet, retweet, reply) count, status text (tweet’s text), location, protected account (boolean), likes count received on the status, retweets count received on the status, listed count, profile’s date of creation. Nonetheless, this was not enough information to start with the modeling, therefore we started extracting data from the text of the acquired statuses. First, we checked the language of the statuses and created two groups of statuses per user. In one group were only statuses written in English language and in other were all statuses (including English ones). We did this because we wanted to use NLP techniques only on English language statuses, since there were participants that write statuses in two or more different languages. Inspired by researches in the use of emojis and punctuation, we decided to count all inter-punction signs, e.g. ”?”, ”,” or ”-”, and emojis for each user[11, 12]. For this we used the group with all statuses. For the English group of statuses we used a common NLP approach consisting of tokenization of the text and removal of stop words. For both methods we used the NLTK library 5 . The last step was to merge all tokens of all users and do the vectorization of the words. Then we applied the TF-IDF method for giving appropriate weight to the vectors. This extracted information was then merged with the survey data and standardized using the StandardScaler6 . Up to this point we produced a large number of input variables for the model. To reduce the unnecessary model complexity we applied the PCA (principal component analysis) dimensionality reduction technique. We have chosen 35 components to be optimal since they were explaining 69% of the variance in the data. This made our dataset ready for modeling. But before we were able to do that, we had to make sure we split the data properly. Because of our small sample size of 103 participants, we used nested five-fold cross validation for splitting the data and hyperparameter optimization. In the modeling phase, we were predicting the variable gullibility. This variable has been made by summing up answers to the 12 questions from the gullibility scale. We used the 7 point likert scale to measure answers to each of the 12 gullibility questions. The range of the whole gullibility scale, was from 12 to 84 however, we only managed to record values ranging from 12 to 60. Additionally, we decided to approach the prediction of user gullibility both as a regression and classification problem. For classification models we used: random forest, gradient boosting, logistic regression, SVC and bagging in combination with SVC. For the regression models we used: SVR, ridge and stochastic gradient descent. The metrics that we used to compare the model results with their baselines were accuracy, recall, precision and f1 for classification models, and RMSE and MAE for regression models 7 . 5 https://www.nltk.org/ 6 https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html 7 https://scikit-learn.org/stable/modules/model_evaluation.html 4. Results 4.1. Correlation matrix Figure 3 shows the correlations between some of the variables in the data we collected. We can observe some high correlation absolute values among variables, for example between age and cognitive reflection, between financial knowledge and financial satisfaction, and between emotionality and sense of self score. In regards to gullibility the highest correlations absolute values were detected in combination with financial knowledge and sense of self. Other features that showed a slightly lower correlation to gullibility are financial skills, emotionality, age and education. 4.2. Age-gender distribution Figure 4 shows the distribution of genders across different age groups. The majority of the participants were between 21 and 40 years old. Out of 103 participants, 60 were males, 37 were females and six others. 4.3. Classification In Tab. 1 we summarized the results of the classification task, where we classified each user as being either gullible or not. The baseline algorithm was predicting the most frequent class (majority classifier). Baseline RF GB LR SVC Bagging + SVC mean accuracy 0.456 0.515 0.506 0.640 0.562 0.611 std accuracy 0.036 0.106 0.068 0.128 0.098 0.096 mean precission 0.180 0.531 0.530 0.629 0.555 0.639 std precision 0.223 0.164 0.116 0.164 0.125 0.174 mean F1 0.248 0.476 0.499 0.634 0.569 0.591 std F1 0.305 0.128 0.112 0.164 0.127 0.122 mean recall 0.400 0.442 0.540 0.641 0.592 0.563 std recall 0.490 0.118 0.243 0.166 0.139 0.110 Table 1 Comparison of the classification models 4.4. Regression In Tab. 2 we summarized the results of the regression task, where we predicted the value of the gullibility variable on the scale from 12 to 84. The baseline algorithm was predicting the average value of the predicted variable in the training set. Figure 3: Gullibility correlation matrix Baseline SVR Ridge SGD mean RMSE 10.345 10.794 10.002 10.036 std RMSE 1.654 1.662 1.646 1.640 mean MAE 8.600 8.675 8.201 8.183 std MAE 1.458 1.288 1.343 1.295 Table 2 Comparison of the regression models Figure 4: Age-gender distribution 5. Discussion Results from the figure 3 showed that gullibility is negatively correlated to the financial knowl- edge and financial skills which are part of the financial literacy questionnaire. All of the correlations values from the matrix were generally low but, in comparison to the other features financial knowledge and financial skills have a high absolute correlation to gullibility. This is important because we added the financial literacy questionnaire to our survey in order to investigate if there is a relationship between gullibility and this fairly contextual feature. These findings are not enough to support any claims about gullibility, but they represent the first step towards new findings in this direction. Besides mentioned features, sense of self, emotionality, age and gender were showing signs that they are correlated gullibility. Emotionality and age were expected to be in this group since we know that other researchers had similar results [2, 6]. Surprisingly, sense of self was positively correlated to gullibility, even though other evidence shows that a weak sense of self is correlated to gullibility[2]. After the comparison of the model’s performance we can say that both approaches, classifi- cation and regression, performed better than their baselines. We reported the average results from 5 different splits to make results more reliable and avoid optimistic bias caused by lucky split. In the Tab. 1 we can see the results of the classification models compared next to the baseline results. The best performing classification model was a logistic regression with a mean accuracy of 0.640 however, when interpreting the average result we should take into account the standard deviation. Logistic regression also had the highest standard deviation (0.128) from all classification models. If we take a look into precision metrics we can see that Bagging in combination with SVC performed slightly better than the logistic regression. The Tab. 2 represents the results of the regression models compared to their baseline results. The baseline was calculated by taking the average result from all splits. Results did not vary much across the models. The only model that underperformed and had worse results than the baseline was the SVR model. 6. Limitations and future work Possible limitations of this research could be the small sample size. We have planned to extend our research in order to solve this issue and gain statistical significance over our results. Also, there is a possibility that highly gullible people are not using Twitter, for example elderly people[6]. Besides this we believe that the models have shown any indication that gullibility can be measured from users’ online behaviour. In our further research on this topic we will try to use more sophisticated language models, that would enable us to utilize the information from non-english tweets as well. We have tested if gullibility is correlated with financial literacy and failed to report a statistically significant correlation. This could be due to the complexity of the questions used to measure financial literacy. However, the correlations between financial knowledge and gullibility and financial skills and gullibility were in the top three highest correlations in respect to gullibility. For future work we also suggest testing out different (simpler) questionnaires for financial literacy. References [1] The social psychology of gullibility: Conspiracy theories, fake news and irrational beliefs, Routledge, 2019. [2] M. S. George, A. K. Teunisse, T. I. Case, Gotcha! Behavioural validation of the Gullibility Scale, Personality and Individual Differences 162 (2020) 110034. URL: https://doi.org/10. 1016/j.paid.2020.110034. doi:1 0 . 1 0 1 6 / j . p a i d . 2 0 2 0 . 1 1 0 0 3 4 . [3] S. Greenspan, Chapter 5 Foolish Action in Adults with Intellectual Disabilities. The Forgotten Problem of Risk-Unawareness, volume 36, 1 ed., Elesvier Inc., 2008. URL: http://dx.doi.org/10.1016/S0074-7750(08)00005-0. doi:1 0 . 1 0 1 6 / S 0 0 7 4 - 7 7 5 0 ( 0 8 ) 0 0 0 0 5 - 0 . [4] T. Yamagishi, M. Kikuchi, M. Kosugi, Trust, gullibility, and social intelligence, Asian Journal of Social Psychology 2 (1999) 145–161. URL: https://onlinelibrary.wiley.com/doi/ abs/10.1111/1467-839X.00030. doi:h t t p s : / / d o i . o r g / 1 0 . 1 1 1 1 / 1 4 6 7 - 8 3 9 X . 0 0 0 3 0 . [5] A. K. Teunisse, T. I. Case, J. Fitness, N. Sweller, I Should Have Known Better: Development of a Self-Report Measure of Gullibility, Personality and Social Psychology Bulletin 46 (2020) 408–423. doi:1 0 . 1 1 7 7 / 0 1 4 6 1 6 7 2 1 9 8 5 8 6 4 1 . [6] S. Greenspan, Annals of gullibility: Why we get duped and how to avoid it, Praeger, 2008. [7] D. Glodstein, S. Glodstein, J. Fornaro, Fraud trauma syndrome: The victims of the bernard madoff scandal., Journal of Forensic Studies in Accounting & Business 2 (2010). [8] H. Mercier, How gullible are we? A review of the evidence from psychology and social science, Review of General Psychology 21 (2017) 103–122. doi:1 0 . 1 0 3 7 / g p r 0 0 0 0 1 1 1 . [9] S. Greenspan, G. Loughlin, R. S. Black, Credulity and gullibility in people with develop- mental disorders: A framework for future research, International Review of Research in Mental Retardation 24 (2001) 101–135. doi:1 0 . 1 0 1 6 / s 0 0 7 4 - 7 7 5 0 ( 0 1 ) 8 0 0 0 7 - 0 . [10] J. B. Rotter, Interpersonal trust, trustworthiness, and gullibility., American Psychologist 35 (1980) 1–7. URL: https://doi.org/10.1037/0003-066x.35.1.1. doi:1 0 . 1 0 3 7 / 0 0 0 3 - 0 6 6 x . 3 5 . 1 . 1 . [11] A. A. Md Shoeb, S. Raji, G. De Melo, Emotag - Towards an emotion-based analysis of emojis, International Conference Recent Advances in Natural Language Processing, RANLP 2019-September (2019) 1094–1103. doi:1 0 . 2 6 6 1 5 / 9 7 8 - 9 5 4 - 4 5 2 - 0 5 6 - 4 _ 1 2 6 . [12] D. N. Gunraj, A. M. Drumm-Hewitt, E. M. Dashow, S. S. N. Upadhyay, C. M. Klin, Texting insincerely: The role of the period in text messaging, Computers in Human Behavior 55 (2016) 1067–1075. URL: http://dx.doi.org/10.1016/j.chb.2015.11.003. doi:1 0 . 1 0 1 6 / j . c h b . 2015.11.003.