-

The PBSDS: A Dataset for the Detection of Pseudoprofound Bullshit

Evan D. DeFrancesco

Carlo Strapparava

0 1

Pseudopf BS?

0 Fondazione Bruno Kessler 1 Università degli Studi di Trento , Italy

This paper introduces the PBSDS, a dataset of tweets containing pseudoprofound bullshit-statements designed to appear profound but lacking substantive meaning. The PBSDS serves as a resource for studying pseudoprofound bullshit, exploring potential linguistic factors in perceiving bullshit. The dataset's creation and experiments with classifiers show promising results, despite limitations such as selection bias and subjective annotation.

eol>pseudoprofound bullshit stylistic analysis pragmatics

Sentence yes

1. Introduction

“Bullshit" refers to communication that is designed to impress but is constructed without concern for truth [ 1 ]. no Bullshit difers from lying in that the liar deliberately manipulates and subverts truth (usually with the intent yes to deceive), while the bullshitter is simply unconcerned with what is true and what is false. A liar needs to know the truth value of a proposition; the bullshitter simply yes does not care.

Although bullshit comes in diferent forms, in this yes project, we focused specifically on what is referred to no as “pseudoprofound bullshit," which is designed to con- no vey some sort of potentially profound meaning but is actually semantically vacuous [ 2 ], e.g., “Hidden meaning transforms unparalleled abstract beauty." Table 1 reports further examples of pseudoprofound bullshit and non- Table 1 pseudoprofound bullshit sentences from our dataset. Examples of pseudoprofound bullshit and non

The goal of this project is to construct a dataset of pseudoprofound bullshit from the PBSDS. tweets that contain pseudoprofound bullshit in English (the PBSDS).1 Operating under the assumption that bullshit is similar to spam email, we hypothesize that it of bullshit receptivity. They found that a tendency to should be possible to detect pseudoprofound bullshit us- judge pseudoprofound bullshit statements as profound ing relatively simple classification algorithms. was correlated with relevant variables such as an intuitive cognitive style and belief in the supernatural. They also found that detecting bullshit was not simply a matter of 2. Related work and motivation skepticism but rather of discerning deceptive vagueness Pennycook et al. [ 2 ] first explored the psychological na- in impressive-sounding claims. Walker et al. [ 3 ] estabture of pseudoprofound bullshit, establishing an index lished a link between illusory pattern perception and the propensity to rate pseudo-profound bullshit statements as profound. Later research by Pennycook and Rand [ 4 ] has found that low pseudoprofound bullshit receptivity correlates positively with perceptions of fake news accuracy and negatively with the ability to distinguish fake and real news. Littrell and Fugelsang [ 5 ] extended this understanding by exploring individuals’ susceptibility to misleading information and its association with re

The unpredictable is a reflection of humble

excellence.

You must be good to yourself if you are ever going to be any good for others.

The law of attraction is always responding to your thoughts. You are attracting in every moment of your life.

Evolution is an ingredient of subjective excellence.

Our consciousness is a reflection of the door of balance.

A garden is a zoo for plants.

Scientists are simply adults who retained and nurtured their native curiosity from childhood. duced engagement in reflective thinking. They found philosophy and scientific communication. In particular, that both highly receptive and highly resistant individu- we scraped the following accounts, from which we colals exhibited limited awareness of their detection abilities lected a total of 12,000 tweets: for pseudo-profound bullshit. Turpin et al. [ 6 ] investigated the influence of diferent types of titles on the • @DeepakChopra: Deepak Chopra is a new-age perceived profoundness of abstract art, revealing that author and alternative medicine promoter. His pseudo-profound bullshit titles specifically enhanced the writing has been described as “incoherent babperceived profundity of the artwork. Nilsson et al. [ 7 ] bling strewn with scientific terms.” 4 found an association between pseudoprofound bullshit • @WisdomofChopra: WisdomOfChopra is opreceptivity and social conservatism and economic pro- erated by a bot that produces tweets that are gressivism. Relatedly, Evans et al. [ 8 ] examined scientific meant to replicate the tone and structure (but bullshit receptivity, which demonstrated positive correla- not necessarily the content) of Deepak Chopra. tions with pseudo-profound bullshit receptivity, belief in The tweets are generated by a simple algorithm: science, conservative political beliefs, and faith in intu- words and phrases are contained within four PHP ition. They found that scientific literacy moderated the arrays. The first array contains sentence subjects; relationship between the two types of bullshit receptiv- the second array contains verb phrases; the third ity. These studies collectively shed light on the nature of contains determiner phrases and adjectives; the pseudo-profound bullshit, its reception, and the under- fourth contains nouns. Words and phrases from lying cognitive mechanisms. However, the development each array are then combined to generate tweets. of a dedicated dataset of pseudoprofound bullshit can • @TheSecret: The Secret’s Twitter account is further facilitate comprehensive investigation and un- largely composed of messages that promote the derstanding of this phenomenon, contributing to future pseudoscientific “law of attraction,” which claims research endeavors. that positive thoughts attract positive experiences

Such a dataset could provide researchers with a stan- and negative thoughts attract negative experidardized and reliable resource to study and analyze the ences. phenomenon of pseudoprofound bullshit systematically. • @realNDWalsche: Neale Donald Walsch is an It would allow for the exploration of various linguistic, American new-age writer and speak whose work cognitive, and contextual factors that contribute to the has appeared in a film version of The Secret. His perception of profoundness in nonsensical statements. own writing consists primarily of new-age spiriAdditionally, an annotated dataset could serve as a bench- tuality texts. mark for developing and evaluating computational mod- • @kate_manne: Kate Manne is an associate proels and algorithms aimed at detecting and combating fessor of philosophy at Cornell University. Her pseudoprofound bullshit. It would enable the training research focuses on moral philosophy, metaethics, and testing of automated systems to recognize and clas- moral psychology, feminist philosophy and social sify instances of pseudoprofound bullshit accurately. This philosophy. In 2019, Manne was named one of could be instrumental in building tools and technologies the world’s top fifty thinkers. 5 to enhance critical thinking, identify deceptive informa- • @neiltyson: Neil deGrasse Tyson is an astrotion, and improve media literacy. physicist and science communicator.

3. Data 3.1. Scraping Twitter

2https://github.com/JustAnotherArchivist/snscrape 3As of 2023, called X.

We recognize that the decision to include artificially generated content from @WisdomofChopra may be seen as a controversial one. However, the distinction between human and artificial origins of the content was secondary for our purposes. What remained paramount was the essence of the content itself: its pseudoprofound nature.

We used snscrape2, an easy-to-use Python package, to crawl the Twitter3 profiles of six accounts and return the 2,000 most recent tweets from each account. The accounts were scraped on 8 August 2023. We selected 3.2. Data cleaning accounts that, we hoped, would provide a mix of pseu- From the initial 12,000 tweets collected, we excluded: dudoprofound bullshit, non-pseudoprofound bullshit, pro- plicate tweets; single-word tweets; tweets that were comfound philosophy and generic statements. For the initial posed only of hashtags; tweets that were direct replies dataset, we chose accounts that were associated with alternative medicine, pseudoscience, new age spirituality, 4https://www.washingtonpost.com/news/answersheet/wp/2015/05/15/scientist-why-deepak-chopra-is-drivingme-crazy/

5https://www.prospectmagazine.co.uk/magazine/prospectworlds-top-50-thinkers-2019 to other Twitter users; tweets that contained URLs; and tweets that contained emojis. We also removed the hashtag (#) and at-sign (@) from tweets. Finally, we decided to remove tweets that explicitly referenced a personal and individual deity (represented in the tweets as “God"), as we did not wish to cause any inadvertent ofense by labelling religious beliefs as pseudoprofound bullshit. After data cleaning, we were left with 5,196 tweets, comprising the initial PBSDS.

3.3. Annotation Classifier SVC KNN MNB

DTC LRC RFC 0.9307 0.8406 0.9008 0.8719 0.9435 0.9309 0.7943 0.8227 0.8156 0.8203 0.7896 0.8274 0.8571 0.8315 0.8561 0.8453 0.8597 0.8761

Acc

Two volunteer annotators provided judgments of whether classifiers selected for the task were the Support Vector a tweet constituted pseudoprofound bullshit. The anno- Classifier (SVC), K-nearest Neighbors (KNN), Multinotators were both students in their mid-20s and were pre- mial Naive Bayes (MNB), Decision Tree Classifier (DTC), viously not familiar with the concept of pseudoprofound Logistic Regression Classifier (LRC) and Random Forest bullshit. The annotators were provided with a work- Classifier (RFC). All models were implemented via the ing definition of pseudoprofound bullshit ( i.e., statements scikit-learn library [ 10 ]. that sound profound and meaningful but that are actu- The tweets were vectorized using tf-idf vectorization, ally semantically vacuous; pseudoprofound bullshit may and the data was split into a training set (85%) and a use grandiose terms to deceive people) as well as several testing set (15%). examples of sentences that constituted pseudoprofound In order to evaluate and compare the results of the six bullshit and that did not constitute pseudoprofound bull- classifiers, we used the standard metrics in text classificashit. The working definition was left purposefully vague, tion: Precision (P), Recall (R), F-score (F1) and Accuracy given the general dificulty of defining pseudoprofound (Acc). The results achieved with the six classifiers are bullshit. After all, what one person may consider to be reported in Table 2. pseudoprofound, another person might consider to be actually profound. Annotators were instructed to label the tweet ‘1’ if they believed that it constituted pseudopro- 6. Limitations found bullshit and ‘0’ if they did not. Perhaps reflecting the dificulty of arriving at a single sense of pseudopro- The PBSDS has several limitations that could be addressed found bullshit, Cohen’s kappa was calculated at 0.52, in future versions of the dataset. The dataset was colindicating moderate inter-rater reliability [ 9 ]. The first lected from specific Twitter accounts presumed to contain author of this paper adjudicated disagreements between pseudoprofound bullshit. This may have resulted in an the two annotators’ judgments. overrepresentation of pseudoprofound content compared to its overall occurrence in natural language. The dataset thus may not fully capture the range and diversity of 4. Dataset description pseudoprofound bullshit found in other contexts. Relatedly, the PBSDS’s reliance on tweets from specific Twitter After annotation, the PBSDS contains 2756 tweets judged accounts limits its generalizability to other platforms or as pseudoprofound bullshit (53.04% of the total dataset) sources of pseudoprofound bullshit. The characteristics and 2440 tweets judged as non-pseudoprofound bullshit and patterns observed in the dataset may not be repre(46.96% of the total dataset). Although the two classes sentative of pseudoprofound content found elsewhere. are reasonably well-balanced, pseudoprofound bullshit Future versions of the PBSDS could address this concern may be disproportionately represented in the dataset by diversifying the sources of data collection. This would compared to its overall occurrence in natural language. involve not only expanding the range of Twitter accounts However, this is not unexpected, given that the dataset under examination but also branching out to other social was sourced primarily from Twitter accounts that were media platforms, blogs, articles, printed publications and likely to include a large amount of pseudoprofound bull- even, perhaps, spoken word content. By incorporating shit. a broader spectrum of sources, the dataset would provide a more comprehensive and varied representation of 5. Experiments and Results pseudoprofound bullshit.

Additionally, defining and identifying pseudoprofound We trained six machine learning classifiers and compared bullshit can be challenging and subjective. The annotathe performance to test the validity of the dataset. The six tion process relied on the judgments of two annotators, which may have introduced inherent biases and variations in interpretations. Although eforts were made to establish guidelines, the subjective nature of the task may have afected the consistency of annotations. While the inter-rater reliability between the annotators was measured to be moderate, there was still inherent subjectivity and disagreement in determining whether a tweet constituted pseudoprofound bullshit. The resolution of disagreements by a single adjudicator introduced another layer of subjectivity. Introducing a multi-rater system, in which multiple individuals assess the content’s (pseudo)profundity, could add layers of reliability and objectivity to the dataset.

Finally, the PBSDS comprises 5,196 tweets, which is relatively small in comparison to other text corpora. This limited size may restrict the scope and statistical power of analyses, potentially impacting the generalizability of ifndings derived from the dataset.

7. Conclusion

Despite its limitations, the PBSDS ofers valuable insights into the phenomenon of pseudoprofound bullshit and its detection. The dataset provides a foundation for further research, enabling comprehensive investigations into linguistic patterns, cognitive biases, and societal implications associated with pseudoprofound bullshit. By better understanding and identifying pseudoprofound bullshit, researchers can develop tools and strategies to enhance critical thinking, combat deceptive communication, and promote media literacy in an increasingly complex information landscape.

Acknowledgments

We acknowledge the support of the PNRR project FAIR Future AI Research (PE00000013), under the NRRP MUR program funded by the NextGenerationEU.

[1]

H. G.

Frankfurt , On bullshit , in: On Bullshit, Princeton University Press, 2005 .

[2]

Pennycook ,

J. Allan

Cheyne ,

Barr ,

D. J.

Koehler ,

J. A.

Fugelsang , On the reception and detection of pseudo-profound bullshit , Judgment and Decision Making 10 ( 2015 ) 549 - 563 . doi: 10 .1017/ S1930297500006999.

[3]

A. C.

Walker ,

M. H.

Turpin ,

J. A.

Stolz ,

J. A.

Fugelsang ,

D. J.

Koehler , Finding meaning in the clouds: Illusory pattern perception predicts receptivity to pseudo-profound bullshit , Judgment and Decision Making 14 ( 2019 ) 109 - 119 .

[4]

Pennycook ,

D. G.

Rand , Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking , Journal of Personality 88 ( 2020 ) 185 - 200 .

[5]

Littrell ,

J. A.

Fugelsang , Bullshit blind spots: The roles of miscalibration and information processing in bullshit detection , Thinking & Reasoning ( 2023 ) 1 - 30 .

[6]

M. H.

Turpin ,

A. C.

Walker ,

Kara-Yakoubian ,

N. N.

Gabert ,

J. A.

Fugelsang ,

J. A.

Stolz , Bullshit makes the art grow profounder, Judgment and Decision making 14 ( 2019 ) 658 - 670 .

[7]

Nilsson ,

Erlandsson ,

Västfjäll , The complex relation between receptivity to pseudo-profound bullshit and political ideology , Personality and Social Psychology Bulletin 45 ( 2019 ) 1440 - 1454 .

[8]

Evans , W. Sleegers, Ž. Mlakar, Individual diferences in receptivity to scientific bullshit, Judgment and Decision Making 15 ( 2020 ) 401 - 412 .

[9]

J. R.

Landis , G. G. Koch, The measurement of observer agreement for categorical data , Biometrics ( 1977 ) 159 - 174 .

[10]

Pedregosa ,

Varoquaux ,

Gramfort ,

Michel ,

Thirion ,

Grisel ,

Blondel ,

Prettenhofer ,

Weiss ,

Dubourg , et al., Scikit-learn: Machine learning in python , The Journal of Machine Learning Research 12 ( 2011 ) 2825 - 2830 .