<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The PBSDS: A Dataset for the Detection of Pseudoprofound Bullshit</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evan D. DeFrancesco</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlo Strapparava</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pseudopf BS?</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università degli Studi di Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces the PBSDS, a dataset of tweets containing pseudoprofound bullshit-statements designed to appear profound but lacking substantive meaning. The PBSDS serves as a resource for studying pseudoprofound bullshit, exploring potential linguistic factors in perceiving bullshit. The dataset's creation and experiments with classifiers show promising results, despite limitations such as selection bias and subjective annotation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;pseudoprofound bullshit</kwd>
        <kwd>stylistic analysis</kwd>
        <kwd>pragmatics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sentence
yes</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        “Bullshit" refers to communication that is designed to
impress but is constructed without concern for truth [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. no
Bullshit difers from lying in that the liar deliberately
manipulates and subverts truth (usually with the intent yes
to deceive), while the bullshitter is simply unconcerned
with what is true and what is false. A liar needs to know
the truth value of a proposition; the bullshitter simply yes
does not care.
      </p>
      <p>
        Although bullshit comes in diferent forms, in this yes
project, we focused specifically on what is referred to
no
as “pseudoprofound bullshit," which is designed to con- no
vey some sort of potentially profound meaning but is
actually semantically vacuous [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], e.g., “Hidden meaning
transforms unparalleled abstract beauty." Table 1 reports
further examples of pseudoprofound bullshit and non- Table 1
pseudoprofound bullshit sentences from our dataset. Examples of pseudoprofound bullshit and
non
      </p>
      <p>
        The goal of this project is to construct a dataset of pseudoprofound bullshit from the PBSDS.
tweets that contain pseudoprofound bullshit in English
(the PBSDS).1 Operating under the assumption that
bullshit is similar to spam email, we hypothesize that it of bullshit receptivity. They found that a tendency to
should be possible to detect pseudoprofound bullshit us- judge pseudoprofound bullshit statements as profound
ing relatively simple classification algorithms. was correlated with relevant variables such as an intuitive
cognitive style and belief in the supernatural. They also
found that detecting bullshit was not simply a matter of
2. Related work and motivation skepticism but rather of discerning deceptive vagueness
Pennycook et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] first explored the psychological na- in impressive-sounding claims. Walker et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
estabture of pseudoprofound bullshit, establishing an index lished a link between illusory pattern perception and the
propensity to rate pseudo-profound bullshit statements
as profound. Later research by Pennycook and Rand [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
has found that low pseudoprofound bullshit receptivity
correlates positively with perceptions of fake news
accuracy and negatively with the ability to distinguish fake
and real news. Littrell and Fugelsang [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] extended this
understanding by exploring individuals’ susceptibility
to misleading information and its association with
re
      </p>
      <sec id="sec-2-1">
        <title>The unpredictable is a reflection of humble</title>
        <p>excellence.</p>
        <p>You must be good to yourself if you are ever
going to be any good for others.</p>
        <p>The law of attraction is always responding
to your thoughts. You are attracting in
every moment of your life.</p>
        <p>Evolution is an ingredient of subjective
excellence.</p>
        <p>Our consciousness is a reflection of the
door of balance.</p>
        <p>A garden is a zoo for plants.</p>
        <p>
          Scientists are simply adults who retained
and nurtured their native curiosity from
childhood.
duced engagement in reflective thinking. They found philosophy and scientific communication. In particular,
that both highly receptive and highly resistant individu- we scraped the following accounts, from which we
colals exhibited limited awareness of their detection abilities lected a total of 12,000 tweets:
for pseudo-profound bullshit. Turpin et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
investigated the influence of diferent types of titles on the • @DeepakChopra: Deepak Chopra is a new-age
perceived profoundness of abstract art, revealing that author and alternative medicine promoter. His
pseudo-profound bullshit titles specifically enhanced the writing has been described as “incoherent
babperceived profundity of the artwork. Nilsson et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] bling strewn with scientific terms.” 4
found an association between pseudoprofound bullshit • @WisdomofChopra: WisdomOfChopra is
opreceptivity and social conservatism and economic pro- erated by a bot that produces tweets that are
gressivism. Relatedly, Evans et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] examined scientific meant to replicate the tone and structure (but
bullshit receptivity, which demonstrated positive correla- not necessarily the content) of Deepak Chopra.
tions with pseudo-profound bullshit receptivity, belief in The tweets are generated by a simple algorithm:
science, conservative political beliefs, and faith in intu- words and phrases are contained within four PHP
ition. They found that scientific literacy moderated the arrays. The first array contains sentence subjects;
relationship between the two types of bullshit receptiv- the second array contains verb phrases; the third
ity. These studies collectively shed light on the nature of contains determiner phrases and adjectives; the
pseudo-profound bullshit, its reception, and the under- fourth contains nouns. Words and phrases from
lying cognitive mechanisms. However, the development each array are then combined to generate tweets.
of a dedicated dataset of pseudoprofound bullshit can • @TheSecret: The Secret’s Twitter account is
further facilitate comprehensive investigation and un- largely composed of messages that promote the
derstanding of this phenomenon, contributing to future pseudoscientific “law of attraction,” which claims
research endeavors. that positive thoughts attract positive experiences
        </p>
        <p>Such a dataset could provide researchers with a stan- and negative thoughts attract negative
experidardized and reliable resource to study and analyze the ences.
phenomenon of pseudoprofound bullshit systematically. • @realNDWalsche: Neale Donald Walsch is an
It would allow for the exploration of various linguistic, American new-age writer and speak whose work
cognitive, and contextual factors that contribute to the has appeared in a film version of The Secret. His
perception of profoundness in nonsensical statements. own writing consists primarily of new-age
spiriAdditionally, an annotated dataset could serve as a bench- tuality texts.
mark for developing and evaluating computational mod- • @kate_manne: Kate Manne is an associate
proels and algorithms aimed at detecting and combating fessor of philosophy at Cornell University. Her
pseudoprofound bullshit. It would enable the training research focuses on moral philosophy, metaethics,
and testing of automated systems to recognize and clas- moral psychology, feminist philosophy and social
sify instances of pseudoprofound bullshit accurately. This philosophy. In 2019, Manne was named one of
could be instrumental in building tools and technologies the world’s top fifty thinkers. 5
to enhance critical thinking, identify deceptive informa- • @neiltyson: Neil deGrasse Tyson is an
astrotion, and improve media literacy. physicist and science communicator.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data</title>
      <sec id="sec-3-1">
        <title>3.1. Scraping Twitter</title>
        <p>2https://github.com/JustAnotherArchivist/snscrape
3As of 2023, called X.</p>
        <p>We recognize that the decision to include artificially
generated content from @WisdomofChopra may be seen
as a controversial one. However, the distinction between
human and artificial origins of the content was secondary
for our purposes. What remained paramount was the
essence of the content itself: its pseudoprofound nature.</p>
        <p>We used snscrape2, an easy-to-use Python package, to
crawl the Twitter3 profiles of six accounts and return
the 2,000 most recent tweets from each account. The
accounts were scraped on 8 August 2023. We selected 3.2. Data cleaning
accounts that, we hoped, would provide a mix of pseu- From the initial 12,000 tweets collected, we excluded:
dudoprofound bullshit, non-pseudoprofound bullshit, pro- plicate tweets; single-word tweets; tweets that were
comfound philosophy and generic statements. For the initial posed only of hashtags; tweets that were direct replies
dataset, we chose accounts that were associated with
alternative medicine, pseudoscience, new age spirituality,
4https://www.washingtonpost.com/news/answersheet/wp/2015/05/15/scientist-why-deepak-chopra-is-drivingme-crazy/</p>
        <p>5https://www.prospectmagazine.co.uk/magazine/prospectworlds-top-50-thinkers-2019
to other Twitter users; tweets that contained URLs; and
tweets that contained emojis. We also removed the
hashtag (#) and at-sign (@) from tweets. Finally, we decided
to remove tweets that explicitly referenced a personal
and individual deity (represented in the tweets as “God"),
as we did not wish to cause any inadvertent ofense by
labelling religious beliefs as pseudoprofound bullshit. After
data cleaning, we were left with 5,196 tweets, comprising
the initial PBSDS.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Annotation</title>
        <sec id="sec-3-2-1">
          <title>Classifier SVC KNN MNB</title>
          <p>DTC
LRC
RFC
0.9307
0.8406
0.9008
0.8719
0.9435
0.9309
0.7943
0.8227
0.8156
0.8203
0.7896
0.8274
0.8571
0.8315
0.8561
0.8453
0.8597
0.8761</p>
          <p>Acc</p>
          <p>
            Two volunteer annotators provided judgments of whether classifiers selected for the task were the Support Vector
a tweet constituted pseudoprofound bullshit. The anno- Classifier (SVC), K-nearest Neighbors (KNN),
Multinotators were both students in their mid-20s and were pre- mial Naive Bayes (MNB), Decision Tree Classifier (DTC),
viously not familiar with the concept of pseudoprofound Logistic Regression Classifier (LRC) and Random Forest
bullshit. The annotators were provided with a work- Classifier (RFC). All models were implemented via the
ing definition of pseudoprofound bullshit ( i.e., statements scikit-learn library [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ].
that sound profound and meaningful but that are actu- The tweets were vectorized using tf-idf vectorization,
ally semantically vacuous; pseudoprofound bullshit may and the data was split into a training set (85%) and a
use grandiose terms to deceive people) as well as several testing set (15%).
examples of sentences that constituted pseudoprofound In order to evaluate and compare the results of the six
bullshit and that did not constitute pseudoprofound bull- classifiers, we used the standard metrics in text
classificashit. The working definition was left purposefully vague, tion: Precision (P), Recall (R), F-score (F1) and Accuracy
given the general dificulty of defining pseudoprofound (Acc). The results achieved with the six classifiers are
bullshit. After all, what one person may consider to be reported in Table 2.
pseudoprofound, another person might consider to be
actually profound. Annotators were instructed to label the
tweet ‘1’ if they believed that it constituted pseudopro- 6. Limitations
found bullshit and ‘0’ if they did not. Perhaps reflecting
the dificulty of arriving at a single sense of pseudopro- The PBSDS has several limitations that could be addressed
found bullshit, Cohen’s kappa was calculated at 0.52, in future versions of the dataset. The dataset was
colindicating moderate inter-rater reliability [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. The first lected from specific Twitter accounts presumed to contain
author of this paper adjudicated disagreements between pseudoprofound bullshit. This may have resulted in an
the two annotators’ judgments. overrepresentation of pseudoprofound content compared
to its overall occurrence in natural language. The dataset
thus may not fully capture the range and diversity of
4. Dataset description pseudoprofound bullshit found in other contexts.
Relatedly, the PBSDS’s reliance on tweets from specific Twitter
After annotation, the PBSDS contains 2756 tweets judged accounts limits its generalizability to other platforms or
as pseudoprofound bullshit (53.04% of the total dataset) sources of pseudoprofound bullshit. The characteristics
and 2440 tweets judged as non-pseudoprofound bullshit and patterns observed in the dataset may not be
repre(46.96% of the total dataset). Although the two classes sentative of pseudoprofound content found elsewhere.
are reasonably well-balanced, pseudoprofound bullshit Future versions of the PBSDS could address this concern
may be disproportionately represented in the dataset by diversifying the sources of data collection. This would
compared to its overall occurrence in natural language. involve not only expanding the range of Twitter accounts
However, this is not unexpected, given that the dataset under examination but also branching out to other social
was sourced primarily from Twitter accounts that were media platforms, blogs, articles, printed publications and
likely to include a large amount of pseudoprofound bull- even, perhaps, spoken word content. By incorporating
shit. a broader spectrum of sources, the dataset would
provide a more comprehensive and varied representation of
5. Experiments and Results pseudoprofound bullshit.
          </p>
          <p>Additionally, defining and identifying pseudoprofound
We trained six machine learning classifiers and compared bullshit can be challenging and subjective. The
annotathe performance to test the validity of the dataset. The six tion process relied on the judgments of two annotators,
which may have introduced inherent biases and
variations in interpretations. Although eforts were made to
establish guidelines, the subjective nature of the task
may have afected the consistency of annotations. While
the inter-rater reliability between the annotators was
measured to be moderate, there was still inherent
subjectivity and disagreement in determining whether a tweet
constituted pseudoprofound bullshit. The resolution of
disagreements by a single adjudicator introduced another
layer of subjectivity. Introducing a multi-rater system, in
which multiple individuals assess the content’s
(pseudo)profundity, could add layers of reliability and objectivity
to the dataset.</p>
          <p>Finally, the PBSDS comprises 5,196 tweets, which is
relatively small in comparison to other text corpora. This
limited size may restrict the scope and statistical power
of analyses, potentially impacting the generalizability of
ifndings derived from the dataset.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>7. Conclusion</title>
      <p>Despite its limitations, the PBSDS ofers valuable insights
into the phenomenon of pseudoprofound bullshit and its
detection. The dataset provides a foundation for further
research, enabling comprehensive investigations into
linguistic patterns, cognitive biases, and societal
implications associated with pseudoprofound bullshit. By better
understanding and identifying pseudoprofound bullshit,
researchers can develop tools and strategies to enhance
critical thinking, combat deceptive communication, and
promote media literacy in an increasingly complex
information landscape.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We acknowledge the support of the PNRR project FAIR
Future AI Research (PE00000013), under the NRRP MUR
program funded by the NextGenerationEU.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. G.</given-names>
            <surname>Frankfurt</surname>
          </string-name>
          ,
          <article-title>On bullshit</article-title>
          , in: On Bullshit, Princeton University Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pennycook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Allan</given-names>
            <surname>Cheyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Barr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Koehler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Fugelsang</surname>
          </string-name>
          ,
          <article-title>On the reception and detection of pseudo-profound bullshit</article-title>
          ,
          <source>Judgment and Decision Making</source>
          <volume>10</volume>
          (
          <year>2015</year>
          )
          <fpage>549</fpage>
          -
          <lpage>563</lpage>
          . doi:
          <volume>10</volume>
          .1017/ S1930297500006999.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Turpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Stolz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Fugelsang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Koehler</surname>
          </string-name>
          ,
          <article-title>Finding meaning in the clouds: Illusory pattern perception predicts receptivity to pseudo-profound bullshit</article-title>
          ,
          <source>Judgment and Decision Making</source>
          <volume>14</volume>
          (
          <year>2019</year>
          )
          <fpage>109</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pennycook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Rand</surname>
          </string-name>
          ,
          <article-title>Who falls for fake news? The roles of bullshit receptivity, overclaiming, familiarity, and analytic thinking</article-title>
          ,
          <source>Journal of Personality</source>
          <volume>88</volume>
          (
          <year>2020</year>
          )
          <fpage>185</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Littrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Fugelsang</surname>
          </string-name>
          ,
          <article-title>Bullshit blind spots: The roles of miscalibration and information processing in bullshit detection</article-title>
          ,
          <source>Thinking &amp; Reasoning</source>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>30</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Turpin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kara-Yakoubian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. N.</given-names>
            <surname>Gabert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Fugelsang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Stolz</surname>
          </string-name>
          ,
          <article-title>Bullshit makes the art grow profounder, Judgment and Decision making 14 (</article-title>
          <year>2019</year>
          )
          <fpage>658</fpage>
          -
          <lpage>670</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Erlandsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Västfjäll</surname>
          </string-name>
          ,
          <article-title>The complex relation between receptivity to pseudo-profound bullshit and political ideology</article-title>
          ,
          <source>Personality and Social Psychology Bulletin</source>
          <volume>45</volume>
          (
          <year>2019</year>
          )
          <fpage>1440</fpage>
          -
          <lpage>1454</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Evans</surname>
          </string-name>
          , W. Sleegers, Ž. Mlakar,
          <article-title>Individual diferences in receptivity to scientific bullshit, Judgment and Decision Making 15 (</article-title>
          <year>2020</year>
          )
          <fpage>401</fpage>
          -
          <lpage>412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Landis</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. G. Koch,</surname>
          </string-name>
          <article-title>The measurement of observer agreement for categorical data</article-title>
          ,
          <source>Biometrics</source>
          (
          <year>1977</year>
          )
          <fpage>159</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          , et al.,
          <article-title>Scikit-learn: Machine learning in python</article-title>
          ,
          <source>The Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>