<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Factoring in Context for the Automatic Detection of Misrepresentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bruna Paz Schmid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Annette Hautli-Janisz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steve Oswald</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Fribourg</institution>
          ,
          <addr-line>Avenue de l'Europe 20, 1700 Fribourg</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Passau</institution>
          ,
          <addr-line>94030 Passau</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>20</volume>
      <issue>2024</issue>
      <fpage>11</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>The aim of the paper is to show how a solid theoretical pragmatic underpinning informs an automatic approach to identifying and classifying misrepresentation in social media. To that end we present a dataset that encodes misrepresentation as well as the source that is misrepresented, building on a set of pragmatically informed annotation guidelines. The performance of standard statistic classifiers for misrepresentation detection is promising. We also perform a fine-grained manual error analysis. The paper closes with a longitudinal analysis of misrepresentation in our dataset and shows that items labelled as misrepresentation increase in years that coincide with political campaigns.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Misrepresentation</kwd>
        <kwd>pragmatics</kwd>
        <kwd>natural language processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        A key feature of Trump’s political campaign and one-term presidency, which started in early 2017 and
ended four years later in 2021, was the strategical use of social media. In October and November 2023
alone, CNN has fact-checked twelve speeches, concluding that his “fall remarks were teeming with false
claims - a staggering quantity of misrepresentations, exaggerations and outright lies that made sheer
wrongness a central feature of each of his addresses” [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Through social media, misrepresentations
occur faster and in a more targeted way than in traditional media outlets where political messages are
usually assessed in terms of their factual content.
      </p>
      <p>Identifying misrepresentations, i.e., a metarepresentation that is not similar enough to the original
representation at the inferential level given a certain context, in a systematic manner is one building
block for helping voters assess the political strategies and worldviews of potential future leaders. But
tackling problematic content that is spread in connection to political campaigns is not simply an issue
of quickly sifting through large quantities of data. What makes it especially challenging is the quality of
the content. As the results of CNN’s fact-checking indicate, “sheer wrongness” comes in various forms,
one of which is misrepresentation – a notion that may be understood as a form of misinformation, that
is, false information.</p>
      <p>The aim of the paper is to show how a solid theoretical pragmatic underpinning informs an automatic
approach to identifying and classifying misrepresentation in social media. To this end, we present a
dataset that encodes misrepresentation as well as the source that is misrepresented, building on a set
of pragmatically informed annotation guidelines. The performance of standard statistic classifiers for
misrepresentation detection is promising. We also perform a fine-grained manual error analysis. The
paper closes with a longitudinal analysis of misrepresentation in our dataset and shows that items
labelled as misrepresentation increase in years that coincide with political campaigns.</p>
      <p>The paper proceeds as follows: Section 2 discusses related work in automatically identifying
misrepresentation. Section 3 presents the theoretical pragmatic underpinning, followed by a description of the
annotation guidelines in Section 4. Section 5 includes information regarding data preprocessing steps,
classification, model performance and error analysis. Section 6 discusses longitudinal aspects of the
study, while section 7 discusses its results and Section 8 its limitations.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Research and governmental-driven eforts alike are trying to contain the efects of misrepresentation
by finding ways to automatically identify information online that is misrepresenting the original or is
outright false. For instance, the European Commission has sponsored projects aimed at developing
AItools, such as Monitio, a media monitoring platform that includes fact-checking [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The fact-checking
system is evidence-based in that it works by retrieving documents from an available collection of news
articles which then serve as evidence for the predictions.
      </p>
      <p>
        However, these approaches to fact-checking and misinformation tend not to diferentiate between the
various forms of false information. Instead, identification often occurs with the help of vocabularies and
datasets labelled based on stance detection [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], truthfulness [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ], topic-matching [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], or linguistic
features associated with fake news [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        Misrepresentation itself has rarely been a topic of research in NLP. One exception is Michael Yeomans’
study about partisan misrepresentation of political opponents through straw man arguments [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In an
experimental setting, participants were tasked with articulating their own together with their opponents’
positions. They were instructed to write down open-ended responses about the Afordable Care Act.
Responses were then labelled depending on whether they were the participants’ genuine or imitated
positions. At the computational level, texts were scored with sentiment analysis and a lexicon of morally
charged words. This was followed by a machine learning model – a logistic LASSO regression - trained
to distinguish between texts from opponents and supporters. However, it is dificult to conclude that
the study was about the straw man argument specifically since the theoretical underpinning remains
unspecified in the paper. As such, the scope of the study appears to have been limited to the analysis of
partisan incendiary language accompanying misrepresentation.
      </p>
      <p>In the next section, we will elaborate on our theoretical underpinning. Our approach being
theorydriven is what diferentiates it from others: Every step is guided by pragmatic theory because, next to
identifying misrepresentation, the aim of our study was also to understand the phenomenon from a
linguistic and political perspective.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Pragmatic theory and misrepresentation</title>
      <p>The core theoretical underpinning of the present paper is an observation from pragmatics, namely
that people tend to show regularity in their language use due to the social aspects of communication
[11, pp. 4-6], for instance if people intend to misrepresent or discredit the original. This means that
certain patterns in language depend on the context they are embedded in. Therefore, by defining and
describing the relevant contexts, we can link patterns of language use to certain pragmatic phenomena
such as misrepresentation.</p>
      <p>
        In this study, we build on theories from political discourse analysis [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], pragmatics [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]
and philosophy of language [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] in defining misrepresentation as a metarepresentation that is not
similar enough to the original representation at the inferential level given a certain context. Example
(1) shows a tweet posted by Donald J. Trump on 27 July 2017 in which he claims that the New York
Times (belonging to the left-wing spectrum) asserts that ‘Fox and Friends’ (right-wing spectrum) is the
most powerful TV show in America.
The original text from an opinion piece in the NYT from 19 July 2017 is shown in (2) and includes some
of the text that precedes and follows the quotation:
(2)
      </p>
      <sec id="sec-3-1">
        <title>Original</title>
        <p>
          For years, it was a nontaxing mix of news, lifestyle and conservative couch gab, a warm-up before
Fox’s day of politics and commentary. Suddenly, for no other reason than its No. 1 fan, it is
the most powerful TV show in America. (It’s also easily the most-watched cable news morning
show, averaging 1.6 million viewers in the year’s second quarter, following a post-Trump ratings
boost.) [...] [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
        </p>
        <p>Based on the theoretical pragmatic underpinning of this paper, a mispresentation M in the political
context needs to meet the following criteria C:</p>
        <sec id="sec-3-1-1">
          <title>C1: M is a metarepresentation in terms of intentionality.</title>
          <p>C2: There are perceivable structural or componential diferences between the original and its
metarepresentation.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>C3: There is noticeable change in the metarepresentation.</title>
          <p>C4: The diference between the original representation and the metarepresentation results in a
diference in comprehension.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>C5: The diference in comprehension is politically relevant.</title>
          <p>Based on C1, Example (1) is a metarepresentation, because it contains representative content
discernible on a verbatim quotation, on the reported speech verb “said,” and on the use of “[w]ow” to
express a psychological state through the positive evaluation of the content of the verbatim quotation.
The tweet is also significantly shorter than the overall article, satisfying C2.</p>
          <p>Regarding C3, the quotation is isolated from the article and as a result, there is an emphasis on the
content of the quotation which is evaluated positively with “[w]ow.” Criteria C4 is met for a variety
of reasons: For one, the quotation was taken from an opinion article and therefore represents an
individual’s opinion (and not necessarily NYT’s). Secondly, Fox &amp; Friends being described as “the most
powerful TV show in America” is surprising and concerning given the adverb “[s]uddenly”, which
marks unexpectedness. Removing the beginning of the sentence thus changes the overall sentiment
in the tweet to a positive one that is absent in the original representation. Thirdly, the author of the
original does not appear to believe that the show deserves its new status, since the latter is said to result
from “no other reason than its No. 1 fan”, which undermines the inherent quality of the show. Finally,
from the first to the second sentence in the original, Fox &amp; Friends develops from “conservative couch
gab” to “the most powerful TV show in America”, i.e., the author implies that the world is turned upside
down as a result of the former president’s relationship with Fox &amp; Friends.</p>
          <p>In terms of C5, a politically relevant diference emerges between the original and the
misrepresentation: The New York Times’ original article achieves relevance by being identified as a critical piece
which deems the show’s new importance to be undeserved and perhaps even dangerous, considering
that it is the result of the influence of the president of the United States. From this point of view,
it may be a warning against the manipulation of the media for political purposes given the media’s
role as a check on governmental power and democracy. This criticism is especially strong since it
originates in the media itself. Strikingly, Trump’s metarepresentational rendition achieves political
relevance by concluding the opposite, with important contextual implications: Even the New York
Times, which is biased and left-wing and is known to be critical of Trump, recognizes how important
his favorite show, in which he often participates, is. Additionally, given Trump’s relationship with the
show, presenting it in a positive light may be an attempt to promote himself as well. Thus, it would
represent an instance of positive self-presentation. Whereas the original representation is likely an
attempt to revise available assumptions and thus change the current state of the world where Fox
&amp; Friends is portrayed as undeservedly powerful, the metarepresentation is likely to function as an
attempt to strengthen those same assumptions and thus to protect the current state of the world.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. The dataset</title>
      <p>
        The basis of the investigation is the Trump Twitter Archive, a database that contains most tweets posted
from Trump’s personal account, @realDonaldTrump, between 2009 and 2021. The site was launched in
2016 and includes 56,571 tweets [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The maximum character count of the tweets ranges between 140
and 280. In the following, we discuss the steps taken to prune the dataset in order to be able to model
pragmatic theory and misrepresentation in a meaningful way. The dataset and classification code are
available at https://github.com/runastef/auto-ident-trump-misrep.git.
      </p>
      <p>Filtering Two filters are applied consecutively with the aim of increasing the likelihood that the
resulting array contains misrepresentation. The first filter extracted tweets containing quotation marks,
which usually signal the presence of representative content for instance in the form of reported speech.
Therefore, the presence of quotation marks is more likely to be used to comment on an original
representation. We exclude tweets predating Trump’s presidential campaign announcement from the
selection as well as retweets. The same was attempted for quoted replies by excluding tweets containing
the handle @realDonaldTrump. The intention behind excluding retweets and quoted replies was to limit
the conversational context of the tweets to the relevant original representations by reducing the amount
of representative content. This reduces the contextual complexity of the tweets so as to strengthen the
relationship between the utterances and the pragmatic phenomenon of misrepresentation. Eventually,
the filter excluded the expressions ‘Nobody’, ‘establishment’, ‘Washington’, ‘elite’, and ‘Congress’. The
resulting pre-annotation dataset, which combines both selections, contained a total of 1,737 tweets.
Annotation study The annotation of the selected tweets was done by two annotators after
instruction, one of them being a co-author of the paper. The annotation guidelines reflect the criteria for
misrepresentation discussed in Section 3 in that they are the deciding factors when a tweet is judged
as being a misrepresentation of an original. The decision is binary, i.e., ’misrepresentation’ versus
’not-misrepresentation’. Inter-annotator agreement with Cohen’s Kappa was 0.765 over the whole
dataset, which signals substantive agreement. To increase the quality of the dataset only tweets that
both annotators agreed on were included. The resulting dataset has 214 items, 107 are labelled as being
an instance of misrepresentation and 107 not being considered misrepresentations.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Predicting misrepresentation</title>
      <p>5.1. Preprocessing
In preparation for the application of the text classification algorithms, we normalize, remove noise
and anticipate and prevent issues connected to expressions such as URLs. Stop words were removed.
Tokenization was done with TweetTokenizer from the NLTK library, which takes into account the
specific linguistic structures prevalent in social media.</p>
      <p>
        The list of stop words is updated to reflect Trump’s language use. Since Trump’s language use lacks
complexity, removing frequent words may result in the removal of a significant amount of meaning
because Trump’s vocabulary contains many elements that would normally count as stop words. As a
result, removing such words could influence the classification in a negative way as important patterns
linked to his language use might be lost. This might cause an issue with the Naïve Bayes classifier,
for example, since it is a probabilistic model that bases its decisions on the frequency with which the
diferent tokens are present in a certain label during the training phase. Thus, to avoid losing potentially
important information, most verbs and conjunctions are kept while pronouns were removed from the
list of stop words.
5.2. Classification
In the next step, the Tfidf Vectorizer (Term-Frequency Inverse Document Frequency Vectorizer) from
scikit-learn was used for vectorization, i.e., a bag of words representation of all tweets was created,
containing the tf-idf values for each word across all tweets. Normally, this vectorizer does its own
tokenization, i.e., a library-internal module splits the running text into tokens. For the purpose of
this paper, we overwrite this module since tweets have a unique format that can be challenging for
tokenization. Since scikit-learn still requires tokenization for internal reasons, we follow the method
introduced in David Batista’s blog and pass a dummy tokenizer and preprocessor that returns the same
input without changing it in scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
5.3. Results
The text classification algorithms employed in this study are: Naïve Bayes, Support Vector Classifier
(with a linear kernel) and Random Forest, all imported from the scikit-learn library. All classifiers were
left on their standard configurations for learning purposes. On average, all three classifiers performed at
around 70% based on mean accuracy (Naïve Bayes: 0.71, SVC: 0.73, Random Forest: 0.68). The accuracy
scores ranged between 71% and 72% for Naïve Bayes, between 72% and 74% for SVC, and between 67%
and 69% for Random Forest based on a 95% confidence interval. The scores are promising considering
that a larger sample size may well improve the performance of the classifiers given that the content of
the training data is expected to be more balanced with a larger sample size. The results are promising
even when compared with related work: Miranda et al.’s [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] evidence-based automated fact checking
platform presents predictions maintained to be correct 58% of the time, and Pérez-Rosas et al.’s [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
fake news detector reportedly presents accuracies between 50% and 76% depending on the domain
associated with the dataset that is used. The results also suggest that the Support Vector Machine
classifier performed slightly better than the other two.
5.4. Error analysis
The performances of Naïve Bayes and SVC are slightly reduced in testing sets containing larger tweets.
That is, the performance seems to decrease the higher the number of tokens in the testing set is. This
could be due to an imbalance between the training set and the testing set. Random Forest was probably
less afected by this because it relies on the decision of multiple classifiers that reach their individual
decisions based on a significantly smaller sample size than SVC and Naïve Bayes during the same run.
      </p>
      <p>Incorrect predictions often arise when the language in the tweet is uncharacteristically complex,
for instance when a supporter’s tweet is copied and posted through Trump’s account and contains
language that is more complex than Trump’s typical writing style. Here is such an example:
(3)</p>
      <sec id="sec-5-1">
        <title>Copied and posted tweet</title>
        <p>"@racheljoycowley: I’m done with Macy’s. Apparently, they follow the trend of trying to force
legislation on every American freedom. Done!" (ID: 616646192609558528)
In an arbitrary run, this tweet was incorrectly predicted as not-misrepresentation by all three classifiers.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Tracking misrepresentation over time</title>
      <p>Having annotated the tweets to be either misrepresentation or not, counting them over the years on
the x-axis allows for a longitudinal view of the tweeting behavior of Donald J. Trump.</p>
      <p>Figure 1 hints at a possible correlation between the number of tweets containing misrepresentations
that Trump posted and the years in which he was active in presidential election campaigns. The overall
trend towards less tweets was already evident in the pre-annotation dataset. Thus, it is not surprising.
What is interesting is the way in which the trend appears to change abruptly in 2018 and then again in
2019. Both categories experience an increase between 2018 and 2019. However, whereas the instances of
not-misrepresentation begin to decrease from 2019 on, the instances of misrepresentation rise sharply.</p>
      <p>It is also worth pointing out that given the connection to campaign years, the graph may be interpreted
in terms of the reliability of the created dataset since misrepresentation is probably more likely to occur
during political campaigns due to the nature of politics.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion and summary</title>
      <p>Future work could include a general analysis of Trump’s language use, which might help to improve
the preprocessing. This could take the form of a linguistic features engineering study to determine his
writing style. The findings could then be used to create a stop words list that is better able to reflect
Trump’s language use although, perhaps, a more generalized approach based on country and political
afiliation may be more helpful for the study of misrepresentation itself because it would be easier to
generalize. It may also prove to be more practical in terms of implementation.</p>
      <p>The process of evaluating the performances of the classifiers could be simplified and improved with
an explainer. The LIME (Local Interpretable Model-Agnostic Explanations) Text Explainer was dificult
to implement even after replacing the LinearSVC classifier with an SVC classifier with a linear kernel.
Initially, the LinearSVC classifier was used, but it was changed into an SVC classifier with a linear
kernel. It should produce similar results seeing as LinearSVC is an implementation of SVC. The change
was necessary because LinearSVC does not support the function predict_proba, which calculates the
probabilities for each class prediction. It would have been dificult to evaluate the performance of the
LinearSVC classifier without predict_proba. LIME was chosen because, in theory, it should work well
with all three classifiers as long as the models are able to “predict the probabilities of the categories.”
According to Albrecht et al., LIME “works locally by taking a look at each prediction separately. This is
achieved by modifying the input vector to find the local components that the predictions are sensitive
to” [20, p. 195]. Then, “[f]rom the behavior in the vicinity of the vector, it will draw conclusions about
which components are more or less important” and “visualize the contributions and explain the decision
mechanism of the algorithm for individual documents” [20, p. 195]. However, LIME’s implementation
and interpretation was challenging. As such, in the end, it was not taken into consideration in the error
analysis. And yet, evaluating the performance of the classifiers would probably have been significantly
more straightforward with such an explainer. In its absence, the process is considerably slowed down.
Improving explainability would have practical implications for future research, for instance, although
we favor a theory-driven approach, together with the misrepresentation dataset the findings could be
used to improve or expand available vocabularies employed in current fact-checking systems.</p>
      <p>To summarize, this study contributes to the research of pragmatically relevant phenomena with
computational linguistic methods by discussing how to account for various aspects of the context at
diferent stages of the research process. Specifically, eforts were made to retain contextual information
related to the social and political contexts. To this end, a framework was developed for the pragmatic
analysis of political misrepresentation with computational methods. Based on the framework, annotation
guidelines were written to enable the creation of a misrepresentation dataset, which was then employed
to train supervised machine learning algorithms used for text classification. The three algorithms
performed at around 70% with SVC performing slightly better than the other two algorithms.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Limitations</title>
      <p>The findings of this study may be limited by the small size of the dataset, the data selection process,
data source, and the format of the text data.</p>
      <p>The study relied on tweets posted from Trump’s Twitter account. Consequently, the methods applied
in this study might yield diferent results on the discourses of other individuals especially if one
considers Trump’s unique language use and political afiliation. Widening the scope of the study to
include political discourse from a larger number of politicians is likely to lead to better insight into
political misrepresentation. To this end, our study will hopefully provide a basis for further research
into a topic that has not received a lot of attention so far.</p>
      <p>The relative novelty of pragmatic research into misrepresentation with computational linguistic
methods also explains the chosen format. The smaller number of tokens present in tweets were expected
to facilitate computation given the theoretical underpinning. Although the small format may limit the
study’s generalizability, it facilitates the qualitative analysis of the results which will help us to widen
the scope of the analysis in future studies on this topic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dale</surname>
          </string-name>
          ,
          <article-title>Trump's avalanche of dishonesty: Fact-checking 102 of his false claims from this fall</article-title>
          ,
          <year>2023</year>
          . URL: https://edition.cnn.com/
          <year>2023</year>
          /12/01/politics/ trump-dishonesty-avalanche-102
          <string-name>
            <surname>-</surname>
          </string-name>
          fall
          <article-title>-false-claims/index</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nogueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Secker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Garrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mitchel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Marinho</surname>
          </string-name>
          ,
          <article-title>Automated fact checking in the news room</article-title>
          , in: L. Liu,
          <string-name>
            <given-names>R.</given-names>
            <surname>White</surname>
          </string-name>
          (Eds.),
          <source>The World Wide Web Conference</source>
          , ACM, New York, NY, USA,
          <year>2019</year>
          , pp.
          <fpage>3579</fpage>
          -
          <lpage>3583</lpage>
          . doi:
          <volume>10</volume>
          .1145/3308558.3314135.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vlachos</surname>
          </string-name>
          ,
          <article-title>Emergent: a novel data-set for stance classification</article-title>
          , in: K.
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Nenkova</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          Rambow (Eds.),
          <source>Proceedings of the</source>
          <year>2016</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , San Diego, CA, USA,
          <year>2016</year>
          , pp.
          <fpage>1163</fpage>
          -
          <lpage>1168</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N16</fpage>
          -1138.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Rashkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Jang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Volkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <article-title>Truth of varying shades: Analyzing language in fake news and political fact-checking</article-title>
          , in: M.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Hwa</surname>
          </string-name>
          , S. Riedel (Eds.),
          <source>Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          , Association for Computational Linguistics, Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>2931</fpage>
          -
          <lpage>2937</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          -1317.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>"liar, liar pants on fire": A new benchmark dataset for fake news detection</article-title>
          , in: R. Barzilay, M.-Y. Kan (Eds.),
          <article-title>Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics: Short Papers, Association for Computational Linguistics</article-title>
          , Vancouver, Canada,
          <year>2017</year>
          , pp.
          <fpage>422</fpage>
          -
          <lpage>426</lpage>
          . URL: https://aclanthology.org/P17-2067. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          -2067.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wiechmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kerz</surname>
          </string-name>
          ,
          <article-title>A language-based approach to fake news detection through interpretable features and brnn</article-title>
          , in: A.
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Zubiaga (Eds.),
          <source>Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Barcelona, Spain,
          <year>2020</year>
          , pp.
          <fpage>14</fpage>
          -
          <lpage>31</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .rdsm-
          <volume>1</volume>
          .2.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Miani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hills</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bangerter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Loco:</surname>
          </string-name>
          <article-title>The 88-million-word language of conspiracy corpus</article-title>
          ,
          <source>Behavior research methods 54</source>
          (
          <year>2022</year>
          )
          <fpage>1794</fpage>
          -
          <lpage>1817</lpage>
          . doi:
          <volume>10</volume>
          .3758/s13428-021-01698-z.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kuzmin</surname>
          </string-name>
          , Larionov, Daniil,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pisarevskaya</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Smirnov</surname>
          </string-name>
          ,
          <article-title>Fake news detection for the russian language</article-title>
          , in: A.
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Zubiaga (Eds.),
          <source>Proceedings of the 3rd International Workshop on Rumours and Deception in Social Media (RDSM)</source>
          ,
          <article-title>Association for Computational Linguistics</article-title>
          , Barcelona, Spain,
          <year>2020</year>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>57</lpage>
          . URL: https://aclanthology.org/
          <year>2020</year>
          .rdsm-
          <volume>1</volume>
          .5/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pérez-Rosas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lefevre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          ,
          <article-title>Automatic detection of fake news</article-title>
          , in: E. M.
          <string-name>
            <surname>Bender</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Derczynski</surname>
          </string-name>
          , P. Isabelle (Eds.),
          <source>Proceedings of the 27th International Conference on Computational Linguistics</source>
          , Association for Computational Linguistics, Santa Fe,
          <string-name>
            <surname>NM</surname>
          </string-name>
          , USA,
          <year>2018</year>
          , pp.
          <fpage>3391</fpage>
          -
          <lpage>3401</lpage>
          . URL: https://aclanthology.org/C18-1287.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yeomans</surname>
          </string-name>
          ,
          <article-title>The straw man efect: Partisan misrepresentation in natural language</article-title>
          ,
          <source>Group Processes &amp; Intergroup Relations</source>
          <volume>25</volume>
          (
          <year>2022</year>
          )
          <fpage>1905</fpage>
          -
          <lpage>1924</lpage>
          . doi:
          <volume>10</volume>
          .1177/13684302211014582.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Yule</surname>
          </string-name>
          , Pragmatics, Oxford introductions to language study, Oxford University Press, Oxford, UK,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>T. A. van Dijk</surname>
          </string-name>
          ,
          <article-title>Ideology and discourse analysis</article-title>
          ,
          <source>Journal of Political Ideologies</source>
          <volume>11</volume>
          (
          <year>2006</year>
          )
          <fpage>115</fpage>
          -
          <lpage>140</lpage>
          . doi:
          <volume>10</volume>
          .1080/13569310600687908.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wilson</surname>
          </string-name>
          , Political discourse, in: D.
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>H. E.</given-names>
          </string-name>
          <string-name>
            <surname>Hamilton</surname>
          </string-name>
          , D. Schifrin (Eds.),
          <source>The Handbook of Discourse Analysis</source>
          , Blackwell Handbooks in Linguistics, Wiley Blackwell, Malden and Oxford,
          <year>2015</year>
          , pp.
          <fpage>775</fpage>
          -
          <lpage>794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.-J.</given-names>
            <surname>Noh</surname>
          </string-name>
          , Metarepresentation:
          <string-name>
            <given-names>A</given-names>
            <surname>Relevance-Theory</surname>
          </string-name>
          <string-name>
            <surname>Approach</surname>
          </string-name>
          , volume
          <volume>69</volume>
          , John Benjamins Publishing Company, Amsterdam,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wilson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sperber</surname>
          </string-name>
          ,
          <article-title>Relevance theory</article-title>
          , in: L. R. Horn, G. Ward (Eds.),
          <source>The Handbook of Pragmatics</source>
          , Wiley, Malden, MA,
          <year>2006</year>
          , pp.
          <fpage>607</fpage>
          -
          <lpage>632</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Searle</surname>
          </string-name>
          ,
          <source>Intentionality: An Essay in the Philosophy of Mind</source>
          , Cambridge University Press, Cambridge, UK,
          <year>2012</year>
          . doi:
          <volume>10</volume>
          .1017/CBO9781139173452.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Poniewozik</surname>
          </string-name>
          , Watching 'fox &amp; friends,'
          <article-title>trump sees a two-way mirror</article-title>
          ,
          <year>2017</year>
          . URL: https://www. nytimes.com/
          <year>2017</year>
          /07/19/arts/television/donald-trump
          <article-title>-fox-friends</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Brown</surname>
          </string-name>
          , Trump twitter archive,
          <year>2016</year>
          . URL: https://www.thetrumparchive.com/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Batista</surname>
          </string-name>
          ,
          <string-name>
            <surname>Applying</surname>
          </string-name>
          scikit-learn
          <source>tfidfvectorizer on tokenized text</source>
          ,
          <year>2018</year>
          . URL: https://www. davidsbatista.net/blog/2018/02/28/Tfidf Vectorizer/.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Albrecht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramachandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winkler</surname>
          </string-name>
          ,
          <article-title>Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (NLP) Applications,</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          , Inc., Sebastopol, CA,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>