=Paper= {{Paper |id=Vol-3681/T3-1 |storemode=property |title=Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans |pdfUrl=https://ceur-ws.org/Vol-3681/T3-1.pdf |volume=Vol-3681 |authors=Megha Sundriyal,Md Shad Akhtar,Tanmoy Chakraborty |dblpUrl=https://dblp.org/rec/conf/fire/SundriyalA023a }} ==Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans== https://ceur-ws.org/Vol-3681/T3-1.pdf
                                Overview of the CLAIMSCAN-2023: Uncovering Truth
                                in Social Media through Claim Detection and
                                Identification of Claim Spans
                                Megha Sundriyal1 , Md Shad Akhtar1 and Tanmoy Chakraborty2
                                1
                                    IIIT Delhi, India
                                2
                                    IIT Delhi, India


                                               Abstract
                                               A significant increase in content creation and information exchange has been made possible by the
                                               quick development of online social media platforms, which has been very advantageous. However, these
                                               platforms have also become a haven for those who disseminate false information, propaganda, and fake
                                               news. Claims are essential in forming our perceptions of the world, but sadly, they are frequently used
                                               to trick people by those who spread false information. To address this problem, social media giants
                                               employ content moderators to filter out fake news from the actual world. However, the sheer volume
                                               of information makes it difficult to identify fake news effectively. Therefore, it has become crucial to
                                               automatically identify social media posts that make such claims, check their veracity, and differentiate
                                               between credible and false claims. In response, we presented CLAIMSCAN in the 2023 Forum for
                                               Information Retrieval Evaluation (FIRE’2023). The primary objectives centered on two crucial tasks: Task
                                               A, determining whether a social media post constitutes a claim, and Task B, precisely identifying the
                                               words or phrases within the post that form the claim. Task A received 40 registrations, demonstrating a
                                               strong interest and engagement in this timely challenge. Meanwhile, Task B attracted participation from
                                               28 teams, highlighting its significance in the digital era of misinformation.

                                               Keywords
                                               Claims, Social Media, Claim Detection, Claim Span Identification, Twitter, Misinformation, Fact-Checking




                                1. Introduction
                                The rapid growth of online social media platforms has facilitated a significant increase in
                                content creation and information exchange, which has been highly beneficial. However, these
                                platforms have also become a breeding ground for those who spread malicious rumors, fake
                                news, propaganda, and misinformation. Claims play a vital role in shaping our understanding
                                of the world, but unfortunately, they are often used by purveyors of fake news to deceive people.
                                The COVID-19 “Infodemic" is a prime example of this phenomenon, which has resulted in the
                                widespread dissemination of false information about politics and social issues, as well as fake
                                medical claims [1]. To address this issue, social media giants hire content moderators to separate
                                fake news from the real thing. However, the sheer volume of information makes it difficult
                                to identify fake news effectively. As a result, automatically identifying posts on social media

                                Forum for Information Retrieval Evaluation, December 15-18, 2023, India
                                $ meghas@iiitd.ac.in (M. Sundriyal); shad.akhtar@iiitd.ac.in (M. S. Akhtar); chak.tanmoy.iit@gmail.com
                                (T. Chakraborty)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                cc-by.pdf
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                                ceur-ws-logo.pdf




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Table 1
Task A: Representative examples of claims and non-claims.
                                               Text                                                  Claim
  My heartfelt gratitude goes out to the men and women in uniform who did not back down               No
  from putting their lives in danger to save the lives of our citizens in difficult circumstances.
  According to research into the dangers of cooking with aluminum foil, some of the toxic             Yes
  metal can contaminate food. This is especially true when cooking or heating spicy or
  acidic foods in foil. Aluminum levels in the body have been linked to osteoporosis and
  Alzheimer’s disease.
  Furthermore, health insurers should recognize alternative medicine as a treatment option            No
  because there is a chance of recovery.
  Toothpaste Zaps Pimples. Don’t pop your pimples! Daily Glow recommends applying                     Yes
  toothpaste to a pimple before bed and washing it off with warm water when you wake
  up in the morning. Toothpaste draws impurities out of pores while also drying the skin
  and shrinking the pimple.


Table 2
Task B: Representative examples of claims and their claim spans.
                                  Claim                                               Claim Span
  According to research into the dangers of cooking with aluminum           cooking with aluminum foil,
  foil, some of the toxic metal can contaminate food. This is especially    some of the toxic metal can
  true when cooking or heating spicy or acidic foods in foil. Aluminum      contaminate food.
  levels in the body have been linked to osteoporosis and Alzheimer’s
  disease.
  Toothpaste Zaps Pimples. Don’t pop your pimples! Daily Glow rec-          Toothpaste Zaps Pimples.
  ommends applying toothpaste to a pimple before bed and washing
  it off with warm water when you wake up in the morning. Tooth-
  paste draws impurities out of pores while also drying the skin and
  shrinking the pimple.


platforms containing such claims, verifying their validity, and distinguishing between credible
and false claims has emerged as a critical research problem in NLP.
   The concept of a claim, defined by Toulmin [2] as an assertion that deserves attention, is cen-
tral to Argument Mining (AM). However, the segregation of claims is complex and challenging
due to language structure and context variation across different sources. Differentiating between
claims and non-claims is highly subjective and tricky, making it difficult for human annotators
and advanced state-of-the-art neural models. Table 1 furnishes a few examples of claims and
non-claims for more understanding. Although claim-detecting systems have advanced, there
is still room for improvement in their precision and efficiency [3]. The dynamic nature of
online social media platforms presents a significant challenge. New types of misinformation can
emerge quickly, and keeping up with changing trends and patterns can take time. In addition to
the challenges of efficiently identifying claims, another factor affecting the fact-checking task is
extracting precise snippets of the claim from the entire social media post, which often contain
extraneous irrelevant text [4]. Table 2 depicts claims and their corresponding claim spans.
   Disentangling such argumentative units of misinformation from benign statements has
numerous advantages, including performing downstream tasks like claim check-worthiness and
verification, adding explainability to the coarse-grained claim detection task, and simplifying
the fact-checking process for human fact-checkers. This task, however, is complex and requires
overcoming technical obstacles such as language complexity and variability.
   To this end, we present the CLAIMSCAN-2023, a shared task in the 2023 edition of the Forum
for Information Retrieval Evaluation workshop. Through this shared task, we aim to develop
systems that can effectively detect and identify claims within social media text. To accomplish
this, we propose two sub-tasks:
    • Task A Claim Detection: Given a social media post, the task is to identify whether or not
      the post contains a claim.
    • Task B Claim Span Identification: Given a social media post containing a claim, the
      objective is to pinpoint the exact phrase of the post that constitutes the claim.


2. Background
The growth of online social media has greatly amplified the spread of misinformation, primarily
through disseminating false claims. This presents a significant risk to online users, as misinfor-
mation can spread rapidly without any effective countermeasures in place. Consequently, tasks
related to identifying and handling claims have gained considerable prominence within the field
of Natural Language Processing (NLP), particularly as a crucial precursor to automated fact
verification. Claims, as a core component of misinformation, have been the subject of extensive
research from multiple perspectives in recent years. This includes areas such as Claim Detection
[5, 6, 3], Claim Check-worthiness [7, 8], Claim Span Identification [4, 9], Claim Normalization
[10], and Claim Verification [11, 12, 13, 14].
   Pioneering efforts in the study of claims can be attributed to Bender et al. [15], who introduced
the “Authority and Alignment in Wikipedia Discussions" corpus, which comprised around 365
discussions sourced from Wikipedia Talk Pages. This work garnered substantial attention
from researchers focusing on claims and served as the cornerstone for the challenging field
of automated claim detection. Over the last decade, the investigation of online claims has
gained some traction within the NLP research community. A primary attempt was made by
Rosenthal and McKeown [16]; they used a supervised approach based on sentiment and word-
gram derivatives to mine claims from discussion platforms. Despite the fact that their work was
limited to traditional machine-learning approaches, it laid the groundwork for future research
in this field. Following research on claim detection, linguistically motivated features such as
sentiment analysis, syntax, context-free grammar, and parse trees were heavily emphasized
[17, 18, 19].
   Given that the majority of studies at the time focused on domain-specific formal texts,
Daxenberger et al. [20] addressed this limitation by conducting cross-domain claim detection
across six diverse datasets, revealing both distinctive and shared features across different
domains. Recent research has led to the use of Large Language Models (LLMs), which hold
great promise. Chakrabarty et al. [5] demonstrated the power of fine-tuning with their ULMFiT
language model, which was fine-tuned on a large Reddit corpus of approximately 5 million
opinionated claims. A generalized claim detection model was proposed by Gupta et al. [6] that
detects the presence of a claim in any online text, regardless of source. They worked with both
structured and unstructured data by training a combination of linguistic encoders (part-of-speech
and dependency trees) and a contextual encoder. Because language models incur significant
computational overheads, Sundriyal et al. [3] addressed this issue and proposed a lighter
framework that attempted to generate discernible feature spaces for individual classes while
avoiding using LLMs and focusing on the definition-centric approach. Several computational
social science researchers have expressed interest in the CLEF-2020 shared task organized by the
CheckThat! Lab [21]. Williams et al. [22] won the task by fine-tuning the RoBERTa model [23],
which was further strengthened by mean pooling and dropout. With their RoBERTa vectors
supplemented with Twitter meta-data, Nikolov et al. [24] bagged second position.
   The existing body of claim detection research primarily focuses on identifying claims at the
sentence level rather than delving into the finer details of exact claim spans. As a result, a recent
advancement in this field has moved away from broad, sentence-level claim identification models
and toward more detailed, fine-grained claim span identification [4]. The idea of rationales
was first presented by Zaidan et al. [25], who highlighted segments of the text that validated
the conclusions of their label. They reported a significant improvement in performance after
incorporating these rationales into the training process for sentiment classification of movie
reviews. In the field of argumentation mining, Trautmann et al. [26] released the AURC-8
dataset, which includes token-level span annotations for the argumentative components of
stance, as well as their corresponding label. The SemEval community has initiated coarse-grained
span identification concerning other domains of argument mining such as toxic comments
[27] and propaganda techniques [28]. These shared tasks amassed many solutions constituting
transformers [29], convolutional neural networks [30], data augmentation techniques [31, 32, 33],
and ensemble frameworks [34, 35]. Wührl and Klinger [36] compiled a corpus of around 1200
biomedical-related tweets with claim phrases. Apart from English, argument extraction has
also been examined for other languages like Greek [37, 38] and German [39]. In a recent study
conducted by Sundriyal et al. [4], a systematic approach was presented for identifying claim
spans within social media posts. Additionally, they created an extensive Twitter corpus manually
annotated specifically for this task.


3. Tasks Description and Settings
CLAIMSCAN-2023 shared task consists of two sub-tasks: Claim Detection and Claim Span
Identification. Participants were free to engage in one or both sub-tasks.

Task A (Claim Detection): Given a social media post, the objective is to identify whether
a claim is present within a provided post or not. This task can be quite demanding, as claims
exhibit diverse structures and can be concealed within extensive text segments. Hence, the
system needs to discern patterns and linguistic cues that are indicative of claims, which may
encompass assertive language, explicit statements on a topic, and allusions to supporting
evidence or sources.

Task B (Claim Span Identification): Following the initial determination of whether the
post contains a claim, the subsequent step entails pinpointing the precise span of the claim
within the post. It is crucial for the system to precisely identify the specific words or phrases
that form the claim, as this information plays a pivotal role in assessing its accuracy during the
fact-checking process.


4. Datasets
To accomplish Task A (Claim Detection), we utilize a publicly available large-scale claim
detection dataset developed and curated for tweets [6]. The dataset was manually annotated
extensively using carefully crafted guidelines, yielding a collection of 9, 894 tweets labeled as
either containing a claim or not containing a claim. The statistics of the dataset are detailed
in Table 3. For Task B (Claim Span Identification), we use the CURT dataset, which contains
9, 458 claim spans from 7, 555 tweets [4]. Table 4 contains the dataset statistics and details.
This dataset has also been annotated manually, with each span identified and tagged using the
BIO (Begin-Inside-Outside) encoding scheme [40], as shown in Table 5. This tagging scheme
indicates whether each word in the tweet is within a claim span and, if so, whether at the start
or end of the span.

Table 3
Task A: Statistics of claim detection dataset.
                                     Dataset     Claim       Non-claim
                                     Train set    7354             1055
                                     Test set     1296              189
                                     Overall      8650             1244



Table 4
Task B: Statistics of claim span identification dataset.
                    Dataset                              Train        Test    Validation
                    Total no. of claims                    6044        755       756
                    Avg. length of tweets                  27.40      26.93     27.29
                    Avg. length of spans                   10.90      10.97     10.71
                    No. of span per tweet                  1.25       1.20       1.27
                    No. of single span tweets              4817        629       593
                    No. of multiple span tweets            1201        121       161

  We took great care in developing annotation guidelines for both tasks, which went through
several iterations and have already been published in two highly regarded peer-reviewed
conferences. In addition, to ensure the quality of the data, we conducted pilot studies and
enlisted human annotators with a strong understanding of claims and who are active social
Table 5
A few examples of social media posts from CURT dataset [4] and their corresponding BIO tags depicting
claim spans.
  Text                                                           Span
  @mcford77 @floradoragirl Exactly. that is the point. Home      {O, O, O, O, O, O, O, B, I, I, I, I, I, I}
  Schooling prevents loads of #Coronavirus deaths.
  @JoeySalads Zero. #Covid19 is a hoax. The dead people died     {O, O, B, I, I, I, B, I, I, I, I, I, I, O, O,
  of something else. Where are the rest of the Corpses? If       O, O, O, O, O, O, O, O, O, O, O, O, O,
  #coronavirus is real, then NYC would not be the greatest hit   O, O, O, O, O, O, O, O, O, O, O, O, O,
  spot of DEATH from it in the world by a factor of five. What   O, O, O, O, O, O, O, O, O}
  about Mexico City? Sydney?


media users to manually annotate the datasets. This rigorous process helps to ensure data
accuracy and reliability, resulting in more robust and reliable models. More details about the
datasets can be found in Gupta et al. [6] and Sundriyal et al. [4].


5. Evaluation Metrics
The evaluation metric for both tasks is the F1 score. For Task A, we compute Macro-F1 scores
using Scikit-learn Library in Python used by the existing systems for claim detection [6, 3, 5].
For Task B, as the final labels for spans follow the BIO tagging notation, our task becomes
a sequence labeling task. We compute Token-F1 scores following existing span detection
methods [27, 4]. Each team was allowed a maximum of 10 submissions, and the best scores
obtained on test data were used for the leaderboard.


6. Participating Systems and Results
Task A received 40 registrations, and Task B received 28 registrations. Out of these 6 teams
submitted their official runs for Task A, while 4 submitted for Task B. We first describe the
teams that submitted system description papers.

    • Team NLytics[41]: Team NLytics participated in both subtasks. For Task A, they fine-
      tuned the RoBERTa model [23] using RoBERTaForSequenceClassification, optimizing it
      with a regression loss (Binary Cross-Entropy Loss). They employed the AdamW optimizer
      with an initial learning rate of 2e-5. The optimizer followed a schedule where the learning
      rate increased linearly from 0 to the initial rate during a warm-up period and then
      decreased linearly to 0. The training process encompassed 20 training epochs. In Task
      B, they utilized RoBERTa and added a layer of linear-chain Conditional Random Field
      (CRF) [42]. As RoBERTa operates with byte pair encoding (BPE) units, while CRF requires
      whole words, only the initial tokens of words were used as input to the CRF, with any
      word continuation tokens being excluded. The training was started with 20 epochs, with
      an early stopping callback monitoring the model’s performance on the validation set.
    • Team mjs227[43]: Team mjs277 participated only in Task B. For identifying claim
      spans, they used the positional transformer architecture. The positional transformer
      is a transformer encoder architecture variant that uses a position-sensitive attention
      mechanism called positional attention. The underlying language model in their proposed
      model was RoBERTa𝐵𝐴𝑆𝐸 [23].
    • Team CODE[41]: Team CODE participated in both subtasks. In Task A, they fine-tuned
      a BERT-based model [44] optimized for sequence classification and trained for 5 epochs.
      They utilized a binary cross-entropy loss (BCE loss) and employed an Adam optimizer
      for this task. In Task B, they employed the RoBERTa model and conducted fine-tuning to
      predict a binary label (0 or 1) for every token, indicating whether the token is associated
      with a claim or not. Instead of using the IOB tag set, they adhered to IO tags. Their model
      underwent training for a duration of 4 epochs, and to eliminate noise, they excluded
      instances with claim spans consisting of fewer than three words.


Table 6
Task A results for the best run per team based on macro-F1 scores.
                                      Rank         Name         Macro-F1
                                        1         NLytics        0.7002
                                        2      bhoomeendra       0.6900
                                        3         amr8ta         0.6678
                                        4          CODE          0.6526
                                        5      michaelibrahim    0.6324
                                        6         pakapro        0.4321



   The official results for Task A are presented in Table 6. Among the six participating teams,
Team NLytics clinched the top position, attaining a noteworthy macro-F1 score of 0.7002.
Following closely, Team bhoomeendra secured the second position.1 Second position was
bagged by Team amr8ta.1 In the fourth spot for Task A was Team CODE, achieving a macro-F1
score of 0.6526. The fifth and sixth positions were occupied by Team michaelibrahim and
Team pakapro, with macro-F1 scores of 0.6324 and 0.4321, respectively.1 It’s worth noting the
substantial margin between the top-performing team and the rest.
   The official results for Task B are in Table 7. Team mjs277 achieved the highest ranking among
all participating teams, with a token-F1 of 0.8344. To identify claim spans, they harnessed the
positional transformer architecture, resulting in a substantial enhancement of their model’s
performance. Team bhoomeendra secured the second position in the task, achieving a token-F1
score of 0.80301 . Team NLytics attained the third spot by fine-tuning a RoBERTa model for
predicting BIO tags for each token in the input sentence, complementing it with a Conditional
Random Field (CRF) layer. In fourth place was Team CODE, who opted for IO tags instead of
BIO tags to signify whether a token was part of the claim or not.


    1
        They did not release their system description papers.
Table 7
Task B results for the best run per team based on token-F1 scores.
                                Rank        Name          Token-F1
                                  1         mjs227          0.8344
                                  2      bhoomeendra        0.8030
                                  3        NLytics          0.7821
                                  4         CODE            0.5714



7. Conclusion
We presented the first edition of the CLAIMSCAN-2023 shared task. This shared task encom-
passed two vital subtasks within the fact-checking process, ranging from detecting claims in
social media posts to determining the exact claim spans. These tasks collectively contribute
to developing technology that aids human fact-checkers in their endeavors. We witnessed
significant participation, with Task A drawing 40 registrations and Task B garnering 28 regis-
trations. A total of 6 teams and 4 teams submitted official runs for Tasks A and B, respectively.
We discussed the tasks and main findings of the three participating teams who submitted their
systems based on their system description papers. We look forward to enriching our datasets
with more examples, diverse information sources, and languages. Our overarching objective is
to share our insights and inspire researchers to bridge the gaps in the field, ultimately enhancing
the effectiveness of fact-checking systems and contributing to a safer online environment. In the
future, we also aim to expand the scope of our task to encompass a broader range of modalities,
such as images.


References
 [1] S. B. Naeem, R. Bhatti, The covid-19 ’infodemic’: a new front for information professionals,
     Health information and libraries journal 37 (2020) 233—239. URL: https://europepmc.org/
     articles/PMC7323420. doi:10.1111/hir.12311.
 [2] S. E. Toulmin, The uses of argument, Cambridge university press, 2003.
 [3] M. Sundriyal, P. Singh, M. S. Akhtar, S. Sengupta, T. Chakraborty, Desyr: definition
     and syntactic representation based claim detection on the web, in: Proceedings of the
     30th ACM International Conference on Information & Knowledge Management, 2021, pp.
     1764–1773.
 [4] M. Sundriyal, A. Kulkarni, V. Pulastya, M. S. Akhtar, T. Chakraborty, Empowering the
     fact-checkers! automatic identification of claim spans on Twitter, in: Proceedings of the
     2022 Conference on Empirical Methods in Natural Language Processing, Association for
     Computational Linguistics, Abu Dhabi, United Arab Emirates, 2022, pp. 7701–7715. URL:
     https://aclanthology.org/2022.emnlp-main.525.
 [5] T. Chakrabarty, C. Hidey, K. McKeown, IMHO fine-tuning improves claim detection, in:
     Proceedings of the 2019 Conference of the North American Chapter of the Association for
     Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short
     Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp.
     558–563. URL: https://aclanthology.org/N19-1054. doi:10.18653/v1/N19-1054.
 [6] S. Gupta, P. Singh, M. Sundriyal, M. S. Akhtar, T. Chakraborty, LESA: Linguistic encapsu-
     lation and semantic amalgamation based generalised claim detection from online content,
     in: Proceedings of the 16th Conference of the European Chapter of the Association for
     Computational Linguistics: Main Volume, Association for Computational Linguistics,
     Online, 2021, pp. 3178–3188. URL: https://www.aclweb.org/anthology/2021.eacl-main.277.
 [7] I. Jaradat, P. Gencheva, A. Barrón-Cedeño, L. Màrquez, P. Nakov, ClaimRank: Detecting
     check-worthy claims in Arabic and English, in: Proceedings of the 2018 Conference of the
     North American Chapter of the Association for Computational Linguistics: Demonstra-
     tions, Association for Computational Linguistics, New Orleans, Louisiana, 2018, pp. 26–30.
     URL: https://aclanthology.org/N18-5006. doi:10.18653/v1/N18-5006.
 [8] D. Wright, I. Augenstein, Claim check-worthiness detection as positive unlabelled learning,
     in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association
     for Computational Linguistics, Online, 2020, pp. 476–488. URL: https://aclanthology.org/
     2020.findings-emnlp.43. doi:10.18653/v1/2020.findings-emnlp.43.
 [9] S. Mittal, M. Sundriyal, P. Nakov, Lost in translation, found in spans: Identifying claims in
     multilingual social media, arXiv:2310.18205 (2023).
[10] M. Sundriyal, T. Chakraborty, P. Nakov, From chaos to clarity: Claim normalization to
     empower fact-checking, arXiv:2310.14338 (2023).
[11] S. Zhi, Y. Sun, J. Liu, C. Zhang, J. Han, Claimverif: A real-time claim verification system
     using the web and fact databases, in: Proceedings of the 2017 ACM on Conference
     on Information and Knowledge Management, CIKM ’17, Association for Computing
     Machinery, New York, NY, USA, 2017, p. 2555–2558. URL: https://doi.org/10.1145/3132847.
     3133182. doi:10.1145/3132847.3133182.
[12] A. Hanselowski, H. Zhang, Z. Li, D. Sorokin, B. Schiller, C. Schulz, I. Gurevych, UKP-
     athene: Multi-sentence textual entailment for claim verification, in: Proceedings of the First
     Workshop on Fact Extraction and VERification (FEVER), Association for Computational
     Linguistics, Brussels, Belgium, 2018, pp. 103–108. URL: https://aclanthology.org/W18-5516.
     doi:10.18653/v1/W18-5516.
[13] A. Soleimani, C. Monz, M. Worring, Bert for evidence retrieval and claim verification,
     Advances in Information Retrieval 12036 (2020) 359.
[14] M. Sundriyal, G. Malhotra, M. S. Akhtar, S. Sengupta, A. Fano, T. Chakraborty, Document
     retrieval and claim verification to mitigate covid-19 misinformation, in: Proceedings of the
     Workshop on Combating Online Hostile Posts in Regional Languages during Emergency
     Situations, 2022, pp. 66–74.
[15] E. M. Bender, J. T. Morgan, M. Oxley, M. Zachry, B. Hutchinson, A. Marin, B. Zhang,
     M. Ostendorf, Annotating social acts: Authority claims and alignment moves in Wikipedia
     talk pages, in: Proceedings of the Workshop on Language in Social Media (LSM 2011),
     Association for Computational Linguistics, Portland, Oregon, 2011, pp. 48–57. URL: https:
     //www.aclweb.org/anthology/W11-0707.
[16] S. Rosenthal, K. McKeown, Detecting opinionated claims in online discussions, in:
     Proceedings of the 2012 IEEE Sixth International Conference on Semantic Computing, ICSC
     ’12, IEEE Computer Society, USA, 2012, p. 30–37. URL: https://doi.org/10.1109/ICSC.2012.59.
     doi:10.1109/ICSC.2012.59.
[17] R. Levy, Y. Bilu, D. Hershcovich, E. Aharoni, N. Slonim, Context dependent claim detection,
     in: Proceedings of COLING 2014, the 25th International Conference on Computational
     Linguistics: Technical Papers, 2014, pp. 1489–1500.
[18] M. Lippi, P. Torroni, Context-independent claim detection for argument mining, in:
     Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015, pp. 185–191.
[19] R. Levy, S. Gretz, B. Sznajder, S. Hummel, R. Aharonov, N. Slonim, Unsupervised corpus–
     wide claim detection, in: Proceedings of the 4th Workshop on Argument Mining, 2017, pp.
     79–84.
[20] J. Daxenberger, S. Eger, I. Habernal, C. Stab, I. Gurevych, What is the essence of a claim?
     cross-domain claim identification, in: Proceedings of the 2017 Conference on Empirical
     Methods in Natural Language Processing, Association for Computational Linguistics,
     Copenhagen, Denmark, 2017, pp. 2055–2066. URL: https://aclanthology.org/D17-1218.
     doi:10.18653/v1/D17-1218.
[21] A. Barrón-Cedeno, T. Elsayed, P. Nakov, G. Da San Martino, M. Hasanain, R. Suwaileh,
     F. Haouari, Checkthat! at clef 2020: Enabling the automatic identification and verification
     of claims in social media, in: European Conference on Information Retrieval, Springer,
     Nature Publishing Group, 2020, pp. 499–507.
[22] E. Williams, P. Rodrigues, V. Novak, Accenture at checkthat! 2020: If you say so: Post-hoc
     fact-checking of claims using transformer-based models, arXiv:2009.02431 (2020).
[23] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv:1907.11692
     (2019).
[24] A. Nikolov, G. D. S. Martino, I. Koychev, P. Nakov, Team alex at clef checkthat! 2020:
     Identifying check-worthy tweets with transformer models, arXiv:2009.02931 (2020).
     arXiv:2009.02931.
[25] O. Zaidan, J. Eisner, C. Piatko, Using “annotator rationales” to improve machine learning
     for text categorization, in: Human language technologies 2007: The conference of the
     North American chapter of the association for computational linguistics; proceedings of
     the main conference, 2007, pp. 260–267.
[26] D. Trautmann, J. Daxenberger, C. Stab, H. Schütze, I. Gurevych, Fine-grained argument
     unit recognition and classification, in: Proceedings of the AAAI Conference on Artificial
     Intelligence, volume 34, 2020, pp. 9048–9056. doi:https://doi.org/10.1609/aaai.
     v34i05.6438.
[27] J. Pavlopoulos, J. Sorensen, L. Laugier, I. Androutsopoulos, SemEval-2021 task 5: Toxic
     spans detection, in: Proceedings of the 15th International Workshop on Semantic Evalua-
     tion (SemEval-2021), Association for Computational Linguistics, Online, 2021, pp. 59–69.
     URL: https://aclanthology.org/2021.semeval-1.6. doi:10.18653/v1/2021.semeval-1.
     6.
[28] G. Da San Martino, A. Barrón-Cedeño, H. Wachsmuth, R. Petrov, P. Nakov, SemEval-
     2020 task 11: Detection of propaganda techniques in news articles, in: Proceedings of the
     Fourteenth Workshop on Semantic Evaluation, International Committee for Computational
     Linguistics, Barcelona (online), 2020, pp. 1377–1414. URL: https://aclanthology.org/2020.
     semeval-1.186.
[29] G. Chhablani, A. Sharma, H. Pandey, Y. Bhartia, S. Suthaharan, NLRG at SemEval-2021 task
     5: Toxic spans detection leveraging BERT-based token classification and span prediction
     techniques, in: Proceedings of the 15th International Workshop on Semantic Evaluation
     (SemEval-2021), Association for Computational Linguistics, Online, 2021, pp. 233–242. URL:
     https://aclanthology.org/2021.semeval-1.27. doi:10.18653/v1/2021.semeval-1.27.
[30] S. Coope, T. Farghly, D. Gerz, I. Vulić, M. Henderson, Span-ConveRT: Few-shot span
     extraction for dialog with pretrained conversational representations, in: Proceedings of
     the 58th Annual Meeting of the Association for Computational Linguistics, Association
     for Computational Linguistics, Online, 2020, pp. 107–121. URL: https://aclanthology.org/
     2020.acl-main.11. doi:10.18653/v1/2020.acl-main.11.
[31] J. Rusert, Nlp_uiowa at semeval-2021 task 5: Transferring toxic sets to tag toxic spans, in:
     Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021),
     2021, pp. 881–887.
[32] R. Palliser-Sans, A. Rial-Farràs, Hle-upc at semeval-2021 task 5: Multi-depth distilbert for
     toxic spans detection, arXiv:2104.00639 (2021).
[33] K. Pluciński, H. Klimczak, Ghost at semeval-2021 task 5: Is explanation all you need?, in:
     Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021),
     2021, pp. 852–859.
[34] Q. Zhu, Z. Lin, Y. Zhang, J. Sun, X. Li, Q. Lin, Y. Dang, R. Xu, HITSZ-HLT at SemEval-
     2021 task 5: Ensemble sequence labeling and span boundary detection for toxic span
     detection, in: Proceedings of the 15th International Workshop on Semantic Evaluation
     (SemEval-2021), Online, 2021.
[35] V. A. Nguyen, T. M. Nguyen, H. Q. Dao, Q. H. Pham, S-nlp at semeval-2021 task 5: An
     analysis of dual networks for sequence tagging, in: Proceedings of the 15th International
     Workshop on Semantic Evaluation (SemEval-2021), 2021, pp. 888–897.
[36] A. Wührl, R. Klinger, Claim detection in biomedical Twitter posts, in: Proceedings of
     the 20th Workshop on Biomedical Language Processing, Association for Computational
     Linguistics, Online, 2021, pp. 131–142. URL: https://aclanthology.org/2021.bionlp-1.15.
     doi:10.18653/v1/2021.bionlp-1.15.
[37] T. Goudas, C. Louizos, G. Petasis, V. Karkaletsis, Argument extraction from news, blogs,
     and social media, in: Hellenic Conference on Artificial Intelligence, Springer, 2014, pp.
     287–299.
[38] C. Sardianos, I. M. Katakis, G. Petasis, V. Karkaletsis, Argument extraction from news, in:
     Proceedings of the 2nd Workshop on Argumentation Mining, 2015, pp. 56–66.
[39] I. Habernal, I. Gurevych, Argumentation mining in user-generated web discourse, Com-
     putational Linguistics 43 (2017) 125–179.
[40] L. Ramshaw, M. Marcus, Text chunking using transformation-based learning, in: Third
     Workshop on Very Large Corpora, 1995. URL: https://aclanthology.org/W95-0107.
[41] A. Pritzkau1, J. Waldmüller, O. Blanc, M. Geierhos, U. Schade, Current language models’
     poor performance on pragmatic aspects of natural language, in: Proceedings of the CEUR
     Workshop Proceedings, Goa, India, CEUR, 2023.
[42] J. D. Lafferty, A. McCallum, F. C. N. Pereira, Conditional random fields: Probabilistic
     models for segmenting and labeling sequence data, in: Proceedings of the Eighteenth
     International Conference on Machine Learning, ICML ’01, Morgan Kaufmann Publishers
     Inc., San Francisco, CA, USA, 2001, p. 282–289. URL: https://openreview.net/forum?id=
     HkbzGjZOZB.
[43] M. Sullivan, N. Madani, S. Saha, R. Srihari, Positional transformers for claim span identifi-
     cation, in: Proceedings of the CEUR Workshop Proceedings, Goa, India, CEUR, 2023.
[44] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv:1810.04805 (2018).