<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SimBa at CheckThat! 2022: Lexical and Semantic Similarity Based Detection of Verified Claims in an Unsupervised and Supervised Way</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alica Hövelmeyer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katarina Boland</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Dietze</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>GESIS - Leibniz Institute for the Social Sciences</institution>
          ,
          <addr-line>Unter Sachsenhausen 6-8, 50667 Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Heinrich-Heine-Universität Düsseldorf (HHU)</institution>
          ,
          <addr-line>Universitätsstraße 1, 40225 Düsseldorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Starlink - here. Thanks, @elonmusk pic.twitter.com/dZbaYqWYCf - Mykhailo Fedorov</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>One step in many automated fact-checking pipelines is verified claim retrieval, i.e. checking whether a claim has been fact-checked before. We approach this task as a semantic textual similarity problem. For this, we examine the extent to which an input claim and a verified claim are similar at semantic, textual, lexical and referential levels using a variety of NLP tools. We rank similar pairs based on these features using a supervised and an unsupervised model. We participate in two subtasks and compare our results for subtask 2A: detecting previously fact-checked claims from tweets and subtask 2B: detecting previously fact-checked claims in political debates for English data. We find that the combination of semantic and lexical similarity features performs best in finding relevant claim pairs for both subtasks. Furthermore, our unsupervised method is on par with the supervised one and seems to generalize well over similar tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;fact-checking</kwd>
        <kwd>STS</kwd>
        <kwd>semantic similarity</kwd>
        <kwd>lexical similarity</kwd>
        <kwd>sentence embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>our approach is that we compare a supervised and an unsupervised method to rank the given
data by similarity and are able to propose an unsupervised method that is on par with supervised
approaches. Furthermore, there is evidence that our unsupervised method generalizes well over
similar tasks. The code for both subtasks is available on github1.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        This submission is part of the 5th edition of the CheckThat! Lab. Previous editions, also held in
conjunction with the Conference and Labs of the Evaluation Forum (CLEF) that also featured
the task of Detecting Previously Fact-Checked Claims / Claim Retrieval, took place in 2020[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
and 2021[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The approaches proposed by the participants are similar to ours in various aspects.
      </p>
      <p>
        For the lab in 2020 the data to be processed exclusively consisted of tweets as input claims.
Many of the participants used pre-processing and cleaned the tweets, removing tweet-specific
characters like hashtags [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Some teams solely made use of lexical and string
similarity features[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], whereas other teams used pre-trained language models to evaluate
semantic similarity. These teams fine-tuned RoBERTa[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or used Sentence-BERT [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] or Universal Sentence Encoder[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] in order to calculate the distances between sentence
embeddings. Diferent variations of Blocking-techniques were also used[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Similar to
our approach, some teams combined lexical and semantic similarity features [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        In 2021 all teams made use of the sentence embedding model Sentence-BERT. Team
NLytics[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] ofered an unsupervised approach based on the distances of sentence
embeddings gained using Sentence-BERT. This approach performed well for only one of the proposed
subtasks.
      </p>
      <p>
        Team DIPS[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and Team Aschern[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] made use of the combination of a semantic similarity
feature (also gained using the sentence embedding model Sentence-BERT) and a string (BM25
by Team DIPS) or lexical (TF.IDF by Team Aschern) similarity feature. Diferent from us, they
only presented supervised approaches to rank the data based on these features.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Task Definition</title>
      <sec id="sec-3-1">
        <title>3.1. Detection of previously fact-checked claims</title>
        <p>
          One of the tasks that arise in the broader context of automated fact-checking is to check whether
a claim has been fact-checked before. This can be considered the second step of a claim retrieval
and verification pipeline, after the detection of check-worthy claims in diferent kinds of textual
utterances and before the verification of those claims. This is addressed by task 2 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. More
precisely, the task is to rank the most relevant verified claims out of a collection of already
verified claims for a given input claim.
        </p>
        <p>
          1https://github.com/Alihoe/CLEFCheckThat2aSimBa, https://github.com/Alihoe/CLEFCheckThat2bSimBa
3.2. Data
The subtasks cover two diferent types of media that are used to disseminate claims. Subtask A
deals with tweets, subtask B with political debates and speeches. Both types of text sequences
containing claim utterances will simply be referred to as input claims in the following. For both
tasks diferent kinds of already fact-checked claims are made available. These will be called
verified claims .[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
        </p>
        <p>Both input claims and verified claims consist of one or a few coherent sentences.</p>
        <p>The input claims of subtask A are given as strings, divided into a training dataset of 1167
input claims, a development test dataset of 201 input claims and a final test dataset of 209 input
claims. A human-annotated mapping from every input claim to the most relevant verified claim
(query relevance or qrels-file) constitutes the gold standard. Verified claims are crawled from
the fact-checking website Snopes and are provided in JSON format containing title, subtitle,
author, date and a vclaim-entry with the content of the claim.</p>
        <p>The input claims of subtask B are also provided as strings, divided into a training dataset
of 702 input claims, a development test dataset of 79 input claims and a final test dataset of
65 input claims. Here, a human-annotated mapping from every input claim to one or more
relevant verified claims is given in addition to the training data and as a gold standard for the
test data. Furthermore, transcripts of the debates or speeches the input claims are obtained
from are given for the test data. 19250 verified claims are taken from the fact-checking website
PolitiFact and made available in JSON format containing the entries vclaim_id, vclaim, date,
truth_label, speaker, url, title and text.</p>
        <p>The mappings of input claims to verified claims will be referred to as input-ver-claim pairs.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Similarity-Based Features</title>
      <sec id="sec-4-1">
        <title>4.1. Semantic Similarity</title>
        <p>
          The task is formulated as a ranking-problem, where input-ver-claim-pairs are ranked depending
on the relevance of the verified claim for fact-checking the input claim. Thus, the task can be
considered a semantic textual similarity problem (STS) where sentences are compared by their
semantic content to rank sentences containing similar claims highest (cf. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]).
        </p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Sentence Embeddings</title>
          <p>
            One promising way to deal with STS-problems is the usage of sentence embeddings. Sentence
embeddings are fixed-sized vector representations that capture the meaning of sentences in so
far that embeddings of semantically similar sentences are close in the corresponding vector space
(cf. [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]). Sentence embedding models are usually trained on a huge amount of natural language
data or rely on models that are trained on such. Thus they reflect the empirical distribution
of linguistic elements and can be viewed as an appropriate method to investigate semantic
similarity. That’s because relying on the distributional hypothesis, "there is a correlation
between distributional similarity and meaning similarity"[
            <xref ref-type="bibr" rid="ref19">19</xref>
            ].
          </p>
          <p>
            The usefulness of the application of sentence embeddings has already been demonstrated by
the participants of last year’s lab. The sentence embedding model Sentence-BERT [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] was used
by the top-ranked teams of both subtask A and subtask B [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. Therefore, we use them as
starting points for diferent components of our application.
          </p>
          <p>
            Sentence-BERT (SBERT) is a modification of the transformer-based pre-trained language
models BERT [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] or RoBERTa[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] using a Siamese network structure. The language models
are trained on natural language inference (NLI) data and a pooling operation is added to their
outputs in order to derive fixed-sized vector representations of the input sentences.
          </p>
          <p>
            The idea of training on NLI data in a supervised way in order to get meaningful sentence
embeddings was introduced by the authors of the sentence embedding model InferSent[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]
(InferSent). However they did not build their model upon a tranformer-based language model,
but on an encoder based on a bi-directional LSTM architecture fed with pre-trained word
embeddings (GloVe[
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] or fastText[
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]).
          </p>
          <p>
            Similarly, the model Universal Sentence Encoder[
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] (UniversalSE) averages together word
and bi-gram level embeddings, passes the representations through a feed-forward deep neural
network (DNN) and is trained on NLI data.
          </p>
          <p>
            The authors of SimCSE[
            <xref ref-type="bibr" rid="ref23">23</xref>
            ](SimCSE) also train their model on NLI data, but within a
contrastive learning framework. Otherwise their model is similar to Sentence-BERT, relying on the
pre-trained language models BERT and RoBERTa and adding a pooling operation to one of their
output layers.
          </p>
          <p>All sentence embedding models are also able encode small paragraphs instead of just
sentences.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Measuring Semantic Similarity Using Sentence Embeddings</title>
          <p>
            For all of these sentence embeddings methods, there are pre-trained models available that can
be used out of the box. For Sentence-BERT we used sentence-transformers/all-mpnet-base-v2,
because it performs best for STS tasks compared to the other pretrained models2. For InferSent
we experimented with both versions, but report here only on the results obtained using version
2, which works with fastText[
            <xref ref-type="bibr" rid="ref22">22</xref>
            ], because we got better results than using the GloVe-vocabulary
in pre-liminary experiments. For Universal Sentence Encoder we used TF2.0 Saved Model (v4)3,
because this is the most widely used model available for Universal Sentence Encoder and for
SimCSE we used princeton-nlp/sup-simcse-roberta-large4, because this also performs best for
STS tasks compared to the other pretrained models 5.
          </p>
          <p>Since sentence embeddings are vector representations of sentences within the same vector
space, their similarity can be measured applying cosine similarity (CosSim), resulting in
similarity scores which are rational numbers ∈ [-100, 100]. These similarity scores should be
referred to as SentEmb.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Other Measures of Similarity</title>
        <p>In the following other measures of similarity are presented. An overview of their corresponding
metrics can be found in Table 1.</p>
        <p>2https://www.sbert.net/docs/pretrained_models.html
3https://tfhub.dev/google/universal-sentence-encoder/4
4https://huggingface.co/princeton-nlp/sup-simcse-roberta-large
5https://github.com/princeton-nlp/SimCSE</p>
        <p>LevDist
StringSim
SimCount
SimRatio
SimCount
SimRatio
SimCount
SimRatio</p>
        <p>Feature
SBERT</p>
        <p>InferSent
UniversalSE</p>
        <p>SimCSE
LevDist
SeqMat
JaccChar</p>
        <p>JaccTok
WordCount</p>
        <p>WordRatio
WordTokRatio</p>
        <p>SynCount</p>
        <p>SynRatio
SynTokRatio</p>
        <p>NE</p>
        <p>NERatio
NeTokRatio</p>
        <p>
          Metric
∈ [-100, 100]
∈ − Z
∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ]
∈ N
∈ [0, 100]
∈ N
∈ [0, 100]
∈ N
∈ [0, 100]
        </p>
        <sec id="sec-4-2-1">
          <title>4.2.1. String Similarity</title>
          <p>In addition to the study of semantic similarity using sentence embeddings, there are other ways
in which the similarity of sentences can be measured.</p>
          <p>The most naive approach to measure the similarity of two sentences is to compare them at
the string level, i.e. to see how far the characters and strings that make up a sentence difer
from those of other sentences. We used three diferent methods to measure the string similarity
of sentences: Levenshtein Distance, Jaccard Distance and Sequence Matching.</p>
          <p>Levenshtein Distance (LevDist) is a metric to measure the distance between two strings by
counting the number of operations (insertions, deletions or substitutions) needed to change
one string into the other. Sentences which are similar thus have a small Levenshtein Distance. In
order to adjust this distance score to the other similarity scores, such that a higher value signifies
a higher similarity, we multiplied the Levenshtein Distance by -1. In practice, we thereby get
negative three- or two-digit integers as similarity scores for almost all input-ver-claim pairs.</p>
          <p>In general, Jaccard Distance is used to measure the similarity of sets. It is computed by
dividing the size of the intersection by the size of the union of the sets. The closer this value is
to one, the more similar are the sets. In context of sentence-similarity it can be applied in two
ways: either regarding the characters (JaccChar) or the tokens (JaccTok) a sentence consists
of as elements of a set.</p>
          <p>The Sequence Matching-algorithm (SeqMat) provided by the Python library diflib works by
comparing "the longest contiguous matching subsequence that contains no ’junk’ elements"
and recursively repeating this on the remaining subsequences. Junk elements are determined
heuristically based on the frequency of their duplicates in the text sequence 6.</p>
          <p>
            Both the application of Jaccard Distance and Sequence Matching generate rational numbers
∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ]. These similarity scores will be referred to as StringSim.
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Lexical Similarity</title>
          <p>Another type of similarity, which is not clearly distinguishable from semantic and string
similarity, is lexical similarity or similarity of words. We used one method to capture lexical
similarity between sentences and simply counted how often two claims contained the same
words.</p>
          <p>
            For this, we tokenized all claims using NLTK’s word tokenizer[
            <xref ref-type="bibr" rid="ref24">24</xref>
            ], filtered out stop words
and counted how often two claims contained the same tokens (WordCount). In order to value
the number of equal tokens of shorter sentences higher than those of longer ones, we also
computed a normalized ratio. For this we divided 100 by the number of tokens of both claims
and multiplied the obtained value by two times the number of equal tokens.7 We did this both
including stop words (WordTokRatio) and not including them (WordRatio).
          </p>
          <p>Counting equal tokens we gained a positive integer similarity score, usually with less than
three digits. We call this kind of discrete score SimCount. Computing the ratios we obtained
percentages similar to the SentEmb-scores ∈ [0, 100]. This kind of scores will be referred to as
SimRatio.</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.2.3. Referential Similarity</title>
          <p>Another way to think of similarity between sentences is to examine whether they refer to the
same objects. To represent this kind of similarity we used two methods. Similar to the lexical
similarity approach, we counted how often two claims contained words which are synonyms of
each other. Additionally, we counted how often two claims contain the same named entities
(NEs).</p>
          <p>
            To compare the synonyms, we used WordNet[
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] and looked for all available synsets the
tokens mentioned in a claim are part of. We tokenized the sentences the same way as above.
Then we counted how often two claims contained the same synsets (SynCount). Here we also
computed the ratio of the count of synonyms regarding all synonyms (SynRatio) and all tokens
(SynTokRatio) in the two sentences.
          </p>
          <p>
            In order to compare NEs we used the entity-fishing system[
            <xref ref-type="bibr" rid="ref26">26</xref>
            ], which recognizes named
entities mentioned in a text and disambiguates them using Wikidata. The system is able to
return the the Wikipedia and Wikidata identifiers of those mentions. We counted how often
two claims contained named entities related to the same Wikipedia or Wikidata entry (NE). We
also additionally computed the ratio of the count of NEs regarding all NEs (NERatio) and all
tokens (NETokRatio) in the two sentences.
          </p>
          <p>Similarly to the lexical similarity scores, we obtained two diferent kinds of metrics for these
similarities: SimCount and SimRatio (see Table 1).</p>
          <p>6https://docs.python.org/3/library/diflib.html
7e.g.: If two claims consisted of ten tokens each and had ten tokens in common, we would obtain a
WordTokRatio of (100/20)*10*2 = 100. If they only had one token in common the obtained ratio would be (100/20)*1*2 =
10. If both claims consisted of 50 tokens each, the obtained ratios would be (100/100)*10*2 = 20 and (100/20)*1*2 = 2.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Pre-Processsing</title>
        <sec id="sec-4-3-1">
          <title>4.3.1. Cleaning tweets</title>
          <p>For both subtasks we experimented with diferent ways of pre-processing the input claims. We
cleaned the tweets given in subtask 2a to get rid of redundant information. We removed URLs,
@-symbols and user-information (see Table 2).</p>
          <p>Cleaned Tweet
Starlink — here. Thanks, elonmusk</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. Including context</title>
          <p>For subtask 2b, we tried incorporating the input claims’ contexts within the speech or debate
they were obtained from. We included the lines that were spoken before and after the relevant
claim and integrated information about the current speaker by prepending "speaker X said" to
the line of speech, where X is substituted by the name of the respective speaker (see Table 3).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Model</title>
      <sec id="sec-5-1">
        <title>5.1. Unsupervised Approach</title>
        <p>Contextualized Input Claim
donald trump said "And Obama would send
pillows and sheets." donald trump said "He
wouldn’t send anything else." donald trump
said "It’s the whole thing."
We tried out an unsupervised and a supervised method to utilize the information we gained
on the diferent kinds of similarity. The main idea of the unsupervised approach is to rank the
input-ver-claim pairs by the diferent similarity scores described above. Therefore a general
similarity score is computed, combining the varying metrics (see Table 1). This general score
can roughly be compared to the percentage to which two sentences are similar where two
exactly equal sentences would have a score of roughly 100. However, our way of combining the
diferent similarity scores does not ensure that the resulting score is smaller than 100. It can
sometimes be a little higher.</p>
        <p>The general similarity score is computed the following way:
• taking the mean of all SentEmb-, SimRatio- and StringSim-scores normalized to [0, 100]
• incorporating the LevDist: First the LevDist is divided by -100, which generates a positive
factor that is smaller the more similar two sentences are. Then the similarity score
obtained by computing the mean, is divided by this factor. 8
• adding the SimCount-scores to the obtained score</p>
        <p>For the output the five most similar verified claims for an input claim are computed relying
on the general similarity score.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Supervised Approach</title>
        <p>For the supervised approach we built a feature set out of the diferent similarity scores in order
to classify if a verified claim is relevant for an input claim. We experimented with diferent
methods to optimize our classification results. We used Blocking and Balancing in order to
optimize our training results. Additionally we tried out diferent Classifiers and applied Feature
Selection to further improve our output. Lastly we also made use of a heuristic based on our
supervised approach to find relevant verified claims for all input claims.</p>
        <p>To optimize the training, we used a Blocking approach. Instead of generating negative training
instances by pairing each input claim with all but the true matching verified claims in the dataset,
we computed the 50 most similar verified claims according to either of the four SentEmb scores
and generated negative training instances using only those. More specifically, we extracted 4
sets of input-ver-claim-pairs, one set for each SentEmb method, with each set containing the
50 most similar verified claims identified by this method. Then we used the union of these
sets as our final training set. We observed that all true input-ver-claims were covered. Besides
the computational advantage of a smaller training set, this way the model may better learn to
distinguish cases that are similar on the surface as all very dissimilar pairs have been filtered
out before training.</p>
        <p>Then all similarity scores, (also the SentEmb-scores) were added as features. As targets we
obtained the relevance scores from the qrels-file of the training data. An unlabeled feature set
was built for the test data.</p>
        <p>After blocking, the percentage of true positives in our training data was still beneath 1%
for both subtasks. That’s why we applied Random Undersampling as a Balancing method and
experimented with diferent parameters (see Tables 4 and 5).</p>
        <p>Then a Classifier was trained on the training data to predict relevance scores for the test data.
We also experimented with diferent classifiers suited for binary classification, such as KNN,
Logistic Regression, Linear SVC and a Decision Tree (see Tables 6, 7).</p>
        <p>We experimented with diferent selections of features out of the similarity features presented
above. The influence of the ensemble of features is shown in Tables 13 and 14. Additionally we
8e.g.: Given is a SentEmb mean of 50.0. If two sentences consist of quite similar strings, one could imagine
them having a LevDist of -50. If two sentences are not that similar, they could have a LevDist of -200. Applying the
technique described, incorporating LevDist would result in the sim score 100 for the similar sentences and 25 for the
varying sentences. This way it is not ensured that the obtained similarity score is ∈ [0, 100]. In practice, however,
the calculated values are in this range.
included the feature TokenCount which represents the sum of tokens of both input claim and
verified claim.</p>
        <p>If no relevant verified claim was predicted for an input claim, we relied on our unsupervised
approach heuristically and chose the five most similar verified claims based on the mean of
sentence embedding similarity scores. For 2A we chose SBERT, InferSent and SimCSE as SentEmb
scores, for 2B all four models, including UniversalSE.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <sec id="sec-6-1">
        <title>6.1. Evaluation Metric</title>
        <p>
          The task is considered a ranking task and is evaluated as such. The oficial ranking evaluation
measure is Mean Average Precision at 5 (MAP@5). Additionally the provided scorer computes
the measures MAP@k for k=1, 3, 5, 10, MRR and Precision@k for k= 3, 5, 10 (cf. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). The
MAP@k metric measures the mean of correctly classified pairs in the top k of the returned
output. MRR or Mean Reciprocal Rank measures how far the assigned rank of a correct pair
difers from its correct rank (i.e. the first rank for subtask A) on average.
        </p>
        <p>Positives</p>
        <p>KNN
Logistic Regression</p>
        <p>Linear SVC
Decision Tree</p>
        <p>KNN
Logistic Regression</p>
        <p>Linear SVC
Decision Tree</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Subtask 2A</title>
        <p>For Subtask 2A we got the best result with our unsupervised approach, combining the similarity
scores of SBERT, SimCSE, WordCount and WordTokRatio with a MAP@5 of 0.9175 (see Table
13).</p>
        <p>However the output we submitted made use of SBERT, SimCSE and WordCount and scored
slightly worse (0.9075) (see Table 8). We still achieved a score above the baselines utilizing a
simple and fast unsupervised ranking method.
0.322
0.189
0.190
0.190
0.187
0.141
0.011
0</p>
        <p>RR</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Subtask 2B</title>
        <p>For Subtask 2B we got the best results using a supervised approach. All similarity features were
included, except from JaccChar (see Table 14). We made use of Random Undersampling to
increase the percentage of positives in the training data (relevant input-ver-claim pairs) to 8%.
Then a Logistic Regression Classifier was trained and predicted 111 input-ver-claim pairs.
The unsupervised heuristic described above was used to find relevant verified claims for the
remaining input claims. This way the output achieved a MAP@5 of 0.4882.</p>
        <p>The output we submitted also scored slightly worse than our best result with a MAP@5 of
0.459. To generate this output we used Linear Support Vector Classification and sampled to
14% positives. The considered features were SimCSE, JaccTok, WordCount, WordRatio, SynCount
and SynRatio. This is still the best result for subtask 2B (see Table 9).</p>
        <p>RR</p>
      </sec>
      <sec id="sec-6-4">
        <title>6.4. Result of Pre-Processing</title>
        <p>It turned out that our pre-processing approach did not improve our results on the test data
for Subtask 2A (see Table 10), although it did for the development test data. This is an issue
worth investigating in future work. Tweet-specific units of text such as user-information were
removed and it showed that it would have been useful to incorporate this kind of information
for solving the task 2A. Nevertheless the pre-processing ensured that the data of both tasks was
more similar and thereby helped assessing similarity of claims in general contexts.</p>
        <p>The incorporation of context for subtask 2B also did not improve the results on the
development test data and on the final test data. That is why we used the original data for subtask
2B.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Observations</title>
      <sec id="sec-7-1">
        <title>7.1. Evaluation of Features</title>
        <sec id="sec-7-1-1">
          <title>7.1.1. Powerful Features for Subtask A and Subtask B</title>
          <p>CosSim SBERT</p>
          <p>CosSim InferSent
CosSim UniversalSE</p>
          <p>CosSim SimCSE</p>
          <p>LevDist
JaccChar
JaccTok</p>
          <p>SeqMat
WordCount</p>
          <p>WordRatio
WordTokRatio</p>
          <p>SynCount</p>
          <p>SynRatio
SynTokRatio</p>
          <p>NE</p>
          <p>NERatio
NETokRatio</p>
          <p>The observation of the results of using the supervised approach on single features (see Table
11) gives a good overview of their independent performance. As expected, the most successful
features for both subtasks are the cosine similarities of the sentence embeddings. Especially
SBERT, UniversalSE and SimCSE performed best on both task. That’s because, as explained
above, sentence embeddings are really useful to capture STS.</p>
          <p>Interestingly SBERT is the most powerful feature for Subtask 2A and SimCSE the most
powerful one for Subtask 2B. It would be worth further investigations to identify the reason for
this diference. Both models are pre-trained on a large share of the same data, so maybe the
contrastive training objective of SimCSE is partly responsible for it.</p>
          <p>Another important observation is the fact that the lexical similarity features WordCount,
WordRatio and WordTokRatio perform also really well for both tasks. This is kind of surprising,
because these features are generated in such a simple way.</p>
          <p>In contrast, the Jaccard Similarity of characters JaccChar is the weakest similarity feature.
This can be explained by the fact that the consideration of equal characters, regardless of their
order, doesn’t have much informational value for the meaning of a sentence as a whole.</p>
          <p>One interesting finding regarding the diferences between the subtasks is the varying
performance of string similarity features. The string similarity features LevDist and SeqMat are the
only features that produce a higher MAP@5 for Subtask 2B than for Subtask 2A. Looking at the
data, it is noticeable that the input claims and the verified claims provided for Subtask 2B often
share long, continuous strings (see Table 12).</p>
        </sec>
        <sec id="sec-7-1-2">
          <title>7.1.2. Feature Set</title>
          <p>One of the most intriguing observations is the fact that both the unsupervised and the supervised
approach perform best if lexical similarity is considered besides semantic similarity (see Tables
13 and 14). The SentEmb features do not seem to cover lexical similarity and their performance
benefits from the additional information contained by lexical similarity features. This is also
supported by the observation that these two types of features do not have a strong correlation
(see Tables 15 and 16).</p>
          <p>Also it can be observed that especially for subtask B it is helpful to consider the combination
of almost all similarity features in the supervised approach (see Table 14).</p>
          <p>Overall a higher number of features mostly increases the performance of the supervised
approach and decreases it for the unsupervised approach as relatively uninformative features
have a too high impact on the latter.</p>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>7.2. Supervised vs Unsupervised Approach</title>
        <p>One important observation with respect to our results is the fact that the unsupervised approach
performs nearly as good as the unsupervised approach for subtask B and even better than the
unsupervised approach for subtask A.</p>
        <p>Since the task is a ranking problem, the unsupervised approach seems to perform suficiently
well for the given task. For similar tasks with the constraint to only find pairs that are relevant
with a high certainty, the supervised approach might be more helpful.</p>
        <p>Also it is reasonable to assume that the unsupervised approach generalizes well over similar
tasks, because it is independent of the training data. This assumption is supported by the fact
that the features that produce the best outputs are almost the same for both subtask A and
Semantic Similarity
Semantic Similarity and
Lexical Similarity
Semantic Similarity,
Lexical Similarity and
Referential Similarity
Semantic Similarity,
String Similarity and
Lexical Similarity
Semantic Similarity,
String Similarity,
Lexical Similarity and
Referential Similarity
ALL
subtask B for the unsupervised approach (see Tables 13 and 14), while the supervised approach
relies on diferent features for the subtasks to produce good outputs.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>8. Future Work</title>
      <p>It would be interesting to investigate the generalizability of our approach and to check if the
assumption that the unsupervised approach generalizes better than the supervised approach is
true. Also a detailed assessment of the impact of pre-processing would be beneficial for related
works.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Conclusion</title>
      <p>We treated the task to detect previously fact-checked claims as a STS-task. To solve it, we
investigated diferent kinds of similarity measures between sentences, covering semantic, lexical
and referential similarity. We found that it is beneficial to combine semantic similarity measures
gained by calculating the distance of sentence embeddings with lexical similarity measures
gained by counting shared words. Furthermore, we found that an unsupervised approach can be
even more successful than a supervised approach for this task. Overall, our proposed approaches
provide very good results for both subtasks with a MAP@5 of 0.907 for subtask A and a MAP@5
of 0.459 for subtask B, both scoring above the baselines and even being the top-ranked output
for subtask B.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Appendix</title>
      <p>NE
SBERT
InferSent
UniversalSE
SimCSE
LevDist
JaccChar
JaccTok
SeqMat
WordCount
WordRatio
WordTokRatio
SynCount
SynRatio
SynTokRatio
NE
NERatio
NeTokRatio
-0.06
-0.59
-0.09
-0.01
1.0
-0.23
-0.03
0.57
-0.33
0.23
0.33
-0.37
-0.01
0.0
-0.07
-0.05
-0.05
SynRatio
NERatio
NETokRatio</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mubarak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2022 CheckThat! lab task 2 on detecting previously fact-checked claims</article-title>
          , in: Working Notes of CLEF 2022-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2022</year>
          , Bologna, Italy,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsayed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Suwaileh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D. S.</given-names>
            <surname>Martino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Overview of checkthat! 2020 english: Automatic identification and verification of claims in social media</article-title>
          ., in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Névéol (Eds.),
          <source>CLEF (Working Notes)</source>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_265. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Haouari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mansour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babulkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          , G. Da San Martino, T. Elsayed,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <article-title>Overview of the CLEF-2021 CheckThat! lab task 2 on detecting previously fact-checked claims in tweets and political debates</article-title>
          , in: Working Notes of CLEF 2021-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2021</year>
          , Bucharest, Romania (online),
          <year>2021</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-29.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bouziane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Perrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cluzeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mardas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sadeq</surname>
          </string-name>
          ,
          <article-title>Team buster.ai at checkthat! 2020 insights and recommendations to improve fact-checking</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <year>0001</year>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_134.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Thuma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            <surname>Motlogelwa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Leburu-Dingalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mudongo</surname>
          </string-name>
          , Ub_et at checkthat! 2020:
          <article-title>Exploring ad hoc retrieval approaches in verified claims retrieval</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <year>0001</year>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          / paper_204.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>McDonald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hampson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Leidner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Stevenson</surname>
          </string-name>
          , The university of shefield at checkthat! 2020:
          <article-title>Claim identification and verification on twitter</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <year>0001</year>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http: //ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_162.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Cheema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hakimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ewerth</surname>
          </string-name>
          , Check square at checkthat! 2020:
          <article-title>Claim detection in social media via fusion of transformer and syntactic features</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <year>0001</year>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_216. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          ,
          <year>2019</year>
          . URL: https: //arxiv.org/abs/
          <year>1907</year>
          .11692. doi:
          <volume>10</volume>
          .48550/ARXIV.
          <year>1907</year>
          .
          <volume>11692</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <year>2019</year>
          . arXiv:
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bondielli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Marcelloni</surname>
          </string-name>
          , Unipi-nle at checkthat! 2020:
          <article-title>Approaching fact checking from a sentence similarity perspective through the lens of transformers</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <year>0001</year>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          /paper_169.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>U.</given-names>
            <surname>Shukla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , Tiet at clef checkthat! 2020:
          <article-title>Verified claim retrieval</article-title>
          , in: L.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Eickhof</surname>
            ,
            <given-names>N. F.</given-names>
          </string-name>
          <year>0001</year>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          Névéol (Eds.), Working Notes of CLEF 2020 -
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Thessaloniki, Greece,
          <source>September 22-25</source>
          ,
          <year>2020</year>
          , volume
          <volume>2696</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2696</volume>
          / paper_197.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Limtiaco</surname>
          </string-name>
          ,
          <string-name>
            R. S. John,
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          , M. GuajardoCespedes, S. Yuan,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          , Universal sentence encoder, CoRR abs/
          <year>1803</year>
          .11175 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1803</year>
          .11175. arXiv:
          <year>1803</year>
          .11175.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Rico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Araujo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          ,
          <article-title>Nlpir@uned at checkthat! 2020: A preliminary approach for check-worthiness and claim retrieval tasks using neural networks and graphs</article-title>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pritzkau</surname>
          </string-name>
          , Nlytics at checkthat! 2021:
          <article-title>Detecting previously fact-checked claims by measuring semantic similarity</article-title>
          , in: Working Notes of CLEF 2021-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2021</year>
          , Bucharest, Romania (online),
          <year>2021</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-47.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mihaylova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Borisova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chemishanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hadzhitsanev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hardalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , Dips at checkthat! 2021:
          <article-title>Verified claim retrieval</article-title>
          , in: Working Notes of CLEF 2021-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2021</year>
          , Bucharest, Romania (online),
          <year>2021</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-45.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chernyavskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ilvovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , Aschern at checkthat! 2021:
          <article-title>Lambda-calculus of fact-checked claims</article-title>
          , in: Working Notes of CLEF 2021-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2021</year>
          , Bucharest, Romania (online),
          <year>2021</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-38.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Diab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gonzalez-Agirre</surname>
          </string-name>
          , SemEval
          <article-title>-2012 task 6: A pilot on semantic textual similarity</article-title>
          ,
          <source>in: *SEM 2012: The First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval</source>
          <year>2012</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , Montréal, Canada,
          <year>2012</year>
          , pp.
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
          . URL: https://aclanthology.org/S12-1051.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Conneau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <article-title>Supervised learning of universal sentence representations from natural language inference data</article-title>
          ,
          <year>2017</year>
          . URL: https://arxiv. org/abs/1705.02364. doi:
          <volume>10</volume>
          .48550/ARXIV.1705.02364.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sahlgren</surname>
          </string-name>
          ,
          <article-title>The distributional hypothesis</article-title>
          ,
          <source>The Italian Journal of Linguistics</source>
          <volume>20</volume>
          (
          <year>2008</year>
          )
          <fpage>33</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers),
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . URL: https://aclanthology.org/ N19-1423. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>N19</fpage>
          -1423.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          . URL: http://www.aclweb.org/anthology/D14-1162.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching word vectors with subword information</article-title>
          ,
          <source>arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yao</surname>
          </string-name>
          , D. Chen,
          <article-title>SimCSE: Simple contrastive learning of sentence embeddings</article-title>
          ,
          <source>in: Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Klein</surname>
          </string-name>
          , E. Loper,
          <article-title>Natural language processing with Python: analyzing text with the natural language toolkit, "</article-title>
          <string-name>
            <surname>O'Reilly Media</surname>
          </string-name>
          ,
          <source>Inc."</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C.</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          ,
          <source>WordNet: An Electronic Lexical Database, Bradford Books</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lopez</surname>
          </string-name>
          , entity-fishing, https://github.com/kermitt2/entity-fishing,
          <fpage>2016</fpage>
          -
          <lpage>2022</lpage>
          . arXiv:1:dir:cb0ba3379413db12b0018b7c3af8d0d2d864139c.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>