ClaimFinder: A Framework for Identifying Claims in
                           Microblogs

                    Wee Yong Lim                            Mong Li Lee                       Wynne Hsu
                                               Department of Computer Science
                                               National University of Singapore
                                   a0109697@u.nus, {leeml,whsu}@comp.nus.edu.sg

ABSTRACT                                                             In fact, our observation of collected tweets related to ma-
Twitter is a microblogging platform that allows users to          jor events indicate that a majority of tweets were forwarded
post public short messages. Posts shared by users pertaining      (re-tweeted) by multiple users with little or no changes to the
to real-world events or themes can provide a rich “on-the-        content of the message. Considering the minimal changes
ground” live update of the events for the benefit of everyone.    by the users, the primary motivation of these users stem
Unfortunately, the posted information may not be all cred-        from their desire to disseminate the information in the tweet.
ible and rumours can spread over this platform. Existing          Such dissemination of information would indeed serve a so-
credibility assessment work have focused on identifying fea-      cial utility if the information is true, but would otherwise be
tures for discriminating the credibility of messages at the       detrimental if the information is false or even speculative.
tweet level. However, they do not handle tweets that con-            Research in information credibility has been gaining mo-
tain multiple pieces of information, each of which may have       mentum in recent years [4, 5, 18, 10]. Figure 1 shows the
different level of credibility. In this work, we introduce the    steps involved in a credibility assessment framework. Col-
notion of a claim based on subject and predicate terms, and       lecting a set of tweets related to a major event can be done
propose a framework to identify claims from a corpus of           manually using keywords relevant to natural disaster, ter-
tweets related to some major event or theme. Specifically,        rorist or shooting incident events [10], or automatically via
we draw upon work done in open information extraction             some event detection methods e.g. TwitterMonitor [12].
to extract from tweets, tuples that comprises of subjects         These tweets are then analyzed to identify topics for subse-
and their predicate. Then we cluster these tuples to iden-        quent credibility classification [4, 5, 18]. Features used to
tify claims such that each claim refers to only one aspect        help identify suspicious tweets include sentiment [15], loca-
of the event. Tweets corresponding to the tuples in each          tion [22], message propagation characteristic [14] amongst
cluster serve as evidence supporting subsequent credibility       others.
assessment task. Extensive experiments on two real world
datasets shows the effectiveness of the proposed approach in
identifying claims.

1.   INTRODUCTION
   Communications over the web have increasingly become
user-driven where there exist multiple platforms for users to
post their messages that can be seen by the general pub-
lic. Unfortunately, there is little or no mechanisms to en-
sure the credibility of the posted messages, unlike traditional
news media. Take the popular microblogging platform Twit-
ter as an example, where users can freely post or re-post
any short messages, known as tweets, from their mobile ac-
counts. Such a platform allows for the fast dissemination of
                                                                  Figure 1: Credibility assessment framework involving tweet
first hand and repeated information. When a major event
                                                                  collection, claims identification and classification.
occurs, many tweets are generated or re-tweeted containing
messages that may be true, false or speculative.
                                                                     Methods to find topics in a corpus of tweets can be broadly
                                                                  divided into feature-based and topic modeling based ap-
                                                                  proaches. The former extract features such as keywords
                                                                  from each tweet and clusters the tweets based on these fea-
Copyright c 2016 held by author(s)/owner(s); copying permitted    tures [2]. Each cluster of tweets defines a topic. For topic
only for private and academic purposes.
Published as part of the #Microposts2016 Workshop proceedings,    modeling based approaches, a topic is represented by a word
available online as CEUR Vol-1691 (http://ceur-ws.org/Vol-1691)   distribution. The work in [23] observe that “a single tweet
#Microposts ’16 Montreal, Canada                                  is usually about a single topic” and designed a TwitterLDA
#Microposts2016, Apr 11th, 2016, Montréal, Canada.
ACM ISBN 978-1-4503-2138-9.                                       model where words in a tweet are chosen from a topic or the
DOI: 10.1145/1235                                                 background noise words.


· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
  We observe that tweets typically contain multiple claims         provides the context for the claims and we would want to
and advocate that current approaches which cluster tweets          identify the claims within the event.
based on topics is too coarse-grained to identify all the claims      Since we do not assume that a tweet contains only one
in tweets. Take for example the following tweet on the             claim, we use an Open Information Extraction (OpenIE)
Nashville flood:                                                   tool [6] to extract from each tweet, zero or more triples of
                                                                   the form (E1 , R, E2 ), where E1 and E2 are each a set of
       “Middle TN (Nashville) has been hit by a terrible           words refering to real world entities, while R is a set of
      flood. Text 90999 to make $10 donation to the                words describing the relationship between the entities E1
      REDCROSS disaster relief. #nashvilleflood”                   and E2 . Each triple is mapped to a subject-predicate tu-
                                                                   ple that has a structure similar to a claim, that is, <S, P >
This tweet has two claims: (1) Nashville has been hit by a
                                                                   where S = E1 ∪ E2 and P = R. Thus, a tweet is associated
flood, and (2) one can make a $10 donation by texting to
                                                                   with a set of subject-predicate tuples {t1 , t2 , ...}.
90999. It is important to identify both claims for subsequent
credibility assessment. This is because while the first claim is
                                                                   Problem Statement. Let D be a corpus of tweets related
likely to be true, the second claim appears highly suspicious.
                                                                   to a major event, and the ith tweet in D is mapped to a
Existing credibility assessment work that utilizes tweet-level
                                                                   set of tuples {ti1 , ti2 , ...}, 1 ≤ i ≤ |D|. Let T be the set of
features will only give a single credibility score to this tweet
                                                                   subject-predicate tuples obtained from all the tweets in D.
and does not differentiate the two claims.
                                                                   The goal is to obtain a partitioning C of the tuples in T such
   In this work, we formalize the concept of a “claim” in a
                                                                   that C identifies the most number of claims in D.
corpus of tweets related to some major event. Our goal is
to design a framework to identify the set of claims such that
                                                                      By partitioning the tuples, we obtain a soft clustering of
each claim refers to only one aspect of the event. Subse-
                                                                   the corresponding tweets since a tweet can contain more
quently, the credibility of these claims can be verified against
                                                                   than a claim. The tweets that correspond to the tuples in
official sources. Note that the credibility assessment task is
                                                                   each cluster provide evidence for the credibility assessment
beyond the scope of this work.
                                                                   of the claim.
   We draw upon work done in the field of Open Informa-
tion Extraction (IE) to extract entities in the tweets and
                                                                   Example. To provide an intuition of the tuple clustering
the relationships between these entities. Then we construct
                                                                   and claim identification process, Table 1 shows the Ope-
tuples comprising of <subject, predicate> from these en-
                                                                   nIE triples and the subject-predicate tuples obtained for 3
tities/relationships. Finally, we cluster the tuples to form
                                                                   tweets. To simplify discussion, let us cluster these tuples
claims. The tweets that correspond to the cluster of tu-
                                                                   based on the similarity of their subject words. For each
ples can be regarded as evidence supporting any subsequent
                                                                   cluster, we construct a claim by taking the union of the
credibility classification task. Extensive experiments on two
                                                                   words in S and P respectively. Table 2 shows the clusters
real-world datasets of tweets demonstrate the effectiveness
                                                                   obtained and the corresponding claims. Note that our ap-
of our proposed approach to identify meaningful claims.
                                                                   proach identifies the multiple claims contained in the tweets.
   The paper is organized as follows. Section 2 defines the
                                                                   For example, tweet 1 has two claims (c1 and c2 ), tweet 2 has
problem. Section 3 describes the proposed approach, and
                                                                   two claims (c2 and c3 ), while tweet 3 has three claims (c3 ,
Section 4 gives an incremental method to identify claims.
                                                                   c4 and c5 ).
We present experiment results in Sections 5, followed by
related work in Section 6 and conclude in Section 7.
                                                                     We will elaborate on our approach to identify claims in
                                                                   the next section.
2.   PROBLEM DEFINITION
   The objective of this work is to identify claims by group-      3.     CLAIMS IDENTIFICATION
ing the tweets related to some major event such that tweets
                                                                      Different from past tweets clustering work reviewed in Sec-
in each group refer to the same claim, of which can be true,
                                                                   tion 6, this work focuses on claim identification by clustering
false, speculative, conversational or simply spam in nature.
                                                                   tuples mapped from OpenIE extractions of the tweets. We
We introduce the concept of a claim as follows:
                                                                   propose a 3-step ClaimF inder method (see Algorithm 1)
                                                                   which comprises of:
Definition 1. A claim is the assertion of a subject and the
corresponding predicate expression for the subject. It has              1. Preprocessing. We preprocess each tweet to remove
the structure (S, P ), where S is the set of words that refer              known noise and tokenize the sentences prior to ap-
to the same subject, P is the set of words that express the                plying the OpenIE process.
same predicate on S.
                                                                        2. Subject-predicate tuple extraction. We use the state-
   The set of words that refer to the same subject/predicate               of-the-art OpenIE technique, ClausIE [6] to extract
is very much context dependent. For example, in a corpus                   basic semantic units of information from the content
of tweets on the missing flight MH370 incident, the words                  of each tweet. Each extraction is mapped to a subject-
“plane” and “MH370 aircraft” are likely to reflect the same                predicate tuple <S, P >.
subject whilst this may not be true in other context involving
                                                                        3. Clustering subject-predicate tuples. We define a sim-
multiple planes such as news reports on manoeuvres between
                                                                           ilarity measure to compute the distance between the
military planes1 . Here, we assume that the major event
                                                                           <S, P > tuples. Then we can utilize methods such as
1                                                                          agglomerative or spectral clustering [16] to cluster the
  http://edition.cnn.com/2014/08/22/world/asia/us-china-
air-encounter/                                                             tuples. Each cluster of tuples form a claim.


                                                                                                                                  14
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
              Tweet Content                                   Open IE Triples                   Subject-Predicate Tuples
         1    MAS CEO confirms SAR ops and says               (mas ceo, confirm, sar ops)       <{mas,ceo,sar,ops}, {confirm}>
              airline is working to verify speculation that   (mh370, land, nanning)            <{mh370,nanning}, {land}>
              the mh370 may have landed in Nanning.
         2    MH370 landing safely in Nanming is pure         (mh370, land, nanming)            <{mh370,nanming}, {land}>
              speculation. No distress signal or call was     (distress signal call, receive)   <{distress,signal,call}, {receive}>
              received at all
         3    So you want me to believe that mh370 has        (mh370, crash, water)             <{mh370,water}, {crash}>
              crashed in water, Aussies found debris but      (aussie, found, debris)           <{aussie,debris}, {found}>
              still no signals captured                       (signal, capture)                 <{signal}, {capture}>
                                   Table 1: Subject-predicate tuples obtained from sample tweets.

              Cluster of tuples                        Claim                                          Description
         c1   { <{mas,ceo,sar,ops},{confirm}> }        ({mas,ceo,sar,ops}, {confirm})                 MAS CEO confirms SAR ops
         c2   { <{mh370,nanning}, {land}>,             ({mh370,nanning,nanming}, {land})              MH370 has landed in
              <{mh370,nanming}, {land}> }                                                             Nanning/Nanming
         c3   { <{distress,signal,call}, {receive}>,   ({distress,signal,call}, {receive,capture})    Signal received/captured
              <{signal}, {capture}> }
         c4   { <{mh370,water}, {crash}> }             ({mh370,water}, {crash})                       MH370 crashed in water
         c5   { <{aussie,debris}, {found}> }           ({aussie,debris}, {found})                     Australia found debris
                                     Table 2: Claims obtained by clustering the tuples in Table 1.


Algorithm 1 ClaimF inder                                                   3.3      Clustering Subject-Predicate Tuples
Input: corpus D of tweets; number of clusters N                               At this juncture, we have obtained a set T of subject-
Output: set C of clusters of tuples                                        predicate tuples from the original corpus of tweets D. We
 1: T = ∅ // initialise set of tuples                                      use the popular Porter Stemmer [17] to stem the words in
 2: for twt ∈ D do                                                         S and P , and filter the most frequent and infrequent words
 3:    F = OpenIE(P reprocess(twt))                                        from the tuples.
 4:    for triple (E1 , R, E2 ) ∈ F do                                        We define the similarity between each pair of subject-
 5:       T ← T ∪ {< (E1 ∪ E2 ), R >}                                      predicate tuples ti = <Si , Pi > and tj = <Sj , Pj > as follows:
 6:    end for                                                                                                                         
 7: end for                                                                                           |Si ∩ Sj |             |Pi ∩ Pj |
                                                                           similarity(ti , tj ) = w ·            + (1 − w) ·
 8: C ← Cluster(T , N ) // cluster the tuples                                                         |Si ∪ Sj |             |Pi ∪ Pj |
 9: return C                                                                                                                             (1)
                                                                           where w is a weight, 0 ≤ w ≤ 1, which is empirically de-
                                                                           termined. Note that this similarity metric is based on the
We describe each step in the following subsections.                        Jaccard index between sets from the respective tuples. This
                                                                           allows tuples comparison operations to be approximated and
3.1     Preprocessing                                                      scaled up (see Section 4).
                                                                              We can now apply existing clustering techniques to clus-
   This phase corresponds to the function Preprocess in
                                                                           ter the tuples in T . Here, we choose two commonly used
Algorithm 1 line 3. We preprocess each tweet via a series
                                                                           methods, namely, agglomerative or spectral clustering in our
of data cleaning operations to reduce the noise that may af-
                                                                           evaluation. Agglomerative clustering is a bottom-up hierar-
fect subsequent OpenIE extraction. These include removing
                                                                           chical clustering approach, which initializes each subject-
“rt” keywords (which indicate retweet message), URLs, user
                                                                           predicate tuple as a cluster by itself and successively merge
mentions, emoticons, colons, quote marks and hashtags’ “#”
                                                                           the most similar pair of clusters at each step, till the spec-
signs. The tweet content is tokenized using the twokenizer
                                                                           ified number of clusters have been generated. Each cluster
tool designed for Twitter content 2
                                                                           c is represented by a tuple tc which is formed by taking the
3.2     Subject-Predicate Tuple Extraction                                 union of the respective S and P terms of the tuples in the
                                                                           cluster, that is,
   After preprocessing the tweets, each sentence is subse-
quently fed to an OpenIE tool to generate a list of relation                tc =< {S1 ∪ · · · ∪ Sn }, {P1 ∪ · · · ∪ Pn } > ∀ < Si , Pi >∈ c
triples. This step corresponds to the OpenIE function call
in Algorithm 1 Line 3.                                                        On the other hand, spectral clustering takes in a similar-
   We chose to use ClausIE, the state-of-the-art OpenIE tech-              ity matrix between all pairs of tuples and construct a Lapla-
nique in this work. ClausIE takes as input each sentence in a              cian matrix. Then it performs an Eigen decomposition to
tweet and identifies the entities E1 and E2 , as well as their             obtain the top m eigenvectors, effectively reducing the di-
relationship R. The output is a triple (E1 , R, E2 ). Then                 mensionality to m. Finally, we use k-means to cluster these
each triple (E1 , R, E2 ) is mapped to a subject-predicate tu-             eigenvectors to obtain the desired clusters.
ple (Algorithm 1 Lines 4-5).                                                  The output of ClaimF inder is a set C of tuple clusters.
                                                                           This corresponds to Lines 8-9 in Algorithm 1. Each cluster
2
    http://www.cs.cmu.edu/˜ark/TweetNLP/                                   corresponds to a claim. For each tuple in the cluster, we


                                                                                                                                          15
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
can retrieve the corresponding tweets from which the tuple       Algorithm 2 ClaimF inderIN C
is derived. This forms a grouping of the tweets that can         Input: incoming tweet twt; split threshold thres
provide evidence to verify the credibility of the claim. Note    Output: set of buckets B = {b1 , b2 , ...}
that a tweet can belong to more than one grouping as it may       1: F = OpenIE(P reprocess(twt))
contain multiple claims.                                          2: for triple ∈ F do
                                                                  3:     extract < S, P > tuple from triple
4.    INCREMENTAL APPROACH                                        4:     i = LSH(M inHash(< S, P >))
  Considering the streaming nature of the tweets, especially      5:     bi ← bi ∪ {< S, P >}
for ongoing controversial major events rife with the propa-       6: end for
gation of rumours, we also propose an incremental approach        7: if |bi | ≥ thres then
to quickly identify claims from incoming tweets. Algorithm        8:     Split(bi ) into c1 and c2
2 gives the details of the ClaimF inderIN C method.               9:     Let tc1 and tc2 be the representative tuples
  Each incoming tweet is preprocessed and the tuples con-                        of c1 and c2 respectively
structed as described in Sections 3.1 and 3.2. We create a       10:     Initialize bi = ∅
set of empty buckets and assign a tuple to the bucket de-        11:     j = LSH(minHash(tc1 ))
termined by a Locality Sensitive Hashing (LSH) function          12:     bj ← bj ∪ {c1 }
with MinHash (lines 2-6 of Algorithm 2). LSH allows us to        13:     k = LSH(minHash(tc2 ))
quickly estimate the similarity between the set of subject       14:     bk ← bk ∪ {c2 }
and predicate words in the tuple and those in the bucket.        15: end if
  Let us first consider the subject term S in a tuple t. Since
S is an arbitrary sized set of words, we choose its top n
most frequent corpus words and apply m hash functions to               Stanford POS tagger using a trained model for tweets
this set of words S 0 . For each hash function hi , we obtain          [7] is used to identify these keywords.
the minimum hash value among the n words, denoted by
min(hi (S 0 )). With this, we form a vector                        • ngrams: set of n consecutive words in the tweet, ig-
                                                                     noring stop words. We use n = 3 as it has been shown
                          0                      0
             ( min(h1 (S )), · · · , min(hm (S )) )                  to best capture the semantics in a tweet [1] generating
                                                                     7,691 ngrams for the MH370 dataset and 3,998 ngrams
Similarly, we form a second vector based on the predicate
                                                                     for the Castillo dataset. Note that the similarity be-
term P as
                                                                     tween a pair of ngrams is based on the Jaccard index
             ( min(h1 (P 0 )), · · · , min(hm (P 0 )) )              (like Equation 1) rather than the fraction of overlap-
                                                                     ping tweets that contains both ngrams used in [1].
where P 0 is the set of top n most frequent words in P . These
two vectors form the MinHash signature of a tuple.               5.1    Datasets
   Next, we apply LSH on the MinHash signatures. Tuples
with similar subject and predicate terms will be hashed to        We try to identify the claims in the two real world datasets:
the same bucket. This is because if there exist some word
                                                                   • MH370 Dataset. We crawled and collected tweets
that is present in both sets Si and Sj , then min(h(Si )) =
                                                                     on the crash of Malaysian Airline MH370 in 2014 for
min(h(Sj )). This eliminates the need for performing pair-
                                                                     our experiments. This event involve the mysterious
wise similarity computation between a tuple from an incom-
                                                                     disappearance of a Boeing 777 plane en route from
ing tweet and each cluster. The corresponding tuples whose
                                                                     Kuala Lumpur to Beijing on 8 March 2014. Perceived
MinHash signatures have been mapped to the same bucket
                                                                     mishandling of the public communication of the situa-
are subsequently merged into a cluster by taking the union
                                                                     tion created an unfortunate conducive environment for
of their S and P terms respectively.
                                                                     the proliferation of various rumours related to MH370
   Our incremental approach provides a mechanism to re-
                                                                     with sustained public interest in the status of the flight
adjust the clusters should the size of a cluster increases be-
                                                                     and the cause of the disappearance. Such rumours
yond some threshold (lines 7-15 of Algorithm 2). This is
                                                                     range from the absurd such as alien abduction to more
achieved by treating the cluster as a mini-corpus to be fur-
                                                                     plausible ones such as the plane’s safe landing in China
ther partitioned via standard clustering methods based on
                                                                     during the early stage of the crisis. The location of
the similarity measure defined in Equation 1. After the ad-
                                                                     the plane and cause of the disappearance remains un-
justment, a merging operation may be applied to re-group
                                                                     known today. The tweet corpus was collected using the
clusters to specified number of clusters.
                                                                     keyword “MH370” via Twitter’s REST API. In total,
                                                                     510,433 tweets from 8 March to 9 April were collected.
5.    PERFORMANCE STUDIES
                                                                       We extracted a subset of tweets from the MH370 dataset
  We implement the proposed algorithms ClaimF inder and
                                                                       using keywords of 6 known rumour and credible claims.
ClaimF inderIN C in Python, and carry out experiments on
                                                                       Overall, 3,764 tweets have been identified and manu-
a 2.3 GHz CPU with 8 GB RAM running on Ubuntu 14.04.
                                                                       ally labeled with the corresponding claims. Table 3
  Our concept of claims is based on subject-predicate tu-
                                                                       gives the details. These claims form the ground truth.
ples. We also compare with the following representations:
     • tweet: full text of the tweet                               • Castillo Dataset. We also obtain a subset of tweets
                                                                     with specific claims from 6 annotated topics in the
     • keywords: a bag-of-words containing nouns, verbs,             Castillo dataset [5]. Table 4 shows 6 claims pertain-
       hashtags and cardinal numbers present in a tweet. The         ing to President Obama. There are altogether 1,336


                                                                                                                             16
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
 Claim    Description                       #tweets #unique         The set Cmatch contains the ground truth claims that have
                                                    tweets        been covered by some cluster in C.
 M1       MH370 landed in Nanning           1393    271
 M2       Pilot commit suicide              312     242           5.3     Performance of ClaimFinder
 M3       Plane change course               203     78               We have two versions of ClaimF inder depending on the
 M4       MH370 off course                  1070    207           clustering technique used. ClaimF inder(Agglomerative)
 M5       Alien abduct MH370                538     398           implements the bottom-up agglomerative clustering in Line
 M6       MH370 sighted in Maldives         248     50            8 of Algorithm 1, while ClaimF inder(Spectral) utilizes spec-
                                                                  tral clustering.
      Table 3: Groundtruth claims in MH370 dataset.                  We run an initial set of experiments on each of the datasets
                                                                  to find the optimal settings for the parameters to achieve
 Claim    Description                       #tweets #unique       the best coverage results in Figures 2, 3 for ClaimF inder.
                                                    tweets        These parameters are the input number of clusters N and
 T269     President Obama visiting the      168     85            the weight w in Equation 1 that controls the relative impor-
          Gulf of Mexico                                          tance of the S and P terms when computing the similarity
 T876     President Obama sending           466     283           scores between tuples. For the MH370 dataset, we have N =
          troops to the US-Mexico                                 18 and w = 0.6, whereas for the Castillo dataset, N = 6 and
          border
                                                                  w = 0.8. In addition, words less than 3% or more than 30%
 T1494    President     Obama      prais-   48      39            of the number of tweets are filtered prior to clustering the
          ing/hailing lawmakers for a
          bill                                                    MH370 dataset. For the smaller Castillo dataset, a higher
 T2370    President Obama signing the       212     104           minimum threshold of 4% is used. These thresholds are de-
          bill related to border security                         termined empirically based on the frequencies of words in
 T2384    President Obama support-          373     233           the groundtruth claims.
          ing/endorsing building of a                                Figures 2 and 3 show the coverage for ClaimF inder us-
          mosque near ground zero                                 ing the different representations and clustering techniques.
 T2499    President Obama is Muslim         69      67            Spectral clustering gives better performance in both datasets,
                                                                  while keywords and ngrams generally gives lower coverage
      Table 4: Groundtruth claims in Castillo dataset.            regardless of the clustering techniques employed.
                                                                     We observe that the proposed subject-predicate tuples
                                                                  consistently identify more claims in both datasets and ar-
      tweets, of which 811 are unique. Nomenclature of the        gue that its effectiveness indicates merit in discriminating
      claims follows that of the original annotated topics in     the entity and relation terms using different weights for the
      [5], but with the prefix “T” instead of “TM” to indicate    different types of terms. This is not possible using keywords
      a filtered subset. We use these claims as ground truth.     or ngrams. In addition, it is not effective to discriminate be-
                                                                  tween the subject and object entities obtained directly from
5.2     Evaluation Metric                                         the OpenIE triple due to the interchangeability of the po-
                                                                  sitions of the entities in the sentence (e.g.plane abducted by
   We evaluate the performance of the algorithms based on         alien vs alien abducts plane).
the proportion of claims they are able to identify. Let G be
the set of ground truth claims and Dg be the set of tweets        5.3.1     Comparison with TwitterLDA
corresponding to a claim g ∈ G. The output of our algorithm
                                                                     TwitterLDA [23] is designed for identifying topics in tweets.
is a set of tuple clusters, denoted C, where each cluster c ∈ C
                                                                  These topics are used to cluster the tweets for credibility
refers to a claim. In other words, C is the set of claims
                                                                  assessment. We compare the performance of TwitterLDA
identified by an algorithm. For each tuple cluster c ∈ C,
                                                                  using various tweet representations, namely, full tweet, key-
we retrieve all the tweets associated with the tuples in c,
                                                                  words, subject-predicate tuples.
denoted by Dc .
                                                                     In addition to the original TwitterLDA model, we also
   We define a match function to compute the fraction of
                                                                  experimented with its variants using author pooling and
tweets common in both Dc and Dg as follows:
                                                                  temporal pooling. For the MH370 dataset, there are 3,764
                                 2 × |Dc ∩ Dg |                   tweets from 3,557 authors. These tweets are posted across
                match(c, g) =                              (2)
                                  |Dc | + |Dg |                   a period of 15 days and thus, a daily (24 hour) time frame
                                                                  is chosen for its temporal pooling. For the Castillo dataset,
Note that when C and G have identical sets of tweets, we
                                                                  there are 1,336 tweets from 1,100 authors, posted between
have match(c, g) = 1. On the other hand, when C and G
                                                                  1 May to 20 August 2010. The longer timeframe motivates
have totally different sets of tweets, then match(c, g) = 0.
                                                                  the use of a weekly (7 days) time frame for temporal pooling.
Given a claim c, we say that c sufficiently covers a ground
                                                                     Implementation for the TwitterLDA based approaches is
truth claim g if match(c, g) ≥ 0.8.
                                                                  based on the publicly available code3 , ran with default 100
                                                                  iterations. TwitterLDA requires the number of topics as an
  We introduce a metric called Coverage to measure the
                                                                  input parameter. Our initial experiments show that the best
ability of a method to identify claims as follows:
                                                                  performance is achieved when the number of topics is 12 for
                                   |Cmatch |                      both datasets. We use this setting to obtain the coverage of
                    Coverage =                             (3)
                                     |G|                          the various TwitterLDA models.
                                                                  3
  where Cmatch = {g ∈ G | ∃ c ∈ C, match(c, g) ≥ 0.8}                 https://github.com/minghui/TwitterLDA


                                                                                                                               17
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
                                Figure 2: Performance of ClaimF inder (MH370).


                                Figure 3: Performance of ClaimF inder (Castillo).


                                 Figure 4: Performance of TwitterLDA (MH370).


                                 Figure 5: Performance of TwitterLDA (Castillo).


                                                                                    18
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
   Figures 4 and 5 show the results. We observe that using
the subject-predicate tuples representation always achieves
                                                                     Groundtruth          Sample tweets
the best coverage regardless of the TwitterLDA models used.          claim
This indicates that the subject-predicate tuples are able to         M5                   CNN has yet to rule out the theory that
capture the underlying semantics of a claim.                         Alien       abduct   MH370 was abducted by aliens. Muldar,
   Using keywords generally yields better coverage compared          MH370                where are you?
to using ngrams or the full text of the tweet. Using the full                             The #MH370 was abducted by aliens? How
tweet results in relatively bad coverage indicating that when                             come?
there are multiple claims in a tweet, some of these claims                                Rumors: Malaysia Airline MH370 Abducted
                                                                                          by Aliens? - News - Bubblews
may be missed.
                                                                                          What if the plane is abducted by the aliens?
   Overall, the best performance is obtained when the pro-                                #MH370 if a mysterious island (Lost) can
posed subject-predicate tuples is used in conjunction with                                happen, so does an alien spaceship.
TwitterLDA(Weekly Pooled). This is because there is a                                     Has somebody floated alien abduction theory
temporal correlation among the claims, that is, posts con-                                for MH370?
taining the same claims are likely to be sent within simi-           M6                   BREAKING: Malaysia transport minister
                                                                     MH370 sighted in     says reports of missing plane sighted over
lar time windows. In contrast, TwitterLDA(Author Pooled)             Maldives             Maldives are untrue
does not perform well due to the low tweet-to-author ratio                                Minister: Maldives says it’s not true that the
for both datasets.                                                                        plane was sighted in its airspace #MH370
   When we compare the coverage of the best performing                                    MH370: Reports that plane sighted in #Mal-
variant of TwitterLDA, i.e. TwitterLDA(Weekly Pooled)                                     dives not true
in Figures 4 and 5, and the best performing ClaimF inder                                  RT Yahoo MY: Plane sighted in Maldives?
                                                                                          Not true, says Hishammuddin
version, i.e. ClaimF inder(Spectral) with subject-predicate
                                                                                          RT TODAYonline: #MH370 press con: Re-
tuples, we see that the latter significantly increases the num-                           ports of plane sighted at Maldives are not
ber of claims identified in both datasets. We note that the                               true; forensic work underway to look at data
MH370 dataset is noisier (more diverse set of words) than                                 deleted from...
the Castillo dataset and believe that the larger improve-
ment for the former is simply an indication of the weakness
of TwitterLDA in dealing with the noise.                                 Table 5: Sample claims found in MH370 dataset.


5.4    Effectiveness of ClaimFinder
   As a case study on the effectiveness of the proposed claim
identification approach, we retrieve the sets of subject-predicate   Groundtruth          Sample tweets
tuples in the cluster that match some ground truth claim,            claim
as well as their corresponding tweets.                               T269                 President Obama will visit the Gulf of Mexico
                                                                     President Obama      in the next 48 hours to check out the oil spill
   The identified claims and sample tweets obtained using            visiting the Gulf    and response, per a White House official.
ClaimF inder(Spectral) are shown in Tables 5 and 6 for               of Mexico            RT @CNN: President Obama will visit the
the MH370 and Castillo dataset respectively. We see that                                  Gulf of Mexico in the next 48 hours to check
the tweets retrieved based on the clusters by ClaimF inder                                out the oil spill and response.
closely match the description of the ground truth claim, in-                              President Obama to visit Gulf of Mexico re-
                                                                                          gion in next 48 hours to check oil spill re-
dicating that the subject-predicate tuples are able to capture                            sponse, White House says.
the semantics of a claim.                                                                 RT @GWPStudio: President Obama to visit
                                                                                          site of oil spill in the Gulf of Mexico in next
                                                                                          48 hours http://bit.ly/cZ0q73 #oilspill
5.5    Scalability of ClaimFinderINC                                                      RT @CNN: Just in: President Barack Obama
  Finally, we evaluate the scalability of the proposed incre-                             will visit the Gulf of Mexico oil spill area on
                                                                                          Sunday morning.
mental method ClaimF inderIN C to identify claims.
                                                                     T2384                RT @croedemeierAP: WASHINGTON (AP)
  We use 100 hash functions to generate the MinHash val-             President Obama      - President Obama supports allowing mosque
ues, and spectral clustering for the splitting and merging           supports building    to be built near ground zero in Manhattan.
operations. There are two parameters in ClaimF inderIN C ,           of a mosque near     President Obama supports allowing mosque
namely the number of LSH vectors and the threshold to split          ground zero          to be built near ground zero
a cluster. We use 50 LSH vectors for both the MH370 and                                   Obama backs Mosque near ground zero (AP):
                                                                                          AP - President Barack Obama on Friday
Castillo datasets. The split threshold is 10 and 30 tuples for                            forcefully endorsed building ...
MH370 and Castillo dataset respectively.                                                  Breaking news:   President Obama backs
  Figure 6 shows the runtime of ClaimF inderIN C com-                                     mosque near ground zero
pared to ClaimF inder (in log scale) under spectral clus-                                 Looks interesting: Obama backs mosque near
tering and ClaimF inder under agglomerative clustering.                                   ground zero: President
We observe ClaimF inderIN C is several orders of magni-                                   Obama threw his support behind a controver-
                                                                                          sial p...
tude faster than both versions of ClaimF inder and remains
scalable as the number of tweets increases.
                                                                        Table 6: Sample claims found in Castillo dataset.


                                                                                                                                        19
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016
                                                                  troduced an incremental approach to quickly process incom-
                                                                  ing tweets. Empirical evaluation on two real world datasets
                                                                  demonstrate the effectiveness of ClaimF inder, and scala-
                                                                  bility of ClaimF inderIN C . For future work, we plan to in-
                                                                  vestigate existing features as well as information from other
                                                                  sources for credibility assessment.

                                                                  8.   REFERENCES
                                                                   [1] L.M. Aiello, G. Petkos, and C. Martin et al. Sensing
                                                                       trending topics in twitter. IEEE Transactions on
                                                                       Multimedia, 15(6), 2013.
                                                                   [2] H. Becker, M. Naaman, and L. Gravano. Beyond trending
                                                                       topics: Real-world event identification on twitter. In AAAI
                                                                       Conference on Weblogs and Social Media, 2011.
                                                                   [3] D.M. Blei, A.Y. Ng, and M.I. Jordan et al. Latent dirichlet
                                                                       allocation. Journal of Machine Learning Research, 2003.
   Figure 6: Scalability of ClaimF inderIN C (MH370).
                                                                   [4] C. Castillo, M. Mendoza, and B. Poblete. Information
                                                                       credibility on twitter. In WWW, 2011.
                                                                   [5] C. Castillo, M. Mendoza, and B. Poblete. Predicting
6. RELATED WORK                                                        information credibility in time-sensitive social media.
    There are two main approaches to cluster tweets, namely            Internet Research, 2013.
features-based and topic modeling based clustering. Feature-       [6] L.D. Corro and R. Gemulla. Clausie: Clause-based open
                                                                       information extraction. In WWW, 2013.
based approach typically represent each tweet as a vector or
                                                                   [7] L. Derczynski, A. Ritter, and S. Clark et. al. Twitter
set of features from which a similarity measure can then               part-of-speech tagging for all: Overcoming sparse and noisy
be used to quantify the distance between any given pair                data. In Recent Advances in NLP, 2013.
of tweets. A commonly used set of features is the TFIDF            [8] I.S. Dhillon. Co-clustering documents and words using
scores of the words present within the tweet content. Other            bipartite spectral graph partitioning. In ACM SIGKDD,
features useful for differentiating individual tweet to their          2001.
event include references to temporal, geographical and user        [9] E. Ferrara, M. JafariAsbagh, and O. Varol et. al. Clustering
information extracted from the tweet content [21]. These               memes in social media. In Advances in Social Networks
                                                                       Analysis and Mining, 2013.
features are then used to cluster the tweets [9, 20, 8].
                                                                  [10] A. Gupta, P. Kumaraguru, C. Castillo, and P. Meier.
    The alternative to features-based clustering is the genera-        Tweetcred: Real-time credibility assessment of content on
tive topic modeling approaches, e.g., LDA [3]. However, the            twitter. In Social Informatics. 2014.
limited number of words present in microblog pose a major         [11] L. Hong and B.D. Davison. Empirical study of topic
problem due to the lack of word co-occurrence within the               modeling in twitter. In SIGKDD Workshop on Social
tweets [11]. Empirical studies show that aggregating tweets            Media Analytics, 2010.
such that each document is the concatenation of tweets from       [12] M. Mathioudakis and N. Koudas. Twittermonitor: Trend
a user, hashtag or time window improves the topic cluster-             detection over the twitter stream. In ACM SIGMOD, 2010.
ing results [11][19][13]. The work in [23] assume that “a         [13] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving
                                                                       lda topic models for microblogs via tweet pooling and
single tweet is usually about a single topic” and propose              automatic labeling. In ACM SIGIR, 2013.
the TwitterLDA model where words in a tweet are either            [14] M. Mendoza, B. Pobletey, and C. Castillo. Twitter Under
chosen from a topic or are background noise words. The                 Crisis: Can we trust what we RT? In 1st Workshop on
TwitterLDA model is able to generate more coherent repre-              Social Media Analytics,, 2010.
sentative topic words compared to a standard LDA model.           [15] J. O’Donovan, B. Kang, and G. Meyer et. al. Credibility in
    To date, prior work on tweet or keywords clustering are            context: An analysis of feature distributions in twitter. In
designed mainly for topic or event detection, of which are             International Conference on Social Computing, 2012.
overly encompassing in nature for the credibility assessment      [16] F. Pedregosa, G. Varoquaux, and A. Gramfort et. al.
                                                                       Scikit-learn: Machine learning in Python. Journal of
task. For example, an entity-oriented sample topic in [23]             Machine Learning Research, 12:2825–2830, 2011.
‘ ‘iphone6, #iphone, apple, app” correspond to tweets refer-      [17] M.F. Porter. An algorithm for suffix stripping. Program,
ring to the iPhone and/or the technology company while a               14:130–137, 1980.
event-oriented topic “health, flu, swine, #h1n1, #swineflu”       [18] V. Qazvinian, E. Rosengren, D.R. Radev, and Q. Mei.
correspond to tweets referring to the virus outbreak. The              Rumor has it: Identifying misinformation in microblogs. In
problem that there are multiple claims of varying credibility          EMNLP, 2011.
made within the tweets in each cluster remains unaddressed.       [19] Y. Wang, J. Liu, and J. Qu et. al. Hashtag graph based
                                                                       topic model for tweet mining. In IEEE Data Mining, 2013.
                                                                  [20] C. Wartena and R. Brussee. Topic detection by clustering
7. CONCLUSION                                                          keywords. In DEXA, 2008.
   In this work, we observed that tweets may contain mul-         [21] Y. Xia, X. Yang, and C. Wu et. al. Information credibility
tiple claims and define a claim as comprising of subjects and          on twitter in emergency situation. In Pacific Asia
                                                                       Conference on Intelligence and Security Informatics, 2012.
predicates terms. We described a method called ClaimF inder
                                                                  [22] F. Yang, Y. Liu, X. Yu, and M. Yang. Automatic detection
to identify claims in a corpus of tweets related to some real          of rumor on sina weibo. In ACM SIGKDD Workshop on
world event. In particular, we use OpenIE techniques to                Mining Data Semantics, 2012.
identify entities and their relationships in tweets and map       [23] W. Zhao, J. Jiang, and J. Weng et. al. Comparing twitter
them to subject-predicate tuples. These tuples are then clus-          and traditional media using topic models. In European
tered such that each cluster refers to a claim. We further in-         Conference on Advances in Information Retrieval, 2011.


                                                                                                                                 20
· #Microposts2016 · 6th Workshop on Making Sense of Microposts · @WWW2016