-

Fully Automatic Approach to Identify Factual or Fact-checkable Tweets

Sarthak Anand

sarthaka.ic@nsit.net.in 2

Rajat Gupta

Rajiv Ratn Shah

Ponnurangam Kumaraguru

0 0 Indraprastha Institute of Information Technology , Delhi 110020 , INDIA 1 Maharaja Agrasen Institute of Technology , Delhi 110086 , INDIA 2 Netaji Subhas Institute of Technology , New Delhi 110078 , INDIA

This paper presents the solution of the team MIDAS of IIIT Delhi for the IRMiDis track in FIRE 2018. We present our solution for the identication of factual or fact-checkable tweets from a dataset that consists of about 50,000 tweets posted during the 2015 Nepal earthquake. We provide a rule based approach for this task and compare it with a semi-supervised approach. After preprocessing steps including tokenization and cleaning, we calculate a factuality score on the basis of number of proper-nouns and quantitative values within a tweet and nally rank them according to the score. Experimental results show that this simple rule based approach provides comparable results in comparison to that of semi-supervised approach.

Social media analysis Unsupervised learning Information retrieval Microblogs Disaster

Social media usage has considerably increased over the last decade. People often use the social media for various purposes and create a huge amount of usergenerated content. In addition to the reporting of news or events social media platforms are increasingly being used for aiding relief operations during various mass emergencies, e.g., during Kerala oods 2018.

However, messages posted on these sites often contain rumors and false information. In such situations, identication of factual or fact-checkable tweets, i.e., tweets that report some relevant and veriable fact is extremely important for eective coordination of post-disaster relief operations. Additionally, cross verication of such critical information is a practical necessity to ensure the trustworthiness. Considering the scale of these platforms it is not feasible to manually check and verify dierent user-generated content on time. Since it is very important to reach to a person who is stuck in such emergencies on time, automated IR techniques are needed to identify, process and verify the credibility of information from multiple sources.

With this paper we provide one such approach which has shown the best performance in the FIRE challenge 2018 [ 1 ] on identifying factual tweets. 2

Related Work

Identifying factual and non factual tweets can be treated as a supervised classication problem. A lot of work have already been done related to supervised based classication [ 7 ] [ 8 ] [ 4 ]. All these works require large amounts of manually labeled dataset.

Despite most works focus on supervised techniques, some works also employed unsupervised techniques as well. For instace, Bjorn Schuller et al. [ 6 ] worked on knowledge based approach which does not demand labeled training data. Moreover, Shailesh S. Deshpand et al. [ 2 ] proposed a rule based approach for the classication of sentences. They tested it for identifying specic and non specic sentences. They computed several features for each sentence for computing a specicity score for each sentence. Similar to their approach we extract features from sentences such as the number of proper nouns(PROPN) and the number of quantitative values(NUM) and compute a factuality score( higher score indicates more factual information ). In our approach, we use the factual score for ranking the tweets in order of factual information and use the top k sentences as fact-checkable tweets. 3

Problem and Data Description

Information retrieval from micro-blogs during disasters challenge had 2 subtasks. Sub-task 1 was about, identifying factual or fact-checkable tweets related to Nepal disaster and ranking them on the basis of their factuality scores. Subtask 2 was about, mapping the fact-checkable tweets with appropriate news articles. The submission was categorized into 3 types based on the amount of manual intervention i.e. Fully automatic, Semi automatic, and Manual.

Data Description Dataset for sub-task 1 consists of about 50,000 tweets posted during the 2015 Nepal earthquake. Dataset for sub-task 2 included around 6,000 news articles related to the 2015 Nepal earthquake. Refer [ 1 ] for more details. 4

Automatic Methodology

The problem at hand is to use tweets and rank them based on the information they contain. The following sections describe in detail the various steps that have been performed to achieve the results and intuition behind our approach. 1. Pre-processing of tweets, POS tagging and nding proper-nouns and quantitative values, are described in Section 4.2. 2. Finally computing a factuality score based on proper-nouns and quantitative values, is described in Section 4.3. 4.1

Intuition

Similar to the ndings of Shailesh S. Deshpande et al. [ 2 ], in our study we nd that tweets that contain some factual information consists of some name entities like an organization like UN or NDRF, or proper noun such as PM Modi and quantitative information such as date, time or numbers( e.g., 5 dead or 5 tonnes ). Based on this study we try to score a tweet on the basis of number of proper nouns and quantitative values which we call as factuality score. 4.2

Data Preprocessing and POS tagging

Since the data given to us is raw, noisy and also prone to more errors, it cannot be directly used for analysis. It is necessary to perform some preprocessing to make the data more suitable so that we can perform POS tagging on the sentences. The following preprocessing steps were performed: 1. Tokenization: Tokenization refers to the breaking down of the given text into individual words. We use the Spacy’s word tokenizer to perform tokenization of the tweets. 2. Normalization: We perform the following steps, very specic to tweets to normalize our corpus:

Stop-words and punctuation removal: Usually tweets consists of mentions, hash-tags, URLs, punctuation marks and emoji’s. They are not useful in determining the amount of information within a tweet and hence are removed from our corpus.

POS tagging In our approach, we have used two major features for computing factuality score, i.e., the number of proper nouns and quantitative values within a tweet. We use spacy’s POS tagger for this purpose. 4.3

Computing Factuality Score

Submitted Approach 1 In this approach we compute the number of proper nouns and number of quantitative values within the tweet. For mapping the score to 0 and 1 we divide the number of PROPN and NUM by maximum values achieved in their respective eld. Finally, we take average of both these values. The Table 1 shows examples for calculating the factuality score. The underlined words refer to proper-nouns and italicized words refer to numbers. For these examples, note that the maximum values of PROPN and NUM were 17 and 13, respectively. (Shortcomings and suggestions for this approach are described in Section 7) 1 Github code available at: https://github.com/isarth/Fire_task_1 For comparing our automatic approach with supervised approach. We manually labeled around 1,500 tweets as factual and non-factual and treat the sub-task 1 (refer Section 3) as binary classication problem. The condence score of the classier is treated as the factuality score, which is nally used for ranking the tweets. The following section describes in detail various steps that have been performed for the semi-automatic approach.

1. Manually labeling a small set of tweets from the dataset. 2. Pre-processing steps, already described in Section 4.2 3. Training a binary classier and nally ranking tweets according the condence score (see Section 5.1 for details).

5.1

Binary Classier

For classifying tweets as factual and Non-factual, we train both Fasttext [ 3 ] cbow and bi-gram models. We split our labeled dataset into two parts training and validation. Table 2 shows the performance of both the classiers. Finally for ranking tweets in order of factuality, we treat the condence score of bi-gram classier as our factuality score.

Validation Accuracy 0.756 0.796

Result and Analysis

Finally Table 3 compares the results of automatic and semi-automatic approach in the FIRE’18 challenge. Table 4 summarizes the nal results of other teams that participated in the FIRE’18 task for automatic submission. We were ranked rst in the competition with an NDCG score of 0.6835. The lowest NDCG score achieved in the competition was 0.1271. Table 5 summarizes the nal results of other teams that participated in the FIRE’18 task for semi-automatic submission. We were ranked second in that task. For detailed results refer [ 1 ]. We have presented our automatic approach for calculating the factuality score on the basis of number of proper-nouns and quantitative values within a tweet which provided comparable results with semi automatic approach in FIRE’18

Sarthak et al.

Information Retrieval from Micro-blogs during Disasters (IRMiDis) task. The best automatic submission achieved the NDCG score of 0.6835, that made our team stand at rst position globally in terms of NDCG score.

On further exploring we nd two minor issues in the automatic approach described in Section 4.3 are:

To overcome the above mentioned issues, we suggest having an upper-bound to the PROPN and NUM values as . Hence for computing the individual score we take min(propn/num, ) and nally to map score between 0 and 1 we divide by and take the average of both the scores. Futher exploration can be done of nding value of . These shortcomings remain, as to be solved as future work.

We also aim to extend the model by making it more ecient by using dierent techniques we did not explore such as using other features like TFIDF [ 5 ] score of words, combined with the ones we already tried. Further knowledge based classication [ 6 ] can also be explored .

1. Basu , M. , Ghosh , S. , Ghosh , K. : Overview of the FIRE 2018 track: Information Retrieval from Microblogs during Disasters (IRMiDis) . In: Proceedings of FIRE 2018 - Forum for Information Retrieval Evaluation (December 2018 )

2. Deshpande , S.S. , Palshikar , G.K. , Athiappan , G.: Unsupervised approach to sentence classication ( 2010 )

3. Joulin , A. , Grave , E. , Bojanowski , P. , Mikolov , T. : Bag of tricks for ecient text classication . CoRR abs/1607 .01759 ( 2016 )

Lei

Shen , J.Z. : Empirical evaluation of rnn architectures on sentence classication task

5. Ramos , J.: Using tf-idf to determine word relevance in document queries

6. Schuller , B. , Knaup , T. : Learning and knowledge-based sentiment analysis in movie review key excerpts

7. Wang , S. , Manning , C.D.: Baselines and bigrams: Simple, good sentiment and topic classication ( 2012 )

8. Zhang , X. , Zhao , J.J. , LeCun, Y.: Character-level convolutional networks for text classication . CoRR abs/1509 .01626 ( 2015 )