Estimating Credibility of News Authors from their WIKI Validated Predictions Navya Yarrabelly Kamalakar Karlapalem DSAC, IIIT Hyderabad DSAC, IIIT Hyderabad yarrabelly.navya@research.iiit.ac.in kamal@iiit.ac.in with reading articles can easily determine predictabil- ity aspects of a news article and over time has some Abstract assurance about which articles or news agencies cor- rectly predict some of the future scenarios. It is impor- In this paper, we consider a set of articles or tant and necessary, therefore, to enhance our ability to reports by journalists or others, wherein they computationally determine the credibility of journal- predict or promise something about future. ists based on their ability to predict the future sce- The problem we approach is determining the narios correctly. As a step towards this direction, we credibility of the authors based on the predic- take up the automatic verification of predictive state- tions coming out to be true. The two specific ments against facts collected from credible information problems we address are extracting the pre- sources. This task of machine reading at scale has the dictions from the articles and annotating with difficulties of relevant article retrieval (finding the sig- various prediction attributes. And then we nificant facts) with that of machine perception of con- determine the truth of these predictions, us- tent (entailment of predictions from facts). ing Wikipedia as a credible source to extract Consider the following prediction published on date relevant facts which can ascertain the validity ‘d’. of the predictions. We proposed and built an Example: The Reserve Bank of India may lower the end to end system for automated predictions economic growth projection for 2017-18 to 6.7 per cent validation(APV) by extracting future specu- later this month, from its August forecast of 7.3 per lations and predictions from news articles and cent, in view of issues with GST implementation and social media. We considered 28 news arti- lower kharif output estimates. In the above predic- cles and extracted 97 predictions from these tive sentence, we have to precisely extract and validate articles and the range of credibility scores(F- only the predictive part “The Reserve Bank of India scores) for these articles are (0.57-0.71). may lower the economic growth projection for 2017-18 to 6.7 per cent”, “in view of issues with GST imple- 1 Introduction mentation and lower kharif output estimates.” is the premise on which the prediction is made and “from its In newspaper articles, many journalists evaluate the August forecast of 7.3 per cent” is a supporting clause. current state of affairs and predict possible future sce- The reference future date for this prediction “later this narios. [6] estimates from their investigations that month” is translated to actual date ‘d+30’. The facts nearly one-third of news articles contain predictive relevant to the predictive part, which are published af- statements. Therefore, it is imperative to determine ter the target date ‘d+30’, are extracted to determine the passages, sentences and phrases of the news articles the entailment relation from fact to prediction. that predict the future scenarios. A person well versed Contributions: The main contributions of the ap- proach proposed are Copyright © 2018 for the individual papers by the papers’ au- (1) To translate predictions to structured queries, we thors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. annotate the predictions with a wide range of at- In: D. Albakour, D. Corney, J. Gonzalo, M. Martinez, tributes(in Table 1). This can further be used by an B. Poblete, A. Vlachos (eds.): Proceedings of the NewsIR’18 IR system to retrieve predictions made in reference to Workshop at ECIR, Grenoble, France, 26-March-2018, pub- a future time period, targeting an event etc. lished at http://ceur-ws.org (2) We also report a timeline story of its relevant facts 3 Predictions and analysis, and the fact sources confirming the truth 3.1 Predictions Extraction : of the predictions. News IR systems can also come up with recommendations or follow up links for an article From each article, we annotate the sentences as pre- read, based on the predictive attributes and from the dictive or factual using the implementation from [15]. timeline of facts extracted. It also identifies the predictive phrase in the prediction (3) We propose an approach to tackle open-domain and resolves the scope of the prediction in a complex prediction validation using Wikipedia as the unique sentence. knowledge source. 3.2 Semantic Graph Model for Predictive Sentence Simplification 2 Related Work News articles often contain long and syntactically com- Research has in the past focused on how to answer plex sentences with relevant dependent relations span- questions but has not devoted attention to discerning ning over various clauses. It is required to determine the accuracy of the predictions/promises made. To constituents that commonly supply no more than con- the best of our knowledge [5] is the only work which textual background information. Inspired by the work focused on the estimation of validity of predictions, by of sentence simplification using relation graph1 and calculating cosine similarity between predicted news syntactic sub-structures [11, 1], we followed a syntax and the relevant events that actually occurred. We of- based sentence simplification approach to determine fer semantic and syntactic analysis based on the struc- such constituents and to annotate predictions with ture of relation triplets in a predictive sentence and various attributes. We constructed a Triplet-Level Se- incorporated domain-specific knowledge into the sys- mantic Graph Model (TLSGM) which has relation- tem. Also, their retrieval model is limited to topics triplets as vertices and the semantic relationships be- contained by predictions(manually collected). Though tween the triplets govern the edges in the graph. From applications on future information retrieval have been the TLSGM, we identified core triplets of the predic- studied by a number of researchers, study on the prob- tive part of the sentence and dis-embedded other pe- lem of validating predictions from Natural Language ripheral triplets w.r.t the head predictive phrase ex- Understanding perspective is limited. [7] presents a tracted in Section 3.1. Then only these core triplets search engine for future and Past events relevant to are validated to determine the accuracy of the predic- a users query. [3] automatically generates summaries tion. of future events related to queries. Their methods Vertices: Vertices in the TLSGM represent (sub- rely on extracting and processing statements contain- ject, predicate phrase, object) relation triplets ex- ing future temporal references. [6] retrieves and ranks tracted from the prediction. predictions that are relevant to a news article using Edges: An edge between two nodes N1 →N2 repre- features: term similarity, entity-based similarity, topic sents the semantic relation of node N2 w.r.t node N1 . similarity, and temporal similarity. Edges can be formed either from the subject or object Relevance to Fact Checking and QA Systems of a node to another node describing/modifying the To some extent our problem can be compared with noun phrase of subject/object, following the rules for the Fact Checking and Question Answering systems. noun descriptors. While edges formed from a predicate Though research has been done on the truth assess- to another node follow verb descriptor rules given be- ment of fact statements relying on iterative peer vot- low. We illustrate the descriptor rules using example ing, leveraging language to infer accuracy of fact can- sentences given below. didates has only started. [14] calculates the credibility of an uncertain fact by comparing other related facts. 1. Example 1 : Mary Kom, who won Bronze at Lon- Fact validity is estimated by the co-occurrence degree don Olympics, still has a fifty-fifty chance of gain- of the doubt object and predicate by relying on page ing a wildcard entry to the 2016 Rio Olympics. counts for web queries.[4], [8] proposed to convert a (Mary Kom, has, fifty-fifty chance) is the head fact-checking question into a set of factoid-style ques- predictive triplet (H). tions and validated the answers against those retrieved 2. Example 2 : The Reserve Bank of India is likely by Factoid Question Answering systems Our problem to leave interest rates unchanged inorder to keep differs from existing fact checking systems and ques- inflation rate controlled. tion answering systems in its retrieval problem, as we only have to validate the predictive part of a sentence Rules for Noun Descriptors and retrieve the relevant facts which occurred within Modifiers and Dependents of the head of the noun the implicit temporal constraints imposed. 1 https://github.com/Lambda-3/Graphene phrase of either the subject or object of a triplet are 8% this year has a relation ccomp(promised, cross), discussed below, categorized by the dependency rela- which adds an edge from (Modi, promised,) to (GDP tions. growth rate, would cross, 8%) . acl:relcl, appos : A relative clause modifier from advcl : An adverbial clause modifier of a VP or the head noun of an NP to the head of a relative S is a clause modifying the verb to introduce either a clause. The clause introduced by this dependency only temporal, consequence, conditional or purpose clause gives additional information on the noun phrase and and adds specificity to the head clause. Example 2 does not remark about the future predictive action, has a relation advcl(leave, keep) which adds an edge which is our focus of interest. Example 1 has relation from H to triplet N2 : (RBI, to keep , inflation rate acl:relcl(Sindhu, won) from the subject of node H. And controlled). The validity of the predictive sentence the edge between H and N2 : (Mary Kom, won, Bronze should be determined regardless of the state of truth at London Olympics ) is only an additional descriptor of the purpose/conditional clause. Hence the node N2 of H. Node N2 and its edges are pruned from the graph. and its edges are discarded from the graph. acl : An adjectival clause introduced by a Noun. • If the dependent is a verb, and it has no subject, 3.3 Prediction Attributes it takes the object of the governor. Example 1 has a relation acl(chance,gaining) from the object of H. And Each Node in TLSGM is further classified and la- the edge between H and N2 : (fifty-fifty chance, gain- beled with reference to the root node i.e head pre- ing, wildcard entry to the 2016 Rio Olympics) further diction node of the graph. We have determined the specifies the predictive action of N1 and hence node characteristics of following constituents, using a num- N2 is retained in the graph. ber of syntactic features (dependency relation types, • If the dependent is an adjective, it will only de- constituency-based parse trees as well as POS and scribe the subject/object. This relation is also used NER labels). Attributes : (Action; Event; Event for optional depictives to modify the nominal of which location; Event Time; Purpose / Consequence of pre- it provides a secondary predication. Example 2 has dictive action; Premise; Conditional clause; Qualifier a relation acl(rates, unchanged) from the object of H. Reference which adds specificity attributes of the en- And the edge between H and N2 : (interest rates, un- tities involved in the prediction; Numeric Quantifier changed) acts as a qualifier reference for the entities Reference; Certainty Perspective to isolate predictive contained in the prediction. stances taken by an author from third party’s voices Rules for Verb Descriptors that are presented by the author). xcomp : An open clausal complement (xcomp) of a VP, without its own subject, whose reference is de- 4 Extracting Relevant facts termined by an external subject. In the following section, we describe our system for • If the governor of the relation contains an object Automatic Prediction Validation (APV) which con- of its own, the clause introduced by xcomp provides sists of three components: (1) Keyword selection mod- attributes to the relation contained by the governor ule to select keywords specific to the predictive part, predicate and acts as a purpose or consequence clause. dis-embedding the linguistic peripheral clauses identi- Ex : Microsoft share values may go down by 10 dol- fied in section 3.2 (2) the Document Retriever module lars to give space to the new iPhone launch. We create for finding facts relevant to the prediction and (3) a an edge (Microsoft share values, may go down, by 10 machine comprehension model, Document Reader, for dollars) -¿ (,give,space to the new iPhone launch), gov- ascertaining the accuracy of predictions from a small erned by the relation xcomp(go, give). collection of relevant facts. • If the governor of the relation does not contain an ob- ject, the dependent predicate modifies the head pred- 4.1 Keyword Selection icate. We modify the predicate of the current node to include the dependent predicate connected by xcomp Obtaining the pertinent facts relevant to the predic- relation. Example 2 has a relation xcomp(likely, tion is in itself a complicated problem to solve. Pre- leave). We modify H to (The Reserve Bank of India, dictions have event and temporal based constraints, is likely to leave , interest rates unchanged ). clausal complements, appositives, relative clauses etc. ccomp for a verb : A clausal complement of a verb to add specificity or modify the action of an event. is a dependent clause with an internal subject which To overcome the problem of query drift introduced by functions like an object of the verb, or adjective. The these clauses, we further dis-embed keywords express- clause introduced further describes the future course of ing the time constraints, premise clauses, certainty action referred by the governor predicate P. Ex: Modi perspective (annotated in Section 3.2) and the spec- promised that Indian GDP growth rate would cross ulative words used. We identify the headword of the predictive phrase and used a rule based approach so egory vectors and assigned a score based on the that the predictive sentence fragments can be detected cosine similarity between prediction category vec- and to select keywords pertaining to the predictive ac- tor and the fact category vector. tion and its attributes in the sentence. Let K be the • Wikipedia concept relevance: Cumulative pair- set of relation triplets, we add the head vertex of the wise similarity score of extracted Wikipedia con- graph (TLSG) to K and recursively add selected nodes cepts from the prediction and fact’s context from from its edges to K. We select nodes with edge labels the Wiki article. corresponding to Action, Event, Qualifier and Quanti- fier References as described in Table 1. We then give • Context similarity: Distributional semantic simi- proximity queries where subject, predicate and object larity score between words and phrases from the occur within a window of 7 words. We further expand prediction and fact. the query set iteratively by adding purpose clauses and expand keywords in a query with their synonyms. From these candidate facts, we filtered the top 100 Example: For the predictive sentence “Lizzie Armit- facts sorted with their current score. stead is predicted to win gold medal in cycling road 4.3 Validation of Predictions race at the Rio Olympics. ” Query : (Lizzie Armitstead ∼ win ∼ gold medal) OR Our approach allows to translate the prediction and (Lizzie Armitstead ∼ win ∼ cycling road race) OR fact to a semantic representation, incorporating knowl- (Lizzie Armitstead ∼ win ∼ Rio Olympics. ) edge from external sources and then try to determine if the representation of the prediction is subsumed by 4.2 Candidate Relevant Facts Extraction that of the fact. From Wikipedia We pass all the (prediction, fact) pairs to two com- ponents: 1. (RATSR) framework(described below) To extract pertinent facts which can ascertain the ac- and 2. an RTE system which performs rich syntac- curacy of the predictions, we used Wikipedia as a tic analysis of the linguistic phenomena between the knowledge source. Wikipedia’s publicly available apis2 entailment pair. to access revision history of each article and its up-to- Relation Alignment for Textual Similarity date knowledge marked with timestamps makes it a Recognition (RATSR) The RATSR framework has reliable source for event-based prediction validation. three major components: 1. Preprocessor. Prediction We used tagme3 as a semantic interpreter that maps and fact pairs are annotated with a range of analytical fragments of natural language text into a weighted se- tools. 2. Graph Generator. Applies metrics to com- quence of Wikipedia concepts relevant to the input. pare triplets in specified annotation views to generate a Using the query set in the above step(Section 4.1), we match graph over the Prediction and Fact constituents extracted the top 50 documents, from a local Lucene of the entailment pair. 3. Alignment Score. Filters the index of Wikipedia English dump. To further extract edges in the match graph to focus on a scoring function the relevant snippet from the article, we only included based on the alignment output. the article content with revision dates occurring within (1)Preprocessor: Sentence and word segmenta- the time-window referenced by temporal constraints tion; POS tagging; dependency parsing; named entity extracted for the validity of the prediction. We used recognition; co-reference resolution; temporal expres- the word2vec python implementation of Gensim [13] sion identifiers; Wikipedia concepts annotator; Multi using Wikipedia as a corpus for generating embed- word expression identifiers4 ; Phrasal verbs identifiers; dings to represent contextual term vectors. Inspired Quantifier and Qualifier references. These resources from [10], we adapted Zero Filter, Terms filter, Exact are used for annotating both predictions and facts at Sequence Filter, Normalization Filter, N-grams Filter, the sentence level and triplet level. Density Filter to extract and sort the relevant candi- (2)Graph Generator: Similarity metrics are applied to date facts from the retrieved articles. Additionally, we the relevant constituent pairs drawn from the Predic- implemented the following filters tion and Fact. [2] uses relation triplet similarity by • Distance filter : Assigns a score to a fact based calculating similarity across subject, verb and object on the distance between subject and object from pairs from PPDB[12], as a feature for stance classifica- each triplet in prediction. tion. We construct a relation match graph(RMG) by iterating over each triplet in prediction and fact and • Category Filter: For all the annotated Wikipedia calculate similarity over various views to give a simi- concepts in the prediction and facts, we build cat- larity score between the two triplets being compared 2 https://www.mediawiki.org/wiki/API:Search 4 https://radimrehurek.com/gensim/models/phrases. 3 https://tagme.d4science.org/tagme/ html\#id2 and create an edge with similarity score as the weight. Rio Olympics from 6 sites (denoted as A5 , B6 , C7 , D8 , We propose methods for similarity between triplets for E9 , F10 ) and manually filtered the predictions which various annotations mentioned in the pre-processing can be objectively evaluated and those which can be step. reduced to factoid questions. ‘Olympics Predictions’ dataset consists of 97 predictions made for various • Triplet Similarity Score using Latent Semantic events in trials for Rio Olympics and the Rio Olympics Analysis Models (Score = S1 ): Adapting the 2016. We further manually annotated each prediction implementation from [9] and using multiplication as true, if it has come true and false otherwise. We as vector composition operator for phrases with collected the second dataset ‘Obama Promises’ from more than one word, we define the similarity of politifact11 , where each promise is labeled as ‘broken’ SPO triplets using distributional models as given or ‘promise kept’ or ‘compromised’. We collected a set below: of 257 such promises which can be objectively evalu- Probability that fact triplet tf :(sf ,vf ,of ) implies ated. prediction triplet tp :(sp ,vp ,op )is We evaluated the predictions and obtained labels P (tp − > tf ) =P (sp |tf )(1 − P (sp ))+ using our prediction validation system on the two datasets. Table 3 compares the accuracy scores of P (v p |tf )(1 − P (v p ))+ (1) these labels against the actual labels. Table 2 presents P (op |tf )(1 − P (op )) the reliability scores obtained(normalized by the num- P (sp |tf ) = P (sp |sf ) + P (sp |v f ) + P (sp |of ) (2) ber of predictions) of the 6 sources we considered. • Triplet Similarity Score using Lexical Seman- Table 1: Credibility scores for the news sites tic Models(Score = S2 ): We calculate similar- Source True False Factual Credibility ity scores between subject, predicate and object Predic- Predic- State- Score pairs from prediction and fact from synonym and tions tions ments antonym similarity using Wordnet, PPDB and A 14 7 67% 29.3 Wikipedia concept Similarity; hyponym and hy- B 4 9 88% 17.4 pernym similarity using Wikipedia and Wordnet C 7 6 72% 31.1 taxonomy structure; length of the path between D 7 4 69% 37.2 two entities in DBPedia; Numeric references simi- E 7 10 71% 21.3 larity. We then combine these scores to give a cu- F 9 13 66% 21.7 mulative lexical similarity score between the two triplets. (3)Alignment: The goal of alignment component is Table 2: Results for predictions validation to decompose the text and hypothesis into semantic Dataset TP TN FP FN Fscore constituents, and determine which prediction triplet Rio Olympics 37 29 20 9 0.718 should be aligned to which fact triplet. In con- Obama 111 30 26 93 0.651 trast to aligning words[2] from prediction to fact, we Promises align triplets to exploit the semantic roles of the con- stituents; to facilitate for the analysis of specific pre- Discussion: ‘Obama Promises’ contains multi- diction attributes(in Table 1) which are matched in the sentence predictions and requires more robust NLP fact; and also to validate against a cluster of relevant modules to identify the main predictive clause that has facts. We used a maximum weight perfect bipartite to be validated, besides other supporting predictive graph matching algorithm to align triplets from pre- clauses (example: “Create a $10 billion fund to diction to relevant triplets from facts. help homeowners refinance or sell their homes. From the similarity scores obtained from the The Fund will not help speculators, people who bought RATSR framework and an RTE system[?], we set vacation homes or people who falsely represented their threshold limits to label the entailment pair as true, incomes”). High false negative rate can be attributed false or unrelated. . 5 https://www.eurosport.co.uk 6 http://edition.cnn.com/ 5 Results & Discussion 7 https://www.foxsports.com.au 8 http://www.couriermail.com.au/ Dataset Preparation: We collected two datasets, 9 https://www.theguardian.com/ one from predictions in sports domain and the other 10 https://www.thehindu.com/news from campaign promises made by Barack Obama. We 11 http://www.politifact.com/truth-o- automatically extracted predictions from articles on meter/promises/obameter/browse/ to the drift in both facts retrieval module and val- [8] Mio Kobayashi, Ai Ishii, Chikara Hoshino, Hiroshi idation module, due to other insignificant predictive Miyashita, and Takuya Matsuzaki. Automated his- clauses. ‘Rio Predictions’ contains mostly event-based torical fact-checking by passage retrieval, word statis- tics, and virtual question-answering. In Proceedings predictions and the high false positive rate for this of the Eighth International Joint Conference on Nat- dataset is partly due to omitting explicit negative en- ural Language Processing (Volume 1: Long Papers), tity similarity in the context of a given prediction. For volume 1, pages 967–975, 2017. example, the entities ‘Ussain Bolt’ and ‘Wayde van [9] Dmitrijs Milajevs, Mehrnoosh Sadrzadeh, and Niekerk’ are negatively related in the context of ‘win- Thomas Roelleke. Ir meets nlp: On the semantic ning a medal at Rio Olympics’. This negative similar- similarity between subject-verb-object phrases. In ity should be translated to negative triplet similarity Proceedings of the 2015 International Conference on and further to labeling as a contradicting relation for The Theory of Information Retrieval, pages 231–240. the prediction-fact entailment pair. We plan to ad- ACM, 2015. dress this in our future work, by generating alternative [10] Piero Molino, Pierpaolo Basile, Annalina Caputo, statements for a prediction by automatically identify- Pasquale Lops, and Giovanni Semeraro. Exploiting ing the doubt unit in a sentence and filling with rele- distributional semantic models in question answer- vant comparable entities/phrases. ing. In Semantic Computing (ICSC), 2012 IEEE Sixth International Conference on, pages 146–153. IEEE, 2012. References [11] Christina Niklaus, Bernhard Bermeitinger, Siegfried [1] Gabor Angeli, Melvin Jose Johnson Premkumar, and Handschuh, and André Freitas. A sentence simplifica- Christopher D Manning. Leveraging linguistic struc- tion system for improving relation extraction. arXiv ture for open domain information extraction. In Pro- preprint arXiv:1703.09013, 2017. ceedings of the 53rd Annual Meeting of the Associa- tion for Computational Linguistics and the 7th Inter- [12] Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, national Joint Conference on Natural Language Pro- Benjamin Van Durme, and Chris Callison-Burch. cessing (Volume 1: Long Papers), volume 1, pages Ppdb 2.0: Better paraphrase ranking, fine-grained en- 344–354, 2015. tailment relations, word embeddings, and style clas- sification. In Proceedings of the 53rd Annual Meet- [2] William Ferreira and Andreas Vlachos. Emergent: a ing of the Association for Computational Linguistics novel data-set for stance classification. In Proceedings and the 7th International Joint Conference on Nat- of the 2016 conference of the North American chapter ural Language Processing (Volume 2: Short Papers), of the association for computational linguistics: Hu- volume 2, pages 425–430, 2015. man language technologies, pages 1163–1168, 2016. [13] Radim Řehůřek and Petr Sojka. Software Framework [3] Adam Jatowt, Kensuke Kanazawa, Satoshi Oyama, for Topic Modelling with Large Corpora. In Proceed- and Katsumi Tanaka. Supporting analysis of future- ings of the LREC 2010 Workshop on New Challenges related information in news archives and the web. In for NLP Frameworks, pages 45–50, Valletta, Malta, Proceedings of the 9th ACM/IEEE-CS joint confer- May 2010. ELRA. http://is.muni.cz/publication/ ence on Digital libraries, pages 115–124. ACM, 2009. 884893/en. [4] Hiroshi Kanayama, Yusuke Miyao, and John Prager. [14] Yusuke Yamamoto and Katsumi Tanaka. Finding Answering yes/no questions via question inversion. comparative facts and aspects for judging the cred- Proceedings of COLING 2012, pages 1377–1392, 2012. ibility of uncertain facts. Web Information Systems Engineering-WISE 2009, pages 291–305, 2009. [5] Kensuke Kanazawa, Adam Jatowt, and Katsumi [15] Navya Yarrabelly and Kamalakar Karlapalem. Ex- Tanaka. Improving retrieval of future-related infor- tracting predictive statements with their scope from mation in text collections. In Proceedings of the 2011 news articles. In The 12th International AAAI Con- IEEE/WIC/ACM International Conferences on Web ference on Web and Social Media (ICWSM-18), Sub- Intelligence and Intelligent Agent Technology-Volume mitted for publication. 01, pages 278–283. IEEE Computer Society, 2011. [6] Nattiya Kanhabua, Roi Blanco, and Michael Matthews. Ranking related news predictions. In Pro- ceedings of the 34th international ACM SIGIR con- ference on Research and development in Information Retrieval, pages 755–764. ACM, 2011. [7] Hideki Kawai, Adam Jatowt, Katsumi Tanaka, Kazuo Kunieda, and Keiji Yamada. Chronoseeker: Search engine for future and past events. In Proceedings of the 4th International Conference on Uniquitous In- formation Management and Communication, page 25. ACM, 2010.